Comments (14)
Is there any update on this?
from gradientaccumulator.
Hello, @Bidski! :]
Is there any update on this?
I made an attempt at it before christmas, but it was a bit more challenging to add proper support for it with the current approach I made, than I had time for.
However, if you are able to test this feature very soon, I could make an attempt today. Also, what is your use case?
from gradientaccumulator.
Hi @andreped
I won't be able to test until Monday, but I would be interested in seeing this working
from gradientaccumulator.
I won't be able to test until Monday, but I would be interested in seeing this working
No worries, that just means that I could make an attempt during the weekend instead. Will keep you updated on the feature. Stay tuned :]
from gradientaccumulator.
Will make an attempt at this now, @Bidski and @innat.
@innat, please provide a gist, if you had one.
EDIT: I just observed that I made a test script once, that is already part of tests. Should this script reproduce your issue properly and represent a valid use case?
from gradientaccumulator.
Just made an attempt now, and I keep running into the same issues. Essentially meaning that the multi-GPU strategy does not work with the train_step overload approach.
I'm quite preoccupied with finalizing my PhD work and I don't see that I have time to debug this further, as I myself don't use multiple GPUs simultaneously in my work. Perhaps anyone else could make an attempt? Are you interested in making an attempt, @innat?
I tried using the class provided here which should be useful for handling resources across replicas, and then looking here to see how it is used in a custom pipeline. But I warn you, it is quite the rabbit hole...
from gradientaccumulator.
Couldn't help myself... Got a little further, @innat.
I added a GAModelWrapperV2 that is compatible with tf.distribute.MirroredStrategy()
. At least, when running a simple test, it does not crash immediantly (as the previous implementation did with the same strategy), and memory seem to be allocated across multiple GPUs.
However, the distribution of "mini"-batches is not done optimally. I assume that you want to split a batch into k
smaller batches, and distribute these across k
GPUs, then catch the gradients from each GPU and accumulate these before the update. When doing it "optimally, I keep getting this error, but hopefully I can find a solution to it:
RuntimeError: Method requires being in cross-replica context, use get_replica_context().merge_call()
Any progress can be observed from the multi-gpu
branch here. See here for the new model wrapper and here for a simple test script.
EDIT: After further inspection, I'm not really sure computation is actually computed on both GPUs. Might be that it "worked", just because only one GPU was used. I don't have time to debug this further, but perhaps someone else have time?
from gradientaccumulator.
EDIT: I just observed that I made a test script once, that is already part of tests. Should this script reproduce your issue properly and represent a valid use case?
I think, according to What should be in scope and what should be outside?, you should move the optimizer (lines 55-56) inside of the scope? Otherwise, I think that test should cover my use case. I also use tf.distribute.ReductionToOneDevice
with tf.distribute.MirroredStrategy
, but I'm not sure if this would have any significant impact on how this wrapper will operate
from gradientaccumulator.
Was just having a look at the tensorflow source code to see the interaction between Model.fit()
and Model.train_step()
and I came across the steps_per_execution argument for Model.compile
and this bit of code in Model.make_train_function
.
This almost seems like a setup for gradient accumulation if I'm not entirely delusional? Maybe a combination of steps_per_execution == accum_steps
and the custom train_step
function might be the way to go?
from gradientaccumulator.
EDIT: I just observed that I made a test script once, that is already part of tests. Should this script reproduce your issue properly and represent a valid use case?
I think, according to What should be in scope and what should be outside?, you should move the optimizer (lines 55-56) inside of the scope? Otherwise, I think that test should cover my use case. I also use
tf.distribute.ReductionToOneDevice
withtf.distribute.MirroredStrategy
, but I'm not sure if this would have any significant impact on how this wrapper will operate
Made a new an improved test script here:
https://github.com/andreped/GradientAccumulator/blob/d2eeee307eefd11182342045622a7fce03319ba5/tests/test_multi_gpu_benchmark.py
I have also tried splitting the mini batch into reduced mini-batches and distribute these, but it does not seem to be working:
https://github.com/andreped/GradientAccumulator/blob/multi-gpu/gradient_accumulator/GAModelWrapperV2.py#L62
If you have time, you can try to explore this further, but I am quite limited with time for the next week or so. Note that all these changes have been made on a separate branch, multi-gpu.
Was just having a look at the tensorflow source code to see the interaction between
Model.fit()
andModel.train_step()
and I came across the steps_per_execution argument forModel.compile
and this bit of code inModel.make_train_function
.
I often have a hard time understand the docs. I'm not sure steps_per_execution
is what we want, but you are free to explore that further. I'm happy to follow up any attempt you make. However, I think we might run into the same Multi-GPU issue as we are observing here anyways. However, I'm not sure.
from gradientaccumulator.
@Bidski Just mentioning that a new release has been added which adds experimental support for optimizer wrapping, similar as was common to do in TF1. All optimizers are supported, however, dynamic optimizers such as Adam have strange behaviour (results too far away from regular batch training). However, SGD works great.
With this new approach, it should be possible to add multi-GPU support much more easily. I will update you tomorrow, when I have made a new attempt at it, with this new approach.
The latest release can be found here.
from gradientaccumulator.
Seems like multi-GPU is not working as intended, even with the OptimizerWrapper. Seems to work just fine with one GPU though.
If anyone wish to debug this further, see this notebook I made public on Kaggle, which enables you to run tests with two GPUs for free: https://www.kaggle.com/code/andreped/grad-accum-multi-gpu?scriptVersionId=117764939
from gradientaccumulator.
Optimizer wrapper is finally working with multi-GPU training!
Fixed in 47a51f3.
from gradientaccumulator.
@Bidski Just letting you know that multi-gpu support has been officially added in the latest release v0.5.0.
Should work out-of-the-box for the optimizer wrapper. For the model wrapper, I have added experimental support which only works for the SGD optimizer. Nonetheless, I believe the optimizer wrapper should fits your needs.
Let me know how it works and if you experience any issues using it :]
from gradientaccumulator.
Related Issues (20)
- Use tf.function on train_step HOT 11
- 0.5.1, tf 2.11 error for accuoptimizer HOT 8
- Replacing AccumBatchNormalization not working as intended HOT 2
- ConvNeXt not compatible with Model wrapper HOT 1
- No mixed precision support with GradientAccumulateOptimizer? HOT 7
- Replacing BN layer with AccumBN layer results in poorer convergence
- confusion over how to use this module HOT 2
- Dummy issue to test auto-assign
- Dummy issue to test auto-assign
- Test HOT 1
- Reduce number of unit tests HOT 2
- Unit tests fail on tf >= 2.10 HOT 1
- AccumBN is not compatible with 3D ops e.g. Conv3D HOT 1
- Mixed precision not working as intended with AccumBatchNormalization HOT 8
- Add linting to improve code style HOT 1
- Unit test for optimizer invariance in distributed trainings HOT 2
- Optimizer wrapper not working as intended HOT 2
- Optimizer wrapper not compatible with tf==2.6 HOT 2
- AttributeError using Optimizer wrapper with tf==2.4 HOT 1
- Optimizer wrapper performance is dependent on tensorflow version HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gradientaccumulator.