Comments (3)
Hi,
I did some benchmarks in our own environment (2 nodes with one Nvidia v100 GPU, 100Gbps network, torch 1.3.1, ResNet20, Cifar10, batch_size=256). Here are the time of 1 epoch training for different compressors:
- NoneCompressor(allreduce): 11.03 s
- TopKCompressor(0.01)(allgather): 11.35 s
- QSGDCompressor(127)(allgather): 11.91 s
- YourCode(0.01)(allgather): 14.11 s
Looks like the time cost of your code is much higher than either of Topk and QSGD.
Then I add a simple loop in TopKCompressor and QSGDCompressor to simulate high computation overhead:
Number of loops | 1 | 2 | 3 | 4 |
---|---|---|---|---|
TopKCompressor(0.01)(allgather) | 11.35 | 12.80 | 13.25 | 13.60 |
QSGDCompressor(127)(allgather) | 11.91 | 15.32 | 18.87 | 22.46 |
The training time grows up linearly with computation overhead, and we can derive the computation cost for TopK and QSGD in this case, which are 0.45 s and 3.4 s for 1 epoch training. Then let's look back at the time cost of your code, it makes more sense now. I think the reason why single TopK or QSGD doesn't show high time cost is probably the benefit of less communication. As you increase the computation overhead, the communication benefits become negligible.
from grace.
Thanks for your reply.
I also did some experiments on the GRACE(change the flag tensors_size_are_same
in Compressor Init.), I found that the time cost is much higher when we turn the tensors_size_are_same
to False
. So I think it will bring us higher computational complexity when dealing with different sizes of the gradient tensor.
from grace.
Indeed. When tensors_size_are_same
is set to False, GRACE needs an additional allgather for the size of the tensors, that's why it is slow.
from grace.
Related Issues (20)
- EFSignSGDCompressor does not seem to use memory HOT 1
- bug: tensorflow natural compression returns 'Python int too large to convert to C long' HOT 2
- A question about one-bit quantization implementation in tensorflow backend. HOT 1
- Discussion about TernGrad HOT 13
- DGC warmup HOT 3
- Threshold compression hangs. HOT 1
- Seeking suggestions for embedding into ddp HOT 8
- Problematic error: suddenly interrupt when trainning HOT 3
- Some questions about the grace HOT 2
- [DDP] weight in classifier HOT 1
- Any plans to port to latest Horovod (0.24) ? HOT 3
- DGC GPU usage in GRACE HOT 4
- The question about QSGD algorithm in the paper HOT 1
- Environment build failed on AWS EC2 g4dn instance with OS image (Deep Learning AMI, ami-0184e674549ab8432) HOT 4
- horovodrun command reports an error and cannot run the examples
- GRACE PyTorch Distributed HOT 6
- TypeError:compensate() missing 1 required positional argument: 'name' HOT 10
- About the usage of GRACE HOT 2
- Global or Local compression HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from grace.