Comments (2)
Hi @lonnietc,
Please take a look at our publications that contain experiments for different algorithms implemented in hivemind: https://github.com/learning-at-home/hivemind#citation (take a look at the newest papers in "Additional publications" too). Hope it helps!
@justheuristic @mryab cc-ing you in case you have anything else to say.
from hivemind.
Hivemind has several components that have different scaling properties.
For instance, hivemind.dht.DHT scales into 8192 nodes more or less seamlessly - and can probably larger if we had the RAM (and patience) to test it.
In turn, hivemind.Optimizer requires some tweaking to go beyond 256 nodes - different averaging timeouts and/or groups. The only time (to my knowledge) we tested it with more than 1k nodes it required multiple averaging groups as in this paper.
As for hivemind.moe, it's scaling properties depend on the network design. Having a model with multiple smaller MoE layers scales to more nodes than one big MoE. Having 2d grid scales better than 1d grid. I'd hazard a guess that a single MoE layer can scale into thousands of nodes with some tinkering (grid, beam search paams), but i haven't ever done that.
from hivemind.
Related Issues (20)
- [BUG] Unable to train a bloat16-compressed model HOT 1
- [BUG][MINOR] relayFinder already running
- [Feature Request] improve bfoat16 serialization when there is no compression HOT 1
- Failed to connect to bootstrap peers HOT 1
- AttributeError in MPFuture HOT 2
- Metaclasses for logging HOT 1
- hivemind.compression: TypedStorage is deprecated HOT 1
- Failed to close hivemind.P2P HOT 1
- Local Gradient Accumulation is slower than the PyTorch implementation.
- [BUG] hivemind.compression is not compatible with bitsandbytes == 0.39.1 HOT 2
- [BUG] Getting '[Errno 13] Permission denied' when import hivemind
- forking before initialization of the MPFuture handler - server runtime not initialized in WSL --new_hive HOT 1
- proto/runtime_pb2.py missing when installing from sources HOT 1
- does/can hivemind work with deepspeed ZeRO-3 Offload? [Feature Request]
- Support for fully homomorphic encryption on training, finetuning, and inference
- [BUG] Unable to start hivemind server when using gradient clipping HOT 2
- Support for windows HOT 1
- pydantic < 2.0.0 is starting to conflict with other dependencies HOT 1
- connecting to private petals using ec2 dht problems
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hivemind.