Here, The materials are mainly about Distributed Deep Learning/Machine Learning Systems Design. Also, you want to learn more about distributed systems(e.g. Courses, Storage, Messaging systems), an awesome collection is here.
[toc]
Readings before you start. Definitely, Rome was not built in a day.
- Tensorflow - Detailed tutorial, friendly to Newbie.
- MXNet - Efficiency and flexibility. Dynamic dependency scheduler + graph optimization.
- Pytorch - Put
Python
first. - Caffe2 -
- Principles of Distributed Systems [ETH Zurich University]
- Deep Learning [An MIT Press book]
- NNVM By Tianqi Chen - Co-author of MXNet
A modular, decentralized and lightweight part to help build deep learning libraries.
- Weld - LLVM By Matei Zaharia - Co-author of Spark
A runtime for data analytics applications that changes the interface between software libraries to enable powerful cross-library optimizations.
-
Training Deep Nets with Sublinear Memory Cost Trade computation for memory - giving a more memory efficient training algorithm with a little extra computation cost.
-
Memory-Efficient Backpropagation Through Time [NIPS 2016] Reduce memory consumption for backpropagation through time (BPTT) algorithm in RNNs.
- RDMA-based