Comments (4)
Hi. Good Q. I try my best to reduce total memory usage. I am not just using computational tricks, but also mathematical tricks (ie - use more memory efficient algorithms). For K-Means and TSNE, yes ur right, memory can explode dramatically. TSNE for eg has to perform Barnes Hut first, which is essentially X*X.T for euclidean distances. Not very memory efficient at all. Likewise, K-Means can cause problems when updating the centroids --> minusing a matrix with it's centroids consumes double memory.
My aim is to reduce these temporary copies. But yes, I can deal with input so long as ur data can fit in ur memory.
Say when u upload data to RAM, it takes say 12 GB / 16 GB for eg. Then I have 4 GB to play with. If you use for eg Sklearn or Scipy's Linear Reg / lstsq, you will experience a crazy memory usage, causing memory issues. Using Cholesky Decomp, I reduced the overall memory usage by more than 50%.
Also, yes, I have planned to break down computations if the memory usage exceeds the total capacity. You can check Hyperlearn2, where I have started implementing the necessary memory checks before something is even run.
from hyperlearn.
https://github.com/danielhanchen/hyperlearn/blob/master/hyperlearn2/base.py
I'm slowing writing decorators to first:
- Convert dtype to lowest possible (int32 --> float32 not float64) etc
- Check possible memory usage, and for now, tells the user theres a memory problem.
- If possible, I try to use more memory efficient algos if memory is restrictive (eg: using GESVD instead of GESDD if memory is a problem)
- In the future, I plan to perform batch processing. Say for Kmeans, instead of minusing in 1 go for each centroid, I will do it sequentailly.
HyperLearn's goal is to make ML faster, but also less resource intensive.
from hyperlearn.
It has been quite a while. Since then, I managed to implement the batch processing to iteratively compute the metrics between GPU and CPU memories. It surely works. However, I believe this could be a general pattern for similar methods in view of ever growing large datasets nowadays. On the other hand, methods not requiring everything in the memory to compute all at once should be encouraged to use such s UMAP. That should make life easier.
from hyperlearn.
@farleylai Heyy!! So sorry I closed your issue - we have a new Discord channel https://discord.gg/eJQzD4sH
We're repackaging the entire package and making it fully streamlined, much faster and supporting many more algos.
I agree on batch processing - processing iterative chunks between GPU GRAM and CPU RAM is a smart approach!
from hyperlearn.
Related Issues (20)
- license HOT 5
- New Documentation - ReadTheDocs ! HOT 5
- `setup.py` install fails on Ubuntu 16.04 HOT 7
- ReadTheDocs Cython ImportError: Cannot import module ...
- Cannot import hyperlearn subpackages HOT 4
- Compiling Numba code fails on 16.04 HOT 1
- Undefined function in `randomized.linalg.svd` HOT 1
- Compiling numba code failes on Ubuntu 16.04 HOT 1
- [ANNOUNCEMENT] New plans for HyperLearn
- Install instructions? HOT 3
- [ANNOUNCEMENT] Hyperlearn revamp mid 2022!
- Discord Server!
- [TESTING] Webhook Discord Server
- **IMPORTANT: On Contributing to HyperLearn + Note to Contributors HOT 5
- Installation Problem HOT 1
- Least Absolute Deviation Regression
- IRLS for Logistic / Softmax Regression + Hessian Matrix
- Shocking Confusing Speed / Timing results of Algorithms (Sklearn, Numpy, Scipy, Pytorch, Numba) | Prelim results HOT 1
- Installation HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hyperlearn.