Comments (5)
I just noticed this in practice - tried to train using nn.DataParallel and got the error that the tensors were on the wrong GPU.
from torchsample.
Yes, it's not totally as trivial as just stuffing everything into nn.DataParallel but it's not terribly difficult either. I've done it in the past in my custom code but would love to switch to this framework as it has additional callbacks, transforms etc that are nice to use.
from torchsample.
@jph00 you may want to examine your code to make sure you're working with the same GPU. The parallelization using nn.DataParallel seems to work for me if I leave out regularizers and constraints. I'm still trying to understand why but it basically complains in module_trainer.py
:
593 regularizer_loss = regularizers(self.model)
594 loss += regularizer_loss
--> 595 batch_logs['regularizer_loss'] = regularizer_loss.data[0]
AttributeError: 'float' object has no attribute 'data'
Looking at the RegularizerModule
I don't even see where data
attribute is coming from. It seems that loss is always a float... yet somehow the code runs fine when executed on a single GPU. Maybe @ncullen93 has an idea. I'm sure the fix is simple.
P.S. there's a similar issue with constraints.
from torchsample.
I can get around the issue by simply commenting out the lines that store regularization and constraints loss in batch_logs
but it's obviously not ideal. Not sure what the real solution is.
Also, there's an additional problem in the module_trainer.py
summary
:
torchsample/torchsample/modules/module_trainer.py in summary(self, input_size)
113 self.model.apply(register_hook)
114 # make a forward pass
--> 115 self.model(x)
116 # remove these hooks
117 for h in hooks:
TypeError: Broadcast function not implemented for CPU tensors
from torchsample.
I added a pull request partially addressing the multi-GPU issues: #40
from torchsample.
Related Issues (20)
- Binary Classification not working HOT 1
- Using Pre-Trained models
- How to set the learning rate for the optimizer in the ModuleTrainer compile function
- Are there any tutorials and documents for this repo? HOT 1
- How can I use torchsample with vae
- Range Normalization divide by 0?
- is this still under development?
- Pytorch1.3 upgradations welcome??
- Callbacks.ReduceLROnPlateau seems not work for small learning rate
- Can torchsample augment 3D medical image directly? HOT 1
- callback with input data (and targets)
- AlphaPose Project for python don't run on my windows10 (2020)
- Contact HOT 1
- Installing torchsample and ModuleTrainer
- customized loss function HOT 1
- Index error in module_trainer.py HOT 1
- Missing data type conversion for interpolation
- Support for torch Variables
- pip3 install
- Binary Classification Accuracy calculation error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from torchsample.