Comments (5)
same error with:
Deep Learning AMI (Ubuntu 18.04) Version 29.0
no idea why:
import os
print(os.environ['LD_LIBRARY_PATH'])
/usr/local/cuda-10.1/lib64:/usr/local/cuda-10.1/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib:/usr/local/cuda-10.1/efa/lib:/opt/amazon/efa/lib:/opt/amazon/efa/lib64:/usr/lib64/openmpi/lib/:/usr/local/lib:/usr/lib:/usr/local/mpi/lib:/lib/:/usr/lib64/openmpi/lib/:/usr/local/lib:/usr/lib:/usr/local/mpi/lib:/lib/::/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow
I think it compiles with wrong CUDA - 10.0 while TF clearly uses 10.1 in this setup, and 10.0 is NOT on the PATH of default TF installation. I think HASTE takes 10.0 as it is the default CUDA dir in /usr/local/cuda -> which points to cuda-10.0, not cuda-10.1.
Is it possible to pull CUDA version from TF during the make haste_tf run like you do with include directories may be?
from haste.
Actually adding /usr/local/cuda-10.0/lib64 to the LD_LIBRARY_PATH solved it.
from haste.
Or - laternatively - changing /usr/local/cuda to point to /usr/local/cuda-10.1 instead of /usr/local/cuda-10.0 at haste compile time.
from haste.
/usr/local/cuda
is the standard path for the "current" installation of CUDA Toolkit. If multiple versions of CUDA Toolkit are present, /usr/local/cuda
should point to the one you'd like to use. If Amazon's DLAMI builds TensorFlow against 10.1 but points /usr/local/cuda
to 10.0, that's a misconfiguration on their part.
Regardless, I've made it possible to override the CUDA Toolkit path to build against without having to modify the Makefile
. You can either export the $CUDA_HOME
environment variable or specify it on the command-line while building Haste:
CUDA_HOME=/usr/local/cuda-10.1 make
Thanks for pushing through these issues and reporting them, @amurashov. I appreciate it.
from haste.
There are two issues with detecting which version of CUDA TensorFlow is built against. First, there's no great way to do that as far as I can tell. The second is that Haste is multi-framework and PyTorch/TensorFlow could be built against different CUDA versions.
I think making it easier for the user to choose the CUDA version to build against is the best I can do here. Closing this out since there's no further action I can take on my part. Please re-open if you have suggestions on what else I could do.
from haste.
Related Issues (20)
- Install on pip on systems without cuda HOT 7
- Segmentation fault on Cuda 10.0 HOT 2
- Support zoneout on lstm cell state and add recurrent dropout HOT 2
- CUDA error: an illegal memory access was encountered HOT 6
- haste_pytorch: Gradient for kernel/recurrent_kernel becomes zero when trained on gpu HOT 4
- How to expose LayerNormGRUCell to python ? HOT 2
- Can't run haste layers in Keras HOT 12
- Biases in final IndRNN layer are 0 HOT 1
- Zoneout remains during eval() HOT 2
- return_state_sequence for tf version
- layer_norm_gru_cell HOT 1
- Can Bidirectional Rnn and multi-layer Rnn be supported? HOT 1
- Activation function in IndRNN HOT 1
- haste_pytorch does not install properly with conda cudatoolkit? HOT 3
- Feature request for cell classes for pytorch HOT 7
- `RNN`s with `zoneout > 0.0` have wrong gradients HOT 1
- haste_tf compilation fails with "‘bfloat16’ in namespace ‘Eigen’ does not name a type"
- Support for PyTorch packed sequences HOT 2
- Supporting RWKV (a RNN that can match transformer LM & zero-shot performance at 1B+ params)
- Nan loss when replace pytorch LSTM with your LSTM or LayerNormLSTM HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from haste.