floydhub / dockerfiles Goto Github PK
View Code? Open in Web Editor NEWDeep Learning Dockerfiles
Home Page: https://docs.floydhub.com/guides/environments/
License: Apache License 2.0
Deep Learning Dockerfiles
Home Page: https://docs.floydhub.com/guides/environments/
License: Apache License 2.0
$ sudo docker run -it floydhub/tensorflow:1.9.0-gpu.cuda9cudnn7-py3_aws.32 python -c "import tensorflow; print(tensorflow.__version__)"
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/local/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/usr/local/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
But the following works:
$ sudo docker run --env "LD_LIBRARY_PATH=/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH" -it floydhub/tensorflow:1.9.0-gpu.cuda9cudnn7-py3_aws.32 python -c "import tensorflow; print(tensorflow.__version__)"
1.9.0
RAPIDS has recently dropped pipy installation, they are now providing conda
packages. This requires a bit of work and it can be done in 3 different ways:
dl-deps
/dl-python
, create a new environment & "move" all the pipy packages under conda
The link given in the description of the repository for the FloydHub Documentation ( https://docs.floydhub.com/home/environments/ ) gives 404 - Not Found error.
nvidia docker version 2 eg (docker run --runtime=nvidia ...) had not worked with opengl until a few days ago. So for example, we cannot run the open ai gym environments which rely on opengl. (specifically we cannot actually render/x-forward them)
nvidia just released a new cuda 9 image with open gl. this is great news especially for people using gym environment and ROS environments.
Here you can read a bit about the issue progress
NVIDIA/nvidia-docker#534
Here's the nvidia link
https://hub.docker.com/r/nvidia/cudagl/
I'm so excited about this! I'm resisting the urge to clone and rebuild all of your images just for the pytorch image! And by the way, I see you've been updating gym recently so this should go right along with it.
Have to build XGBoost with GPU support for the GPU images using the -DUSE_CUDA=ON
flag
I am using circleci to build dockerfile tensorflow1.8 gpu images.
Built time up to several hours. But circleci limit the job time to 2 hours.
Do you have this problem? Are you still using circleci now?
The python 3 GPU docker file specifies ENV TF_CUDA_COMPUTE_CAPABILITIES=3.7
, which is the compute capability for the K80s. AWS also has g3's M80 cards, which have compute capabilities 5.2
. Could that line be changed to ENV TF_CUDA_COMPUTE_CAPABILITIES=3.7,5.2
so that the TF that's built is optimized for all AWS GPU offerings?
See nvidia for listing
Hi,
Theano 0.9 has been released. It would be great if you can upgrade the version in the docker image.
Thanks
Hi, I launched a machine by:
floyd run --mode jupyter --data SyccinddLDdS7p3vzcwGQ2 --env theano:py2 --gpu
In the notebook, I checked the Keras backend by:
!cat ~/.keras/keras.json
It returns:
{
"image_dim_ordering": "tf",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}
Could you please change it to something like:
{
"image_dim_ordering": "th",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "theano"
}
In the GPU docker files for tensorflow you use the following capabilities to setup bazel:
ENV TF_CUDA_COMPUTE_CAPABILITIES=3.0,3.5,5.2,6.0,6.1
.
Though the p2.xlarge
instances in AWS use the K80 which support the 3.7
compute capabilities.
Can you please explain what makes your current configuration optimized for AWS? Why not simply use 3.7
in the compute capabilities field?
Thanks.
I normally work on Linux but I've set up a py3 cpu version of tf 1.5.0 on windows.
For some reason, I've got no internet access. Most of the suggestions on stack overflow for fixing internet connection issues assume that the host is linux, so they haven't been any help.
Do you have any advice for running these containers on windows?
I have windows 10 Pro, and am happy to provide any details you might require.
I would love to see an optimised GPU Tensorflow Go image. I'll try make my own but not sure where to start yet so any help would be appreciated.
openAI gym is working in the pytorch image, but to actually run the atari environments, it fails due to lack of x11. What's the best way to get it to where we can see the environments. I tried adding a desktop with vnc but it messed up the gpu support.
in my attempts strangely enough the atari environments would run when I run the container with docker, but when I run the container with nvidia-docker (gpu enabled), the gpu is definately working (torch.cuda.is_available()) but the atari games error out on
File "/usr/local/lib/python3.5/site-packages/pyglet/gl/glx_info.py", line 83, in have_version
raise GLXInfoException('pyglet requires an X server with GLX')
pyglet.gl.glx_info.GLXInfoException: pyglet requires an X server with GLX
to be clear, I have an Nvidia GPU. my goal is to work with pytorch and openai gym. your pytorch image is perfect, except that I need to see the atari environments. I don't know how to see them but have been trying for 16 hours. I can follow instructions, but I can't seem to find instructions.
Could you please add https://github.com/spotify/annoy to dl-base docker image? So its available across all dl experiments
Annoy is used to find word embeddings using nearest neighbours. Its quite a popular plugin for nlp tasks. Thank you so much
Just a quick heads-up :)
Was playing around with this and noticed there isn't a license in the repository.
$ sudo docker run -t -i floydhub/pytorch:latest-gpu-py3 ipython
Python 3.5.3 (default, Feb 13 2017, 20:35:17)
Type "copyright", "credits" or "license" for more information.
IPython 5.3.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import torch
In [2]: torch.cuda.is_available()
Out[2]: False
Any idea what went wrong here?
Hello,I found a performance issue in the definition of dataset
,
dockerfiles/blob/master/dl/tensorflow/tests/1.13/dataset.py,
.map(decode_image) was called without num_parallel_calls.
I think it will increase the efficiency of your program if you add this.
The same issues also exist in .map(decode_label) ,
.map(decode_image),
.map(decode_label),
Here is the documemtation of tensorflow to support this thing.
Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.
Currently the prepare.py script looks only for images that have changed in the most recent commit and builds it. Sometimes we need the flexibility to rebuild specific docker images or all images in a framework.
This could be achieved using environment variables that are injected at build time from Circle CI web UI.
cc: @houqp
ci/prepare.py script fails when there is no code change and the build was started manually.
Traceback (most recent call last):
File "ci/prepare.py", line 69, in <module>
for dockerfile_path in find_changed_dockerfiles():
File "ci/prepare.py", line 37, in find_changed_dockerfiles
git_compare = build_info['compare'].split('/')[-1]
AttributeError: 'NoneType' object has no attribute 'split'
cc: @houqp
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.