floydhub / dockerfiles Goto Github PK

View Code? Open in Web Editor NEW

156.0 156.0 57.0 546 KB

Deep Learning Dockerfiles

Home Page: https://docs.floydhub.com/guides/environments/

License: Apache License 2.0

Python 74.83% Shell 8.43% HTML 9.71% Lua 6.35% Dockerfile 0.68%

deep-learning docker pytorch tensorflow torch

dockerfiles's People

Contributors

Stargazers

Watchers

Forkers

yubozhao codeaudit arundasan91 houqp hongxin001 ofirbb elhomosiguiente fastaichina dave-re nakosung awokeknowing vlad17 yarec ahlfors benjamesbabala allensmile alexleethinker lwanglinhong jamesferry11 apremkumar1989 russellcloud whenyd tahak babelbuilder liviust redeipirati rodinian skyhit afcarl xinqiyang metu-mmi-deeplearning hephaex danielsernabuitrago yuanjie-ai zmoon111 rbackupx pyseany tony-hong enbacoo danaharralds jmerkow yuanbw dclong kapros-stuff surajx miltonfelipe 419291821 edmontdants stevenjokess parth1551 frankfanslc sysang mohammadreza-asadi-g nascentcore

dockerfiles's Issues

floydhub/tensorflow seems to be missing stubs from LD_LIBRARY_PATH

$ sudo docker run -it floydhub/tensorflow:1.9.0-gpu.cuda9cudnn7-py3_aws.32 python -c "import tensorflow; print(tensorflow.__version__)"
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/local/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/local/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

But the following works:

$ sudo docker run --env "LD_LIBRARY_PATH=/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH"  -it floydhub/tensorflow:1.9.0-gpu.cuda9cudnn7-py3_aws.32 python -c "import tensorflow; print(tensorflow.__version__)"
1.9.0

Feature Request: RAPIDS

RAPIDS has recently dropped pipy installation, they are now providing conda packages. This requires a bit of work and it can be done in 3 different ways:

Install RAPIDS from source (a lot of work, not to mention that they recommend using conda even in the build from source)
Install Conda under dl-deps/dl-python, create a new environment & "move" all the pipy packages under conda
Download the conda packages for the build we want & make them available inside the environment

The link given in the description

The link given in the description of the repository for the FloydHub Documentation ( https://docs.floydhub.com/home/environments/ ) gives 404 - Not Found error.

base images on cudagl (for openGL support)

nvidia docker version 2 eg (docker run --runtime=nvidia ...) had not worked with opengl until a few days ago. So for example, we cannot run the open ai gym environments which rely on opengl. (specifically we cannot actually render/x-forward them)

nvidia just released a new cuda 9 image with open gl. this is great news especially for people using gym environment and ROS environments.

Here you can read a bit about the issue progress
NVIDIA/nvidia-docker#534

Here's the nvidia link
https://hub.docker.com/r/nvidia/cudagl/

I'm so excited about this! I'm resisting the urge to clone and rebuild all of your images just for the pytorch image! And by the way, I see you've been updating gym recently so this should go right along with it.

Build xgboost with GPU support

Have to build XGBoost with GPU support for the GPU images using the -DUSE_CUDA=ON flag

circleci build timeout

I am using circleci to build dockerfile tensorflow1.8 gpu images.
Built time up to several hours. But circleci limit the job time to 2 hours.
Do you have this problem? Are you still using circleci now?

TF GPU compute

The python 3 GPU docker file specifies ENV TF_CUDA_COMPUTE_CAPABILITIES=3.7, which is the compute capability for the K80s. AWS also has g3's M80 cards, which have compute capabilities 5.2. Could that line be changed to ENV TF_CUDA_COMPUTE_CAPABILITIES=3.7,5.2 so that the TF that's built is optimized for all AWS GPU offerings?

See nvidia for listing

Update theano

Hi,
Theano 0.9 has been released. It would be great if you can upgrade the version in the docker image.

Thanks

Keras backend in theano:py2

Hi, I launched a machine by:

floyd run --mode jupyter  --data SyccinddLDdS7p3vzcwGQ2  --env theano:py2  --gpu

In the notebook, I checked the Keras backend by:

!cat ~/.keras/keras.json

It returns:

{
    "image_dim_ordering": "tf", 
    "epsilon": 1e-07, 
    "floatx": "float32", 
    "backend": "tensorflow"
}

Could you please change it to something like:

{
    "image_dim_ordering": "th",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "theano"
}

Tensorflow CUDA Compute capabilities

In the GPU docker files for tensorflow you use the following capabilities to setup bazel:
ENV TF_CUDA_COMPUTE_CAPABILITIES=3.0,3.5,5.2,6.0,6.1.

Though the p2.xlarge instances in AWS use the K80 which support the 3.7 compute capabilities.

Can you please explain what makes your current configuration optimized for AWS? Why not simply use 3.7 in the compute capabilities field?

Thanks.

Can't access internet from tensorflow 1.5.0 container on windows

I normally work on Linux but I've set up a py3 cpu version of tf 1.5.0 on windows.

For some reason, I've got no internet access. Most of the suggestions on stack overflow for fixing internet connection issues assume that the host is linux, so they haven't been any help.

Do you have any advice for running these containers on windows?

I have windows 10 Pro, and am happy to provide any details you might require.

Tensorflow + Go image

I would love to see an optimised GPU Tensorflow Go image. I'll try make my own but not sure where to start yet so any help would be appreciated.

vnc for openAI gym

openAI gym is working in the pytorch image, but to actually run the atari environments, it fails due to lack of x11. What's the best way to get it to where we can see the environments. I tried adding a desktop with vnc but it messed up the gpu support.

in my attempts strangely enough the atari environments would run when I run the container with docker, but when I run the container with nvidia-docker (gpu enabled), the gpu is definately working (torch.cuda.is_available()) but the atari games error out on

File "/usr/local/lib/python3.5/site-packages/pyglet/gl/glx_info.py", line 83, in have_version
    raise GLXInfoException('pyglet requires an X server with GLX')
pyglet.gl.glx_info.GLXInfoException: pyglet requires an X server with GLX

to be clear, I have an Nvidia GPU. my goal is to work with pytorch and openai gym. your pytorch image is perfect, except that I need to see the atari environments. I don't know how to see them but have been trying for 16 hours. I can follow instructions, but I can't seem to find instructions.

Add Annoy python package

Could you please add https://github.com/spotify/annoy to dl-base docker image? So its available across all dl experiments

Annoy is used to find word embeddings using nearest neighbours. Its quite a popular plugin for nlp tasks. Thank you so much

Add license

Just a quick heads-up :)

Was playing around with this and noticed there isn't a license in the repository.

cuda not available

$ sudo docker run  -t -i floydhub/pytorch:latest-gpu-py3 ipython
Python 3.5.3 (default, Feb 13 2017, 20:35:17) 
Type "copyright", "credits" or "license" for more information.

IPython 5.3.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import torch

In [2]: torch.cuda.is_available()
Out[2]: False

Any idea what went wrong here?

Performance issues in blob/master/dl/tensorflow/tests/1.13/dataset.py(P2)

Hello,I found a performance issue in the definition of dataset ,
dockerfiles/blob/master/dl/tensorflow/tests/1.13/dataset.py,
.map(decode_image) was called without num_parallel_calls.
I think it will increase the efficiency of your program if you add this.

The same issues also exist in .map(decode_label) ,
.map(decode_image),
.map(decode_label),

Here is the documemtation of tensorflow to support this thing.

Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.

Force build of specific docker images / frameworks

Currently the prepare.py script looks only for images that have changed in the most recent commit and builds it. Sometimes we need the flexibility to rebuild specific docker images or all images in a framework.

This could be achieved using environment variables that are injected at build time from Circle CI web UI.

cc: @houqp

prepare.py fails when there is no code change

ci/prepare.py script fails when there is no code change and the build was started manually.

Traceback (most recent call last):
  File "ci/prepare.py", line 69, in <module>
    for dockerfile_path in find_changed_dockerfiles():
  File "ci/prepare.py", line 37, in find_changed_dockerfiles
    git_compare = build_info['compare'].split('/')[-1]
AttributeError: 'NoneType' object has no attribute 'split'