Giter Site home page Giter Site logo

chainermn docker container about chainermn HOT 9 CLOSED

chainer avatar chainer commented on August 16, 2024
chainermn docker container

from chainermn.

Comments (9)

keisukefukuda avatar keisukefukuda commented on August 16, 2024

Hi MannyKayy,
Thank you for trying ChainerMN.

It seems that gcc cannot find nccl.h in your include path.
Will you check the following things?

BTW, do you mind if we add your case to the troubleshooting guide?
As of now, we haven't tried Nvidia-docker and your experience really helps.

The file nccl.h exists somewhere in your environment. Maybe in /usr/local/include, but I have no idea.

if (1) is yes: check your CPATH environment variable as indicated in http://chainermn.readthedocs.io/en/latest/installation/guide.html#nvidia-nccl

if (1) is no: Go to https://github.com/NVIDIA/nccl . Please follow its README and make sure it works. Then, go back to http://chainermn.readthedocs.io/en/latest/installation/guide.html#nvidia-nccl .

Hope it helps.
Thanks!
Keisuke

from chainermn.

yoshihingis avatar yoshihingis commented on August 16, 2024

Hey all,

I tried to make Dockerfile of ChinerMN as below;
There were some warnings ,but Build was OK.
I run my ChainerMN docker image and started up ChainerMN container.
And I tested train_mnist.py for ChinerMN on my container.
But I faced two issues as below.

Could you give me any advices?

I'd really appreciate it if you will teach me the way to solve these issues.

★Issues

  1. mpixec run as root issue

mpiexec has detected an attempt to run as root.
Running at root is strongly discouraged as any mistake (e.g., in
defining TMPDIR) or bug can result in catastrophic damage to the OS
file system, leaving your system in an unusable state.

You can override this protection by adding the --allow-run-as-root
option to your cmd line. However, we reiterate our strong advice
against doing so - please do so at your own risk.

2.openmpi setting issue

root@1f3a59529a69:/usr/local/lib/python2.7/dist-packages/chainermn# mpiexec -allow-run-as-root -n 4 pyhton train_mnist.py

The value of the MCA parameter "plm_rsh_agent" was set to a path
that could not be found:

plm_rsh_agent: ssh : rsh

Please either unset the parameter, or check that the path is correct

★My Dockerfile for ChinerMN

FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu14.04

ENV http_proxy my.company.com
ENV https_proxy my.company.con

RUN apt-get update && apt-get install -y --no-install-recommends
build-essential
git
wget
make
nano
wget
file
python-dev
python-pip
cython &&
rm -rf /var/lib/apt/lists/*

WORKDIR /home/

RUN wget http://www.open-mpi.org/software/ompi/v2.1/downloads/openmpi-2.1.1.tar.gz

#RUN file -z openmpi-2.1.1.tar.gz

RUN tar xzvf openmpi-2.1.1.tar.gz

WORKDIR openmpi-2.1.1

RUN ./configure --with-cuda
RUN make -j4
RUN make install
RUN ldconfig

RUN which mpicc
RUN mpicc -show
RUN which mpiexec
RUN mpiexec --version

WORKDIR /home/

RUN git clone https://github.com/NVIDIA/nccl.git

WORKDIR /home/nccl/

RUN make CUDA_HOME=/usr/local/cuda test

RUN make install

ENV PATH /usr/local/bin:/usr/local/cuda/bin:$PATH
ENV LD_LIBRARY_PATH /usr/local/lib:/usr/local/cuda/lib64:$LD_LIBRARY_PATH
ENV LIBRARY_PATH /usr/local/lib:$LIBRARY_PATH
ENV CPATH /usr/local/cuda/include:/usr/local/include:$CPATH

RUN pip install --upgrade urllib3
RUN pip install --upgrade pip
RUN pip install --upgrade cython
RUN pip install chainermn

WORKDIR /usr/local/lib/python2.7/dist-packages/chainermn

RUN wget https://github.com/pfnet/chainermn/tree/master/examples/mnist/train_mnist.py

from chainermn.

yoshihingis avatar yoshihingis commented on August 16, 2024

Hi MannyKayy,

I faced on same issue , gcc compiler could not find cuda_runtime_api.h and etc.
Mybe you set up right Path ,this isuue will not be deleted.
I copied all files under /usr/local/cuda/include to /usr/local/include.
(I set nccl library at /usr/local/include.)
I did it, and this issue was be deleted.

from chainermn.

keisukefukuda avatar keisukefukuda commented on August 16, 2024

@MannyKayy,

Sorry, I think I misread your question. You compiler did not find cuda_runtime_api.h, not nccl.h.
But anyway, as yoshihingis implied, the problem should be solved by setting CPATH to point the directory which contains cuda_runtime_api.h, or copying the file to the system's default include path.

Thanks.

from chainermn.

keisukefukuda avatar keisukefukuda commented on August 16, 2024

@yoshihingis,
Thank you for your comment and question.
Could you create a separate issue for your docker-related question, so that we can track it more easily?
Thanks!

from chainermn.

yoshihingis avatar yoshihingis commented on August 16, 2024

Dear Keisuke,

Thank you for your comments.
I think that you will be able to track my docker issue easily.
But ChainerMN Framework is very difficult, hence I made new simple Dockerfile which use openmpi and simple code .

This Container (which wast started up by this new Dockerfile) will re-create same issues of my ChainerMN dockerfile.

New Docker file

FROM ubuntu:14.04

ENV http_proxy my company proxy
ENV https_proxy my company proxy

RUN apt-get update && apt-get install -y --no-install-recommends
build-essential
git
wget
make
nano
wget
file
python-dev
python-pip
cython &&
rm -rf /var/lib/apt/lists/*

WORKDIR /home/

#RUN file -z openmpi-2.1.1.tar.gz

RUN wget http://www.open-mpi.org/software/ompi/v2.1/downloads/openmpi-2.1.1.tar.gz

RUN tar xzvf openmpi-2.1.1.tar.gz

WORKDIR openmpi-2.1.1

RUN ./configure
RUN make -j4
RUN make install
RUN ldconfig

RUN which mpicc
RUN mpicc -show
RUN which mpiexec
RUN mpiexec --version

WORKDIR /home/
RUN mkdir OpenMpi
WORKDIR /home/OpenMPi

RUN wget http://www.open-mpi.org/papers/workshop-2006/hello.c

RUN mpicc hello.c -o hello

from chainermn.

MannyKayy avatar MannyKayy commented on August 16, 2024

Thanks, @keisukefukuda and @yoshihingis .

I now have a working Dockerfile (tested on aws g2.8x instances).

If you want me to send a pull request, let me know.

from chainermn.

keisukefukuda avatar keisukefukuda commented on August 16, 2024

Hi, @MannyKayy
It'd be really helpful if you contribute your Docker file via a PR.
Will you create a directory docker and put the Dockerfile in it?
Thanks!

from chainermn.

MannyKayy avatar MannyKayy commented on August 16, 2024

@keisukefukuda #71 done

from chainermn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.