Giter Site home page Giter Site logo

Comments (2)

keisukefukuda avatar keisukefukuda commented on June 30, 2024

Thank you for your info @yoshihingis .
I will investigate it, but currently, I don't have access to GPU-equipped & docker-enabled machine so it may take time. I will first try on non-GPU docker and see what happens.

from chainermn.

yoshihingis avatar yoshihingis commented on June 30, 2024

Dear keisuke

I modified my Dockerfile and ChinerMN could be operated on Docker , but Only CPU mode was OK, I got some erros at GPU Mode .

I added "apt-get install openssh-server" and delete wget train_mnist.py for ChainerMN.

If you tray to test My ChainerMN on Docker container, Please be careful two points.

  1. the way to test train_manist.py
    mpiexec - -allow-run-as-root -n X python train_mnist.py

2.Build Dockerfile and run Dockerimage and login in container by bash, please write train_mnist.py code by your self.
Because I do not wget train_mnist.py in my dockerfile.

I write GPU mode Errors and My ChinerMN Dockerfile.

I'd really appreciate it if you give me any advices about GPU mode erros.

Regards,

==GPU mode Error==
root@2db4d2db4676:/usr/local/lib/python2.7/dist-packages/chainermn# mpiexec --allow-run-as-root -n 2 python train_mnist.py -g
Using hierarchical communicator
Using hierarchical communicator
Traceback (most recent call last):
File "train_mnist.py", line 119, in
Traceback (most recent call last):
File "train_mnist.py", line 119, in
main()
File "train_mnist.py", line 56, in main
comm = chainermn.create_communicator(args.communicator)
File "/usr/local/lib/python2.7/dist-packages/chainermn/communicators/init.py", line 49, in create_communicator
return HierarchicalCommunicator(mpi_comm=mpi_comm)
File "/usr/local/lib/python2.7/dist-packages/chainermn/communicators/hierarchical_communicator.py", line 14, in init
main()
self.gpu_buffer_a = _memory_utility.DeviceMemory()
File "/usr/local/lib/python2.7/dist-packages/chainermn/communicators/_memory_utility.py", line 48, in init
File "train_mnist.py", line 56, in main
"Cupy is not available.")
comm = chainermn.create_communicator(args.communicator)
File "/usr/local/lib/python2.7/dist-packages/chainermn/communicators/init.py", line 49, in create_communicator
RuntimeError: DeviceMemory cannot be used: Cupy is not available.
return HierarchicalCommunicator(mpi_comm=mpi_comm)
File "/usr/local/lib/python2.7/dist-packages/chainermn/communicators/hierarchical_communicator.py", line 14, in init
self.gpu_buffer_a = _memory_utility.DeviceMemory()
File "/usr/local/lib/python2.7/dist-packages/chainermn/communicators/_memory_utility.py", line 48, in init
"Cupy is not available.")
RuntimeError: DeviceMemory cannot be used: Cupy is not available.

Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.

mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[63255,1],1]
Exit code: 1

====ChinerMN my Dockerfile======

FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu14.04

ENV http_proxy my_company_http_proxy
ENV https_proxy my_compnay_https_proxy

ENV LANG en_US.UTF-8

RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
openssh-server \
git \
wget \
make \
nano \
wget \
file \
python-dev \
python-pip \
cython && \
rm -rf /var/lib/apt/lists/*

WORKDIR /home/

RUN wget http://www.open-mpi.org/software/ompi/v2.1/downloads/openmpi-2.1.1.tar.gz

#RUN file -z openmpi-2.1.1.tar.gz

RUN tar xzvf openmpi-2.1.1.tar.gz

WORKDIR openmpi-2.1.1

RUN ./configure --with-cuda
RUN make -j4
RUN make install
RUN ldconfig

RUN which mpicc
RUN mpicc -show
RUN which mpiexec
RUN mpiexec --version

WORKDIR /home/

RUN git clone https://github.com/NVIDIA/nccl.git

WORKDIR /home/nccl/

RUN make CUDA_HOME=/usr/local/cuda test

RUN make install

ENV PATH /usr/local/bin:/usr/local/cuda/bin:$PATH
ENV LD_LIBRARY_PATH /usr/local/lib:/usr/local/cuda/lib64:$LD_LIBRARY_PATH
ENV LIBRARY_PATH /usr/local/lib:$LIBRARY_PATH
ENV CPATH /usr/local/cuda/include:/usr/local/include:$CPATH

RUN pip install --upgrade urllib3
RUN pip install --upgrade pip
RUN pip install --upgrade cython
RUN pip install chainermn

WORKDIR /usr/local/lib/python2.7/dist-packages/chainermn

from chainermn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.