williamfalcon / tensorflow-gpu-install-ubuntu-16.04 Goto Github PK

Tensorflow GPU install instructions for ubuntu 16.04 - Deep learning

ubuntu tensorflow nvidia-driver nouveau tensorflow-gpu deep-learning machine-learning

tensorflow-gpu-install-ubuntu-16.04's Introduction

Tensorflow GPU install on ubuntu 16.04

These instructions are intended to set up a deep learning environment for GPU-powered tensorflow.
See here for pytorch GPU install instructions

After following these instructions you'll have:

Ubuntu 16.04.
Cuda 9.0 drivers installed.
A conda environment with python 3.6.
The latest tensorflow version with gpu support.

Step 0: Noveau drivers

Before you begin, you may need to disable the opensource ubuntu NVIDIA driver called nouveau.

Option 1: Modify modprobe file

After you boot the linux system and are sitting at a login prompt, press ctrl+alt+F1 to get to a terminal screen. Login via this terminal screen.
Create a file: /etc/modprobe.d/nouveau-blacklist.conf e.g. by

sudo touch /etc/modprobe.d/nouveau-blacklist.conf

Put the following in the above file...

blacklist nouveau
options nouveau modeset=0

Regenerate the kernel initramfs

sudo update-initramfs -u

reboot system

reboot

On reboot, verify that noveau drivers are not loaded

lsmod | grep nouveau

If nouveau driver(s) are still loaded do not proceed with the installation guide and troubleshoot why it's still loaded.

Option 2: Modify Grub load command
From this stackoverflow solution

When the GRUB boot menu appears : Highlight the Ubuntu menu entry and press the E key. Add the nouveau.modeset=0 parameter to the end of the linux line ... Then press F10 to boot.
When login page appears press [ctrl + ALt + F1]
Enter username + password
Uninstall every NVIDIA related software:

sudo apt-get purge nvidia*  
sudo reboot

Installation steps

update apt-get

sudo apt-get update

Install apt-get deps

sudo apt-get install openjdk-8-jdk git python-dev python3-dev python-numpy python3-numpy build-essential python-pip python3-pip python-virtualenv swig python-wheel libcurl3-dev curl

install nvidia drivers

# The 16.04 installer works with 16.10.
# download drivers
curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb

# download key to allow installation
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub

# install actual package
sudo dpkg -i ./cuda-repo-ubuntu1604_9.0.176-1_amd64.deb

#  install cuda (but it'll prompt to install other deps, so we try to install twice with a dep update in between
sudo apt-get update
sudo apt-get install cuda-9-0

2a. reboot Ubuntu

sudo reboot

2b. check nvidia driver install

nvidia-smi   

# you should see a list of gpus printed    
# if not, the previous steps failed.

Install cudnn

wget https://s3.amazonaws.com/open-source-william-falcon/cudnn-9.0-linux-x64-v7.3.1.20.tgz
sudo tar -xzvf cudnn-9.0-linux-x64-v7.3.1.20.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

Add these lines to end of ~/.bashrc:

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda
export PATH="$PATH:/usr/local/cuda/bin"

4a. Reload bashrc

source ~/.bashrc

Install miniconda

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh   

# press s to skip terms   

# Do you approve the license terms? [yes|no]
# yes

# Miniconda3 will now be installed into this location:
# accept the location

# Do you wish the installer to prepend the Miniconda3 install location
# to PATH in your /home/ghost/.bashrc ? [yes|no]
# yes

5a. Reload bashrc

source ~/.bashrc

Create python 3.6 conda env to install tf

conda create -n tensorflow python=3.6

# press y a few times

Activate env

source activate tensorflow

update pip (might already be up to date, but just in case...)

pip install --upgrade pip

Install stable tensorflow with GPU support for python 3.6

pip install --upgrade tensorflow-gpu

# If the above fails, try the part below
# pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.0-cp36-cp36m-linux_x86_64.whl

Test tf install

# start python shell   
python

# run test script   
import tensorflow as tf   

hello = tf.constant('Hello, TensorFlow!')

# when you run sess, you should see a bunch of lines with the word gpu in them (if install worked)
# otherwise, not running on gpu
sess = tf.Session()
print(sess.run(hello))

or alternatively

tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"

tensorflow-gpu-install-ubuntu-16.04's People

Contributors

$fractalbass avatar$

Stargazers

Watchers

Forkers

2karis fendouai dietercastel tzrtzr000 keven4ever jchacon4 dipanjans yushroom aswl01 dapengliu coderx7 kenanpelit w1368027790 zachary-britt drmly shahariarrabby fractalbass alchem9st qthcn airob sushantjha8 ogail denysegh saadmahboob mfeldman143 roger1993 ameysaple phildani7 dmanh cveaux pathway jzkay12 jeanchritopher laurii chalkwu marcussfu jneo8 khoa-ho sharat910 kelvinson james-fu calvinytong chadihelwe mikewlange kunlqt rowedenny harshsinh ramkumars1985 liufei11111 kundjanasith ikaros-4173 bellamn nemo11 yluo39github nguyenchithien norimasanabeta daonv vinodpathak manishsahu53 westamine profcab mhaghighat boscoybarra prabhatpankaj falconzyx jitensinha98 jaredchung shashikant-ghangare ttdelgadott satishjasthi jfacoustic wanghm92 msmdev peraktong mutexlocker clhne celidos aravindkota dsp6414 mepunit tarsbase everitt257 loctruong96 sdfasdasas hhy5277 ghicheon hefv57 hanlin-zhu fengweijp codedeep79 kravi2018 hunglc007 786440445

tensorflow-gpu-install-ubuntu-16.04's Issues

How to resolve the issue of "ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory" when import tensorflow

Here is the error trace:

$ python 
Python 3.6.3 (default, May 20 2018, 18:46:07) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
Traceback (most recent call last):
  File "/home/yubrshen/.pyenv/versions/udacity_workspace/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/home/yubrshen/.pyenv/versions/udacity_workspace/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/home/yubrshen/.pyenv/versions/udacity_workspace/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/home/yubrshen/.pyenv/versions/3.6.3/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/home/yubrshen/.pyenv/versions/3.6.3/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

Update with CUDA 9 and CuDNN 7

Please update this to the requirements of tensorflow 1.5.

Disabling nouveau on ubuntu 16.04.3

The supplied instructions for disabling nouveau drivers on ubuntu 16.04.3 did not work for me. I had to follow NVIDIA instructions to achieve that (plus a restart after finishing this), check instructions here. Additionally I suggest that the README add check to verify that nouveau drivers are unloaded by running this cmd and it should not return anything

lsmod | grep nouveau

Happy to submit a PR with the suggestions

sudo apt-get install cuda-9-0 failed with error message: E: Unable to locate package cuda-9-0

Is there any spelling error with cuda-9-0?

what about kernel 4.4 compatibility issues?

There is a lot of talk about having to downgrade the current 16.04 kernel, and that officially Cuda 9.1 requires kernel 4.4 and the current kernel on 16.04 doesn't work with Cuda 9.1.
http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

I've also run into Cuda/driver installation issues on 16.04.

The instructions listed in this github don't seem to downgrade the 16.04 kernel.
What's going on?

Add a reboot step before running nvidia-smi

I encountered an issue when running nvidia-smi. I needed to reboot my system after installing CUDA before the command would list my GPUs. It worked after rebooting, and I continued with the process.

Cuda compute capability

Ignoring visible gpu device (device: 0, name: GeForce GTX 660M, pci bus id: 0000:01:00.0, compute capability: 3.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5.
How can we solve this problem?
Thank you in advance.

Beginning installing Nvidia 390 or 396

You skipped the 1st part of installing Nvidia drivers... And to check
" nvidia-smi "command
2.b ... Please tell what you'll get for cuda toolkit 9.0.. ?

tf-nightly-gpu build

First, thanks for this got me up and running with minimal fuss.

While running some test models, was running into runtime/complie time version mismatch error.

pip install tf-nightly-gpu

solved it for me.

cuDNN 7.1 incompatible

Thank you for the great tutorial! I followed the exact same steps and everything seems working fine. However, when I tried to run keras projects in PyCharm, I got an error similar to the one reported
Here. I am trying to downgrade to cudnn 7.0.4 now and see how it goes.

gpg error

When using sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub in installing nvidia drivers, I have an error as:
gpgkeys: no key data found for http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
gpg: no valid OpenPGP data found。

Reboot required after step 2

Hey ! It worked like a charm. Many thanks.
I just noticed that reboot is required after step 2, at least for me.
Maybe you want to update ;)
Cheers,
Edouard

Ubuntu 16.04 display is gone

Tried this on Ubuntu 16.04 Followed the step. The display is no longer active since the reboot on 2a. I am stuck now. Can still SSH to the machine, but it seems the boot configs are damaged and no longer recognizes the GPUs. Output for: "sudo update-initramfs -u" is:

update-initramfs: Generating /boot/initrd.img-4.13.0-32-generic
/usr/bin/objcopy:/boot/efi/EFI/ubuntu/shimx64.efi: No such file or directory
/usr/bin/objcopy: --change-section-vma .initrd=0x0000000003000000 never used
/usr/bin/objcopy: --change-section-vma .linux=0x0000000000040000 never used
/usr/bin/objcopy: --change-section-vma .cmdline=0x0000000000030000 never used
/usr/bin/objcopy: --change-section-vma .osrel=0x0000000000020000 never used
update-initramfs: Failed to generate Linuxium bootscript.

Shouldn't initramfs be updated after adding the blacklist file for nouveau.

To the best of my knowledge, you should add the command sudo update_initramfs -u after blacklisting nouveau. I think that the ramfs image has to be updated after editing the modprobe.

cudnn needed upgrading

The cudnn 7.0 is no longer compatible with the current release of tensorflow-gpu, so when running some jobs will give segment fault. Switching to newest version of cudnn fixed this.