Giter Site home page Giter Site logo

ageron / handson-ml2 Goto Github PK

View Code? Open in Web Editor NEW
26.9K 26.9K 12.5K 150.5 MB

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

License: Apache License 2.0

Jupyter Notebook 99.96% Dockerfile 0.01% Makefile 0.01% Shell 0.01% Python 0.02%

handson-ml2's Introduction

Machine Learning Notebooks

⚠ The 3rd edition of my book will be released in October 2022. The notebooks are available at ageron/handson-ml3 and contain more up-to-date code.

This project aims at teaching you the fundamentals of Machine Learning in python. It contains the example code and solutions to the exercises in the second edition of my O'Reilly book Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow:

Note: If you are looking for the first edition notebooks, check out ageron/handson-ml. For the third edition, check out ageron/handson-ml3.

Quick Start

Want to play with these notebooks online without having to install anything?

Use any of the following services (I recommended Colab or Kaggle, since they offer free GPUs and TPUs).

WARNING: Please be aware that these services provide temporary environments: anything you do will be deleted after a while, so make sure you download any data you care about.

  • Open In Colab

  • Open in Kaggle

  • Launch binder

  • Launch in Deepnote

Just want to quickly look at some notebooks, without executing any code?

  • Render nbviewer

  • github.com's notebook viewer also works but it's not ideal: it's slower, the math equations are not always displayed correctly, and large notebooks often fail to open.

Want to run this project using a Docker image?

Read the Docker instructions.

Want to install this project on your own machine?

Start by installing Anaconda (or Miniconda), git, and if you have a TensorFlow-compatible GPU, install the GPU driver, as well as the appropriate version of CUDA and cuDNN (see TensorFlow's documentation for more details).

Next, clone this project by opening a terminal and typing the following commands (do not type the first $ signs on each line, they just indicate that these are terminal commands):

$ git clone https://github.com/ageron/handson-ml2.git
$ cd handson-ml2

Next, run the following commands:

$ conda env create -f environment.yml
$ conda activate tf2
$ python -m ipykernel install --user --name=python3

Finally, start Jupyter:

$ jupyter notebook

If you need further instructions, read the detailed installation instructions.

FAQ

Which Python version should I use?

I recommend Python 3.8. If you follow the installation instructions above, that's the version you will get. Most code will work with other versions of Python 3, but some libraries do not support Python 3.9 or 3.10 yet, which is why I recommend Python 3.8.

I'm getting an error when I call load_housing_data()

Make sure you call fetch_housing_data() before you call load_housing_data(). If you're getting an HTTP error, make sure you're running the exact same code as in the notebook (copy/paste it if needed). If the problem persists, please check your network configuration.

I'm getting an SSL error on MacOSX

You probably need to install the SSL certificates (see this StackOverflow question). If you downloaded Python from the official website, then run /Applications/Python\ 3.8/Install\ Certificates.command in a terminal (change 3.8 to whatever version you installed). If you installed Python using MacPorts, run sudo port install curl-ca-bundle in a terminal.

I've installed this project locally. How do I update it to the latest version?

See INSTALL.md

How do I update my Python libraries to the latest versions, when using Anaconda?

See INSTALL.md

Contributors

I would like to thank everyone who contributed to this project, either by providing useful feedback, filing issues or submitting Pull Requests. Special thanks go to Haesun Park and Ian Beauregard who reviewed every notebook and submitted many PRs, including help on some of the exercise solutions. Thanks as well to Steven Bunkley and Ziembla who created the docker directory, and to github user SuperYorio who helped on some exercise solutions.

handson-ml2's People

Contributors

ada-nai avatar ageron avatar alexdombos avatar arodiss avatar austin-chan avatar baseplate77 avatar bric avatar chrisqlasty avatar cp612sh avatar daniel-s-ingram avatar davidcotton avatar dependabot[bot] avatar dgwozdz avatar francotheengineer avatar gae7 avatar gsundeep-tech avatar ibeauregard avatar jruales avatar lvnilesh avatar mbreemhaar avatar nbgraham avatar neonithinar avatar ngaro avatar pizzaz93 avatar pkourdis avatar psnilesh avatar rickiepark avatar stevenbunkley avatar vasili111 avatar ziembla avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

handson-ml2's Issues

Cannot access container & it only runs (exit code 0) for seconds

After copying environment.yml to the right place and commenting out git.nbdiff (why didn't it work?) the container built successfully but a) starting it from Portainer it runs for a few seconds and then seems to stop and b) even if I was fast I could not access anything on localhost:8888

I might have the port wrong, but can't recall where it it was set or work out anything else to do with the container.

I am only interested in the python and ML and have no experience of LINUX or Docker, and I suspect I and others like me would benefit from more explicit explanations of how to run this - it looks fantastic in prospect!

Would appreciate your expert advice :)

The last build output was:

olop@Minty ~/Downloads/handson-ml2-master/docker $ docker-compose build
Building handson-ml2
Step 1/19 : FROM continuumio/miniconda3:latest
---> 406f2b43ea59
Step 2/19 : RUN apt-get update && apt-get upgrade -y && apt-get install -y libpq-dev build-essential git sudo cmake zlib1g-dev libjpeg-dev xvfb ffmpeg xorg-dev libboost-all-dev libsdl2-dev swig unzip zip && rm -rf /var/lib/apt/lists/*
---> Using cache
---> 09dbaee5152a
Step 3/19 : RUN conda update -n base conda
---> Using cache
---> c8ad38ddfa7e
Step 4/19 : COPY docker/environment.yml /tmp/
---> Using cache
---> 6d80800eddca
Step 5/19 : RUN conda env create -f /tmp/environment.yml
---> Using cache
---> f5108c8f9e87
Step 6/19 : ARG username
---> Using cache
---> c37e2d9a8754
Step 7/19 : ARG userid
---> Using cache
---> 30bd5ca16a55
Step 8/19 : ARG home=/home/${username}
---> Using cache
---> e8d4e94deb07
Step 9/19 : ARG workdir=${home}/handson-ml2
---> Using cache
---> 337be092376a
Step 10/19 : RUN adduser ${username} --uid ${userid} --gecos '' --disabled-password && echo "${username} ALL=(root) NOPASSWD:ALL" > /etc/sudoers.d/${username} && chmod 0440 /etc/sudoers.d/${username}
---> Using cache
---> 2c0fbd3c6e5b
Step 11/19 : WORKDIR ${workdir}
---> Using cache
---> 77728e3a0ce7
Step 12/19 : RUN chown ${username}:${username} ${workdir}
---> Using cache
---> 71dae86963b8
Step 13/19 : USER ${username}
---> Using cache
---> bfb3057bacbe
Step 14/19 : WORKDIR ${workdir}
---> Using cache
---> 74a1a41cdc47
Step 15/19 : COPY docker/bashrc.bash /tmp/
---> 9287d9083afe
Step 16/19 : RUN cat /tmp/bashrc.bash >> ${home}/.bashrc
---> Running in 74fb5bed35ce
Removing intermediate container 74fb5bed35ce
---> 4474776e18d3
Step 17/19 : RUN echo "export PATH="${workdir}/docker/bin:$PATH"" >> ${home}/.bashrc
---> Running in 5e97fbd7e8b9
Removing intermediate container 5e97fbd7e8b9
---> b265c7b1aeee
Step 18/19 : RUN sudo rm /tmp/bashrc.bash
---> Running in 5a858f4d5a33
Removing intermediate container 5a858f4d5a33
---> a23ae6dd4a93
Step 19/19 : RUN mkdir -p ${home}/.jupyter && echo 'c.NotebookApp.password = u"sha1:c6bbcba2d04b:f969e403db876dcfbe26f47affe41909bd53392e"' >> ${home}/.jupyter/jupyter_notebook_config.py
---> Running in f13c82dc15cd
Removing intermediate container f13c82dc15cd
---> 0f02601cf182
Successfully built 0f02601cf182
Successfully tagged handson-ml2:latest

the edited Dockerfile is like this

FROM continuumio/miniconda3:latest

RUN apt-get update && apt-get upgrade -y \
    && apt-get install -y \
        libpq-dev \
        build-essential \
        git \
        sudo \
        cmake zlib1g-dev libjpeg-dev xvfb ffmpeg xorg-dev libboost-all-dev libsdl2-dev swig \
        unzip zip \
    && rm -rf /var/lib/apt/lists/*

RUN conda update -n base conda
COPY docker/environment.yml /tmp/
RUN conda env create -f /tmp/environment.yml

ARG username
ARG userid

ARG home=/home/${username}
ARG workdir=${home}/handson-ml2

RUN adduser ${username} --uid ${userid} --gecos '' --disabled-password \
    && echo "${username} ALL=(root) NOPASSWD:ALL" > /etc/sudoers.d/${username} \
    && chmod 0440 /etc/sudoers.d/${username}

WORKDIR ${workdir}
RUN chown ${username}:${username} ${workdir}

USER ${username}
WORKDIR ${workdir}

# The config below enables diffing notebooks with nbdiff (and nbdiff support
# in git diff command) after connecting to the container by "make exec" (or
# "docker-compose exec handson-ml2 bash")
#       You may also try running:
#         nbdiff NOTEBOOK_NAME.ipynb
#       to get nbdiff between checkpointed version and current version of the
# given notebook.

# commented out 1/11/2019 22:31
# RUN git-nbdiffdriver config --enable --global

# INFO: Optionally uncomment any (one) of the following RUN commands below to ignore either
#       metadata or details in nbdiff within git diff
#RUN git config --global diff.jupyternotebook.command 'git-nbdiffdriver diff --ignore-metadata'
# commented out 1/11/2019 22:31
# RUN git config --global diff.jupyternotebook.command 'git-nbdiffdriver diff --ignore-details'


COPY docker/bashrc.bash /tmp/
RUN cat /tmp/bashrc.bash >> ${home}/.bashrc
RUN echo "export PATH=\"${workdir}/docker/bin:$PATH\"" >> ${home}/.bashrc
RUN sudo rm /tmp/bashrc.bash


# INFO: Uncomment lines below to enable automatic save of python-only and html-only
#       exports alongside the notebook
# COPY docker/jupyter_notebook_config.py /tmp/
# RUN cat /tmp/jupyter_notebook_config.py >> ${home}/.jupyter/jupyter_notebook_config.py
# RUN sudo rm /tmp/jupyter_notebook_config.py


# INFO: Uncomment the RUN command below to disable git diff paging
# RUN git config --global core.pager ''


# INFO: Uncomment the RUN command below for easy and constant notebook URL (just localhost:8888)
#       That will switch Jupyter to using empty password instead of a token.
#       To avoid making a security hole you SHOULD in fact not only uncomment but
#       regenerate the hash for your own non-empty password and replace the hash below.
#       You can compute a password hash in any notebook, just run the code:
#          from notebook.auth import passwd
#          passwd()
#       and take the hash from the output

# uncommented 1/11/2019 22:22:33
RUN mkdir -p ${home}/.jupyter && \
    echo 'c.NotebookApp.password = u"sha1:c6bbcba2d04b:f969e403db876dcfbe26f47affe41909bd53392e"' \
    >> ${home}/.jupyter/jupyter_notebook_config.py

Docker image scikit-learn out of date

I created a fresh docker image. Book 1 code example 1-1 asserts sklearn >=.20 but the installed version is .19.1

Running make exec then conda update scikit-learn says user does not have write permissions to /opt/conda and sudoing the command says conda command not found.

Installing on windows

HI! Thank for a very nice book. When I try to install on a Win10 machine I get some errors when trying the requirements.txt..

a) jupyter notebook is missing. Needed to do python3 -m pip install jupyter

b) i need to change pip3 to pip and python3 to only python (I only have python 3.6.8, no 2.7)

c) tfx fails to install (Could not find a version that satisfies the requirement ml-metadata<0.15,>=0.14):

Collecting future
  Using cached https://files.pythonhosted.org/packages/3f/bf/57733d44afd0cf67580658507bd11d3ec629612d5e0e432beb4b8f6fbb04/future-0.18.1.tar.gz
Collecting dill
  Using cached https://files.pythonhosted.org/packages/c7/11/345f3173809cea7f1a193bfbf02403fff250a3360e0e118a1630985e547d/dill-0.3.1.1.tar.gz
Collecting tensorflow-metadata
  Using cached https://files.pythonhosted.org/packages/4e/c2/e4ed82a725c9f8160a0ed73f0511773be9f76343def86f6f47121f0e8430/tensorflow_metadata-0.15.0-py2.py3-none-any.whl
Collecting psutil
  Using cached https://files.pythonhosted.org/packages/86/91/f15a3aae2af13f008ed95e02292d1a2e84615ff42b7203357c1c0bbe0651/psutil-5.6.3-cp36-cp36m-win_amd64.whl
Requirement already satisfied, skipping upgrade: attrs in c:\users\kalle\venvs\my_env\lib\site-packages (from tensorflow-datasets==1.2.0->-r requirements.txt (line 52)) (19.3.0)
ERROR: Could not find a version that satisfies the requirement ml-metadata<0.15,>=0.14 (from tfx==0.14.0->-r requirements.txt (line 57)) (from versions: 0.12.0.dev0, 0.13.0.dev0, 0.13.1.dev0)
ERROR: No matching distribution found for ml-metadata<0.15,>=0.14 (from tfx==0.14.0->-r requirements.txt (line 57))

(my_env) C:\Users\Kalle\venvs\my_env\handson-ml2-master>

d) some other error (I am using TF 2.0.0 GPU) but some TF 1.14 error?)

  Building wheel for promise (setup.py) ... done
  Created wheel for promise: filename=promise-2.2.1-cp36-none-any.whl size=21294 sha256=27bd5f0ea45972456dae5e656e0a57938455f326249db262f323a750d2a77fe2
  Stored in directory: C:\Users\Kalle\AppData\Local\pip\Cache\wheels\92\84\9f\75e2235effae0e1c5a5c0626a503e532bbffcb7e79e672b606
  Building wheel for googleapis-common-protos (setup.py) ... done
  Created wheel for googleapis-common-protos: filename=googleapis_common_protos-1.6.0-cp36-none-any.whl size=77586 sha256=e6806e2edee25b1590cdc111a6cc456ecb792bbed3473faf14f107885fe345b7
  Stored in directory: C:\Users\Kalle\AppData\Local\pip\Cache\wheels\9e\3d\a2\1bec8bb7db80ab3216dbc33092bb7ccd0debfb8ba42b5668d5
Successfully built nltk opt-einsum gast absl-py future dill promise googleapis-common-protos
ERROR: tensorflow 1.14.0 has requirement tensorboard<1.15.0,>=1.14.0, but you'll have tensorboard 2.0.0 which is incompatible.
ERROR: tensorflow 1.14.0 has requirement tensorflow-estimator<1.15.0rc0,>=1.14.0rc0, but you'll have tensorflow-estimator 2.0.1 which is incompatible.
Installing collected packages: cycler, kiwisolver, numpy, pyparsing, matplotlib, pytz, pandas, scipy, scikit-learn, xgbo

Maybe you/someone have some ideas of what is wrong?
Best regards / Kalle Prorok, Umeå, Sweden (teacher with 166 students on our Deep Learnig course)

Ch1 comibine_country_stats

The line gdp_per_capita.set_index("Country", inplace=True) should be remeved. The index is already set to country in pd.read_csv().

Docker building hangs forever

Not sure if it is related to this issue, but it didn't help by updating to latest conda, suggested in the post before they closed it. In Step 5/32 : RUN conda install -y -c conda-forge pyopengl xgboost nbdime I got:

Solving environment: ...working... failed with initial forzen solve. Retrying with flexible solve. 

and the script started to run several Examining..., Comparing specs that have this dependency..., Finding shortest conflict path... package by package, both CPU and RAM usage ran high. Try to let it check but did not end overnight.

My working environment is MacOS 10.14.6, Mid-2018 macbook pro, conda version is 4.7.11 from Miniconda, first priority channel is conda-forge.

For some reasons, it turns out to pass the solving environment step when I change the image resource at the 1st line in docker/Dockerfile from FROM continuumio/anaconda3:2019.03 to be FROM continuumio/miniconda3:latest. It might happen to any other user who didn't install Anaconda but Miniconda.

Installation on Windows - %matplotlib inline problems

Yes I know ML practioners all want us to use Linux and avoid Windows. But some of us do use Windows 10 for Machine Learning and Tensorflow.

I have Anaconda installed, and have created a tensorflowenv to process the new TF2.0 API. I have used your environment.yml. This appears to work after I had to explicitly install tensorflow-gpu 2.0 into the Conda Environment and upgraded to Cuda 10.x Nvidia drivers. I canrun the new TF.keras Api and training. But I would like to run your code.

However this environment does not support %matplotlib inline for the basic display of plots in jupyter notebooks. pillow was installed but I still get the error:
from . import _imaging as core
if version != getattr(core, "PILLOW_VERSION", None):
ImportError: DLL load failed: The specified module could not be found.

My other environments with this instruction do work. So its not necessarily a Windows problem, so its looks like a package dependency within this particular Conda environment that does not enable me to run this Instruction. All rather curious.

I appreciate that you, Aurelien, don't run on Windows, and cannot help, but perhaps if another reader has reconciled this issue, it would be very helpful.

Theory Questions K Means for Preprocessing

Hi There,

I hope you're well.

Thank you for your book, it's been really useful and enjoyed reading it.

I was wondering if you could helped clarify this issue for me:
When we use Kmeans for preprocessing, where is the distance computed, and stored where?

For example,
You place KMeans and Logistic Regression into a Pipeline and use the .fit() method.

I understand that we are trying to get the distances of each value from it's cluster, but does KMeans.fit(xtrain,ytrain) actually do this? Or is it initiated when we run the gridsearchCV. If this is the case, what is the sequence of steps and how do we know what's being performed.

I know I didn't phrase this well but happy to try rephrase it again?

Thanks,

Viraj

Chap2 Exercise 3

I see that in exercise 3 of chapter 2, you convert the value of the parmenter K to negative value. Could you explain for me the benefit for doing it? Many thanks !

Install Docker on Mac

I tried to install the docker image on Mac, latest OS, 10.16.1

Followed instructions switching to docker directory and issuing the
make build

[....]

Downloading and Extracting Packages
sqlite-3.30.1 | 1.9 MB | ########## | 100%
cffi-1.13.1 | 224 KB | ########## | 100%
asn1crypto-1.2.0 | 162 KB | ########## | 100%
cryptography-2.8 | 612 KB | ########## | 100%
ca-certificates-2019 | 131 KB | ########## | 100%
openssl-1.1.1d | 3.7 MB | ########## | 100%
pip-19.3.1 | 1.9 MB | ########## | 100%
setuptools-41.6.0 | 652 KB | ########## | 100%
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Removing intermediate container 8d784dabb7cf
---> feff143c8ef1
Step 4/20 : COPY docker/environment.yml /tmp/
ERROR: Service 'handson-ml2' failed to build: COPY failed: stat /var/lib/docker/tmp/docker-builder765996510/docker/environment.yml: no such file or directory
make: *** [build] Error 1

Chapter 11 Exercise 8 transfer learning question

I'm unclear on how to generate the digits' 0-4 datasets. In the chapter 11 notebook, I see where the output layer is 8 neurons since it's only trying to output 1 of 8 classes. But, do the "y" values have to be zero-based and contiguous? Looks like "yes" based on the example. For the exercise, I wanted to make a 0-4 dataset, and a 5-9 dataset, leaving the y's equal to 5,6,7,8, or 9. So should they be modified down to 0-4 by subtracting 5? And why, in the exercise, do the generated binary values have to be a float? Wouldn't integer work?

For the splitting function, I wanted to do something like:

# Build training data with digits 0-4
def split_dataset(X, y):
    y_56789 = y > 4 # digits 5,6,7,8,9
    y_A = y[~y_56789] # since deleting 5 - 9, no need to modify remaining 0 - 4 class numbers
    y_B = y[y_56789]
    return ((X[~y_56789], y_A),
            (X[y_56789], y_B))

Does this work? Thanks.

Are loss/accuracy ratios a better early stopping condition than actual lack of improvement?

I'm going through the 2nd edition chapter 10 notebook and working on the "Training and Evaluating the Model" section. Going through the "Using Callbacks" section, the EarlyStopping method seems useful (by default using no improvement for some number of epochs). For some reason, my learning curves plot looks different. I may have tweaked something, but I think I have the random seed inits the same. It looks like mine starts to overfit way earlier than the book graph suggests. I can see how to make my own metric and incorporate into a custom early stopping callback, but it's not clear what the metric should be. Your example prints out the val_loss/loss ratio. I did that and it starts drifting up after epoch 10, which brings up some questions. Should this ratio be used as an early stopping condition? At what value (1.0, 1.5, . . .)? Better to use a ratio of accuracies as an overfitting indicator? Any other interesting metrics indicate overfitting? Graph below. Thanks.
Screen Shot 2019-06-14 at 1 20 25 PM

Missing output value in 03_classification

Hi,

This issue applies to Chapter 3's code sample (i.e.: 03_classification).

When calculating the desired threshold value for a precision >= 0.09, the following is done:

threshold_90_precision = thresholds[np.argmax(precisions >= 0.90)]

Right after this statement, the following is written:

threshold_90_precision

Correct me if I'm wrong, but I believe this was done in order to demonstrate that the threshold in question is equal to 7816.155523682526 (note: 7813 is used in the graph examples / shown in comments, so there's a disparity there).

The issue is that the output is missing.

Thanks,
Hussein Khalil

Chapter 7 - Out-of-Bag Evaluation

In Chapter 7, the book says,

The BaggingClassifier automatically performs soft voting instead of hard voting if the base classifier can estimate class probabilities (i.e., if it has a predict_proba() method), which is the case with Decision Trees classifiers

Then in the following section of Out-of-Bag Evaluation, it says the oob error is calculated by

You can evaluate the ensemble itself by averaging out the oob evaluations of each predictor

It seems to me that this is just the average of generalization error for each tree. Why can it be used as the generalization error for the ensemble model, not to mention the random forest is trained using soft voting?

Chapter 3: plot_digits() error

I have this code:

some_index = 5500
plt.subplot(121); plot_digits(X_test_mod[some_index])
plt.subplot(122) #plot_digits(y_test_mod[some_index])
plt.show()

But there are some errors in the function plot_digits()

def plot_digits(instances, images_per_row=10, **options): 
    size = 28 
    images_per_row = min(len(instances), images_per_row) 
    images = [instance.reshape(size,size) for instance in instances] 
    n_rows = (len(instances) - 1) // images_per_row + 1 
    row_images = [] 
    n_empty = n_rows * images_per_row - len(instances) 
    images.append(np.zeros((size, size * n_empty)))  
    for row in range(n_rows): 
        rimages = images[row * images_per_row : (row + 1) * images_per_row] 
        row_images.append(np.concatenate(rimages, axis=1)) 
    image = np.concatenate(row_images, axis=0) 
    plt.imshow(image, cmap = plt.cm.binary, **options) 
    plt.axis("off") 

It shows me this:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-89-2bf391be4bbb> in <module>
      1 some_index = 5500
----> 2 plt.subplot(121); plot_digits(X_test_mod[some_index])
      3 plt.subplot(122); plot_digits(y_test_mod[some_index])
      4 plt.show()
      5 

<ipython-input-39-75ce87a65aba> in plot_digits(instances, images_per_row, **options)
      9     size = 28 
     10     images_per_row = min(len(instances), images_per_row) 
---> 11     images = [instance.reshape(size,size) for instance in instances] 
     12     n_rows = (len(instances) - 1) // images_per_row + 1 
     13     row_images = [] 

<ipython-input-39-75ce87a65aba> in <listcomp>(.0)
      9     size = 28 
     10     images_per_row = min(len(instances), images_per_row) 
---> 11     images = [instance.reshape(size,size) for instance in instances] 
     12     n_rows = (len(instances) - 1) // images_per_row + 1 
     13     row_images = [] 

ValueError: cannot reshape array of size 1 into shape (28,28)

How can I solve that? I tried a lot of things like reshape and nothing!

Where is Appendix A?

Excellent book. I'm going through chapter 4 exercises, and was trying to check my answers. The notebook says to look in Appendix A, but I can't find it going through all the directories. Is it available yet? Did I miss it? Thanks.

Ch 18 on Mac / Docker

Mac OS 10.15.1. I have this error.
XQuartz 2.7.11

XQuartz is in the path of the docker file.

I installed this file as instructed:

python3 -m pip install -U pyvirtualdisplay
try:
    import pyvirtualdisplay
    display = pyvirtualdisplay.Display(visible=0, size=(1400, 900)).start()
except ImportError:
    pass
xdpyinfo was not found, X start can not be checked! Please install xdpyinfo!

env.render()

---------------------------------------------------------------------------
NoSuchDisplayException                    Traceback (most recent call last)
<ipython-input-6-d9761596d5d9> in <module>
----> 1 env.render()

/opt/conda/envs/tf2/lib/python3.7/site-packages/gym/core.py in render(self, mode, **kwargs)
    231 
    232     def render(self, mode='human', **kwargs):
--> 233         return self.env.render(mode, **kwargs)
    234 
    235     def close(self):

/opt/conda/envs/tf2/lib/python3.7/site-packages/gym/envs/classic_control/cartpole.py in render(self, mode)
    148 
    149         if self.viewer is None:
--> 150             from gym.envs.classic_control import rendering
    151             self.viewer = rendering.Viewer(screen_width, screen_height)
    152             l,r,t,b = -cartwidth/2, cartwidth/2, cartheight/2, -cartheight/2

/opt/conda/envs/tf2/lib/python3.7/site-packages/gym/envs/classic_control/rendering.py in <module>
     25 
     26 try:
---> 27     from pyglet.gl import *
     28 except ImportError as e:
     29     raise ImportError('''

/opt/conda/envs/tf2/lib/python3.7/site-packages/pyglet/gl/__init__.py in <module>
    237     # trickery is for circular import
    238     _pyglet.gl = _sys.modules[__name__]
--> 239     import pyglet.window

/opt/conda/envs/tf2/lib/python3.7/site-packages/pyglet/window/__init__.py in <module>
   1894 if not _is_pyglet_docgen:
   1895     pyglet.window = sys.modules[__name__]
-> 1896     gl._create_shadow_window()
   1897 

/opt/conda/envs/tf2/lib/python3.7/site-packages/pyglet/gl/__init__.py in _create_shadow_window()
    206 
    207     from pyglet.window import Window
--> 208     _shadow_window = Window(width=1, height=1, visible=False)
    209     _shadow_window.switch_to()
    210 

/opt/conda/envs/tf2/lib/python3.7/site-packages/pyglet/window/xlib/__init__.py in __init__(self, *args, **kwargs)
    164                     self._event_handlers[message] = func
    165 
--> 166         super(XlibWindow, self).__init__(*args, **kwargs)
    167 
    168         global _can_detect_autorepeat

/opt/conda/envs/tf2/lib/python3.7/site-packages/pyglet/window/__init__.py in __init__(self, width, height, caption, resizable, style, fullscreen, visible, vsync, display, screen, config, context, mode)
    499 
    500         if not display:
--> 501             display = get_platform().get_default_display()
    502 
    503         if not screen:

/opt/conda/envs/tf2/lib/python3.7/site-packages/pyglet/window/__init__.py in get_default_display(self)
   1843         :rtype: `Display`
   1844         """
-> 1845         return pyglet.canvas.get_display()
   1846 
   1847 if _is_pyglet_docgen:

/opt/conda/envs/tf2/lib/python3.7/site-packages/pyglet/canvas/__init__.py in get_display()
     80 
     81     # Otherwise, create a new display and return it.
---> 82     return Display()
     83 
     84 if _is_pyglet_docgen:

/opt/conda/envs/tf2/lib/python3.7/site-packages/pyglet/canvas/xlib.py in __init__(self, name, x_screen)
     84         self._display = xlib.XOpenDisplay(name)
     85         if not self._display:
---> 86             raise NoSuchDisplayException('Cannot connect to "%s"' % name)
     87 
     88         screen_count = xlib.XScreenCount(self._display)

NoSuchDisplayException: Cannot connect to "None"

RNN Chapter

I wonder if there is a typo in the 15_RNN code? When you are plotting the graphs in the time series example, you use the following function. Looks like the labels for Actual and Forecast should be swapped.

def plot_multiple_forecasts(X, Y, Y_pred):
    n_steps = X.shape[1]
    ahead = Y.shape[1]
    plot_series(X[0, :, 0])
    plt.plot(np.arange(n_steps, n_steps + ahead), Y_pred[0, :, 0], "ro-", label="Actual")
    plt.plot(np.arange(n_steps, n_steps + ahead), Y[0, :, 0], "bx-", label="Forecast", markersize=10)
    plt.axis([0, n_steps + ahead, -1, 1])
    plt.legend(fontsize=14)

Chapter 3 - Classification - Titanic

This applies to Chapter 3 - Classification, more specifically the exercise for the Titanic data set.

When discussing ways to improve the results, the following is expressed:

(...) try to convert numerical attributes to categorical attributes: for example, different age groups had very different survival rates (see below), so it may help to create an age bucket category and use it instead of the age. Similarly, it may be useful to have a special category for people traveling alone since only 30% of them survived (see below).

We then proceed to create an "AgeBucket" attribute:

train_data["AgeBucket"] = train_data["Age"] // 15 * 15
train_data[["AgeBucket", "Survived"]].groupby(['AgeBucket']).mean()

Remarks:

  1. We are using train_data which didn't go through the pre-processing pipeline (see: preprocess_pipeline). I assume this would impact the results due to the missing age entries (see train_data.info()) - should X_train(:, 0) be used instead ?
  2. I have difficulty understanding the 15 * 15 comment - what is the intention ?
  3. The creation of the age buckets seems to be missing. I was able to achieve something similar by using panda.cut: train_data["AgeBucket"] = pd.cut(X_train[:, 0], bins=[0,15,30,45,60,75,100], labels=["0","15","30","45","60","75"])

Thanks, I thoroughly enjoyed the exercise.

urllib not defined

Hello,

Sorry I'm not a Python expert, in 01_the_machine_learning_landscape there is an error in this cell :

# Download the data
DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml2/master/"
os.makedirs(datapath, exist_ok=True)
for filename in ("oecd_bli_2015.csv", "gdp_per_capita.csv"):
    print("Downloading", filename)
    url = DOWNLOAD_ROOT + "datasets/lifesat/" + filename
    urllib.request.urlretrieve(url, datapath + filename)

the import urllib is missing

Simple question about pipelines

Hi @ageron
Im really starting to finally wrap my head around linear regression thanks to your wonderful book!
I just have a simple question, which may be pretty trivial so forgive me for asking.
I have started using some of the datasets provided in sklearn.datasets and im trying to follow along with the end-to-end chapter. In your book you use pipleines to apply all the various transformation steps sequentially. if youre working with a dataset that does not have text data, or null values, and the only transformation you want to apply is feature scaling, you would not need to apply a pipeline right? you would just call the standardscaler module on the data on its own?

Thanks again for this wonderful book, really learning a lot from it, despite being quite overwhelmed at first glance

[Question] learning <handson-ml2> with 1st edition book

Hi Aurelien,

First of all this is a question instead of an issue; please let me know if this is not the right place to ask a book-related question and redirect me to the right place :)

I bought the 1st edition of the book and so far have been enjoying it very much. As TensorFlow 2.0 (TF) came out, from the ipynbs on this repo it seems that the 2nd edition will focus on taking advantage of the newer version of TF. I noticed that Chapter 9 (Up and Running with TensorFlow) of the 1st edition book is gone in the 2nd edition, may I know if this is because the low-level compute-graph construction is no longer applicable in TF2.0? And for a reader/learner like me, before the release of 2nd edition in Aug 2019, do you have any suggestions / advices on trying to use the 1st edition book to learn with latest version of TF? Should I skip particular chapters? Your advices would be greatly appreciated.

Best Regards,
Ji

tensorflow-gpu 2.0.0 and cuda 10.1

The tensorflow-gpu 2.0.0 you get from pip only works with cuda 10.0. If you have the newer cuda 10.1 installed, you'll either have to downgrade cuda or compile tensorflow from source yourself.

I went the latter route and it works fine with cuda 10.1.

Classification: add labels in confusion_matrix.

In the classification chapter, there is a part with a confusion matrix. It really confuses, because there TP stays for False and TN stays for True class.
Also, in the first book (don't have the second) confusion matrix is showed transponed. TP should be always in the upper left corner.

Linear regression example in 2nd Edition book using unprocessed training data

It appears that the data used to test the trained linear regression model on page 75 of the 2nd edition of "Hands-on..." is using the unprocessed housing data frame. If the model was trained with housing_prepared shouldn't the examples (i.e. some_data=housing.iloc[:5]) use the processed data set as well (i.e. some_data=housing_prepared[:5])?

Chapter2-Housingdataset-FileNotFoundError

Hi, I am trying to run the code of your Housing example in chapter 2, but I got the following message:

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-18-373f7b289efc> in <module>
----> 1 housing = load_housing_data()

<ipython-input-17-4d0bff7b3608> in load_housing_data(housing_path)
      2 def load_housing_data(housing_path=HOUSING_PATH):
      3     csv_path = os.path.join(housing_path, "housing.csv")
----> 4     return pd.read_csv(csv_path)

~/hanson-ml/ML/env/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
    700                     skip_blank_lines=skip_blank_lines)
    701 
--> 702         return _read(filepath_or_buffer, kwds)
    703 
    704     parser_f.__name__ = name

~/hanson-ml/ML/env/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    427 
    428     # Create the parser.
--> 429     parser = TextFileReader(filepath_or_buffer, **kwds)
    430 
    431     if chunksize or iterator:

~/hanson-ml/ML/env/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    893             self.options['has_index_names'] = kwds['has_index_names']
    894 
--> 895         self._make_engine(self.engine)
    896 
    897     def close(self):

~/hanson-ml/ML/env/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
   1120     def _make_engine(self, engine='c'):
   1121         if engine == 'c':
-> 1122             self._engine = CParserWrapper(self.f, **self.options)
   1123         else:
   1124             if engine == 'python':

~/hanson-ml/ML/env/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1851         kwds['usecols'] = self.usecols
   1852 
-> 1853         self._reader = parsers.TextReader(src, **kwds)
   1854         self.unnamed_cols = self._reader.unnamed_cols
   1855 

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: [Errno 2] File b'datasets/housing/housing.csv' does not exist: b'datasets/housing/housing.csv'

I am not sure what's going on, but I realized that the link appearing in your code, namely:

https://raw.githubusercontent.com/ageron/handson-ml/master/

is broken. Is this the cause of the problem?

Regards.

18 - Reinforcement learning - It won't learn Deep-Q-Network

When I run the Deep-Q-Network in tensorflow 1.14, I need to disable tf.random.seed. Which is fine, but then the model becomes unstable. It doesn't learn. In fact, it gets worse after each episode. I ran it maybe 20 times, and only 2 of them learned. Any ideas?

typo in chapter 1

I'm just reading pdf version from ebooks.com.
Just find typo as below: 'Reinforcement Learning isag a very different beast.'
Which should be 'Reinforcement Learning is a very different beast'.

environment.yml should be in docker

This is hard work for someone who a) doesn't do linux, b) just installed docker in complete ignorance in linux under vmware because win10 home... z)

I think there's a problem: error in docker build

Step 4/20 : COPY docker/environment.yml /tmp/
ERROR: Service 'handson-ml2' failed to build: COPY failed: stat /var/lib/docker/tmp/docker-builder812836084/docker/environment.yml: no such file or directory
Makefile:9: recipe for target 'build' failed

because dockerfile contains

RUN conda update -n base conda
COPY docker/environment.yml /tmp/
RUN conda env create -f /tmp/environment.yml

but environment.yml is not in docker/

(After copying it in I got to step 5 of 20 and "enabling notebook extension"... validating... done
and then a valid environment seems to be built

Now I can get to step 15, but there's a new error...

Service 'handson-ml2' failed to build: The command '/bin/sh -c git-nbdiffdriver config --enable --global' returned a non-zero code: 127...

ah, probably because

/bin/sh: 1: git-nbdiffdriver: not found

on the line above

On the newly added confidence interval section in chapter 2

In chapter 2, the confidence interval of generalization error (RMSE) is calculated by taking the square root of a t interval. The code is as follows,

from scipy import stats
confidence = 0.95
squared_errors = (final_predictions - y_test) ** 2 #y_test is real values, final_predictions is predicted values of y
np.sqrt(stats.t.interval(confidence, len(squared_errors) - 1, loc=squared_errors.mean(), scale=stats.sem(squared_errors)))

It's unintuitive to me, though (isn't the sum of squared errors follow something like a ${\chi}^2$ distribution?). I wonder under what assumptions (such as what distribution of the variables) this result is deduced, thanks!

Chapter 12 – Custom Models and Training with TensorFlow

Hi~

I seem to have found a mistake.

The sentence above the cell "In [100]:" in your jupyter notebook is "If you do the math, you will find that metric = loss * mean of sample weights (plus some floating point precision error).".

However, I found loss = metric * mean_of_sample_weights rather than metric = loss * mean_of_sample_weights after I did the math. And this is verified by the code in your jupyter notebook (cell "In [102]:" and cell "In [124]:").

  • cell "In [102]:":
In [102]: history.history["loss"][0], history.history["huber_fn"][0] * sample_weight.mean()
Out[102]: (0.11463345723687114, 0.11505695444753776)
  • cell "In [124]:"
In [124]: history.history["loss"][0], history.history["HuberMetric"][0] * sample_weight.mean()
Out[124]: (0.36160108902965715, 0.36160092789449944)

Tensorflow v 2.0 seems to be incompatible with standalone Keras

I'm working my way through Ch 10, and I decided to follow along by using the standalone Keras API rather than Tensorflow's. I had TF updated to 2.0 as in the book, and when I changed the import statements to reflect that, I got this error.
keras-team/keras#12379
The solutions they have are to downgrade to 1.1 something TF, or to use TF's built in Keras support. But it looks like using the standalone Keras with TF2 won't work as of right now.

Chapter 11, OneCycleScheduler

Shouldn't the _interpolate function in the OneCycleScheduler class be returning
(rate2 - rate1) * (self.iteration - iter1) / (iter2 - iter1)
instead of
(rate2 - rate1) * (iter2 - self.iteration) / (iter2 - iter1)
?

How to zip some features in tf.data.Dataset ?

I have a dataset with structure: (x, y_in, y_length, y_out) for my encoder-decoder network.
But my encoder-decoder network only accept ((x, y_in, y_length), y_out).
How to convert from (x, y_in, y_length, y_out) to ((x, y_in, y_length), y_out) ?

Feature Columns and Functional API

Hey there!

First off, super huge fan and very excited for the final product (bought the first book and loved it)

I thought it would be worth noting in the book that tf.feature_column doesn't seem to work with tf2.0 as of this time of writing (May 13 2019)

Specifically stuff in Chapter 13

columns_without_target = columns[:-1]
model = keras.models.Sequential([
    keras.layers.DenseFeatures(feature_columns=columns_without_target),
    keras.layers.Dense(1)
])
model.compile(loss="mse", optimizer="sgd", metrics=["accuracy"])
model.fit(dataset, steps_per_epoch=len(X_train) // batch_size, epochs=5)

based on tensorflow/tensorflow#28111 and Unable to use FeatureColumn with Keras Functional API

Adapting ResNet34 for 3D dataset

Hi,

First, thank you very much for your work, it is fantastic and I have enjoyed working through the book very much.

I am trying to implement the ResNet34 from scratch that you show in the book on a 3D dataset but have ran into an error message. When executing the for loop that adds the residual layers to the model, on the 5th iteration, I get the error message:

"Dimensions must be equal, but are 128 and 64 for 'residual_unit_35/conv3d_243/convolution' (op: 'Conv3D') with input shapes: [?,12,14,12,128], [3,3,3,64,128]."

Here is the current code I am using to build the model:

class ResidualUnit(keras.layers.Layer):

    def __init__(self, filters, strides=1, activation="relu", **kwargs):
        super().__init__(**kwargs)
        self.activation = keras.activations.get(activation)
        self.main_layers = [
            keras.layers.Conv3D(filters, (3,3,3), strides=strides,
                                padding="same", use_bias=False),
            keras.layers.BatchNormalization(),
            self.activation,
            keras.layers.Conv3D(filters, (3,3,3), strides=1,
                                padding="same", use_bias=False),
            keras.layers.BatchNormalization()]
        self.skip_layers = []
        if strides > 1:
            self.skip_layers = [
                keras.layers.Conv3D(filters, (1,1,1), strides=strides,
                                    padding="same", use_bias=False),
                keras.layers.BatchNormalization()]

    def call(self, inputs):
        Z = inputs
        for layer in self.main_layers:
            Z = layer(Z)
        skip_Z = inputs
        for layer in self.skip_layers:
            skip_Z = layer(skip_Z)
        return self.activation(Z + skip_Z)

model = keras.models.Sequential()
model.add(keras.layers.Conv3D(64, (7,7,7), strides=2, padding="same",
                              use_bias=False, input_shape=[91,109,91,1]))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Activation("relu"))
model.add(keras.layers.MaxPool3D(pool_size=3, strides=2, padding="same"))
prev_filters = 64
for filters in [64] * 3 + [128] * 4 + [256] * 6 + [512] * 3:
    strides = 1 if filters == prev_filters else 2
    model.add(ResidualUnit(filters, strides=strides))
    prev_filters = filters
    print(filters)
model.add(keras.layers.GlobalAveragePooling3D())
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(2, activation="sigmoid"))

Do you have any advice for resolving this?

Thank you,
Brady

Install on Mac

I have up-to-date OS on Mac which had been clean installed in July. I followed the directions for a local pip install in the environment.

https://github.com/ageron/handson-ml2/blob/master/INSTALL.md

Preparation and then:

python3 -m pip install --upgrade -r requirements.txt

(I had tried to use Google colab but I am running Ch 18 for Reinforcement Learning and there is a proper with env.render() running from colab since I am running on a remote server).

The problem is with a single line importing sklearn. When imported from iPython in the same environment, it works fine.

import sklearn

ImportError                               Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/numpy/core/__init__.py in <module>
     39 try:
---> 40     from . import multiarray
     41 except ImportError as exc:

~/anaconda3/lib/python3.7/site-packages/numpy/core/multiarray.py in <module>
     11 
---> 12 from . import overrides
     13 from . import _multiarray_umath

~/anaconda3/lib/python3.7/site-packages/numpy/core/overrides.py in <module>
      5 
----> 6 from numpy.core._multiarray_umath import (
      7     add_docstring, implement_array_function, _get_implementing_args)

ImportError: dlopen(/Users/df/anaconda3/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-darwin.so, 2): Library not loaded: @rpath/libopenblas.dylib
  Referenced from: /Users/df/anaconda3/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-darwin.so
  Reason: image not found

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
<ipython-input-1-b7c74cbf5af0> in <module>
----> 1 import sklearn

~/anaconda3/lib/python3.7/site-packages/sklearn/__init__.py in <module>
     74 else:
     75     from . import __check_build
---> 76     from .base import clone
     77     from .utils._show_versions import show_versions
     78 

~/anaconda3/lib/python3.7/site-packages/sklearn/base.py in <module>
     11 import re
     12 
---> 13 import numpy as np
     14 
     15 from . import __version__

~/anaconda3/lib/python3.7/site-packages/numpy/__init__.py in <module>
    140     from . import _distributor_init
    141 
--> 142     from . import core
    143     from .core import *
    144     from . import compat

~/anaconda3/lib/python3.7/site-packages/numpy/core/__init__.py in <module>
     69 Original error was: %s
     70 """ % (sys.executable, exc)
---> 71     raise ImportError(msg)
     72 finally:
     73     for envkey in env_added:

ImportError: 

IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the multiarray numpy extension module failed.  Most
likely you are trying to import a failed build of numpy.
Here is how to proceed:
- If you're working with a numpy git repository, try `git clean -xdf`
  (removes all files not under version control) and rebuild numpy.
- If you are simply trying to use the numpy version that you have installed:
  your installation is broken - please reinstall numpy.
- If you have already reinstalled and that did not fix the problem, then:
  1. Check that you are using the Python you expect (you're using /Users/df/anaconda3/bin/python),
     and that you have no directories in your PATH or PYTHONPATH that can
     interfere with the Python and numpy versions you're trying to use.
  2. If (1) looks fine, you can open a new issue at
     https://github.com/numpy/numpy/issues.  Please include details on:
     - how you installed Python
     - how you installed numpy
     - your operating system
     - whether or not you have multiple versions of Python installed
     - if you built from source, your compiler versions and ideally a build log

     Note: this error has many possible causes, so please don't comment on
     an existing issue about this - open a new one instead.

Original error was: dlopen(/Users/df/anaconda3/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-darwin.so, 2): Library not loaded: @rpath/libopenblas.dylib
  Referenced from: /Users/df/anaconda3/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-darwin.so
  Reason: image not found

GPU,TPU in Colab

Welcome. I have got some idea. Maybe there is a chance to add a readme instruction how to use keras and tensorflow 2.0 with GPU or TPU in Colab? By default, everything is counting by CPU.

CH16: nan loss with stateful RNN

I'm unable to reproduce the output of section Stateful RNN in https://github.com/ageron/handson-ml2/blob/master/16_nlp_with_rnns_and_attention.ipynb

I've executed all cells of the notebook but when executing this cell:

model.compile(loss="sparse_categorical_crossentropy", optimizer="adam")
steps_per_epoch = train_size // batch_size // n_steps
model.fit(dataset, steps_per_epoch=steps_per_epoch, epochs=50,
                   callbacks=[ResetStatesCallback()])

I quickly end up with a nan loss:

Train for 313 steps
Epoch 1/50
 38/313 [==>...........................] - ETA: 45s - loss: nan

Instead of the output committed in the notebook:

Epoch 1/50
313/313 [==============================] - 101s 322ms/step - loss: 2.6180
Epoch 2/50
313/313 [==============================] - 98s 312ms/step - loss: 2.2312
...etc...

I'm using:

tensorflow  2.0.0rc0
numpy       1.17.0

on ubuntu 18.04.

I've tried to find the problem by using a manual training loop (with GradientTape...) and batches of 1 instances. I could reproduce the exact same results I get with model.fit(), but I can also access intermediate layers, check the GRU weights, dense layer weights and outputs and the gradients. After learning with a few examples the 2nd GRU layer (upper) outputs large numbers (and gradually increasing as the time step increases), leading to large outputs in the dense layer, nan in the softmax output, hence nan loss. But I'm unable to explain why the network evolves in that direction.
I've added regularizers to the GRUs but they only slightly delay the problem.
I haven't tried with tensorflow 2.0.0-beta1.

Am I the only one ? When this kind of things happens, what's the best path to understand the problem (and maybe solve it) ?

For the other RNN (non-stateful) (section Creating and Training the Model) I obtained the same result as the one committed in the notebook.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.