Giter Site home page Giter Site logo

Comments (5)

ageron avatar ageron commented on May 18, 2024

Hi @gkd720 ,
Thanks for your question! This difference may be due to the fact the "sgd" optimizer used to have a default learning rate of 1e-3, but this was changed recently to 1e-2 (presumably to match the default in the multi-backend Keras implementation at keras.io).
So please use optimizer=keras.optimizers.SGD(lr=1e-3) instead of "sgd", and I'm guessing everything will fall back into place.

from handson-ml2.

gkd720 avatar gkd720 commented on May 18, 2024

Yep. That was it. Thanks. Got to 88.25% training accuracy, 87.9% validation accuracy, and 86.6% on the test set after 70 epochs. Both validation and training loss still decreasing pretty much monotonically, without any hyper parameter tuning, and with a val/train loss ration at 1.04, so no overfitting suspected yet, and accuracies still might drift up somewhat.
But back to my early stopping criteria questions above: What do the cool data scientists use (actual improvement amounts, loss ratio, accuracy ratio, etc.)? Or do they just eyeball an accuracy, and think "good enough"? Thanks again.

from handson-ml2.

ageron avatar ageron commented on May 18, 2024

Thanks for your feedback, glad to know that setting the learning rate to 1e-3 fixed the discrepancy. Regarding your early stopping question, I'm not sure about cool data scientists, but I think the rest of us use the validation metric as the criterion: we're interested in how well the system will perform on new instances, and the validation metric is a fairly good way to estimate this, so if it stops improving, we should stop training.
Comparing the training loss and the validation loss is useful to know how bad overfitting is. If the spread is large, you don't just want to stop training, you also want to fix the problem, for example by training on more data, or regularizing the model, etc. Which implies retraining.
Hope this helps!

from handson-ml2.

gkd720 avatar gkd720 commented on May 18, 2024

In your "If the spread is large", does that mean a large ratio, or just a large difference?
Anyway, I went ahead and did Chapter 10 Exercise 10 on the MNIST digits dataset (Doh! I now suspect you meant the fashion dataset. Oh well, it was a good exercise and I wanted to try hyper parameter tuning.) Trying some typical ranges, it ran for almost 2 hours on my late 2012 iMac with a 3.4 GHz Intel Core i7 and 16GB memory running at 800% (4 HT physical cores) which quickly kicked in the fan. The best model achieved 98.01% (WooHoo! . . . unless it's not really that hard for the digits vs. the fashion set). Is it possible to see live results in the Jupyter notebook during the tuning run? I see the results flying by in my terminal window but most of it scrolled out of the display buffer. I did see a warning about output going to stderr before some kind of "flag" assignment, so I guess I would have to explicitly write them to a file? I did see an occasional model summary fly by, so I guess writing from the build_model function can work. Would assigning the fit results to a "history" variable have forced the output to Jupyter? Could they be seen live, or only after all experimentation was completed? My concern is how to detect overfitting before all the runs are done. Since the test accuracy is so close to the train/valid results, should I even care if later evaluation suggests overfitting (0.0652/0.0046 == too big?)?

Code/results below for review, or as another data point in the solution space. Thanks.

Update (6/21/2019): I reran this exercise with the MNIST digits and fashion datasets, but I get the same exact model/params when done! Is this possible? And now, the test score for the digits run is 17% !! I checked my data assignment/manipulation, and the only difference is which keras.dataset I use (mnist vs. fashion_mnist). What might I be doing wrong? Thanks.

Update (6/27/2019): OK, I suspected it was my fault, and it looks like it was, but I can't figure out how I messed up. Anyway, rerunning both fashion and digits tuning runs at the same time, with less combinations (less n_iters and epochs), results in much more typical results: fashion - 86% and digits - 96%. I still need to look into getting live data during the runs, so I'll experiment with "history".

X_valid, X_train = X_train_full[:5000] / 255., X_train_full[5000:] / 255.
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
X_test = X_test / 255.

def build_model(n_hidden=1, n_neurons=100, learning_rate=3e-3, input_shape=[28,28]):
    model = keras.models.Sequential()
    model.add(keras.layers.Flatten(input_shape=input_shape))
    for layer in range(n_hidden):
        model.add(keras.layers.Dense(n_neurons, activation="relu"))
        n_neurons=n_neurons//3     # drop by an integral third each successive layer
    model.add(keras.layers.Dense(10,activation="softmax"))     # We know last level must pick 1 of 10 digits
    optimizer = keras.optimizers.SGD(learning_rate)
    model.summary()
    model.compile(loss="sparse_categorical_crossentropy",
                  optimizer=optimizer,
                  metrics=["accuracy"])     # Need this for early stopping.
    return model

keras_clf = keras.wrappers.scikit_learn.KerasClassifier(build_model)

from scipy.stats import reciprocal
from sklearn.model_selection import RandomizedSearchCV

param_distribs = {
    "n_hidden": [1, 2, 3],
    "n_neurons": np.arange(100, 500),
    "learning_rate": reciprocal(3e-4, 3e-2),
}

np.random.seed(42)
tf.random.set_seed(42)
# May have to Kernel -> Reconnect to see all results.
rnd_search_cv = RandomizedSearchCV(keras_clf, param_distribs, n_iter=20, cv=3, verbose=2, n_jobs=-1)
rnd_search_cv.fit(X_train, y_train, epochs=100,
                  validation_data=(X_valid, y_valid),
                  callbacks=[keras.callbacks.EarlyStopping(patience=10)])

Fitting 3 folds for each of 20 candidates, totalling 60 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed: 54.9min
[Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed: 105.8min finished

. . .
Epoch 31/100
55000/55000 [==============================] - 3s 63us/sample - loss: 0.0046 - accuracy: 0.9998 - val_loss: 0.0652 - val_accuracy: 0.9820
. . .

model=rnd_search_cv.best_estimator_.model
model.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_4 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_11 (Dense)             (None, 341)               267685    
_________________________________________________________________
dense_12 (Dense)             (None, 113)               38646     
_________________________________________________________________
dense_13 (Dense)             (None, 10)                1140      
=================================================================
Total params: 307,471
Trainable params: 307,471
Non-trainable params: 0


rnd_search_cv.best_params_
Out[54]:
{'learning_rate': 0.02298924804076755, 'n_hidden': 2, 'n_neurons': 341}


model.evaluate(X_train, y_train)
55000/55000 [==============================] - 2s 31us/sample - loss: 0.0041 - accuracy: 0.9999
Out[56]:
[0.004096633061076599, 0.99985456]


model.evaluate(X_valid, y_valid)
5000/5000 [==============================] - 0s 31us/sample - loss: 0.0652 - accuracy: 0.9820
Out[57]:
[0.06520095226885751, 0.982]

model.evaluate(X_test, y_test)
10000/10000 [==============================] - 0s 33us/sample - loss: 0.0688 - accuracy: 0.9801
Out[59]:
[0.06882534091388516, 0.9801]

from handson-ml2.

ageron avatar ageron commented on May 18, 2024

Sorry for the late response.

98% accuracy on MNIST is good. With convolutional neural nets, you can go beyond 99% accuracy, and with data augmentation, ensembling, learning rate schedule, and so on, you can reach 99.7%.

To get live results in Jupyter, you probably should use the TensorBoard callback when you call the model.fit() method, and use the %tensorboard magic command to view progress.

Regarding your question about detecting overfitting: if you train your model and get good performance on the validation set, you might not care that there's a bit of overfitting. So usually what happens is you are unhappy about the validation performance, and you try to understand what happens, and the gap between the train metric and the validation metric is so great that you conclude that it's an overfitting problem, so you regularize your model or you find more training data or you reduce the size of your neural network (less layers, less neurons...), and so on. I've never tried to automate the interruption of training based on an overfitting criterion, but I guess you could. I think I would wait until the training metric reaches some threshold, and I would then stop training if and when the valid/train metric ratio reaches some other threshold. Not sure that's great, it's my first hunch. However, if we interrupt training while the validation metric is still improving, we run the risk of throwing away a model that would have been fantastic, despite some overfitting. What matters in the end is the generalization performance. I would prefer a model with 90% accuracy on the test set and 100% on the training set, versus a model with 80% accuracy on the test set and 85% on the training set. The first one overfits more than the second, but it's still much better.

Hope this helps.

from handson-ml2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.