rstudio-conf-2020 / dl-keras-tf Goto Github PK
View Code? Open in Web Editor NEWrstudio::conf(2020) deep learning workshop
License: Creative Commons Attribution Share Alike 4.0 International
rstudio::conf(2020) deep learning workshop
License: Creative Commons Attribution Share Alike 4.0 International
For direct word embedding the output made sense
# natural language modeling embeddings
get_similar_words("horrible", word_embeddings)
# horrible terrible awful bad acting
# 1.0000000 0.9248301 0.8892507 0.8432761 0.8015473
But how do we understand the relationship between words generated by word embeddings learned from classification
similar_classification_words("horrible", embedding_wts)
# horrible keith brooks blond york sporting
# 1.0000000 0.7858497 0.7819669 0.7724826 0.7616312 0.7583101
Is there a way to put this in better context?
This issue is based on a question raised during the workshop.
Problem
loss values appear to be nan
when the loss parameter is set to "mean_squared_error" as shown below.
model %>% compile(
optimizer_sgd(lr = 0.1),
loss = "mean_squared_error",
metrics = "mae"
)
history <- model %>% fit(
x_train,
y_train
batch_size = 16,
validation_split = 0.2)
Train on 1640 samples, validate on 411 samples
Epoch 1/10
1640/1640 [==============================] - 1s 397us/sample - loss: nan - mae: nan - val_loss: nan - val_mae: nan
Epoch 2/10
1640/1640 [==============================] - 1s 447us/sample - loss: nan - mae: nan - val_loss: nan - val_mae: nan
Epoch 3/10
1640/1640 [==============================] - 0s 268us/sample - loss: nan - mae: nan - val_loss: nan - val_mae: nan
Epoch 4/10
1640/1640 [==============================] - 1s 382us/sample - loss: nan - mae: nan - val_loss: nan - val_mae: nan
Reason
This is probably related to the range of the predicted value (sales price) which reaches ~ 755000. When the error gets squared, the sum explodes! That's why msle
was recommended in the instructions and not mean_squared_error
.
When running the following code chunk from a fresh session I get the following error:
history <- model %>% fit_generator(
train_generator,
steps_per_epoch = 100,
epochs = 30,
validation_data = validation_generator,
validation_steps = 50,
callbacks = callback_early_stopping(patience = 5)
)
WARNING:tensorflow:sample_weight modes were coerced from
...
to
['...']
2020-01-28 00:41:21.650047: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-01-28 00:41:21.887585: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-01-28 00:41:22.620623: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 16.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
Error in py_call_impl(callable, dots$args, dots$keywords) : ResourceExhaustedError: OOM when allocating tensor with shape[6272,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node MatMul_3 (defined at /util/deprecation.py:324) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [Op:__inference_distributed_function_1290] Function call stack: distributed_function
Are there specific strategies in keras to deal with class imbalance in the training set? or any recommendations about models to be aware of when this is a problem? Many classification problems are for rare outcomes, >= 5% of the cases.
I can see docs/data/non-imdb-movie-reviews
does not exist in the repo. And I couldn't see it created in the requirements.Rmd
. Am I missing sth?
I've heard of using sequence embedding for things like visits to a website to predict purchase behavior or, more pertinent to my work, embeddings for patient visits to predict something like a hospital admission.
How can we extend what we will cover for word2vec and word embeddings to start looking at sequence embedding when there is a timestamp associated with the "word." Similar to what is covered in this "patient2vec" paper? https://arxiv.org/abs/1810.04793 or med2vec: https://arxiv.org/abs/1602.05568
Using k_gradients()
which calls keras$backend$gradients()
results in the following error:
Error in py_call_impl(callable, dots$args, dots$keywords) : RuntimeError: tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.
Not sure if this is related to sth in version/environment.
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] keras_2.2.5.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 here_0.1 lattice_0.20-35 rprojroot_1.3-2
[5] zeallot_0.1.0 grid_3.5.0 R6_2.4.0 backports_1.1.4
[9] jsonlite_1.6 magrittr_1.5 tfruns_1.4 whisker_0.3-2
[13] Matrix_1.2-14 reticulate_1.13 generics_0.0.2 tools_3.5.0
[17] xfun_0.8 compiler_3.5.0 base64enc_0.1-3 tensorflow_2.0.0
[21] knitr_1.24
Just in case there's a possibility this would happen with other, we can add a note about it.
Issue
As I was installing keras on a new device, I got the following error with the first call to a keras
function mnist <- dataset_mnist()
:
Error in py_call_impl(callable, dots$args, dots$keywords) :
Exception: URL fetch failure on https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz: None -- [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1076)
Solution
To run Install Certificates.command
Just in case there's a possibility this would happen with other, we can add a note about it.
I have used Kaggle API with Windows before but I switched recently to Mac OS, and I'd highlight the following point with the first installation.
Issue: bash: kaggle: command not found
error
My solution: I had to add alias kaggle="/Users/[my-user-name]/Library/Python/3.7/bin/kaggle"
to .bash_profile
coz this is my kaggle path.
Related issue: Kaggle/kaggle-api#59
Can you post / share embedded models that are more complex with dense layers, dropouts, etc; and any good reading materials around embedded models?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.