Giter Site home page Giter Site logo

notebooks's People

Contributors

amygdala avatar brandondutra avatar chmeyers avatar clemens-tolboom avatar craigcitro avatar gramster avatar lakshmanok avatar nikhilk avatar ojarjur avatar parthea avatar qimingj avatar rajivpb avatar rileyjbauer avatar supriyagarg avatar yebrahim avatar yuxuanchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

notebooks's Issues

Map charts don't load, missing key

Getting this error:

Google Maps API error: MissingKeyMapError https://developers.google.com/maps/documentation/javascript/error-messages#missing-key-map-error
_.kb @ js?v=3&callback=google.loader.callbacks.maps&sensor=false:37
(anonymous) @ common.js:53
(anonymous) @ common.js:194
c @ common.js:49
(anonymous) @ AuthenticationService.Authenticate?1shttp%3A%2F%2Flocalhost%3A8081%2Fnotebooks%2Fdev%2Fnotebooks%2F…:1

Add sort_index() to closing data

The dataframes do not seem to be in date-sorted order, this is not a problem for the plotting of the time series, but the autocorrelation, etc. assumes they are sorted.

A work-around is to add a sort_index() to the first "Munge the data" code cell, e.g., to read:

closing_data = pd.DataFrame()
   . . .
closing_data['aord_close'] = aord['Close']

# Put the closing_data in sorted order *** Needed for autocorrelation to work ***
closing_data = closing_data.sort_index()

# Pandas includes a very convenient function for filling gaps in the data.
closing_data = closing_data.fillna(method='ffill')

Missing package causes error in sample: "Introduction to Python"

Copy issue from googledatalab/datalab#936

Need to add package or change sample (the latter may be easier for now)
%%bash

apt-get install -y -q libxslt-dev libxml2-dev
pip install -q scrapy

debconf: delaying package configuration, since apt-utils is not installed
Command "/usr/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build- OqPUd_/cryptography/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" install --record /tmp/pip-NBepB4-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-OqPUd_/cryptography/

Is Storage API in sample notebook referencing pydatalab?

Hi,

Is the Storage API referenced in https://github.com/googledatalab/notebooks/blob/master/tutorials/Storage/Storage%20APIs.ipynb follows the API in http://googledatalab.github.io/pydatalab/datalab.storage.html? The source code for this documentation is in https://github.com/googledatalab/pydatalab/tree/v1.1/datalab/storage

If yes, I couldn't find some classes such as Item. The error thrown is AttributeError: 'module' object has no attribute 'Item'

Code snippet for your reference:

import google.datalab.storage as storage

shared_bucket = storage.Item('BUCKET_NAME', "KEY_NAME")

ga sd notebooks

TODO

  1. Done: remove job.wait as they are now blocking
  2. Done: check I don't say 'transforms.json' or 'numerical_analysis.json'.
  3. Done: use job name from the ctc?...it doesn't have one
  4. Done: make a readme that says you have to run %projects set cloud-ml-dev....not needed

Unsupported loss function in seq2seq model.

I am exploring the following tensorflow example: https://github.com/googledatalab/notebooks/blob/master/samples/TensorFlow/LSTM%20Punctuation%20Model%20With%20TensorFlow.ipynb which apparently is written in tf v1, so I upgraded with the v2 upgrade script and there were three main inconsistencies:

ERROR: Using member tf.contrib.rnn.DropoutWrapper in deprecated module tf.contrib. tf.contrib.rnn.DropoutWrapper cannot be converted automatically. tf.contrib will not be distributed with TensorFlow 2.0, please consider an alternative in non-contrib TensorFlow, a community-maintained repository such as tensorflow/addons, or fork the required code.
ERROR: Using member tf.contrib.legacy_seq2seq.sequence_loss_by_example in deprecated module tf.contrib. tf.contrib.legacy_seq2seq.sequence_loss_by_example cannot be converted automatically. tf.contrib will not be distributed with TensorFlow 2.0, please consider an alternative in non-contrib TensorFlow, a community-maintained repository such as tensorflow/addons, or fork the required code.
ERROR: Using member tf.contrib.framework.get_or_create_global_step in deprecated module tf.contrib. tf.contrib.framework.get_or_create_global_step cannot be converted automatically. tf.contrib will not be distributed with TensorFlow 2.0, please consider an alternative in non-contrib TensorFlow, a community-maintained repository such as tensorflow/addons, or fork the required code.

So for compatibility I manually replaced framework.get_or_create_global_step with tf.compat.v1.train.get_or_create_global_step, and also rnn.DropoutWrapper with tf.compat.v1.nn.rnn_cell.DropoutWrapper.

But I was unable to find a solution on how to handle the tf.contrib.legacy_seq2seq.sequence_loss_by_example method, since I cannot find a backwards compatible alternative. I tried installing Tensroflow Addons and use its seq2seq loss function, but wasn't able to figure out how to adapt it to work with the rest of the code.

Stumbled across some errors like Consider casting elements to a supported type. or Logits must be a [batch_size x sequence_length x logits] tensor, because probably i am not implementing something correctly.

My question: How to implement supported tensorflow v2 alternative of this loss function, so it acts similarly to the code below?

    output = tf.reshape(tf.concat(axis=1, values=outputs), [-1, size])
    softmax_w = tf.compat.v1.get_variable("softmax_w", [size, len(TARGETS)], dtype=tf.float32)
    softmax_b = tf.compat.v1.get_variable("softmax_b", [len(TARGETS)], dtype=tf.float32)
    logits = tf.matmul(output, softmax_w) + softmax_b
    self._predictions = tf.argmax(input=logits, axis=1)    
    self._targets = tf.reshape(input_.targets, [-1])
    loss = tfa.seq2seq.sequence_loss(
        [logits],
        [tf.reshape(input_.targets, [-1])],
        [tf.ones([batch_size * num_steps], dtype=tf.float32)])
    self._cost = cost = tf.reduce_sum(input_tensor=loss) / batch_size
    self._final_state = state

Full code here.

My proposal: When this is resolved please update the notebook with newer version example.

Make the notebooks more reliable in test runs

We've recently added the .test.sh script which runs each of these notebooks to validate that none of the cells raise errors when executed.

Conceptually, this is fine, but most of these notebooks were written with the idea that they would be run once rather than repeatedly.

As such, we should go through each notebook and see if there are things we should change to make them more reliable when run multiple times.

As a concrete example, we have at least one issue in the 'tutorials/storage/Storage APIs.ipynb' notebook where it creates a sample bucket, adds an item to it, deletes just that one item, and then deletes the notebook. If that process fails after creating the sample item but before deleting it, then every subsequent run will find that the sample bucket is not empty, and attempts to delete the sample bucket will fail.

That particular notebook needs to be updated to make sure that it better handles the scenario of the sample bucket already existing and there are probably also similar issues in the other notebooks.

Error when using old training data

Hi all,

When I use model_file_prefix = model_dir to skip the training part (as I've already done this) I get the an error upon running the following code:

`from google.datalab.ml import ConfusionMatrix
from pprint import pprint

cm_data = run_eval(model_file_prefix, '/content/datalab/punctuation/datapreped/test.txt')
pprint(cm_data.tolist())
cm = ConfusionMatrix(cm_data, TARGETS)
cm.plot()`

Error:
INFO:tensorflow:Restoring parameters from /content/lab/datalab/punctuation/model/eval/model.ckpt
INFO:tensorflow:Starting standard services.
INFO:tensorflow:Saving checkpoint to path /content/lab/datalab/punctuation/model/eval/model.ckpt
INFO:tensorflow:Starting queue runners.
INFO:tensorflow:Recording summary at step None.
INFO:tensorflow:Restoring parameters from /content/lab/datalab/punctuation/model/
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.NotFoundError'>, Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /content/lab/datalab/punctuation/model/
[[Node: save/RestoreV2_6 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_6/tensor_names, save/RestoreV2_6/shape_and_slices)]]

Caused by op 'save/RestoreV2_6', defined at:
File "/usr/local/envs/py3env/lib/python3.5/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/local/envs/py3env/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/main.py", line 3, in
app.launch_new_instance()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 474, in start
ioloop.IOLoop.instance().start()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/zmq/eventloop/ioloop.py", line 177, in start
super(ZMQIOLoop, self).start()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tornado/ioloop.py", line 887, in start
handler_func(fd_obj, events)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 276, in dispatcher
return self.dispatch_shell(stream, msg)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell
handler(stream, idents, msg)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 390, in execute_request
user_expressions, allow_stdin)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/zmqshell.py", line 501, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2728, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2850, in run_ast_nodes
if self.run_code(code, result):
File "/usr/local/envs/py3env/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 4, in
cm_data = run_eval(model_file_prefix, '/content/lab/datalab/punctuation/datapreped/test.txt')
File "", line 25, in run_eval
sv = tf.train.Supervisor(logdir=logdir)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 136, in new_func
return func(*args, **kwargs)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/supervisor.py", line 316, in init
self._init_saver(saver=saver)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/supervisor.py", line 464, in _init_saver
saver = saver_mod.Saver()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1239, in init
self.build()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1248, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1284, in _build
build_save=build_save, build_restore=build_restore)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 765, in _build_internal
restore_sequentially, reshape)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 428, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 268, in restore_op
[spec.tensor.dtype])[0])
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1031, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
op_def=op_def)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1625, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /content/lab/datalab/punctuation/model/
[[Node: save/RestoreV2_6 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_6/tensor_names, save/RestoreV2_6/shape_and_slices)]]


NotFoundError Traceback (most recent call last)
/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1349 try:
-> 1350 return fn(*args)
1351 except errors.OpError as e:

/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
1328 feed_dict, fetch_list, target_list,
-> 1329 status, run_metadata)
1330

/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py in exit(self, type_arg, value_arg, traceback_arg)
472 compat.as_text(c_api.TF_Message(self.status.status)),
--> 473 c_api.TF_GetCode(self.status.status))
474 # Delete the underlying status object from memory otherwise it stays alive

NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /content/lab/datalab/punctuation/model/
[[Node: save/RestoreV2_6 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_6/tensor_names, save/RestoreV2_6/shape_and_slices)]]

During handling of the above exception, another exception occurred:

NotFoundError Traceback (most recent call last)
in ()
2 from pprint import pprint
3
----> 4 cm_data = run_eval(model_file_prefix, '/content/lab/datalab/punctuation/datapreped/test.txt')
5 #'/content/datalab/docs/samples/TensorFlow'
6 pprint(cm_data.tolist())

in run_eval(model_file_prefix, test_data_path)
25 sv = tf.train.Supervisor(logdir=logdir)
26 with sv.managed_session() as session:
---> 27 sv.saver.restore(session, model_file_prefix)
28 test_perplexity, cm_data = run_epoch(session, mtest, 1, word_to_id, is_eval=True)
29 return cm_data

/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py in restore(self, sess, save_path)
1684 if context.in_graph_mode():
1685 sess.run(self.saver_def.restore_op_name,
-> 1686 {self.saver_def.filename_tensor_name: save_path})
1687 else:
1688 self._build_eager(save_path, build_save=False, build_restore=True)

/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
893 try:
894 result = self._run(None, fetches, feed_dict, options_ptr,
--> 895 run_metadata_ptr)
896 if run_metadata:
897 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1126 if final_fetches or final_targets or (handle and feed_dict_tensor):
1127 results = self._do_run(handle, final_targets, final_fetches,
-> 1128 feed_dict_tensor, options, run_metadata)
1129 else:
1130 results = []

/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1342 if handle is None:
1343 return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1344 options, run_metadata)
1345 else:
1346 return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1361 except KeyError:
1362 pass
-> 1363 raise type(e)(node_def, op, message)
1364
1365 def _extend_graph(self):

NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /content/lab/datalab/punctuation/model/
[[Node: save/RestoreV2_6 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_6/tensor_names, save/RestoreV2_6/shape_and_slices)]]

Caused by op 'save/RestoreV2_6', defined at:
File "/usr/local/envs/py3env/lib/python3.5/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/local/envs/py3env/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/main.py", line 3, in
app.launch_new_instance()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 474, in start
ioloop.IOLoop.instance().start()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/zmq/eventloop/ioloop.py", line 177, in start
super(ZMQIOLoop, self).start()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tornado/ioloop.py", line 887, in start
handler_func(fd_obj, events)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 276, in dispatcher
return self.dispatch_shell(stream, msg)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell
handler(stream, idents, msg)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 390, in execute_request
user_expressions, allow_stdin)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/zmqshell.py", line 501, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2728, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2850, in run_ast_nodes
if self.run_code(code, result):
File "/usr/local/envs/py3env/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 4, in
cm_data = run_eval(model_file_prefix, '/content/lab/datalab/punctuation/datapreped/test.txt')
File "", line 25, in run_eval
sv = tf.train.Supervisor(logdir=logdir)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 136, in new_func
return func(*args, **kwargs)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/supervisor.py", line 316, in init
self._init_saver(saver=saver)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/supervisor.py", line 464, in _init_saver
saver = saver_mod.Saver()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1239, in init
self.build()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1248, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1284, in _build
build_save=build_save, build_restore=build_restore)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 765, in _build_internal
restore_sequentially, reshape)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 428, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 268, in restore_op
[spec.tensor.dtype])[0])
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1031, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
op_def=op_def)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1625, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /content/lab/datalab/punctuation/model/
[[Node: save/RestoreV2_6 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_6/tensor_names, save/RestoreV2_6/shape_and_slices)]]

The path, and file, both exist.

When I use model_file_prefix = train(ttxt, vtxt, saved_model_path) the code does work without error. Any idea why this occurs and how to fix it?

Internal Temporary file not writable

In the workbook at notebooks/samples/ML Toolbox/Regression/Census/2 Service Preprocess.ipynb the following section:

analysis_path = os.path.join(workspace_path, 'analysis') regression.analyze(dataset=train_data, output_dir=analysis_path, cloud=True)

gives the following error:

Running numerical analysis...Analyze: failed with error: The internal temporary file is not writable.

Machine Learning with Financial Data - unstable results

Hi,
I was trying out the "Machine Learning with Financial Data" notebook.
In the tutorial video, the prediction accuracy of the first trivial model is 65% (as is described in the notebook's text), however, the current version of the notebook on github showed an accuracy of 90% and when I ran the notebook myself both on my PC and on Google Cloud, I got 84%. I'm wondering what is going on here. Can anyone explain that to me?
Thanks,
Yang

Needed: DataLab integration with Google BigTable, Google DataProc (Spark)

We use Jupyter notebooks to access BigTable data like so:

from google.cloud import bigtable
from google.cloud import happybase
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
connection = happybase.Connection(instance=instance)
table = connection.table(table_name)

for key, row in table.scan:

(we then convert this in Pandas DataFrames)

In regards to DataLab and DataProc integration - Jupyter Spark integration http://blog.insightdatalabs.com/jupyter-on-apache-spark-step-by-step/ is a thing in Data Science - so how can we leverage DataLab notebooks over Spark jobs running on DataProc (eg stepwise pyspark job definitions, visualising job results)?

Also , how do we leverage IPython Parallel https://ipyparallel.readthedocs.io/en/latest/ and Jupyter Cluster notebook extensions in DataLab ?

Beta2: Update tutorial notebook "Importing and Exporting Data"

table.to_file('/tmp/cars.csv')

TypeErrorTraceback (most recent call last)
in ()
----> 1 table.to_file('/tmp/cars.csv')

/usr/local/lib/python2.7/dist-packages/datalab/bigquery/_table.pyc in to_file(self, destination, format, csv_delimiter, csv_header)
648 for column in self.schema:
649 fieldnames.append(column.name)
--> 650 writer = csv.DictWriter(f, fieldnames=fieldnames, delimiter=csv_delimiter)
651 if csv_header:
652 writer.writeheader()

/usr/lib/python2.7/csv.pyc in init(self, f, fieldnames, restval, extrasaction, dialect, args, *kwds)
135 extrasaction)
136 self.extrasaction = extrasaction
--> 137 self.writer = writer(f, dialect, args, *kwds)
138
139 def writeheader(self):

TypeError: "delimiter" must be string, not unicode

How-to on handling large data

Creating a bug to track internal bug...

It's very difficult to understand anything about best practices for magics or modules without linking out to readthedocs.

This needs to be covered in tutorials and sample notebooks through markdown discussion and code and in help text for queries.
Specific worthwhile additions:

  1. How to handle large data (common complaint) for in-memory work - when retrieved from BQ - this is needed for GA
  2. How to handle large data in memory - Dataframe won't scale - this is post-GA and I have a separate tracking bug to use Graphlab's OSS alternative.

We should figure out how to better surface reference docs, as well as improve docs with a how-to set of notebooks to cover this sort of information.

Failed to send HTTP request on BigQuery APIs Notebook

I'm trying to run the BigQuery API tutorial that is included in datalabs, but I'm hitting an error on the very first step:

# Create and run a SQL query
bq.Query('SELECT * FROM [cloud-datalab-samples:httplogs.logs_20140615] LIMIT 3').results()

I feel like I might be missing something basic. Here is the Traceback:

ExceptionTraceback (most recent call last)
<ipython-input-2-03b0534f5548> in <module>()
      1 # Create and run a SQL query
----> 2 bq.Query('SELECT * FROM [cloud-datalab-samples:httplogs.logs_20140615] LIMIT 3').results()

/usr/local/lib/python2.7/dist-packages/datalab/bigquery/_query.pyc in results(self, use_cache, dialect, billing_tier)
    226     """
    227     if not use_cache or (self._results is None):
--> 228       self.execute(use_cache=use_cache, dialect=dialect, billing_tier=billing_tier)
    229     return self._results.results
    230 

/usr/local/lib/python2.7/dist-packages/datalab/bigquery/_query.pyc in execute(self, table_name, table_mode, use_cache, priority, allow_large_results, dialect, billing_tier)
    524     job = self.execute_async(table_name=table_name, table_mode=table_mode, use_cache=use_cache,
    525                              priority=priority, allow_large_results=allow_large_results,
--> 526                              dialect=dialect, billing_tier=billing_tier)
    527     self._results = job.wait()
    528     return self._results

/usr/local/lib/python2.7/dist-packages/datalab/bigquery/_query.pyc in execute_async(self, table_name, table_mode, use_cache, priority, allow_large_results, dialect, billing_tier)
    479                                                  billing_tier=billing_tier)
    480     except Exception as e:
--> 481       raise e
    482     if 'jobReference' not in query_result:
    483       raise Exception('Unexpected response from server')

Exception: Failed to send HTTP request.

AttributeError: 'dict_values' has no attribute 'index'

Upon running for s in sources: source, predicted = predictor.predict(s) print('\n---SOURCE----\n' + source) print('---PREDICTED----\n' + predicted)

I get the following error:
INFO:tensorflow:Restoring parameters from /content/lab/datalab/punctuation/model/punctuation-5796

AttributeError Traceback (most recent call last)
in ()
9
10 for s in sources:
---> 11 source, predicted = predictor.predict(s)
12 print('\n---SOURCE----\n' + source)
13 print('---PREDICTED----\n' + predicted)

in predict(self, content)
88 for i in indices:
89 words1[i], words1[i-1] = words1[i-1], words1[i]
---> 90 words2 = [self._word_to_id.keys()[self._word_to_id.values().index(data_x[index])] for index in range(len(puncts) - 1, len(data_x))]
91 all_words = words1 + [puncts[-1]] + words2
92 content = ' '.join(all_words)

in (.0)
88 for i in indices:
89 words1[i], words1[i-1] = words1[i-1], words1[i]
---> 90 words2 = [self._word_to_id.keys()[self._word_to_id.values().index(data_x[index])] for index in range(len(puncts) - 1, len(data_x))]
91 all_words = words1 + [puncts[-1]] + words2
92 content = ' '.join(all_words)

AttributeError: 'dict_values' object has no attribute 'index'

This is how index is defined:
words1 = [self._word_to_id.keys()[self._word_to_id.values().index(data_x[index])] for index in range(len(puncts) - 1)]

indices = [i for i, w in enumerate(words1) if w in PUNCTUATIONS]

for i in indices:

words1[i], words1[i-1] = words1[i-1], words1[i] #only line in for loop

words2 = [self._word_to_id.keys()[self._word_to_id.values().index(data_x[index])] for index in range(len(puncts) - 1, len(data_x))]
all_words = words1 + [puncts[-1]] + words2
content = ' '.join(all_words)
min_step = len(puncts)

Can anyone explain why this occurs and/or how to fix this?

BigQuery API Notebook throws error

I'm running through /docs/tutorials/BigQuery/BigQuery%20APIs.ipynb

In the second cell I get:

# Create and run a SQL query
bq.Query('SELECT * FROM [cloud-datalab-samples:httplogs.logs_20140615] LIMIT 3').results()

RequestExceptionTraceback (most recent call last)
<ipython-input-2-03b0534f5548> in <module>()
      1 # Create and run a SQL query
----> 2 bq.Query('SELECT * FROM [cloud-datalab-samples:httplogs.logs_20140615] LIMIT 3').results()

/usr/local/lib/python2.7/dist-packages/datalab/bigquery/_query.pyc in results(self, use_cache, dialect, billing_tier)
    226     """
    227     if not use_cache or (self._results is None):
--> 228       self.execute(use_cache=use_cache, dialect=dialect, billing_tier=billing_tier)
    229     return self._results.results
    230 

/usr/local/lib/python2.7/dist-packages/datalab/bigquery/_query.pyc in execute(self, table_name, table_mode, use_cache, priority, allow_large_results, dialect, billing_tier)
    524     job = self.execute_async(table_name=table_name, table_mode=table_mode, use_cache=use_cache,
    525                              priority=priority, allow_large_results=allow_large_results,
--> 526                              dialect=dialect, billing_tier=billing_tier)
    527     self._results = job.wait()
    528     return self._results

/usr/local/lib/python2.7/dist-packages/datalab/bigquery/_query.pyc in execute_async(self, table_name, table_mode, use_cache, priority, allow_large_results, dialect, billing_tier)
    479                                                  billing_tier=billing_tier)
    480     except Exception as e:
--> 481       raise e
    482     if 'jobReference' not in query_result:
    483       raise Exception('Unexpected response from server')

RequestException: HTTP request failed: Invalid project ID '<PROJECT_ID>'. Project IDs must contain 6-63 lowercase letters, digits, or dashes. IDs must start with a letter and may not end with a dash.

The query itself runs fine in BigQuery.

Consider adding an standalone sample notebook of Facets

Usage of Facets can be found at the samples such as:
https://github.com/googledatalab/notebooks/blob/master/samples/contrib/mlworkbench/structured_data_classification_police/Predict%20Case%20Resolution%20(small%20data%20experience).ipynb

Given the importance of data visualization, please consider creating a standalone notebook to address best practices and how to use the built-in Facets feature in Datalab.

Besides, please add concrete examples using a sample dataset to explain how to interpret the Facets result, and how to transfer the insight into action items, such as data cleaning or feature engineering.
One possible structure can be:

  1. Image of the Facets result
  2. Human interpretation
  3. The code

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.