googledatalab / notebooks Goto Github PK
View Code? Open in Web Editor NEWGoogle Cloud Datalab samples and documentation
License: Apache License 2.0
Google Cloud Datalab samples and documentation
License: Apache License 2.0
Getting this error:
Google Maps API error: MissingKeyMapError https://developers.google.com/maps/documentation/javascript/error-messages#missing-key-map-error
_.kb @ js?v=3&callback=google.loader.callbacks.maps&sensor=false:37
(anonymous) @ common.js:53
(anonymous) @ common.js:194
c @ common.js:49
(anonymous) @ AuthenticationService.Authenticate?1shttp%3A%2F%2Flocalhost%3A8081%2Fnotebooks%2Fdev%2Fnotebooks%2F…:1
We already have Toolbox and lower level TF examples (docssamplesTensorFlow). This item is to track estimator level sample for completeness.
The dataframes do not seem to be in date-sorted order, this is not a problem for the plotting of the time series, but the autocorrelation, etc. assumes they are sorted.
A work-around is to add a sort_index() to the first "Munge the data" code cell, e.g., to read:
closing_data = pd.DataFrame()
. . .
closing_data['aord_close'] = aord['Close']
# Put the closing_data in sorted order *** Needed for autocorrelation to work ***
closing_data = closing_data.sort_index()
# Pandas includes a very convenient function for filling gaps in the data.
closing_data = closing_data.fillna(method='ffill')
prediction_dir is defined but not used in the notebook.
The intro note book here needs to be updated to work based on the new locally-run setup.
Copy issue from googledatalab/datalab#936
Need to add package or change sample (the latter may be easier for now)
%%bashapt-get install -y -q libxslt-dev libxml2-dev
pip install -q scrapydebconf: delaying package configuration, since apt-utils is not installed
Command "/usr/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build- OqPUd_/cryptography/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" install --record /tmp/pip-NBepB4-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-OqPUd_/cryptography/
Show a simple example (e.g. apache_beam.examples.wordcount) with local and cloud run
Hi,
Is the Storage API referenced in https://github.com/googledatalab/notebooks/blob/master/tutorials/Storage/Storage%20APIs.ipynb follows the API in http://googledatalab.github.io/pydatalab/datalab.storage.html? The source code for this documentation is in https://github.com/googledatalab/pydatalab/tree/v1.1/datalab/storage
If yes, I couldn't find some classes such as Item
. The error thrown is AttributeError: 'module' object has no attribute 'Item'
Code snippet for your reference:
import google.datalab.storage as storage
shared_bucket = storage.Item('BUCKET_NAME', "KEY_NAME")
TODO
I found this via the article: https://cloudplatform.googleblog.com/2016/03/TensorFlow-machine-learning-with-financial-data-on-Google-Cloud-Platform.html
But it was very hard to find. I initially thought it would be in the https://github.com/googledatalab/datalab
Please update your Readme files or the article to make it more clear. Thanks!
I am exploring the following tensorflow example: https://github.com/googledatalab/notebooks/blob/master/samples/TensorFlow/LSTM%20Punctuation%20Model%20With%20TensorFlow.ipynb which apparently is written in tf v1, so I upgraded with the v2 upgrade script and there were three main inconsistencies:
ERROR: Using member tf.contrib.rnn.DropoutWrapper in deprecated module tf.contrib. tf.contrib.rnn.DropoutWrapper cannot be converted automatically. tf.contrib will not be distributed with TensorFlow 2.0, please consider an alternative in non-contrib TensorFlow, a community-maintained repository such as tensorflow/addons, or fork the required code.
ERROR: Using member tf.contrib.legacy_seq2seq.sequence_loss_by_example in deprecated module tf.contrib. tf.contrib.legacy_seq2seq.sequence_loss_by_example cannot be converted automatically. tf.contrib will not be distributed with TensorFlow 2.0, please consider an alternative in non-contrib TensorFlow, a community-maintained repository such as tensorflow/addons, or fork the required code.
ERROR: Using member tf.contrib.framework.get_or_create_global_step in deprecated module tf.contrib. tf.contrib.framework.get_or_create_global_step cannot be converted automatically. tf.contrib will not be distributed with TensorFlow 2.0, please consider an alternative in non-contrib TensorFlow, a community-maintained repository such as tensorflow/addons, or fork the required code.
So for compatibility I manually replaced framework.get_or_create_global_step
with tf.compat.v1.train.get_or_create_global_step
, and also rnn.DropoutWrapper
with tf.compat.v1.nn.rnn_cell.DropoutWrapper
.
But I was unable to find a solution on how to handle the tf.contrib.legacy_seq2seq.sequence_loss_by_example
method, since I cannot find a backwards compatible alternative. I tried installing Tensroflow Addons and use its seq2seq loss function, but wasn't able to figure out how to adapt it to work with the rest of the code.
Stumbled across some errors like Consider casting elements to a supported type.
or Logits must be a [batch_size x sequence_length x logits] tensor
, because probably i am not implementing something correctly.
My question: How to implement supported tensorflow v2 alternative of this loss function, so it acts similarly to the code below?
output = tf.reshape(tf.concat(axis=1, values=outputs), [-1, size])
softmax_w = tf.compat.v1.get_variable("softmax_w", [size, len(TARGETS)], dtype=tf.float32)
softmax_b = tf.compat.v1.get_variable("softmax_b", [len(TARGETS)], dtype=tf.float32)
logits = tf.matmul(output, softmax_w) + softmax_b
self._predictions = tf.argmax(input=logits, axis=1)
self._targets = tf.reshape(input_.targets, [-1])
loss = tfa.seq2seq.sequence_loss(
[logits],
[tf.reshape(input_.targets, [-1])],
[tf.ones([batch_size * num_steps], dtype=tf.float32)])
self._cost = cost = tf.reduce_sum(input_tensor=loss) / batch_size
self._final_state = state
Full code here.
My proposal: When this is resolved please update the notebook with newer version example.
We've recently added the .test.sh
script which runs each of these notebooks to validate that none of the cells raise errors when executed.
Conceptually, this is fine, but most of these notebooks were written with the idea that they would be run once rather than repeatedly.
As such, we should go through each notebook and see if there are things we should change to make them more reliable when run multiple times.
As a concrete example, we have at least one issue in the 'tutorials/storage/Storage APIs.ipynb' notebook where it creates a sample bucket, adds an item to it, deletes just that one item, and then deletes the notebook. If that process fails after creating the sample item but before deleting it, then every subsequent run will find that the sample bucket is not empty, and attempts to delete the sample bucket will fail.
That particular notebook needs to be updated to make sure that it better handles the scenario of the sample bucket already existing and there are probably also similar issues in the other notebooks.
Hi all,
When I use model_file_prefix = model_dir
to skip the training part (as I've already done this) I get the an error upon running the following code:
`from google.datalab.ml import ConfusionMatrix
from pprint import pprint
cm_data = run_eval(model_file_prefix, '/content/datalab/punctuation/datapreped/test.txt')
pprint(cm_data.tolist())
cm = ConfusionMatrix(cm_data, TARGETS)
cm.plot()`
Error:
INFO:tensorflow:Restoring parameters from /content/lab/datalab/punctuation/model/eval/model.ckpt
INFO:tensorflow:Starting standard services.
INFO:tensorflow:Saving checkpoint to path /content/lab/datalab/punctuation/model/eval/model.ckpt
INFO:tensorflow:Starting queue runners.
INFO:tensorflow:Recording summary at step None.
INFO:tensorflow:Restoring parameters from /content/lab/datalab/punctuation/model/
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.NotFoundError'>, Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /content/lab/datalab/punctuation/model/
[[Node: save/RestoreV2_6 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_6/tensor_names, save/RestoreV2_6/shape_and_slices)]]
Caused by op 'save/RestoreV2_6', defined at:
File "/usr/local/envs/py3env/lib/python3.5/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/local/envs/py3env/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/main.py", line 3, in
app.launch_new_instance()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 474, in start
ioloop.IOLoop.instance().start()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/zmq/eventloop/ioloop.py", line 177, in start
super(ZMQIOLoop, self).start()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tornado/ioloop.py", line 887, in start
handler_func(fd_obj, events)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 276, in dispatcher
return self.dispatch_shell(stream, msg)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell
handler(stream, idents, msg)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 390, in execute_request
user_expressions, allow_stdin)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/zmqshell.py", line 501, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2728, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2850, in run_ast_nodes
if self.run_code(code, result):
File "/usr/local/envs/py3env/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 4, in
cm_data = run_eval(model_file_prefix, '/content/lab/datalab/punctuation/datapreped/test.txt')
File "", line 25, in run_eval
sv = tf.train.Supervisor(logdir=logdir)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 136, in new_func
return func(*args, **kwargs)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/supervisor.py", line 316, in init
self._init_saver(saver=saver)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/supervisor.py", line 464, in _init_saver
saver = saver_mod.Saver()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1239, in init
self.build()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1248, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1284, in _build
build_save=build_save, build_restore=build_restore)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 765, in _build_internal
restore_sequentially, reshape)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 428, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 268, in restore_op
[spec.tensor.dtype])[0])
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1031, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
op_def=op_def)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1625, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /content/lab/datalab/punctuation/model/
[[Node: save/RestoreV2_6 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_6/tensor_names, save/RestoreV2_6/shape_and_slices)]]
NotFoundError Traceback (most recent call last)
/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1349 try:
-> 1350 return fn(*args)
1351 except errors.OpError as e:
/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
1328 feed_dict, fetch_list, target_list,
-> 1329 status, run_metadata)
1330
/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py in exit(self, type_arg, value_arg, traceback_arg)
472 compat.as_text(c_api.TF_Message(self.status.status)),
--> 473 c_api.TF_GetCode(self.status.status))
474 # Delete the underlying status object from memory otherwise it stays alive
NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /content/lab/datalab/punctuation/model/
[[Node: save/RestoreV2_6 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_6/tensor_names, save/RestoreV2_6/shape_and_slices)]]
During handling of the above exception, another exception occurred:
NotFoundError Traceback (most recent call last)
in ()
2 from pprint import pprint
3
----> 4 cm_data = run_eval(model_file_prefix, '/content/lab/datalab/punctuation/datapreped/test.txt')
5 #'/content/datalab/docs/samples/TensorFlow'
6 pprint(cm_data.tolist())
in run_eval(model_file_prefix, test_data_path)
25 sv = tf.train.Supervisor(logdir=logdir)
26 with sv.managed_session() as session:
---> 27 sv.saver.restore(session, model_file_prefix)
28 test_perplexity, cm_data = run_epoch(session, mtest, 1, word_to_id, is_eval=True)
29 return cm_data
/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py in restore(self, sess, save_path)
1684 if context.in_graph_mode():
1685 sess.run(self.saver_def.restore_op_name,
-> 1686 {self.saver_def.filename_tensor_name: save_path})
1687 else:
1688 self._build_eager(save_path, build_save=False, build_restore=True)
/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
893 try:
894 result = self._run(None, fetches, feed_dict, options_ptr,
--> 895 run_metadata_ptr)
896 if run_metadata:
897 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1126 if final_fetches or final_targets or (handle and feed_dict_tensor):
1127 results = self._do_run(handle, final_targets, final_fetches,
-> 1128 feed_dict_tensor, options, run_metadata)
1129 else:
1130 results = []
/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1342 if handle is None:
1343 return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1344 options, run_metadata)
1345 else:
1346 return self._do_call(_prun_fn, self._session, handle, feeds, fetches)
/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1361 except KeyError:
1362 pass
-> 1363 raise type(e)(node_def, op, message)
1364
1365 def _extend_graph(self):
NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /content/lab/datalab/punctuation/model/
[[Node: save/RestoreV2_6 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_6/tensor_names, save/RestoreV2_6/shape_and_slices)]]
Caused by op 'save/RestoreV2_6', defined at:
File "/usr/local/envs/py3env/lib/python3.5/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/local/envs/py3env/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/main.py", line 3, in
app.launch_new_instance()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 474, in start
ioloop.IOLoop.instance().start()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/zmq/eventloop/ioloop.py", line 177, in start
super(ZMQIOLoop, self).start()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tornado/ioloop.py", line 887, in start
handler_func(fd_obj, events)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 276, in dispatcher
return self.dispatch_shell(stream, msg)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell
handler(stream, idents, msg)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 390, in execute_request
user_expressions, allow_stdin)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/ipykernel/zmqshell.py", line 501, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2728, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2850, in run_ast_nodes
if self.run_code(code, result):
File "/usr/local/envs/py3env/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 4, in
cm_data = run_eval(model_file_prefix, '/content/lab/datalab/punctuation/datapreped/test.txt')
File "", line 25, in run_eval
sv = tf.train.Supervisor(logdir=logdir)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 136, in new_func
return func(*args, **kwargs)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/supervisor.py", line 316, in init
self._init_saver(saver=saver)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/supervisor.py", line 464, in _init_saver
saver = saver_mod.Saver()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1239, in init
self.build()
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1248, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1284, in _build
build_save=build_save, build_restore=build_restore)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 765, in _build_internal
restore_sequentially, reshape)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 428, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 268, in restore_op
[spec.tensor.dtype])[0])
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1031, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
op_def=op_def)
File "/usr/local/envs/py3env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1625, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /content/lab/datalab/punctuation/model/
[[Node: save/RestoreV2_6 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_6/tensor_names, save/RestoreV2_6/shape_and_slices)]]
The path, and file, both exist.
When I use model_file_prefix = train(ttxt, vtxt, saved_model_path)
the code does work without error. Any idea why this occurs and how to fix it?
In the workbook at notebooks/samples/ML Toolbox/Regression/Census/2 Service Preprocess.ipynb the following section:
analysis_path = os.path.join(workspace_path, 'analysis') regression.analyze(dataset=train_data, output_dir=analysis_path, cloud=True)
gives the following error:
Running numerical analysis...Analyze: failed with error: The internal temporary file is not writable.
In the example "Flower Classification (large dataset experience)," the link to the blog post is incorrect. Presumably it should be the same as in the small-dataset-experience example.
Hi,
I was trying out the "Machine Learning with Financial Data" notebook.
In the tutorial video, the prediction accuracy of the first trivial model is 65% (as is described in the notebook's text), however, the current version of the notebook on github showed an accuracy of 90% and when I ran the notebook myself both on my PC and on Google Cloud, I got 84%. I'm wondering what is going on here. Can anyone explain that to me?
Thanks,
Yang
We use Jupyter notebooks to access BigTable data like so:
from google.cloud import bigtable
from google.cloud import happybase
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
connection = happybase.Connection(instance=instance)
table = connection.table(table_name)
for key, row in table.scan:
(we then convert this in Pandas DataFrames)
In regards to DataLab and DataProc integration - Jupyter Spark integration http://blog.insightdatalabs.com/jupyter-on-apache-spark-step-by-step/ is a thing in Data Science - so how can we leverage DataLab notebooks over Spark jobs running on DataProc (eg stepwise pyspark job definitions, visualising job results)?
Also , how do we leverage IPython Parallel https://ipyparallel.readthedocs.io/en/latest/ and Jupyter Cluster notebook extensions in DataLab ?
In Census/4 Service Evaluate.ipynb -- the correct schema to use when defining a BigQuery data source should use the preferred column names.
predicted_target -> predicted
target_from_input -> target
table.to_file('/tmp/cars.csv')
TypeErrorTraceback (most recent call last)
in ()
----> 1 table.to_file('/tmp/cars.csv')
/usr/local/lib/python2.7/dist-packages/datalab/bigquery/_table.pyc in to_file(self, destination, format, csv_delimiter, csv_header)
648 for column in self.schema:
649 fieldnames.append(column.name)
--> 650 writer = csv.DictWriter(f, fieldnames=fieldnames, delimiter=csv_delimiter)
651 if csv_header:
652 writer.writeheader()
/usr/lib/python2.7/csv.pyc in init(self, f, fieldnames, restval, extrasaction, dialect, args, *kwds)
135 extrasaction)
136 self.extrasaction = extrasaction
--> 137 self.writer = writer(f, dialect, args, *kwds)
138
139 def writeheader(self):
TypeError: "delimiter" must be string, not unicode
Creating a bug to track internal bug...
It's very difficult to understand anything about best practices for magics or modules without linking out to readthedocs.
This needs to be covered in tutorials and sample notebooks through markdown discussion and code and in help text for queries.
Specific worthwhile additions:
- How to handle large data (common complaint) for in-memory work - when retrieved from BQ - this is needed for GA
- How to handle large data in memory - Dataframe won't scale - this is post-GA and I have a separate tracking bug to use Graphlab's OSS alternative.
We should figure out how to better surface reference docs, as well as improve docs with a how-to set of notebooks to cover this sort of information.
I'm trying to run the BigQuery API tutorial that is included in datalabs, but I'm hitting an error on the very first step:
# Create and run a SQL query
bq.Query('SELECT * FROM [cloud-datalab-samples:httplogs.logs_20140615] LIMIT 3').results()
I feel like I might be missing something basic. Here is the Traceback:
ExceptionTraceback (most recent call last)
<ipython-input-2-03b0534f5548> in <module>()
1 # Create and run a SQL query
----> 2 bq.Query('SELECT * FROM [cloud-datalab-samples:httplogs.logs_20140615] LIMIT 3').results()
/usr/local/lib/python2.7/dist-packages/datalab/bigquery/_query.pyc in results(self, use_cache, dialect, billing_tier)
226 """
227 if not use_cache or (self._results is None):
--> 228 self.execute(use_cache=use_cache, dialect=dialect, billing_tier=billing_tier)
229 return self._results.results
230
/usr/local/lib/python2.7/dist-packages/datalab/bigquery/_query.pyc in execute(self, table_name, table_mode, use_cache, priority, allow_large_results, dialect, billing_tier)
524 job = self.execute_async(table_name=table_name, table_mode=table_mode, use_cache=use_cache,
525 priority=priority, allow_large_results=allow_large_results,
--> 526 dialect=dialect, billing_tier=billing_tier)
527 self._results = job.wait()
528 return self._results
/usr/local/lib/python2.7/dist-packages/datalab/bigquery/_query.pyc in execute_async(self, table_name, table_mode, use_cache, priority, allow_large_results, dialect, billing_tier)
479 billing_tier=billing_tier)
480 except Exception as e:
--> 481 raise e
482 if 'jobReference' not in query_result:
483 raise Exception('Unexpected response from server')
Exception: Failed to send HTTP request.
Copying sample notebooks from GCS is no longer necessary or appropriate. So Readme.ipynb needs to be updated to show git command instead.
delete wrong git
Illustrate how to use LIME
Upon running for s in sources: source, predicted = predictor.predict(s) print('\n---SOURCE----\n' + source) print('---PREDICTED----\n' + predicted)
AttributeError Traceback (most recent call last)
in ()
9
10 for s in sources:
---> 11 source, predicted = predictor.predict(s)
12 print('\n---SOURCE----\n' + source)
13 print('---PREDICTED----\n' + predicted)
in predict(self, content)
88 for i in indices:
89 words1[i], words1[i-1] = words1[i-1], words1[i]
---> 90 words2 = [self._word_to_id.keys()[self._word_to_id.values().index(data_x[index])] for index in range(len(puncts) - 1, len(data_x))]
91 all_words = words1 + [puncts[-1]] + words2
92 content = ' '.join(all_words)
in (.0)
88 for i in indices:
89 words1[i], words1[i-1] = words1[i-1], words1[i]
---> 90 words2 = [self._word_to_id.keys()[self._word_to_id.values().index(data_x[index])] for index in range(len(puncts) - 1, len(data_x))]
91 all_words = words1 + [puncts[-1]] + words2
92 content = ' '.join(all_words)
AttributeError: 'dict_values' object has no attribute 'index'
This is how index is defined:
words1 = [self._word_to_id.keys()[self._word_to_id.values().index(data_x[index])] for index in range(len(puncts) - 1)]
indices = [i for i, w in enumerate(words1) if w in PUNCTUATIONS]
for i in indices:
words1[i], words1[i-1] = words1[i-1], words1[i]
#only line in for loop
words2 = [self._word_to_id.keys()[self._word_to_id.values().index(data_x[index])] for index in range(len(puncts) - 1, len(data_x))]
all_words = words1 + [puncts[-1]] + words2
content = ' '.join(all_words)
min_step = len(puncts)
Can anyone explain why this occurs and/or how to fix this?
Visiting https://github.com/googledatalab/notebooks/blob/master/tutorials/Data/Interactive%20Charts%20with%20Google%20Charting%20APIs.ipynb the last two charts do not have their output embedded (Out[10] + Out[12]).
githubarchive:github.timeline table seems to have disappeared from BQ. There are some other github tables in bigquery-public-data: github_repos that need to be checked or the sample needs to be dropped.
I'm running through /docs/tutorials/BigQuery/BigQuery%20APIs.ipynb
In the second cell I get:
# Create and run a SQL query
bq.Query('SELECT * FROM [cloud-datalab-samples:httplogs.logs_20140615] LIMIT 3').results()
RequestExceptionTraceback (most recent call last)
<ipython-input-2-03b0534f5548> in <module>()
1 # Create and run a SQL query
----> 2 bq.Query('SELECT * FROM [cloud-datalab-samples:httplogs.logs_20140615] LIMIT 3').results()
/usr/local/lib/python2.7/dist-packages/datalab/bigquery/_query.pyc in results(self, use_cache, dialect, billing_tier)
226 """
227 if not use_cache or (self._results is None):
--> 228 self.execute(use_cache=use_cache, dialect=dialect, billing_tier=billing_tier)
229 return self._results.results
230
/usr/local/lib/python2.7/dist-packages/datalab/bigquery/_query.pyc in execute(self, table_name, table_mode, use_cache, priority, allow_large_results, dialect, billing_tier)
524 job = self.execute_async(table_name=table_name, table_mode=table_mode, use_cache=use_cache,
525 priority=priority, allow_large_results=allow_large_results,
--> 526 dialect=dialect, billing_tier=billing_tier)
527 self._results = job.wait()
528 return self._results
/usr/local/lib/python2.7/dist-packages/datalab/bigquery/_query.pyc in execute_async(self, table_name, table_mode, use_cache, priority, allow_large_results, dialect, billing_tier)
479 billing_tier=billing_tier)
480 except Exception as e:
--> 481 raise e
482 if 'jobReference' not in query_result:
483 raise Exception('Unexpected response from server')
RequestException: HTTP request failed: Invalid project ID '<PROJECT_ID>'. Project IDs must contain 6-63 lowercase letters, digits, or dashes. IDs must start with a letter and may not end with a dash.
The query itself runs fine in BigQuery.
On the Image Classification Service End-End notebook, the first paragraph discusses a blog post. The blogpost links to a localhost page. Can you please update the link to the blog post page? Thank you!
Based on a customer request:
Show a sample to do basic I/O from/to GCS with dataframe in between
Usage of Facets can be found at the samples such as:
https://github.com/googledatalab/notebooks/blob/master/samples/contrib/mlworkbench/structured_data_classification_police/Predict%20Case%20Resolution%20(small%20data%20experience).ipynb
Given the importance of data visualization, please consider creating a standalone notebook to address best practices and how to use the built-in Facets feature in Datalab.
Besides, please add concrete examples using a sample dataset to explain how to interpret the Facets result, and how to transfer the insight into action items, such as data cleaning or feature engineering.
One possible structure can be:
Some of the documentations are outdated and left behind the GA version.
For example in BigQuery magic cells page, %%bq is not used in GA version
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.