pixiedust / pixiedust_node Goto Github PK

View Code? Open in Web Editor NEW

213.0 213.0 24.0 90 KB

Jupyter magic to allow Node.js code to run in a notebook

Home Page: https://medium.com/ibm-watson-data-lab/running-node-js-notebooks-in-watson-studio-a8f6545d8299

License: Apache License 2.0

Python 54.46% JavaScript 45.24% Shell 0.30%

javascript jupyter nodejs notebook pixiedust-node

pixiedust_node's Introduction

PixieDust

PixieDust is a productivity tool for Python or Scala notebooks, which lets a developer encapsulate business logic into something easy for your customers to consume.

New Book now available: Thoughtful Data Science

This book published by Packt Publishing is the user and developer reference for using PixieDust

Pixiedust developer community

Wait! There is a developer community? Yes there is! If you already are a member, login. If you would like to contribute please join us.

Why you need it

Notebooks are a powerful tool for fast and flexible data analysis. But the learning curve is steep.

Python data science notebooks were first popularized in academia, and there are some formalities to work through before you can get to your analysis. For example, in a Python interactive notebook, a mundane task like creating a simple chart or saving data into a persistence repository requires mastery of complex code like this matplotlib snippet:

All this for a chart?

Once you do create a notebook that provides great data insights, it's hard to share with business users, who don't want to slog through all that dry, hard-to-read code, much less tweak it and collaborate.

PixieDust to the rescue.

What is PixieDust?

PixieDust is an open source helper library that works as an add-on to Jupyter notebooks to improve the user experience of working with data. It also fills a gap for users who have no access to configuration files when a notebook is hosted on the cloud.

Use in Python or Scala

PixieDust greatly simplifies working with Python display libraries like matplotlib, but works just as effectively in Scala notebooks too. You no longer have compromise your love of Scala to generate great charts. PixieDust lets you bring robust Python visualization options to your Scala notebooks. Installer and instructions to use Scala with PixieDust are coming soon...

Features

PixieDust's current capabilities include:

packageManager lets you install Spark packages inside a Python notebook. This is something that you can't do today on hosted Jupyter notebooks, which prevents developers from using a large number of spark package add-ons.
Visualizations. One single API called display() lets you visualize your Spark object in different ways: table, charts, maps, etc.... This module is designed to be extensible, providing an API that lets anyone easily contribute a new visualization plugin.

This sample visualization plugin uses d3 to show the different flight routes for each airport:
Embedded apps. Let nonprogrammers actively use notebooks. Transform a hard-to-read notebook into a polished graphic app for business users. Check out these preliminary sample apps:
- An app can feature embedded forms and responses, flightpredict, which lets users enter flight details to see the likelihood of landing on-time.
- Or present a sophisticated workflow, like our twitter demo, which delivers a real-time feed of tweets, trending hashtags, and aggregated sentiment charts with Watson Tone Analyzer.
Extensibility. Create your own visualizations or apps using the PixieDust extensibility APIs. If you know html and css, you can write and deliver amazing graphics without forcing notebook users to type one line of code. Use the shape of the data to control when PixieDust shows your visualization in a menu.
Export. Notebook users can download data to .csv, HTML, JSON, etc. locally on your laptop or into a variety of back-end data sources, like Cloudant, dashDB, GraphDB, etc...
Scala Bridge. Use Scala directly in your Python notebook. Variables are automatically transfered from Python to Scala and vice-versa. Learn more.

Or start in a Scala notebook. As mentioned, all these PixieDust features work not only in Python, but in Scala too. So if you prefer Scala, you'll soon be able to start there and use PixieDust to insert sophisticated Python graphic options within your Scala notebook. Instructions coming soon.
Spark progress monitor. Track the status of your Spark job. No more waiting in the dark. Notebook users can now see how a cell's code is running behind the scenes.

Watch this video to see PixieDust in action:

Usage

You can use PixieDust locally or online within IBM's Watson Studio.

Use online

To use PixieDust online

Sign up for a free trial on IBM's Watson Studio
Create a new notebook from URL using this template and learn the basics

https://github.com/pixiedust/pixiedust/blob/master/notebook/DSX/Welcome%20to%20PixieDust.ipynb
Review the documentation

Use locally

Pixiedust supports

Spark 1.6 or 2.0
Python 2.7 or 3.5

Sample notebooks

Wherever you prefer to work, try out the following sample notebooks:

Welcome to PixieDust The ultimate notebook to get started with PixieDust.
Intro to PixieDust. Uses PackageManager to install GraphFrames, generates a dataframe from a simple data set, and lets you try the display() API. See also: Intro to PixieDust for Spark 2.x
Mapping Intro lets you load sample data sets, explore display() API features, including maps.

Tutorials

Discover hidden Facebook usage insights
FlightPredict II: The Sequel shows how to predict flight delays with PixieDust. Includes an embedded app
Sentiment Analysis of Twitter Hashtags with Spark revisits a spark streaming app this time using PixieDust and Jupyter. Includes an embedded app.

Contribute

Note: PixieDust currently supports Spark DataFrames, Spark GraphFrames and Pandas DataFrames, with more to come. If you can't wait, write your own today and contribute it back.

Read how to contribute for details on our code of conduct and instructions for submitting pull requests to us.

Developer Guide

Dive into the PixieDust developer docs and learn how to build your own custom visualization or embedded app. You can also pitch in and contribute an enhancement to PixieDust's core features.

We can't wait to see what you build.

License

Apache License, Version 2.0.

For details and all the legalese, read LICENSE.

pixiedust_node's People

Contributors

Stargazers

Watchers

pixiedust_node's Issues

Python 3.x: Typerror following the medium post examples

!pip install pixiedust
!pip install pixiedust_node

import pixiedust_node

%%node
var date = new Date();
print(date);

TypeError                                 Traceback (most recent call last)
<ipython-input-4-26ef6f3fb6e5> in <module>()
----> 1 get_ipython().run_cell_magic('node', '', 'var date = new Date();\nprint(date);')

/opt/conda/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2129             magic_arg_s = self.var_expand(line, stack_depth)
   2130             with self.builtin_trap:
-> 2131                 result = fn(magic_arg_s, cell)
   2132             return result
   2133 

<decorator-gen-126> in node(self, line, cell)

/opt/conda/lib/python3.6/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

/opt/conda/lib/python3.6/site-packages/pixiedust_node/__init__.py in node(self, line, cell)
     48     def node(self, line, cell):
     49         # write the cell contents to the Node.js process
---> 50         self.n.write(cell)
     51 
     52 try:

/opt/conda/lib/python3.6/site-packages/pixiedust_node/node.py in write(self, s)
     53 
     54     def write(self, s):
---> 55         self.ps.stdin.write(s)
     56         self.ps.stdin.write("\r\n")
     57 

TypeError: a bytes-like object is required, not 'str'

Any ideas?

Running `npm` calls causes 100% CPU usage

If I use a npm.install('packagename') call, the host machine's CPU rises to 100% and doesn't come down, even after the npm install operation has finished.

Exception from popen in python 3.5

Trying to import pixiedust_node fails in Python 3.5, with the following error:

Pixiedust database opened successfully

Pixiedust version 1.1.17

Table USER_PREFERENCES created successfully
Table service_connections created successfully

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-4a0fd16c56d9> in <module>()
----> 1 import pixiedust_node

~/miniconda/envs/tmp/lib/python3.5/site-packages/pixiedust_node/__init__.py in <module>()
     60         # start up a Node.js sub-process running a REPL
     61         path = os.path.join(__path__[0], 'pixiedustNodeRepl.js')
---> 62         node = Node(path)
     63 
     64         # pass the node process to the Node magics

~/miniconda/envs/tmp/lib/python3.5/site-packages/pixiedust_node/node.py in __init__(self, path)
    208         # process that runs the Node.js code
    209         args = (self.node_path, path)
--> 210         self.ps = self.popen(args)
    211         #print ("Node process id", self.ps.pid)
    212 

TypeError: __init__() got an unexpected keyword argument 'encoding'

In the code, the encoding argument is passed to popen if the Python version is 3 or greater:

    if sys.version_info.major == 3:
        popen_kwargs['encoding'] = 'utf-8'

But the encoding argument was only added to popen in python 3.6 (see: https://docs.python.org/3.6/library/subprocess.html). I'm not sure if just changing the if to sys.version_info.major == 3 and sys.version_info.minor > 6 will fix the problem.

Exception thrown when calling Jupyter function

Hi,
I am trying to get the cell index in javascript and then using it as a python variable.
I tried execute this line to capture the cell index:

%%node
var cell_index=Jupyter.notebook.get_selected_index()

But I got this message:

Thrown:

This line is working in jupyter (but the problem is that it's javacript variable and not python):

%%javascript
var cell_index=Jupyter.notebook.get_selected_index()

Do you have any solution for this problem?

404: Link needs update for installing Pixiedust

Currently, in the prerequisites section, you have the following sentence with a link for installing Pixiedust locally. It leads to a 404 page:

Notebooks can be run locally by installing Pixiedust and its prerequisites.

Working Directory should match Jupyter Python environment

It seems like the working directory for the node process should match the python working directory in jupyter. I think, related to #38, this would also default the npm location to this same working directory which makes sense and mirrors the behavior of the %%script node jupyter cell magic.

I'm happy to put together a PR here if needed. My expertise is definitely more on the node side and less on the jupyter and python side so feel free to correct any misunderstandings I may have here.

print doesn't create any output after display was invoked

PD 1.1.7, PD_node 0.2.1

import pixiedust_node

%%node
print('Hello world')

Hello world

%%node
var x = [];
x.push({x:1, y:2});
display(x);

...

%%node
print('Hello world')

** no output **

Use Nodejs vars in Python

Today, I can use store to save Nodejs vars for use in Python cells, but I can't do the opposite. That is, I can't access Python vars in %%node cells.

Always return execution result of first cell

I always get a result of the first node cell all the time

Installing Node if it's not there

If pixiedust_node is installed, could it conceivably install Node.js (node/npm) on the target system? If so how?

@DTAIEB said

"yeah, I think we could do a command line download/install of node under PIXIEDUST_HOME directory when we detect it’s not there"

Currently, we assume node is in the PATH so we run

self.ps = subprocess.Popen( ('node', path), stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

This could be changed to:

home = get_ipython().home_dir
self.ps = subprocess.Popen( ('node', path), stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, cwd = home)

This would ensure node's home directory is the same as Pixiedust's.

NPM Modules Installed in Local Directory

npm.install() seems to install modules in the user directory. Is there some way in pixiedust_node to refer to modules installed in the same directory as the notebook file or control the installation location of npm.install()?

rename npm.remove to npm.uninstall or add alias to conform with npm synopsis

Follow npm's syntax for removing a package: https://docs.npmjs.com/cli/uninstall.

Is there a way to Download all the node cells as Javascript file?

TypeError: x.get is not a function

We saw the following error intermittently:

Sequence:

%%node
// connect to Cloudant using cloudant-quickstart
var cqs = require('cloudant-quickstart');
var cities = cqs('https://...-bluemix.cloudant.com/cities');

[works]

%%node
cities.get('2636749').then(print).catch(print);

[fails sometimes; see screen cap]

%%node
cities.get(['4562407','2636749','3530597']).then(print).catch(print);

Once the error was encountered, cities is rendered unusable (no method calls work anymore)

what determines the node version for pixiedust node?

how can I set it to a specific version like LTS

thanks

how to run whole node-js code cells in single shot without %%node

Hey Guys,

How I execute the whole notebook without giving %%node in every cell.

It's very irritating to add %%node in every cell.
We need something RunAll Cell in a single hit

thanks,

npm.uninstall() does not work (though npm.remove() does)

The README suggests they both work.

How do I convert an array of values so that they are correctly understood by `display()`?

I'm new to Jupyter and pixiedust and I'm having a hardtime on my first experiments with it.

I'm looping through an array to recover values by time stamp and put them in a new array to be displayed as a line graph.

        var data = [];
        body.donors.forEach(
            function(donor) {
                var epochDate = new Date(donor.data_envio).setHours(0,0,0,0);
                var obj = {
                    date: new Date(epochDate),
                    valor: Number(donor.valor)
                };
                data.push(obj);
            }
        );
        display(data);

When I do this, the generated chart says "x must be a label or position".

Opening "Options", the date field is shown as "string".

I've tried formating the date field as ISO 8601 but it is still understood as string.

I've found no information/documentation on how to "cast" my data so that pixiedust correctly understands it.

Running npm package in cell like one would do on command line

Is it possible to run a npm package within a cell after it's been installed like you would from the command line? I have installed shp2json with npm.install('shp2json') and would like to then run it with !shp2json /path/to/shapefile > output.json but receive an error saying /bin/sh: 1: shp2json: not found

Cleanup: terminate the pixie dust_node background process when the notebook server shuts down

If one opens a notebook and imports the pixie dust_node package, a Node.js background process (pixiedustNodeRel.js) is launched. This process currently remains running after the notebook server is shut down. If possible, we should terminate the process since it doesn't serve a purpose anymore at that point.

why does it print Uncaught for no reason

every declaration will print Uncaught why?

weird dotted output

a node cell that does only function declaration
is outputting this weird dotted output

thanks

Cannot run node commands - a byte like object is required

Could not start a node command. Got the message that byte object is required. I installed jupyter using anaconda3. Here is how I fixed it

Added encode when writing to stdin
When reading from stdout, convert from byte to string
Also needed to flush after writing to stdin

Let me know if you need it as a pull request

AttributeError: module 'jinja2.ext' has no attribute 'with_'

when trying to import pixiedust_node into jupyter-lab, an error occurs: AttributeError: module 'jinja2.ext' has no attribute 'with_'; jinja2 is installed and all permissions are given (checked in the folder with env)

Does this still work? Deprecation and Uncaught error.

When I try to import this project, I get a deprecation error, which (from what I can see) means this project doesn't work? I tried creating a variable in Python and accessing it in node/JS, and it just says Uncaught :/

Thanks!

Will not display any graphs

This is the error I got:

def join_path(self, template, parent):
\n in template()
\nTemplateAssertionError: no filter named 'tojson'\n

x.sum('field').then(console.log) triggers 'int' object has no attribute 'getitem' error

PD 1.1.7 pd_node 0.2.3

Didn't try other aggregation operators.

%%node
var cqs = require('cloudant-quickstart');
const cities = cqs('https://56953ed8-3fba-4f7e-824e-5498c8e1d18e-bluemix.cloudant.com/cities');
cities.get('2636749').then(print).catch(console.error);
// works
cities.get('2636749').then(console.log).catch(console.error);
// works
cities.sum('population').then(print).catch(print);
2694222973
// fails
cities.sum('population').then(console.log).catch(print);

2694222973
'int' object has no attribute '__getitem__'

Workaround: use print instead of console.log or console.error

Source notebook: https://github.com/ibm-watson-data-lab/nodebook-code-pattern/blob/master/notebooks/nodebook_1.ipynb

print statement that is not compatible with Python 3 in last release

I see from your Python3 milestone that you are working toward Python 3 compatibility and have already merged in many fixes. However, your most recent release (v0.2.0) contains a fairly new line of code that does not work in Python 3. In my notebook I see this right away:

import pixiedust_node

Traceback (most recent call last):

  File "c:\python36\lib\site-packages\IPython\core\interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-1-4377929b0946>", line 1, in <module>
    import pixiedust_node

  File "c:\python36\lib\site-packages\pixiedust_node\__init__.py", line 20, in <module>
    from .node import Node, Npm

  File "c:\python36\lib\site-packages\pixiedust_node\node.py", line 91
    print '!!! Warning: store is now deprecated - Node.js global variables are automatically propagated to Python !!!'
                                                                                                                     ^
SyntaxError: Missing parentheses in call to 'print'

Error running npm.list

When I run npm.list from a %%node cell in a notebook I receive the following error:

CalledProcessError: Command '['npm', 'list', '-s']' returned non-zero exit status 1

Here is the out from npm list -s from the command line:

/Users/markwatson
├── UNMET PEER DEPENDENCY @angular/[email protected]
├── UNMET PEER DEPENDENCY @angular/[email protected]
├── UNMET PEER DEPENDENCY @angular/[email protected]
├── UNMET PEER DEPENDENCY @angular/[email protected]
├── UNMET PEER DEPENDENCY @angular/[email protected]
├── UNMET PEER DEPENDENCY @angular/[email protected]
├── UNMET PEER DEPENDENCY @angular/[email protected]
├── UNMET PEER DEPENDENCY @angular/[email protected]
├── [email protected]
├─┬ [email protected]
│ ├── [email protected]
│ ├─┬ [email protected]
...

Print statements are sometimes delayed until running the cell a second time

I created the following cell:

%%node
print(new Date());

Sometimes this works without issue. Sometimes the value is not printed. Running the cell a second time results in two printouts:

%%node
print(new Date());

"2017-08-29T21:09:40.994Z"
"2017-08-29T21:09:46.047Z"

Problem with mysql module

Hi!
I'm experiencing trouble trying to use node-mysql within a pixiedust-node project.

The cell setup is the following:

import pixiedust_node
npm.install('mysql')

This works as expected showing the logos of both pixiedust and pixiedust node and then the stats of the module install.
The problem comes with the nodejs part. The code works perfectly in a nodejs project, but not within pixiedust-node:

%%node
var mysql = require('mysql');
var dburl = 'database-url.com';
var con = mysql.createConnection({
  host: dburl,
  user: 'dbuser',
  password: 'dbpassword',
  database: 'dbdatabase'
});
con.connect(function(err) {
    if (err) throw err;
});
var query ='SELECT user_created_date from user LIMIT 100';
con.query(query, function (err, result, fields) {
            if (err) throw err;
            console.log(result);
});
con.end();

I write it down here as a single block, but I've tried it in different cell layouts to find more precisely where the error is located. And it is located in the mysql.createConnection() call:

... ... ... ... ... TypeError: Converting circular structure to JSON
at JSON.stringify (<anonymous>)
at globalVariableChecker (/home/javier/anaconda3/lib/python3.7/site-packages/pixiedust_node/pixiedustNodeRepl.js:26:22)
at REPLServer.writer (/home/javier/anaconda3/lib/python3.7/site-packages/pixiedust_node/pixiedustNodeRepl.js:67:5)
at finish (repl.js:683:38)
at finishExecution (repl.js:310:7)
at REPLServer.defaultEval (repl.js:396:7)
at bound (domain.js:395:14)
at REPLServer.runBound [as eval] (domain.js:408:12)
at REPLServer.onLine (repl.js:639:10)
at REPLServer.emit (events.js:182:13)
/home/javier/anaconda3/lib/python3.7/site-packages/pixiedust_node/pixiedustNodeRepl.js:26
const j = JSON.stringify(r.context[v]);
^
TypeError: Converting circular structure to JSON
at JSON.stringify (<anonymous>)
at Timeout.globalVariableChecker [as _onTimeout] (/home/javier/anaconda3/lib/python3.7/site-packages/pixiedust_node/pixiedustNodeRepl.js:26:22)
at ontimeout (timers.js:436:11)
at tryOnTimeout (timers.js:300:5)
at unrefdHandle (timers.js:520:7)
at Timer.processTimers (timers.js:222:12)

Running multiple cells yields unexpected results ...

Based on IBM/nodejs-in-notebooks#5, which was raised against the sample notebook. Logging it here for reference; in general "run X" cells should probably used with caution to avoid unexpected results. Running one cell at a time the issues are not observed.

Erratic output behaviour

Sometimes, output from Node cells and/or node functions is cut or not displayed at all.
Short descriptive video (trying to display the result of the help() function) available here.

Wrong working directory used when running node

The node subprocess is launched using the wrong cwd (see https://github.com/ibm-watson-data-lab/pixiedust_node/blob/master/pixiedust_node/node.py#L27)
it should match the npm command which is using the current working directory.

I think we should have both node and npm using the central working directory e.g. PIXIEDUST_HOME/node.

Note: you can get the PIXIEDUST_HOME directory using the Environment class
from pixiedust.utils.environment import Environment
Environment.pixiedustHome

/cc @glynnbird @bradnoble

add requests to install dependencies

When importing pixiedust_node in a notebook, the following error happens:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-9-327dd5855a1c> in <module>()
      8 from scipy.interpolate import interp1d
      9 from skimage.draw import bezier_curve
---> 10 import pixiedust_node

~/dev/readonce/venv/lib/python3.6/site-packages/pixiedust_node/__init__.py in <module>()
     19 from IPython.core.error import TryNext
     20 import warnings
---> 21 from .node import Node, Npm
     22 import os
     23 from pixiedust.utils.shellAccess import ShellAccess

~/dev/readonce/venv/lib/python3.6/site-packages/pixiedust_node/node.py in <module>()
     11 import IPython
     12 import pandas
---> 13 from pixiedust.display import display
     14 from pixiedust.utils.environment import Environment
     15 from pixiedust.utils.shellAccess import ShellAccess

~/dev/readonce/venv/lib/python3.6/site-packages/pixiedust/__init__.py in <module>()
     29 
     30     #shortcut to logging
---> 31     import pixiedust.utils.pdLogging as pdLogging
     32     logger = pdLogging.getPixiedustLogger()
     33     getLogger = pdLogging.getLogger

~/dev/readonce/venv/lib/python3.6/site-packages/pixiedust/utils/__init__.py in <module>()
     16 
     17 import os
---> 18 from . import storage
     19 import pkg_resources
     20 import binascii

~/dev/readonce/venv/lib/python3.6/site-packages/pixiedust/utils/storage.py in <module>()
     26 from pkg_resources import get_distribution
     27 from re import search
---> 28 from requests import post
     29 from os import environ as env
     30 from pixiedust.utils.printEx import *

So the requests package should probably be explicitely listed in install_requires.

TypeError: Converting circular structure to JSON at JSON.stringify

I'm attempting to use the node-postgres library from pixiedust_node. The following simple setup fails:

import pixiedust_node
npm.install(('node-fetch', 'pg'))

var { Pool } = require('pg');
var pool = new Pool({
  user: 'congress23',
  host: 'localhost',
  database: 'vol_congress23',
  password: '',
  port: 5431,
});
pool.query('SELECT NOW()', (err, res) => {
    console.log(err,res);
});

Oddly, the error occurs even when trying to console.log("it works") instead of the results, so I'm not even sure where/what is causing the circular reference error. The same code executes just fine when run directly from node. Here's the stacktrace:

TypeError: Converting circular structure to JSON
at JSON.stringify (<anonymous>)
at globalVariableChecker (/anaconda3/lib/python3.6/site-packages/pixiedust_node/pixiedustNodeRepl.js:26:22)
at REPLServer.writer (/anaconda3/lib/python3.6/site-packages/pixiedust_node/pixiedustNodeRepl.js:67:5)
at finish (repl.js:512:38)
at REPLServer.defaultEval (repl.js:279:5)
at bound (domain.js:301:14)
at REPLServer.runBound [as eval] (domain.js:314:12)
at REPLServer.onLine (repl.js:468:10)
at emitOne (events.js:116:13)
at REPLServer.emit (events.js:211:7)
er
/anaconda3/lib/python3.6/site-packages/pixiedust_node/pixiedustNodeRepl.js:26
const j = JSON.stringify(r.context[v]);
^
TypeError: Converting circular structure to JSON
at JSON.stringify (<anonymous>)
at Timeout.globalVariableChecker [as _onTimeout] (/anaconda3/lib/python3.6/site-packages/pixiedust_node/pixiedustNodeRepl.js:26:22)
at ontimeout (timers.js:482:11)
at Timer.unrefdHandle (timers.js:595:5)