Giter Site home page Giter Site logo

pixiedust / pixiedust_node Goto Github PK

View Code? Open in Web Editor NEW
213.0 213.0 24.0 90 KB

Jupyter magic to allow Node.js code to run in a notebook

Home Page: https://medium.com/ibm-watson-data-lab/running-node-js-notebooks-in-watson-studio-a8f6545d8299

License: Apache License 2.0

Python 54.46% JavaScript 45.24% Shell 0.30%
javascript jupyter nodejs notebook pixiedust-node

pixiedust_node's Introduction

PixieDust

PyPI version Build Status

PixieDust is a productivity tool for Python or Scala notebooks, which lets a developer encapsulate business logic into something easy for your customers to consume.

New Book now available: Thoughtful Data Science

This book published by Packt Publishing is the user and developer reference for using PixieDust

Pixiedust developer community

Wait! There is a developer community? Yes there is! If you already are a member, login. If you would like to contribute please join us.

Why you need it

Notebooks are a powerful tool for fast and flexible data analysis. But the learning curve is steep.

Python data science notebooks were first popularized in academia, and there are some formalities to work through before you can get to your analysis. For example, in a Python interactive notebook, a mundane task like creating a simple chart or saving data into a persistence repository requires mastery of complex code like this matplotlib snippet:

All this for a chart?
All this for a chart?

Once you do create a notebook that provides great data insights, it's hard to share with business users, who don't want to slog through all that dry, hard-to-read code, much less tweak it and collaborate.

PixieDust to the rescue.

What is PixieDust?

PixieDust is an open source helper library that works as an add-on to Jupyter notebooks to improve the user experience of working with data. It also fills a gap for users who have no access to configuration files when a notebook is hosted on the cloud.

Use in Python or Scala

PixieDust greatly simplifies working with Python display libraries like matplotlib, but works just as effectively in Scala notebooks too. You no longer have compromise your love of Scala to generate great charts. PixieDust lets you bring robust Python visualization options to your Scala notebooks. Installer and instructions to use Scala with PixieDust are coming soon...

Features

PixieDust's current capabilities include:

  • packageManager lets you install Spark packages inside a Python notebook. This is something that you can't do today on hosted Jupyter notebooks, which prevents developers from using a large number of spark package add-ons.

  • Visualizations. One single API called display() lets you visualize your Spark object in different ways: table, charts, maps, etc.... This module is designed to be extensible, providing an API that lets anyone easily contribute a new visualization plugin.

    This sample visualization plugin uses d3 to show the different flight routes for each airport:

    graph map

  • Embedded apps. Let nonprogrammers actively use notebooks. Transform a hard-to-read notebook into a polished graphic app for business users. Check out these preliminary sample apps:

    • An app can feature embedded forms and responses, flightpredict, which lets users enter flight details to see the likelihood of landing on-time.
    • Or present a sophisticated workflow, like our twitter demo, which delivers a real-time feed of tweets, trending hashtags, and aggregated sentiment charts with Watson Tone Analyzer.
  • Extensibility. Create your own visualizations or apps using the PixieDust extensibility APIs. If you know html and css, you can write and deliver amazing graphics without forcing notebook users to type one line of code. Use the shape of the data to control when PixieDust shows your visualization in a menu.

  • Export. Notebook users can download data to .csv, HTML, JSON, etc. locally on your laptop or into a variety of back-end data sources, like Cloudant, dashDB, GraphDB, etc...

    save as options

  • Scala Bridge. Use Scala directly in your Python notebook. Variables are automatically transfered from Python to Scala and vice-versa. Learn more.

    Or start in a Scala notebook. As mentioned, all these PixieDust features work not only in Python, but in Scala too. So if you prefer Scala, you'll soon be able to start there and use PixieDust to insert sophisticated Python graphic options within your Scala notebook. Instructions coming soon.

  • Spark progress monitor. Track the status of your Spark job. No more waiting in the dark. Notebook users can now see how a cell's code is running behind the scenes.

Watch this video to see PixieDust in action:

about PixieDust

Usage

You can use PixieDust locally or online within IBM's Watson Studio.

Use online

To use PixieDust online

Use locally

  • Pixiedust supports
  • Spark 1.6 or 2.0
  • Python 2.7 or 3.5

Sample notebooks

Wherever you prefer to work, try out the following sample notebooks:

Tutorials

Contribute

Note: PixieDust currently supports Spark DataFrames, Spark GraphFrames and Pandas DataFrames, with more to come. If you can't wait, write your own today and contribute it back.

Read how to contribute for details on our code of conduct and instructions for submitting pull requests to us.

Developer Guide

Dive into the PixieDust developer docs and learn how to build your own custom visualization or embedded app. You can also pitch in and contribute an enhancement to PixieDust's core features.

We can't wait to see what you build.

License

Apache License, Version 2.0.

For details and all the legalese, read LICENSE.

pixiedust_node's People

Contributors

bradnoble avatar elainethale avatar glynnbird avatar jsnowacki avatar ptitzler avatar shnizzedy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pixiedust_node's Issues

Python 3.x: Typerror following the medium post examples

!pip install pixiedust
!pip install pixiedust_node
import pixiedust_node
%%node
var date = new Date();
print(date);
TypeError                                 Traceback (most recent call last)
<ipython-input-4-26ef6f3fb6e5> in <module>()
----> 1 get_ipython().run_cell_magic('node', '', 'var date = new Date();\nprint(date);')

/opt/conda/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2129             magic_arg_s = self.var_expand(line, stack_depth)
   2130             with self.builtin_trap:
-> 2131                 result = fn(magic_arg_s, cell)
   2132             return result
   2133 

<decorator-gen-126> in node(self, line, cell)

/opt/conda/lib/python3.6/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

/opt/conda/lib/python3.6/site-packages/pixiedust_node/__init__.py in node(self, line, cell)
     48     def node(self, line, cell):
     49         # write the cell contents to the Node.js process
---> 50         self.n.write(cell)
     51 
     52 try:

/opt/conda/lib/python3.6/site-packages/pixiedust_node/node.py in write(self, s)
     53 
     54     def write(self, s):
---> 55         self.ps.stdin.write(s)
     56         self.ps.stdin.write("\r\n")
     57 

TypeError: a bytes-like object is required, not 'str'

Any ideas?

Exception from popen in python 3.5

Trying to import pixiedust_node fails in Python 3.5, with the following error:

Pixiedust database opened successfully

Pixiedust version 1.1.17

Table USER_PREFERENCES created successfully
Table service_connections created successfully

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-4a0fd16c56d9> in <module>()
----> 1 import pixiedust_node

~/miniconda/envs/tmp/lib/python3.5/site-packages/pixiedust_node/__init__.py in <module>()
     60         # start up a Node.js sub-process running a REPL
     61         path = os.path.join(__path__[0], 'pixiedustNodeRepl.js')
---> 62         node = Node(path)
     63 
     64         # pass the node process to the Node magics

~/miniconda/envs/tmp/lib/python3.5/site-packages/pixiedust_node/node.py in __init__(self, path)
    208         # process that runs the Node.js code
    209         args = (self.node_path, path)
--> 210         self.ps = self.popen(args)
    211         #print ("Node process id", self.ps.pid)
    212 

TypeError: __init__() got an unexpected keyword argument 'encoding'

In the code, the encoding argument is passed to popen if the Python version is 3 or greater:

    if sys.version_info.major == 3:
        popen_kwargs['encoding'] = 'utf-8'

But the encoding argument was only added to popen in python 3.6 (see: https://docs.python.org/3.6/library/subprocess.html). I'm not sure if just changing the if to sys.version_info.major == 3 and sys.version_info.minor > 6 will fix the problem.

Exception thrown when calling Jupyter function

Hi,
I am trying to get the cell index in javascript and then using it as a python variable.
I tried execute this line to capture the cell index:

%%node
var cell_index=Jupyter.notebook.get_selected_index()

But I got this message:

Thrown:

This line is working in jupyter (but the problem is that it's javacript variable and not python):

%%javascript
var cell_index=Jupyter.notebook.get_selected_index()

Do you have any solution for this problem?

Working Directory should match Jupyter Python environment

It seems like the working directory for the node process should match the python working directory in jupyter. I think, related to #38, this would also default the npm location to this same working directory which makes sense and mirrors the behavior of the %%script node jupyter cell magic.

I'm happy to put together a PR here if needed. My expertise is definitely more on the node side and less on the jupyter and python side so feel free to correct any misunderstandings I may have here.

Use Nodejs vars in Python

Today, I can use store to save Nodejs vars for use in Python cells, but I can't do the opposite. That is, I can't access Python vars in %%node cells.

Installing Node if it's not there

If pixiedust_node is installed, could it conceivably install Node.js (node/npm) on the target system? If so how?

@DTAIEB said

"yeah, I think we could do a command line download/install of node under PIXIEDUST_HOME directory when we detect it’s not there"

Currently, we assume node is in the PATH so we run

self.ps = subprocess.Popen( ('node', path), stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

This could be changed to:

home = get_ipython().home_dir
self.ps = subprocess.Popen( ('node', path), stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, cwd = home)

This would ensure node's home directory is the same as Pixiedust's.

NPM Modules Installed in Local Directory

npm.install() seems to install modules in the user directory. Is there some way in pixiedust_node to refer to modules installed in the same directory as the notebook file or control the installation location of npm.install()?

TypeError: x.get is not a function

We saw the following error intermittently:

screen shot 2018-02-09 at 11 13 32 am

Sequence:

%%node
// connect to Cloudant using cloudant-quickstart
var cqs = require('cloudant-quickstart');
var cities = cqs('https://...-bluemix.cloudant.com/cities');

[works]

%%node
cities.get('2636749').then(print).catch(print);

[fails sometimes; see screen cap]

%%node
cities.get(['4562407','2636749','3530597']).then(print).catch(print);

Once the error was encountered, cities is rendered unusable (no method calls work anymore)

How do I convert an array of values so that they are correctly understood by `display()`?

I'm new to Jupyter and pixiedust and I'm having a hardtime on my first experiments with it.

I'm looping through an array to recover values by time stamp and put them in a new array to be displayed as a line graph.

        var data = [];
        body.donors.forEach(
            function(donor) {
                var epochDate = new Date(donor.data_envio).setHours(0,0,0,0);
                var obj = {
                    date: new Date(epochDate),
                    valor: Number(donor.valor)
                };
                data.push(obj);
            }
        );
        display(data);

When I do this, the generated chart says "x must be a label or position".

Opening "Options", the date field is shown as "string".

I've tried formating the date field as ISO 8601 but it is still understood as string.

I've found no information/documentation on how to "cast" my data so that pixiedust correctly understands it.

Running npm package in cell like one would do on command line

Is it possible to run a npm package within a cell after it's been installed like you would from the command line? I have installed shp2json with npm.install('shp2json') and would like to then run it with !shp2json /path/to/shapefile > output.json but receive an error saying /bin/sh: 1: shp2json: not found

weird dotted output

a node cell that does only function declaration
is outputting this weird dotted output

Screen Shot 2020-05-05 at 17 32 57

thanks

Cannot run node commands - a byte like object is required

Could not start a node command. Got the message that byte object is required. I installed jupyter using anaconda3. Here is how I fixed it

  • Added encode when writing to stdin
  • When reading from stdout, convert from byte to string
  • Also needed to flush after writing to stdin

Let me know if you need it as a pull request

Does this still work? Deprecation and Uncaught error.

When I try to import this project, I get a deprecation error, which (from what I can see) means this project doesn't work? I tried creating a variable in Python and accessing it in node/JS, and it just says Uncaught :/

image

Thanks!

Will not display any graphs

This is the error I got:

def join_path(self, template, parent):
\n in template()
\nTemplateAssertionError: no filter named 'tojson'\n

x.sum('field').then(console.log) triggers 'int' object has no attribute '__getitem__' error

PD 1.1.7 pd_node 0.2.3

Didn't try other aggregation operators.

%%node
var cqs = require('cloudant-quickstart');
const cities = cqs('https://56953ed8-3fba-4f7e-824e-5498c8e1d18e-bluemix.cloudant.com/cities');
cities.get('2636749').then(print).catch(console.error);
// works
cities.get('2636749').then(console.log).catch(console.error);
// works
cities.sum('population').then(print).catch(print);
2694222973
// fails
cities.sum('population').then(console.log).catch(print);

2694222973
'int' object has no attribute '__getitem__'

Workaround: use print instead of console.log or console.error

Source notebook: https://github.com/ibm-watson-data-lab/nodebook-code-pattern/blob/master/notebooks/nodebook_1.ipynb

print statement that is not compatible with Python 3 in last release

I see from your Python3 milestone that you are working toward Python 3 compatibility and have already merged in many fixes. However, your most recent release (v0.2.0) contains a fairly new line of code that does not work in Python 3. In my notebook I see this right away:

import pixiedust_node
Traceback (most recent call last):

  File "c:\python36\lib\site-packages\IPython\core\interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-1-4377929b0946>", line 1, in <module>
    import pixiedust_node

  File "c:\python36\lib\site-packages\pixiedust_node\__init__.py", line 20, in <module>
    from .node import Node, Npm

  File "c:\python36\lib\site-packages\pixiedust_node\node.py", line 91
    print '!!! Warning: store is now deprecated - Node.js global variables are automatically propagated to Python !!!'
                                                                                                                     ^
SyntaxError: Missing parentheses in call to 'print'

Error running npm.list

When I run npm.list from a %%node cell in a notebook I receive the following error:

CalledProcessError: Command '['npm', 'list', '-s']' returned non-zero exit status 1

Here is the out from npm list -s from the command line:

/Users/markwatson
├── UNMET PEER DEPENDENCY @angular/[email protected]
├── UNMET PEER DEPENDENCY @angular/[email protected]
├── UNMET PEER DEPENDENCY @angular/[email protected]
├── UNMET PEER DEPENDENCY @angular/[email protected]
├── UNMET PEER DEPENDENCY @angular/[email protected]
├── UNMET PEER DEPENDENCY @angular/[email protected]
├── UNMET PEER DEPENDENCY @angular/[email protected]
├── UNMET PEER DEPENDENCY @angular/[email protected]
├── [email protected]
├─┬ [email protected]
│ ├── [email protected]
│ ├─┬ [email protected]
...

Problem with mysql module

Hi!
I'm experiencing trouble trying to use node-mysql within a pixiedust-node project.

The cell setup is the following:

import pixiedust_node
npm.install('mysql')

This works as expected showing the logos of both pixiedust and pixiedust node and then the stats of the module install.
The problem comes with the nodejs part. The code works perfectly in a nodejs project, but not within pixiedust-node:

%%node
var mysql = require('mysql');
var dburl = 'database-url.com';
var con = mysql.createConnection({
  host: dburl,
  user: 'dbuser',
  password: 'dbpassword',
  database: 'dbdatabase'
});
con.connect(function(err) {
    if (err) throw err;
});
var query ='SELECT user_created_date from user LIMIT 100';
con.query(query, function (err, result, fields) {
            if (err) throw err;
            console.log(result);
});
con.end();

I write it down here as a single block, but I've tried it in different cell layouts to find more precisely where the error is located. And it is located in the mysql.createConnection() call:

... ... ... ... ... TypeError: Converting circular structure to JSON
at JSON.stringify (<anonymous>)
at globalVariableChecker (/home/javier/anaconda3/lib/python3.7/site-packages/pixiedust_node/pixiedustNodeRepl.js:26:22)
at REPLServer.writer (/home/javier/anaconda3/lib/python3.7/site-packages/pixiedust_node/pixiedustNodeRepl.js:67:5)
at finish (repl.js:683:38)
at finishExecution (repl.js:310:7)
at REPLServer.defaultEval (repl.js:396:7)
at bound (domain.js:395:14)
at REPLServer.runBound [as eval] (domain.js:408:12)
at REPLServer.onLine (repl.js:639:10)
at REPLServer.emit (events.js:182:13)
/home/javier/anaconda3/lib/python3.7/site-packages/pixiedust_node/pixiedustNodeRepl.js:26
const j = JSON.stringify(r.context[v]);
^
TypeError: Converting circular structure to JSON
at JSON.stringify (<anonymous>)
at Timeout.globalVariableChecker [as _onTimeout] (/home/javier/anaconda3/lib/python3.7/site-packages/pixiedust_node/pixiedustNodeRepl.js:26:22)
at ontimeout (timers.js:436:11)
at tryOnTimeout (timers.js:300:5)
at unrefdHandle (timers.js:520:7)
at Timer.processTimers (timers.js:222:12)

Erratic output behaviour

Sometimes, output from Node cells and/or node functions is cut or not displayed at all.
Short descriptive video (trying to display the result of the help() function) available here.

Wrong working directory used when running node

The node subprocess is launched using the wrong cwd (see https://github.com/ibm-watson-data-lab/pixiedust_node/blob/master/pixiedust_node/node.py#L27)
it should match the npm command which is using the current working directory.

I think we should have both node and npm using the central working directory e.g. PIXIEDUST_HOME/node.

Note: you can get the PIXIEDUST_HOME directory using the Environment class
from pixiedust.utils.environment import Environment
Environment.pixiedustHome

/cc @glynnbird @bradnoble

add requests to install dependencies

When importing pixiedust_node in a notebook, the following error happens:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-9-327dd5855a1c> in <module>()
      8 from scipy.interpolate import interp1d
      9 from skimage.draw import bezier_curve
---> 10 import pixiedust_node

~/dev/readonce/venv/lib/python3.6/site-packages/pixiedust_node/__init__.py in <module>()
     19 from IPython.core.error import TryNext
     20 import warnings
---> 21 from .node import Node, Npm
     22 import os
     23 from pixiedust.utils.shellAccess import ShellAccess

~/dev/readonce/venv/lib/python3.6/site-packages/pixiedust_node/node.py in <module>()
     11 import IPython
     12 import pandas
---> 13 from pixiedust.display import display
     14 from pixiedust.utils.environment import Environment
     15 from pixiedust.utils.shellAccess import ShellAccess

~/dev/readonce/venv/lib/python3.6/site-packages/pixiedust/__init__.py in <module>()
     29 
     30     #shortcut to logging
---> 31     import pixiedust.utils.pdLogging as pdLogging
     32     logger = pdLogging.getPixiedustLogger()
     33     getLogger = pdLogging.getLogger

~/dev/readonce/venv/lib/python3.6/site-packages/pixiedust/utils/__init__.py in <module>()
     16 
     17 import os
---> 18 from . import storage
     19 import pkg_resources
     20 import binascii

~/dev/readonce/venv/lib/python3.6/site-packages/pixiedust/utils/storage.py in <module>()
     26 from pkg_resources import get_distribution
     27 from re import search
---> 28 from requests import post
     29 from os import environ as env
     30 from pixiedust.utils.printEx import *

So the requests package should probably be explicitely listed in install_requires.

TypeError: Converting circular structure to JSON at JSON.stringify

I'm attempting to use the node-postgres library from pixiedust_node. The following simple setup fails:

import pixiedust_node
npm.install(('node-fetch', 'pg')) 
var { Pool } = require('pg');
var pool = new Pool({
  user: 'congress23',
  host: 'localhost',
  database: 'vol_congress23',
  password: '',
  port: 5431,
});
pool.query('SELECT NOW()', (err, res) => {
    console.log(err,res);
});

Oddly, the error occurs even when trying to console.log("it works") instead of the results, so I'm not even sure where/what is causing the circular reference error. The same code executes just fine when run directly from node. Here's the stacktrace:

TypeError: Converting circular structure to JSON
at JSON.stringify (<anonymous>)
at globalVariableChecker (/anaconda3/lib/python3.6/site-packages/pixiedust_node/pixiedustNodeRepl.js:26:22)
at REPLServer.writer (/anaconda3/lib/python3.6/site-packages/pixiedust_node/pixiedustNodeRepl.js:67:5)
at finish (repl.js:512:38)
at REPLServer.defaultEval (repl.js:279:5)
at bound (domain.js:301:14)
at REPLServer.runBound [as eval] (domain.js:314:12)
at REPLServer.onLine (repl.js:468:10)
at emitOne (events.js:116:13)
at REPLServer.emit (events.js:211:7)
er
/anaconda3/lib/python3.6/site-packages/pixiedust_node/pixiedustNodeRepl.js:26
const j = JSON.stringify(r.context[v]);
^
TypeError: Converting circular structure to JSON
at JSON.stringify (<anonymous>)
at Timeout.globalVariableChecker [as _onTimeout] (/anaconda3/lib/python3.6/site-packages/pixiedust_node/pixiedustNodeRepl.js:26:22)
at ontimeout (timers.js:482:11)
at Timer.unrefdHandle (timers.js:595:5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.