Giter Site home page Giter Site logo

spark-notebook / spark-notebook Goto Github PK

View Code? Open in Web Editor NEW
3.1K 190.0 654.0 16.15 MB

Interactive and Reactive Data Science using Scala and Spark.

License: Apache License 2.0

CoffeeScript 0.82% CSS 1.01% HTML 2.65% Shell 0.06% Scala 10.73% Java 0.01% JavaScript 54.20% Makefile 0.02% Jupyter Notebook 29.34% Less 1.16%
apache-spark notebook scala data-science spark reactive

spark-notebook's Issues

Open notebooks from any folder

In Ipython, the command ipython notebook can be called from any directory, launching a web service with all the .ipynb in the current directory and subdirs.

It would be useful to mantain the same approach, to store the notebooks within the projects / git repos / etc.

Integrate with bokeh

This works slightly started here.

It needs to be continued here, it's a great move and idea! This would ease the integration of drawing capabilities, at least a first and complete one.

Other ideas will be nvd3, c3.js and so on.

Download notebook as code

Add a menu item to download the code part as a scala file. Maybe consider markdown as comments, :cp and :dp will be more complicated and could be left apart at first

Cursor size glitch

It happens in lines # > 1. The cursor is spanning more than one line in height, this is distracting me so much and is very uncomfortable.

Change default port 9000

Hey guys, I hope this is the right spot to ask questions (and possibly raise an issue).

I would like to run spark-notebook on a server, also used as namenode of an Hadoop Cluster (for a proof-of-concept).
When starting spark-notebook, it produces an error:
"org.jboss.netty.channel.ChannelException: Failed to bind to: /0.0.0.0:9000"
Most likely, it cannot bind to that port, since Hadoop Filesystem is already bind to port 9000.

Is there a way to change the port of spark-notebook?

Best regards,
Benjamin

Dependencies are not updated

Executing "Update classpath and Spark's jars", it seems the dependencies are not updated as documented.

At the end of the execution, the key spark.jars is still empty.

Also, trying to add mllib with the following code doesn't have any effect:

:dp org.apache.spark  % "spark-mllib_2.10" % 1.2.0

Executing spark while "play run" is failing

This is a tricky one (again), because looks like it's only happening when running in Dev mode (like run).

Hence, it's still hard to debug or develop things involving spark execution thus (Spark, SparkInfo, Sql, and c°)... Which is sad.

So the symptom is that there is a mismatch while writing/reading Tasks by the closure serializer (as far as the current investigation discovered). The problem occurs for scala.Option, scala.None$. The serialiazed (Java) is reporting bad serial uid, but the show ids (see below) are for Option and None.type → that's normal. Although, the written bytes should only refer the characters None$ and never Option.

No idea, how this happens now, however, I guess that the Option field in Task is metrics.

[error] o.a.s.e.Executor - Exception in task 1.0 in stage 0.0 (TID 1)
java.io.InvalidClassException: scala.Option; local class incompatible: stream classdesc serialVersionUID = -2062608324514658839, local class serialVersionUID = 5081326844987135632
    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617) ~[na:1.7.0_72]
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) ~[na:1.7.0_72]
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) ~[na:1.7.0_72]
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) ~[na:1.7.0_72]
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) ~[na:1.7.0_72]
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) ~[na:1.7.0_72]
[error] o.a.s.s.TaskSetManager - Task 7 in stage 0.0 failed 1 times; aborting job

Forked process won't work using `play run`

This is mainly a play (sbt) problem actually, it seems that it interferes with the classpath construction and loading, hence the classes are not found in the forked process.

Some stuffs have been tried so far, like this, however if now the process can find the classes, it fails weirdly at runtime:
cannot find function f in StringContext.

The thing is that f is actually a macro and should have been injected in the class' bytecode (AFAIK) hence it should be resolved.

A clash in the scala version is one of the potential problem, but not sure.

Run notebook on mesos

"Executor Spark home spark.mesos.executor.home is not set!" even if spark.executor.uri is correctly set

Header glitch -- disappearing oO

Sometimes, after runs or things like these (even just at first load), the header is not visible and screws the UI.

It requires a click somewhere around the banner. This is a classical symptom of bad dom/css manip.

Sanitize the themes

LSS: lots of css imports...

Longer:
Quite some different themes are used, like jquery-ui, bootstrap.
Also libs like bokeh are defining clashing class names (with jquery-ui AFAIK) and are either shipping or relying on certain css libs (jquerui / boostrap?)

Using :cp looses env

Since using :cp will restart the REPL the env (variables mainly created from the history) are lost.

Solution is to ask a new REPL by passing the history of the previous one...

SparkR integration

Need a new context (:r?)

Need to see how to integrate and use SparkR from scala.

Create a print css

When printing only a part of the notebook is printed, the rest is actually hidden

Dockerized Container

Would be nice to spin it up in a VM or a container, on kubernetes or Mesos.
I've started working on it, but it's a bit of a hack for the command.
Security, easy url, and sbt are a little painful;
I'l send a Pr when I can.

Enable interpolation in Markup block

It's painful to use <h2></h2> code in code blocks because we need to render a local variable.

The idea here is to make the local variables accessible to the markup renderer.

no response to statement execution

I am testing using the "Simple spark" example . When executing any command there is no response.

terminal-log

Embedded server listening at
http://0.0.0.0:8899
Press any key to stop.
log4j:ERROR Could not find value for key log4j.appender.console
log4j:ERROR Could not instantiate appender named "console".
log4j:WARN No appenders could be found for logger (notebook.kernel.pfork.BetterFork$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[WARN] [12/20/2014 16:32:12.241] [main] [EventStream(akka://Remote)] [akka.event-handlers] config is deprecated, use [akka.loggers]

Git Clone on Windows machine

error: unable to create file Spark+on+Mesos+using+C*.snb (Invalid argument)

probbaly some irregular char in the file name.

Can I submit a PR ?

PySpark integration

Need a new context (:py?)

Need to see how to integrate and use PySpark from scala.

Hadoop 2.3 support

A version with Spark 1.1 (or 1.2) and Hadoop 2.3 would be appreciated.

Allow Inputs to remain visible after execution

Input blocks in the notebook are always hidden after "successful" execution → which means no exception.
Which is sometimes a bit awkward because the result might be a Try in fail state, or simply we want to keep seeing the input code.

What could be done is to add a menu entry plus shortcut for this new mode where inputs are never hidden (à la ALT+A)

adam usage

I see that you added info about configuration spark-notebook with different things. I think guide about configuration with ADAM will also be useful

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.