spark-notebook / spark-notebook Goto Github PK
View Code? Open in Web Editor NEWInteractive and Reactive Data Science using Scala and Spark.
License: Apache License 2.0
Interactive and Reactive Data Science using Scala and Spark.
License: Apache License 2.0
ul
, li
, table
, etc (some are already in package.scala)
In Ipython, the command ipython notebook
can be called from any directory, launching a web service with all the .ipynb in the current directory and subdirs.
It would be useful to mantain the same approach, to store the notebooks within the projects / git repos / etc.
This works slightly started here.
It needs to be continued here, it's a great move and idea! This would ease the integration of drawing capabilities, at least a first and complete one.
Other ideas will be nvd3, c3.js and so on.
Widget that would be
File browser,
HDFS browser,
Tachyon browser
...
Add a new context :sh
like :sql
, :cp
.
We can use scala.sys.process
for execute them.
When using the assembly build using java -jar
some problems happen that wouldn't in sbt.
The symptom is that some files cannot be found in the assembly by unfiltered (or netty?)
Form
is loosing provided data. Although it could be great to store this information somewhere (added as Json somewhere in the block on the server side?).
Add a menu item to download the code part as a scala file. Maybe consider markdown as comments, :cp
and :dp
will be more complicated and could be left apart at first
Show the rows in the RDD in a table.
Even cooler would be to have a R's summary
functionality on it.
It happens in lines # > 1. The cursor is spanning more than one line in height, this is distracting me so much and is very uncomfortable.
This line fails when executed before IPython init → hence it breaks observable mechanism, which is almost everything interesting...
Hey guys, I hope this is the right spot to ask questions (and possibly raise an issue).
I would like to run spark-notebook on a server, also used as namenode of an Hadoop Cluster (for a proof-of-concept).
When starting spark-notebook, it produces an error:
"org.jboss.netty.channel.ChannelException: Failed to bind to: /0.0.0.0:9000"
Most likely, it cannot bind to that port, since Hadoop Filesystem is already bind to port 9000.
Is there a way to change the port of spark-notebook?
Best regards,
Benjamin
Executing "Update classpath and Spark's jars", it seems the dependencies are not updated as documented.
At the end of the execution, the key spark.jars
is still empty.
Also, trying to add mllib with the following code doesn't have any effect:
:dp org.apache.spark % "spark-mllib_2.10" % 1.2.0
This is a tricky one (again), because looks like it's only happening when running in Dev mode (like run).
Hence, it's still hard to debug or develop things involving spark execution thus (Spark, SparkInfo, Sql, and c°)... Which is sad.
So the symptom is that there is a mismatch while writing/reading Tasks
by the closure serializer (as far as the current investigation discovered). The problem occurs for scala.Option
, scala.None$
. The serialiazed (Java) is reporting bad serial uid, but the show ids (see below) are for Option and None.type → that's normal. Although, the written bytes should only refer the characters None$
and never Option
.
No idea, how this happens now, however, I guess that the Option
field in Task
is metrics
.
[error] o.a.s.e.Executor - Exception in task 1.0 in stage 0.0 (TID 1)
java.io.InvalidClassException: scala.Option; local class incompatible: stream classdesc serialVersionUID = -2062608324514658839, local class serialVersionUID = 5081326844987135632
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617) ~[na:1.7.0_72]
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) ~[na:1.7.0_72]
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) ~[na:1.7.0_72]
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) ~[na:1.7.0_72]
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) ~[na:1.7.0_72]
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) ~[na:1.7.0_72]
[error] o.a.s.s.TaskSetManager - Task 7 in stage 0.0 failed 1 times; aborting job
Plug some stuffs in Deps.scala
Clicking the S3 link from: https://github.com/andypetrella/spark-notebook/releases
Is that expected, instead of a directory hierarchy?
This is mainly a play
(sbt
) problem actually, it seems that it interferes with the classpath construction and loading, hence the classes are not found in the forked process.
Some stuffs have been tried so far, like this, however if now the process can find the classes, it fails weirdly at runtime:
cannot find function f
in StringContext
.
The thing is that f
is actually a macro and should have been injected in the class' bytecode (AFAIK) hence it should be resolved.
A clash in the scala version is one of the potential problem, but not sure.
"Executor Spark home spark.mesos.executor.home
is not set!" even if spark.executor.uri
is correctly set
The switch to the new IPython UI broke it
This is very annoying... need to play with →
and ←
and end
to get to the first character of a block...
I get "Failed to initialize compiler: object scala.runtime in compiler mirror not found" errors.
What I did is just ran
sbt run
And tried simplest example
Painful to have to download the deps then still have to create the according :cp block.
So resolve should have a parameter that will update the classpath in one go.
It'll need this: https://issues.apache.org/jira/browse/SPARK-4923
But also, akka
has been bumped to a new version → switch to Play 2.3 should be okay now (hence some adaptations in the websocket would be nice, like using actors right away).
when page is closed or reloaded, the remote process should be shut down.
that would be a great show up
/cc @lossyrob ^^
Sometimes, after runs or things like these (even just at first load), the header is not visible and screws the UI.
It requires a click somewhere around the banner. This is a classical symptom of bad dom/css manip.
docker installation
data table, :sh, hadoop and version builds
so on and so forth
LSS: lots of css imports...
Longer:
Quite some different themes are used, like jquery-ui, bootstrap.
Also libs like bokeh are defining clashing class names (with jquery-ui AFAIK) and are either shipping or relying on certain css libs (jquerui / boostrap?)
Since using :cp
will restart the REPL
the env (variables mainly created from the history) are lost.
Solution is to ask a new REPL
by passing the history of the previous one...
The current is very basic and can be found here.
In the near future such thingy should be even easier to have → apache/spark#3009
Need a new context (:r
?)
Need to see how to integrate and use SparkR from scala.
Plug some stuffs in Deps.scala.
Ref(?): http://site.kuali.org/maven/wagons/maven-s3-wagon/1.1.20/
would be so fun, mostly if it has an evolving graph
When printing only a part of the notebook is printed, the rest is actually hidden
Would be nice to spin it up in a VM or a container, on kubernetes or Mesos.
I've started working on it, but it's a bit of a hack for the command.
Security, easy url, and sbt are a little painful;
I'l send a Pr when I can.
It's painful to use <h2></h2>
code in code blocks because we need to render a local variable.
The idea here is to make the local variables accessible to the markup renderer.
And update the main page to choose which SparkContext to use when a notebook is opened.
Within a notebook, allow Spark to choose one of them.
I am testing using the "Simple spark" example . When executing any command there is no response.
terminal-log
Embedded server listening at
http://0.0.0.0:8899
Press any key to stop.
log4j:ERROR Could not find value for key log4j.appender.console
log4j:ERROR Could not instantiate appender named "console".
log4j:WARN No appenders could be found for logger (notebook.kernel.pfork.BetterFork$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[WARN] [12/20/2014 16:32:12.241] [main] [EventStream(akka://Remote)] [akka.event-handlers] config is deprecated, use [akka.loggers]
error: unable to create file Spark+on+Mesos+using+C*.snb (Invalid argument)
probbaly some irregular char in the file name.
Can I submit a PR ?
Need a new context (:py
?)
Need to see how to integrate and use PySpark from scala.
A version with Spark 1.1 (or 1.2) and Hadoop 2.3 would be appreciated.
Input blocks in the notebook are always hidden after "successful" execution → which means no exception.
Which is sometimes a bit awkward because the result might be a Try
in fail state, or simply we want to keep seeing the input code.
What could be done is to add a menu entry plus shortcut for this new mode where inputs are never hidden (à la ALT+A
)
The cluster list can be used to list the spark clusters that can be used to preconfigure notebooks
Deb, ...
I see that you added info about configuration spark-notebook with different things. I think guide about configuration with ADAM will also be useful
ScalaJS is much more convenient for interacting with Javascript then Playground (just like at http://www.scala-js-fiddle.com/ it is easy to use). And one can also write visualization code with scalajs.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.