Giter Site home page Giter Site logo

How to use it Sparta.jl ? about spark.jl HOT 10 CLOSED

dfdx avatar dfdx commented on August 20, 2024
How to use it Sparta.jl ?

from spark.jl.

Comments (10)

dfdx avatar dfdx commented on August 20, 2024

Spark.jl is currently in a very early stage and supports only very basic operations like map, map_partitions, reduce and collect, and works only with byte arrays. So right now for any serious production or research code I would suggest using PySpark or SparkR depending on your preferences. Nevertheless, you can leave your requests for features that are most important for you so that I could set up priorities.

However, if you are brave enough to try Sparta.jl, here's a short instruction to get you started:

  1. Clone the project: from Julia REPL, run Pkg.clone("https://github.com/dfdx/Sparta.jl").

  2. Build Java part of Sparta.jl: go to <sparta-root>/jvm/sparta and run mvn clean package (assuming you have Maven installed on your system) .

  3. Write and run sample program, e.g. such one:

    using Sparta
    sc = SparkContext()
    rdd = text_file(sc)
    collect(rdd)

Feel free to ask any questions, submit issues or just blame me for not working functionality :)

from spark.jl.

dfdx avatar dfdx commented on August 20, 2024

Also note that Spark and HDFS are best suited for working with big data. If you just need heavy computations on relatively small data, consider using Julia's native ClusterManager and all great tools for parallel computing in Julia.

from spark.jl.

gaganworks avatar gaganworks commented on August 20, 2024

Followed the steps mentioned to install. mvn clean package was successful with the following last few lines:

[INFO] Attaching shaded artifact.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 21:16 min
[INFO] Finished at: 2015-08-29T00:37:30+05:30
[INFO] Final Memory: 76M/755M
[INFO] ------------------------------------------------------------------------

But sc = SparkContext() fails as in below:

julia> using Sparta

julia> sc = SparkContext()
ERROR: SparkContext not defined

from spark.jl.

dfdx avatar dfdx commented on August 20, 2024

Have you pulled latest master? I pushed some fixes including exporting SparkContext recently, so you could miss it. Just go to ~/.julia/v0.3/Sparta/ and run git pull origin master.

from spark.jl.

gaganworks avatar gaganworks commented on August 20, 2024

Yes I pulled the latest master and re-complied using "mvn clean package". Apart from Sparta package do I need to install some more packages which Sparta is in-turn using ? I suspect this is the case. Pls see below:

julia> using IJulia

julia> using BinDeps

julia> using Sparta
ERROR: Docile not found
in require at loading.jl:47
in include at boot.jl:245
in include_from_node1 at loading.jl:128
in include at boot.jl:245
in include_from_node1 at loading.jl:128
in reload_path at loading.jl:152
in _require at loading.jl:67
in require at loading.jl:51
while loading C:\Users\gagands.julia\v0.3\Sparta\src\core.jl, in expression starting on line 2
while loading C:\Users\gagands.julia\v0.3\Sparta\src\Sparta.jl, in expression starting on line 15

julia> using Docile
ERROR: Docile not found
in require at loading.jl:47

julia>

from spark.jl.

dfdx avatar dfdx commented on August 20, 2024

Correct, Sparta additionally uses Docile, JavaCall and Iterators packages. I pushed fix to master, but for you it should be easier just to add these packages manually:

Pkg.add("Docile")
Pkg.add("JavaCall")
Pkg.add("Iterators")

Since this package is still in a very early stage, most likely you will get many more issues. If you could describe what you are working on, I would try to adjust current code for your immediate needs and maybe develop parts that are most critical for your use case.

from spark.jl.

gaganworks avatar gaganworks commented on August 20, 2024

Hi, Thanks and apologies for a late reply. I was travelling. I am working on implementing Monte Carlo analytics using Julia working on Spark nodes. I wish to use the compute nodes exposed by Spark for Parallel Monte Carlo Analytics. Will clone the latest master of Sparta and do some research on how this can be done. Thanks.

from spark.jl.

dfdx avatar dfdx commented on August 20, 2024

I'm closing this issue as README now has instructions for building and running the project.

from spark.jl.

PallHaraldsson avatar PallHaraldsson commented on August 20, 2024

Googling for "Sparta.jl" I only found this. Just to be sure, is that this project renamed? Maybe README should say.

from spark.jl.

dfdx avatar dfdx commented on August 20, 2024

Yes, this repository had the name "Sparta.jl" originally, but was renamed to "Spark.jl" before the first registered version. I wonder why you were looking for "Sparta.jl"? That name is really old, so it might be better to update references in the source you used.

from spark.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.