Giter Site home page Giter Site logo

DistributedDataFrame about dataframes.jl HOT 9 CLOSED

juliadata avatar juliadata commented on August 20, 2024
DistributedDataFrame

from dataframes.jl.

Comments (9)

ViralBShah avatar ViralBShah commented on August 20, 2024

Cc: @tanmaykm

from dataframes.jl.

tanmaykm avatar tanmaykm commented on August 20, 2024

I was thinking of the following model for this:

  • New objects: DistributedDataVector, DistributedDataFrame, DistributedGroupedDataFrame
  • DistributedDataVector: simplified 1D darray
  • DistributedDataFrame: collection of remote refs to DataFrames / DistributedDataVectors
  • DistributedGroupedDataFrame: collection of DistributedDataFrames
  • To start with we can have an interface to read from files with rows split by row numbers. Can use mmap to map different portions of a single large file.
  • Implement most operations on DataFrame and DataVector as defined in operators.jl
  • Have convert methods to get all parts of a DistributedDataVector / DistributedDataFrame locally as DataVector / DataFrame.
  • All operations implemented using pmap / pmapreduce underneath

I can have a go at this if it sounds good. Would be glad to have any comments/discussions before I start.

from dataframes.jl.

ViralBShah avatar ViralBShah commented on August 20, 2024

I wonder if a design is possible without using Distributed prefixes for everything. Right now, I can't think of an alternative.

from dataframes.jl.

johnmyleswhite avatar johnmyleswhite commented on August 20, 2024

This would be great. I'd suggest just building the data structures up first since I think we'll see what needs to be handled there as we go on.

As you go, please let us know when you find functions that should be defined in terms of AbstractDataFrame so that DataFrame and DistributedDataFrame are handled together.

from dataframes.jl.

tanmaykm avatar tanmaykm commented on August 20, 2024

I thought of having just the DistributedDataVector (with an abstract DataVector), but was not sure if that would be enough to handle all functionalities.

from dataframes.jl.

johnmyleswhite avatar johnmyleswhite commented on August 20, 2024

I think we really want the rows of a DataFrame to be distributed, rather than the columns.

from dataframes.jl.

tanmaykm avatar tanmaykm commented on August 20, 2024

Yes, I agree.

I was wondering if it would it be any better to implement the DistributedDataFrame as a collection of DistributedDataVectors rather than a collection of remote DataFrames.

from dataframes.jl.

johnmyleswhite avatar johnmyleswhite commented on August 20, 2024

Oh, definitely. So long as you can guarantee that each of the vectors is split in the same way, I think we should follow the existing definition of DataFrame and define things in terms of columns.

from dataframes.jl.

garborg avatar garborg commented on August 20, 2024

See prototypes branch for early implementation / inspiration.

from dataframes.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.