Giter Site home page Giter Site logo

andresmor-ms / data-wrangling-components Goto Github PK

View Code? Open in Web Editor NEW

This project forked from microsoft/datashaper

0.0 0.0 0.0 371.55 MB

Processing engine and React components for constructing configuration-based data transformation and processing pipelines.

License: MIT License

JavaScript 0.72% Shell 0.01% TypeScript 90.47% EJS 0.05% Dockerfile 0.04% Python 7.91% Jupyter Notebook 0.54% CSS 0.27%

data-wrangling-components's Introduction

datashaper

This project provides a collection of web components for doing lightweight data wrangling.

There are four goals of the project:

  1. Create a shareable client/server schema for serialized wrangling instructions
  2. Maintain an implementation of a basic client-side wrangling engine (largely based on Arquero)
  3. Maintain a python implementation using common wrangling libraries (e.g., pandas) for backend or data science deployments
  4. Provide some reusable React components so wrangling operations can be incorporated into webapps easily.

The first goal is nascent, and currently covered by TypeScript typings in the core javascript package. However, our intent is to eventually extract a JSONSchema specification that is more readily consumable by cross-platform services. In addition, our API largely mirrors Arquero's for now; we'll review for areas of parameter commonality and make some generalizations in the future.

Individual documentation for the JavaScript and Python implementations can be found in their respective folders. Broad documentation about building pipelines and the available verbs is available in the docs folder

We currently have four packages:

  • core - this is the primary engine for pipeline execution. It includes low-level operational primitives to execute a wide variety of relational algebra transformations over Arquero tables. The pipeline is essentially an implementation of async chain-of-command, executing verbs serially based on an input table context and set of step configurations.
  • react - this is a set of React components for each verb that you can include in web apps that enable tranformation pipeline building.
  • utilities - this is a set of helpers for working with files, etc., to ease building data wrangling applications.
  • webapp - this is an example/test webapp that includes all of the verb components and allows creation, execution, and saving of pipeline JSON files.

Building

  • You need node and yarn installed
  • Run: yarn
  • Then: yarn build
  • Run the webapp locally: yarn start

Usage

The webapp uses both the core engine and React components to build a small application that demonstrates how to use the wrangling components. At a basic level, you need a set of input tables, which you place in a TableStore (basically a chain execution context). You add wrangling steps to a Pipeline, then run it to generate an output table.

Tables in the store are referenced by key. Steps can create any number of output tables that are also written to the store. Future steps can therefore build upon previous/intermediate outputs however you'd like. See the every-operation.json example for a sample of every verb we currently support.

Example joining two tables:

    import { table } from 'arquero'
    import { createTableStore, createPipeline } from '@datashaper/core'

    // id   name
    // 1    bob
    // 2    joe
    // 3    jane
    const parents = table({
        id: [1, 2, 3],
        name: ['bob', 'joe', 'jane']
    })

    // id   kid
    // 1    billy
    // 1    jill
    // 2    kaden
    // 2    kyle
    // 3    moe
    const kids = table({
        id: [1, 1, 2, 2, 3],
        kid: ['billy', 'jill', 'kaden', 'kyle', 'moe]
    })

    const store = createTableStore()

    store.set({
        id: 'parents',
        table: parents
    })
    store.set({
        id: 'kids',
        table: kids
    })

	const pipeline = createPipeline(store)

    pipeline.add({
        verb: 'join',
        input: 'parents',
        output: 'output',
        args: {
            other: 'kids',
            on: ['id']
        }
    })

    // id   name    kid
    // 1    bob     billy
    // 1    bob     jill
    // 2    joe     kaden
    // 2    joe     kyle
    // 3    jane    moe
    const result = await pipeline.run()

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

data-wrangling-components's People

Contributors

andresmor-ms avatar darthtrevino avatar dayesouza avatar dreness avatar dworthen avatar gaudyb avatar gbm2494 avatar microsoft-github-operations[bot] avatar microsoftopensource avatar monik182 avatar natoverse avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.