Giter Site home page Giter Site logo

split-apply-combine's Introduction

split-apply-combine

This is a sample implementation of ddply and some other R type stuff that I did for my June 2013 presentation to the Bay Area Clojure Users Group.

This is not meant to be a complete implementation, but rather as a demonstration of the Split-Apply-Combine concepts proposed by Hadley Wickham in his paper The Split-Apply-Combine Strategy for Data Analysis (Journal of Statistical Software, April 2011, Volume 40, Issue 1) and implemented in the plyr library. See http://plyr.had.co.nz for more information on the plyr project.

There is API documentation for these functions at http://tomfaulhaber.github.io/split-apply-combine.

Components

These are the files and what's in each of them:

In the root directory:

  • script.clj has the set of commands that I ran interactively to construct the results we saw during the live demo. I also used nrow, head, and frequencies to do some ad hoc exploration of the data along the way.

In src/split-apply-combine:

  • core.clj has transform, transform*, and colwise with their supporting functions. There are also a few other generally useful functions.
  • ply.clj has the implementation of ddply, ddply*, d_ply, and d_ply* and their various supporting functions.
  • cpu.clj has routines for defining, loading an manipulating the CPU load data I showed during the demo. (Note the only part of this we used during the demo was the cpu-files data which had the list of data files.)
  • stock.clj has routines for loading and manipulating the stock data I spoke about. This data was all pulled from yahoo finance.

Data files (in data/):

  • tech-stocks.csv is some simple stock data that include the data on the slides (Amazon, IBM, and Microsoft from the first four months of 2013). Read it with the read-saved-data function. It's easy to pull your own data from yahoo finance, just use the load-yahoo-data function.
  • scdb_agent_*.cpu is the set of files with CPU load data that we used during the demo portion of the presentation. Thanks to SpaceCurve, Inc. (www.spacecurve.com) for permission to use this data.

License

Copyright © 2013 Tom Faulhaber

Distributed under the Eclipse Public License, the same as Clojure.

split-apply-combine's People

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.