Giter Site home page Giter Site logo

yonglehou / thunder Goto Github PK

View Code? Open in Web Editor NEW

This project forked from thunder-project/thunder

0.0 1.0 0.0 71.08 MB

Large-scale image and time series analysis with Spark

Home Page: http://thunder-project.org

License: Apache License 2.0

Python 99.75% Shell 0.25%

thunder's Introduction

Thunder

Large-scale image and time series analysis with Spark - project page

Latest Version Build Status Gitter

About

Thunder is a library for analyzing large-scale spatial and temporal data. It's fast to run, easy to extend, and designed for interactivity. It is built on Spark, a new framework for cluster computing.

Thunder includes utilities for loading and saving different formats, classes for working with distributed spatial and temporal data, and modular functions for time series analysis, factorization, and model fitting. Analyses can easily be scripted or combined. It is written against Spark's Python API (Pyspark), making use of scipy, numpy, and scikit-learn.

Documentation

This README contains basic info on installation and usage and how to get help. See the complete documentation for more details, tutorials, and API references. We also maintain separate development documentation for reference if you are running on Thunder's master branch.

Quick start

Thunder is designed to run on a cluster, but local testing is a great way to learn and develop. Many computers can install it with just a few simple steps. If you aren't currently using Python for scientific computing, Anaconda is highly recommended.

  1. Download the latest "pre-built for Hadoop 1.x" version of Spark and set an environmental variable

    export SPARK_HOME=/your/path/to/spark

  2. Install Thunder

    pip install thunder-python

  3. Start Thunder from the terminal

    thunder

    from thunder import ICA data = tsc.makeExample("ica") model = ICA(c=2).fit(data)

To run in iPython, just set this environmental variable before staring:

export IPYTHON=1

To run analyses as standalone jobs, use the submit script

thunder-submit <analysis name or script file> <datadirectory> <outputdirectory> <opts>

We also include a script for launching an Amazon EC2 cluster with Thunder preinstalled

thunder-ec2 -k mykey -i mykey.pem -s <number-of-nodes> launch <cluster-name>

Analyses

Thunder currently includes two primary data types for distributed spatial and temporal data, and five main analysis packages: classification (decoding), clustering, factorization, image processing, and regression. It also provides an entry point for loading and converting a variety of raw data formats, and utilities for exporting or visually inspecting results. Scripts can be used to run standalone analyses, but the underlying classes and functions can be used from within the PySpark shell or an iPython notebook for easy interactive analysis.

Input and output

The primary data types in Thunder โ€” Images and Series โ€” can each be loaded from a variety of raw input formats, including text or flat binary files (for Series) and binary, tifs, or pngs (for Images). Files can be stored locally, on a networked file system, on Amazon's S3, on Google Storage, or in HDFS. Where needed, metadata (e.g. model parameters) can be provided as numpy arrays or loaded from JSON or MAT files. Results can be visualized directly from the python shell or in iPython notebook using matplotlib, seaborn, or a new interactive visualization library we are developing called lightning

Help

We maintain a chatroom on gitter. You can also post questions or ideas to the mailing list. If you find a reproducible bug, submit an issue. If posting an issue, please provide information about your environment (e.g. local usage or EC2, operating system) and instructions for reproducing the error.

Contributions

Thunder is a community effort, and thus far features contributions from the following individuals:

Andrew Osheroff, Ben Poole, Chris Stock, Davis Bennett, Jascha Swisher, Jason Wittenbach, Jeremy Freeman, Josh Rosen, Kunal Lillaney, Logan Grosenick, Matt Conlen, Michael Broxton, Noah Young, Ognen Duzlevski, Richard Hofer, Owen Kahn, Ted Fujimoto, Tom Sainsbury, Uri Laseron

If you have ideas or want to contribute, submit an issue or pull request, or reach out to us on gitter, twitter, or the mailing list.

thunder's People

Contributors

freeman-lab avatar industrial-sloth avatar jwittenbach avatar broxtronix avatar kunallillaney avatar andrewosh avatar mathisonian avatar andrewgiessel avatar rhofour avatar joshrosen avatar poolio avatar tomsains avatar tcfuji avatar andrewlew1s avatar lgrosenick avatar ognend avatar gitter-badger avatar laserson avatar vjlbym avatar d-v-b avatar nerduno avatar waffle-iron avatar okahn avatar bald6354 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.