ognis1205 / spark-tda Goto Github PK

View Code? Open in Web Editor NEW

47.0 5.0 5.0 32.2 MB

SparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.

License: Apache License 2.0

Scala 98.10% Python 1.90%

topological-data-analysis tda spark apache-spark machine-learning ml mllib

spark-tda's Introduction

SparkTDA

The scalable topological data analysis package for Apache Spark. This project aims to implement the following features:

Scalable Mapper Implemented as Reeb Diagrams, i.e., Reeb Cosheaves
Scalable Mapper Implementation
Scalable Multiscale Mapper Implementation
Scalable Tower Computation for Multiscale Mapper
Scalable Persistent Homology Computation on Top of Apache Spark

If you would like to know how to use and/or learn more the implementation details of the above mentioned features, please follow the links.

Status

WIP and EXPERIMENTAL. This package is still a proof-of-concept of scalable topological data analysis support for Apache Spark, hence you cannot expect that this package is ready for production use.

Examples

Mapper

2-skeltons of Reeb Diagram of MNIST (40 intervals on the 1st primcipal component with 50% overlap)	2-skeltons of Reeb Diagram of MNIST (20 intervals on the 1st primcipal component with 50% overlap)
60k images clustered in 784 dimensions without any projection loss	60k images clustered in 784 dimensions witout any projection loss

Requirements

This library requires Spark 2.0+

Building and Running Unit Tests

To compile this project, run sbt package from the project home directory. This will also run the Scala unit tests. To run the unit tests, run sbt test from the project home directory. This project uses the sbt-spark-package plugin, which provides the 'spPublish' and 'spPublishLocal' task. We recommend users to use this library with Apache Spark including the dependencies by supplying a comma-delimited list of Maven coordinates with --packages and download the package from the locally repository or official Spark Packages repository.

The package can be published locally with:

$ sbt spPublishLocal

The package can be published to Spark Packages with (requires authentication and authorization):

$ sbt spPublish

Using with Spark Shell

This package can be added to Spark using the --packages command line option. For example, to include it when starting the spark shell:

$ spark-shell --packages ognis1205:spark-tda:0.0.1-SNAPSHOT-spark2.2-s_2.11

Future Works

Mapper

Write Wiki
Implement Python APIs
Publish to Spark Packages
Benchmark
Consider using GraphFrames instead of plain GraphX
Implement some useful filter functions, e.g., Gaussian Density, Graph Laplacian, etc as transformers

Related Softwares & Projects

References

Mapper

KNN/ANN/SNN

LSH

M. S. Charikar (2002). Similarity Estimation Techniques from Rounding Algorithms, 34th STOC, 2002.

spark-tda's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger giserh nirvanesque wenjackie hulalazz

spark-tda's Issues

Use Docker

Is your feature request related to a problem? Please describe.

Nope!

Describe the solution you'd like

Use Docker for making handy and easy-to-use package.

Additional context

N/A

Estimate buffer width for each cover intervals.

Is your feature request related to a problem? Please describe.

This is not related to any problems. For now, buffer width for VP tree search is estimated for
entire data. But in practical datasets, preferable buffer width is computed for each intervals of
cover. VP tree construction is designed to handle data skew, this implies that the buffer width
would be change for different intervals.

Describe the solution you'd like

In ReebDiagram estimator, buffer width estimation will be done for each intervals.

Additional context

N/A

ognis1205 / spark-tda Goto Github PK

spark-tda's Introduction

SparkTDA

Status

Examples

Mapper

Requirements

Building and Running Unit Tests

The package can be published locally with:

The package can be published to Spark Packages with (requires authentication and authorization):

Using with Spark Shell

Future Works

Mapper

Related Softwares & Projects

References

Mapper

KNN/ANN/SNN

LSH

spark-tda's People

Contributors

Stargazers

Watchers

Forkers

spark-tda's Issues

Recommend Projects

Recommend Topics

Recommend Org