Giter Site home page Giter Site logo

lucasimi / tda-mapper-python Goto Github PK

View Code? Open in Web Editor NEW
5.0 1.0 1.0 9.77 MB

A simple and efficient Python implementation of Mapper algorithm for Topological Data Analysis

Home Page: https://tda-mapper.readthedocs.io/en/main/

License: Apache License 2.0

Python 100.00%
mapper-algorithm topological-data-analysis mapper tda topological-machine-learning topology topology-visualization

tda-mapper-python's Introduction

Logo

PyPI version downloads test deploy docs codecov DOI Streamlit App

tda-mapper

A simple and efficient Python implementation of Mapper algorithm for Topological Data Analysis

The Mapper algorithm is a well-known technique in the field of topological data analysis that allows data to be represented as a graph. Mapper is used in various fields such as machine learning, data mining, and social sciences, due to its ability to preserve topological features of the underlying space, providing a visual representation that facilitates exploration and interpretation. For an in-depth coverage of Mapper you can read the original paper.

This library contains an implementation of Mapper, where the construction of open covers is based on vp-trees for improved performance and scalability. The details about this methodology are contained in our preprint.

Step 1 Step 2 Step 3 Step 4
Step 1 Step 2 Step 3 Step 2
Chose lens Cover image Run clustering Build graph

Example

Here you can find an example to use to kickstart your analysis. In this toy-example we use a two-dimensional dataset of two concentric circles. The Mapper graph is a topological summary of the whole point cloud.

import numpy as np

from sklearn.datasets import make_circles
from sklearn.decomposition import PCA
from sklearn.cluster import DBSCAN

from tdamapper.core import MapperAlgorithm
from tdamapper.cover import CubicalCover
from tdamapper.plot import MapperLayoutInteractive

X, y = make_circles(                # load a labelled dataset
    n_samples=5000,
    noise=0.05,
    factor=0.3,
    random_state=42)
lens = PCA(2).fit_transform(X)

mapper_algo = MapperAlgorithm(
    cover=CubicalCover(
        n_intervals=10,
        overlap_frac=0.3),
    clustering=DBSCAN())
mapper_graph = mapper_algo.fit_transform(X, lens)

mapper_plot = MapperLayoutInteractive(
    mapper_graph,
    colors=y,                       # color according to categorical values
    cmap='jet',                     # Jet colormap, for classes
    agg=np.nanmean,                 # aggregate on nodes according to mean
    dim=2,
    iterations=60,
    seed=42,
    width=600,
    height=600)

fig_mean = mapper_plot.plot()
fig_mean.show(config={'scrollZoom': True})

mapper_plot.update(                 # reuse the plot with the same positions
    colors=y,
    cmap='viridis',                 # viridis colormap, for ranges
    agg=np.nanstd,                  # aggregate on nodes according to std
)

fig_std = mapper_plot.plot()
fig_std.show(config={'scrollZoom': True})
Dataset Mapper graph (average) Mapper graph (deviation)
Dataset Mapper graph (average) Mapper graph (standard deviation)

More examples can be found in the documentation.

Demo App

You can also run a demo app locally by running

pip install -r app/requirements.txt
streamlit run app/streamlit_app.py

Citations

If you want to use tda-mapper in your work or research, you can cite the archive uploaded on Zenodo, pointing to the specific version of the software used in your work.

If you want to cite the methodology on which tda-mapper is based, you can use the preprint.

tda-mapper-python's People

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

pakhi07

tda-mapper-python's Issues

Unexpected labels in core.mapper_connected_components

In the function core.mapper_connected_components the assignment of labels is incorrect.
To reproduce:

data = [0, 1, 2, 3]

class MockCover:

    def apply(self, X):
        yield [0, 3]
        yield [1, 3]
        yield [1, 2]
        yield [0, 1, 3]

cover = MockCover()
clustering = TrivialClustering()
labels = mapper_labels(data, data, cover, clustering)

From the example we should expect a single connected component, because each yielded open set overlaps with the other. This is not the case, since in the body of core.mapper_connected_components we use union-find and incorrectly assign a temporary class, which can eventually change in the loop.

uf = UnionFind(label_values)
labels = [-1 for _ in X]
for lbls in itm_lbls:
len_lbls = len(lbls)
root = -1
# noise points
if len_lbls == 1:
root = uf.find(lbls[0])
elif len_lbls > 1:
for first, second in zip(lbls, lbls[1:]):
root = uf.union(first, second)
labels.append(root)

To fix this we must first perform uf.union on the whole array, then run a second loop in order to assign labels using uf.find.

Unexpected output from core.mapper_connected_components

Running the following lines after the example in readme.md I get a list of twice the size of X.shape[0].

cover = CubicalCover(n_intervals=10, overlap_frac=0.65)
clustering = FailSafeClustering(clustering=AgglomerativeClustering(10), verbose=False)

from tdamapper import core
l = core.mapper_connected_components(X,lens,cover,clustering)

The first X.shape[0] elements of the output list are -1.
I am using version tda-mapper==0.2.0.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.