scikit-tda / kepler-mapper Goto Github PK
View Code? Open in Web Editor NEWKepler Mapper: A flexible Python implementation of the Mapper algorithm.
Home Page: https://kepler-mapper.scikit-tda.org
License: MIT License
Kepler Mapper: A flexible Python implementation of the Mapper algorithm.
Home Page: https://kepler-mapper.scikit-tda.org
License: MIT License
Can you please explain me what we mean by verbose
for example
mapper = km.KeplerMapper(verbose=2)
how to use it...I am confused...please guide...
Thanks
Hey Devs,
Often I've figured it may be useful to have a write/read utility for mapper networks produced by KeplerMapper. Is this something being developed? If not, how would you suggest one could be written? I suppose a basic interface with networkX could work, but maybe you reckon there is a better way?
Let me know your thoughts
By integrating CI, Travis will automatically run the test suite before any pull request.
I've added the .travis.yml settings file and tested everything so it works on my fork. From what I can tell, only the owner of the repo can integrate travis into the repo.
@MLWave, when you have a second, could you turn this on? It took me <3 minutes on my fork, just follow the first few steps here.
Once this is done, we can
Great SW! I'm trying to run the code in a jupyter notebook
, but it doesn't work, i.e. I get the .html file saved but no display in the notebook.
from IPython.core.display import display, HTML
...
display(mapper.visualize(graph, path_html="make_circles_keplermapper_output.html",
title="make_circles(n_samples=5000, noise=0.03, factor=0.3)"))
I also tried HTML(mapper.visualize...)
with no result.
Any suggestion?
Hello Kepler-Mapper devs and users,
This is not an opened issue, but a proposal for a possible future pull request.
Over the last weekend I worked on a new version of kmapper on my local repo, that generates an interactive Plotly plot in the Jupyter Notebook.
Here are a few experiments: http://nbviewer.jupyter.org/gist/empet/3ad5d13ad662eb18ae682f2a49035420
Please express your opinion on such a possibility of plotting the topological graph associated to a dataset.
On my master sauln/kepler-mapper@274e3ea I have added a setup.py
file so kepler-mapper can be installed with python setup.py install
. New import format would be
import KeplerMapper from kmapper
or
import kmapper as km
What are your thoughts on this? Once we set the format, I can run through the docs and examples with an update.
I am totally new to the topological data analysis. I try to learn some. When I copy the code to the Python Notebook and replace with my own data, The code is ok. But no plot output. Do i need to install other software? Then python I use is Anaconda.
Hi there,
When using HDBSCAN (from hdbscan library, as suggested in Issue #68), the code terminates with :
ValueError: k must be less than or equal to the number of training points
This is with the default behaviour (i.e. using mapper.map(projected_X=projected_data, inverse_X=data, clusterer=hdbscan.HDBSCAN()
)
The same data and projection work fine with DBSCAN()
, so I assume its an issue with the interaction with hdbscan
. Maybe you know something about this?
The process: https://github.com/Kaggle/docker-python
Some examples already:
https://inclass.kaggle.com/triskelion/mapping-with-sum-row/notebook
https://www.kaggle.com/triskelion/testing-python-3/notebook
https://www.kaggle.com/triskelion/isomap-all-the-digits-2
If we get the new containers to build with kmapper, we can use KeplerMapper on all Kaggle's datasets and use their notebooks for replication/easy forks.
Dear all,
Many thanks for Kepler Mapper. It is a very interesting project.
I have looked at the examples at the notebook folder. I have found a problem with the notebook KeplerMapper Newsgroup20 Pipeline.ipynb. I get the following error when I tried to run the line "projected_X = mapper.fit_transform(X, projection=[TfidfVectorizer(analyzer="char", ngram_range=(1,6), max_df=0.83, min_df=0.05), TruncatedSVD(n_components=100, random_state=1729), Isomap(n_components=2, n_jobs=-1)] :
IndexError Traceback (most recent call last)
in ()
10 Isomap(n_components=2,
11 n_jobs=-1)],
---> 12 scaler=[None, None, MinMaxScaler()])
13
14 print("SHAPE",projected_X.shape)
~/kepler-mapper/kmapper/kmapper.py in fit_transform(self, X, projection, scaler, distance_matrix)
219 if self.verbose > 0:
220 print("\n..Projecting data using: %s" % (str(projection)))
--> 221 X = X[:, np.array(projection)]
222
223 # Scaling
IndexError: arrays used as indices must be of integer (or boolean) type
I have tried different ways to solve the problem, but without any success. I wonder whether it would be possible for you to suggest me some possible solutions?
Many thanks for considering my request.
Best wishes
All information about the constructed graph should be stored in graph["meta_graph"]
. The visualize
method should use attributes in this dictionary rather than found from self
.
This would allow us to separate the construction of the visualization from the construction of the graph object alone. Additionally, the graph could be saved with all of its relevant information and graphed at a later time.
What is the best way to assimilate information from both fit_transform and map? Maybe fit_transform sets what it can and then map included this information also?
Hi, I am extremely sorry , If I am Asking some simple questions.
As I am Mathematician by profession and Not the programmer..
I would like to know
I would highly appreciate, If you respond ...
Hi, thanks for providing the great software for TDA!
I am looking for a way to download the TDA plot with white background to use and cite it in a research paper, is there a way to extract it as a clean and high-resolution plot, other than the current option to download the html with black background?
Thanks!
Thank you for your hard work, you are appreciated.
It would be great if you considered releasing a GUI for the kepler-mapper, in-which allowing the user to:
I would be glad to donate to such project to bring it to life
Waiting for your reply
Now that @michiexile abstracted a nerve class, we can include support for other nerves. Most obvious are a min-intersection nerve and an n-simplices nerve. Though I believe we will also need a custom nerve for multi-mapper and we could experiment with a multi-nerve.
It would be nice to set a minimum number of points each cluster has to intersect on to be considered connected.
edges = nerve({"a": [1,2,3], "b": [2,3,4], "c": [4,5,6]}, min_intersection=2)
-> ["a", "b"] in edges and ["b", "c"] not in edges
It would be nice to have a general n-simplex nerve that constructs simplices of order n or less.
Before building this, is there an established format for simplexes? Are there any libraries that we could use?
Most promising simplicial complex libraries found in the wild:
I'd prefer not to reinvent the wheel but I think a strong python simplicial complex library could be useful to the community.
There's a bug in dict_to_json that prevents linking from the json data back to node data.
The json for a given node is defined as "name": "node_id"
when it should actually be "name": node_id
. I've made a quick fix.
Hello All,
Really like this project, better than all the other mapper tools on python right now imo.
However, I am consistently having a problem where whenever I include inverse_X in the mapping, the mapping returns 0 nodes and 0 edges. I have tried this with a number of different data sets without success. I have done this on data sets with 300D and 200,000 rows and 200D, 100D, and all your sample sets with the same result. Its possible I am just ignorant, but I believe that all of these data sets should work with Inverse_X given it is just projecting along the original data. There are no errors produced for this problem. If inverse_X is not included, the mapper works as expected.
On another note, I am going to start working on a right click selection tool for displaying contents of nodes as i need this for my research. No idea how successful i'll be. Ill let you know.
The distribution plot on the right pane shows the distribution for all data points. It would be nice to show the distribution of points in each node.
One option for this could be to show a pie chart at each node, or during hover, change the histogram in the right hand pane.
I'd like to make some tweaks to the output web page, but as it stands that's slightly awkward to do; the HTML is contained within the Python as a string and so is the JS. It's particularly a problem as I'm doing these changes for someone with less programming experience who might have a hard time resolving any conflicts that could arise trying to keep a fork up-to-date.
Would splitting them out into separate files that are then read in by the Python and used to produce the finished HTML be a problem? Or potentially just setting them up as Jinja2 templates and rendering them that way? It would make the HTML/Javascript a bit neater by replacing all the %s markers with actual variable names. I could try either, but would prefer not to spend time on it if it's incompatible with the project design somehow!
Hi there
I can see that there is a TODO to implement a cover defining API. I was wondering what is the best way of creating a user-defined cover at the moment (if it is possible at all). From what I can tell, we are currently restricted to a (n_bins, overlap_perc)
method. Is it possible to define a cover explicity (for one or more dimensions in the lens), using cutoff values or similar (like, setting the maximum and minimum values of the covering space in each dimension)? I ask because in its current implementation I think the [non-]presence of an outlier can skew the covering space quite drastically
Let me know what my options are for the covering space. I would also be interested to know the status of the above TODO. More information as to how the cover class currently works might also be useful if I was going to write my own.
Thanks!
Edit: I've modified the code such that you can pass kmapper.map
a CoverBounds
variable.
if CoverBounds == None:
Normal behavior
However, CoverBounds can also be a (ndim_lens, 2)
array, with min, max
for every dimension of your lens. If the default behavior is fine for a particular dimension, pass it np.float('inf'), np.float('inf')
.
For example, if I have a lens in R2 and want to set the maximum and minimum of the second dimension to be 0 and 1, I can pass:
mapper.map(CoverBounds = np.array([[np.float('inf'), np.float('inf')],[0,1]]))
and that should have the desired behavior.
Edit 2: Might change it so rather than inf
detection, works off None
detection in the CoverBounds
array.
I think a system designed like this should produce exactly the same cover, independent of input data limits.
Devs - let me hear your thoughts on this - I can clean up and submit a pull request.
There's a slightly odd interaction with minimum cluster size and cells with few entries. In kmapper.py:372
, cells are only checked for clustering if there are >= min_cluster_samples
samples within them. But min_cluster_samples
is set to n_clusters
.
So if you set n_clusters
to 3, then any cell with 3 samples in will produce 3 separate 1-sample clusters in the output. Any cell with 2 samples will produce 0 clusters (and thus likely a different unique sample count in the output). This probably has little to no impact on the graph and is unlikely to show up except in small trial datasets, but it is a bit confusing that the parameter is reused for a different (if related) purpose.
In Breast cancer example you are using the following code lines
We create a custom 1-D lens with Isolation Forest
model = ensemble.IsolationForest(random_state=1729)
model.fit(X)
lens1 = model.decision_function(X).reshape((X.shape[0], 1))
My question is .. What does this lense1 is doing i mean what kind of projection do we have ? are we projecting our data on predictions of isolation forest...I am not sure please help????
The switch to using links to the JS/CSS rather than folding it into the file directly has unfortunately introduced some portability & sustainability issues.
It makes the saved HTML files dependent on the install directory of the module, and any environment used whilst running it- making redistributing them difficult. In addition, the HTML version is static whilst the JS/CSS is linked, potentially leading to old HTML files becoming unusable if the JS/CSS they depend on changes to require tags or elements not present in the old HTML.
I have following queries ?
1. How do i use kepler mapper with categorical data.
2. How do i use distance function described in kmapper.py, I mean when i ran mapper in R
I apply mapper on distance matrix. What is idea of distance in kmapper?
3. How do i color nodes of graphs as per my Rule..for example If i want to color nodes in my mapper output graph according to number of points in each node as per color scale?
I would highly appreciate if you help me out in thses queries??
Thanks
I really enjoyed reading this notebook: https://github.com/MLWave/kepler-mapper/tree/master/notebooks/self-guessing#references
I'm curious though: what do the predictions from a strong self-guesser for the "blue sea star" image at the start of the notebook look like?
Hi Devs
In the confidence_graphs
notebook, image tooltips are used in the visualisation. This is something which we feel could be very useful in our work here.
However, when we create our image tooltips in the same way, and follow the procedure outlined in the notebook, we find that rather than get a nice array of images under MEMBERS
in the output html file, all of our images overlay (essentially, the html tag doesn't seem to be processed correctly).
I assume this is down to a change in the visualization since the notebook was written. I'm not sure this is a browser issue (using Chrome). Are you aware of this, and if so if it is possible to create image tooltips?
Lee
Hi @ dev
I am sure you will solve this issue..
I am sendng you relevant part of code here
graph = mapper.map(projected_data,X,nr_cubes=14,overlap_perc=0.8,clusterer=sklearn.cluster.DBSCAN(eps=15, min_samples=4))
model = ensemble.IsolationForest(random_state=1729)
model.fit(X)
usecolor = model.decision_function(X).reshape((X.shape[0], 1))
node=graph['nodes'].values()
cluster= list(node)
# My own Function..I am trying to color each node by avg anomaly score of the data points in that node
z=[]
for i in range(0,len(cluster)):
z.append(np.round(np.mean(usecolor[cluster[i]])*10,decimals=4,out=None))
print(z)
s=np.asarray(z)
s
I am getting following output
Mapping on data shaped (150, 4) using lens shaped (150, 2)
Creating 196 hypercubes.
Created 57 edges and 24 nodes in 0:00:00.054112.
[0.6021, 0.4893, 0.2385, -0.2715, 0.7183, 0.7395, 0.6809, 0.4972, 0.333, 0.6087, 0.6839, 0.5749, 0.3008, 0.4609, 0.3176, 0.2205, 0.1462, -0.05, 0.4609, 0.4583, 0.0631, 0.559, 0.5166, 0.1199]
array([ 0.6021, 0.4893, 0.2385, -0.2715, 0.7183, 0.7395, 0.6809,
0.4972, 0.333 , 0.6087, 0.6839, 0.5749, 0.3008, 0.4609,
0.3176, 0.2205, 0.1462, -0.05 , 0.4609, 0.4583, 0.0631,
0.559 , 0.5166, 0.1199])
Now I go to next part of code
# Visualize it
mapper.visualize(simplicial_complex, path_html="/home/dhananjay/kepler-mapper/keplermapper-iris.html", custom_meta={"Data:": "Me"}, custom_tooltips=Y,color_function=z)
But I am getting error
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-14-0692ad6431ec> in <module>()
6 0.4972, 0.333 , 0.6087, 0.6839, 0.5749, 0.3008, 0.4609,
7 0.3176, 0.2205, 0.1462, -0.05 , 0.4609, 0.4583, 0.0631,
----> 8 0.559 , 0.5166, 0.1199]))
~/kepler-mapper/kmapper/kmapper.py in visualize(self, graph, color_function, custom_tooltips, custom_meta, path_html, title, save_file, X, X_names, lens, lens_names, show_tooltips)
507 mapper_data = format_mapper_data(graph, color_function, X,
508 X_names, lens,
--> 509 lens_names, custom_tooltips, env)
510
511 histogram = graph_data_distribution(graph, color_function)
~/kepler-mapper/kmapper/visuals.py in format_mapper_data(graph, color_function, X, X_names, lens, lens_names, custom_tooltips, env)
58 for i, (node_id, member_ids) in enumerate(graph["nodes"].items()):
59 node_id_to_num[node_id] = i
---> 60 c = _color_function(member_ids, color_function)
61 t = _type_node()
62 s = _size_node(member_ids)
~/kepler-mapper/kmapper/visuals.py in _color_function(member_ids, color_function)
219
220 def _color_function(member_ids, color_function):
--> 221 return _color_idx(np.mean(color_function[member_ids]))
222 # return int(np.mean(color_function[member_ids]) * 30)
223
IndexError: index 50 is out of bounds for axis 1 with size 24
Currently, the map
method assumes the clustering class has an attribute labels_
. This is not part of the API and so is not true for all sklearn clustering methods.
It would be preferable to use the fit_predict
method instead to extract the cluster labels.
AttributeError Traceback (most recent call last)
in ()
9
10 # Fit to and transform the data
---> 11 projected_data = mapper.fit_transform(data, projection=[0,1]) # X-Y axis
12
13 # Create dictionary called 'complex' with nodes, edges and meta-information
C:\Users\admin\Anaconda3\lib\km.py in fit_transform(self, X, projection, scaler)
62 pass
63 print("\n..Projecting data using: \n\t%s\n"%str(projection))
---> 64 X = reducer.fit_transform(X)
65
66 # Detect if projection is a string (for standard functions)
AttributeError: 'list' object has no attribute 'fit_transform'
Hi all,
I am switching from Ayasdi to open source Mapper and was looking for whether Kepler-mapper has a metric selection function, just like the "projection"/lens selection function in this class. As far as I know, in the original python Mapper algorithm, they have a list of metric options as follows:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html
Can anyone help on this please?
-thanks.
Currently:
Implement/Research:
Re-visit:
Hi first of all this is a great open source implementation of the Mapper been searching for one. I am trying to build a complex using some weather data.
My shape:
X.shape
(7477, 2)
Circle example shape:
data.shape
(5000, 2)
Yet I'm getting ValueError: all the input array dimensions except for the concatenation axis must match exactly.
Here's a sample of my data:
X[0]
array([ 1.02, 28. ])
I'm sure I'm doing something silly could you guide me?
Hi, I am customizing my color function using an external variable in the same dataset (code bellow).
import numpy as np
import pandas as pd
import kmapper as km
from mst_clustering import MSTClustering
lens = df['var_lens']
graph = mapper.map(lens, df[vars_used_for_distance], nr_cubes=10,
overlap_perc=0.95,
clusterer=MSTClustering(cutoff=2))
# Visualization
mapper.visualize(graph, path_html="psycho-scores-MST.html",
color_function=df['var_color'].values,
custom_tooltips=df['var_color'].values)
I noticed kepler-mapper uses a discrete range (0-10) of colors, however I could not understand from the visuals.py code which statistic within the cluster it uses.
Using
for cluster in graph['nodes']:
print np.nanmean(df['var_color'][graph['nodes'][cluster]].values)
I get the means in each cluster, but I can't pass the values to node colors in color_function.
I have a sufficient mathematical background, but my python skills are not very good. Congratulations for the library. It is really good and has wonderful visualizations.
Hi there,
thanks for this amazing fancy version of mapper! After working through couple datasets using km, I have few suggestions for the next update and hopefully would be helpful to others as well:
in the 3D output, when we move the mouse to nodes, we only see the classification label (e.g., if the outcome is binary, then we only see 0/1), what was not generated but would be extremely helpful in later on validation of the results in traditional statistical approaches is: (number of ) row IDs within each nodes. if there is a way to see how many rows (assuming your data is one ID per row) are in each cluster, it would be really informative...
after realizing that feature, maybe it would be worth adding another function from which we can select a specific cluster (assuming we have several clusters in the output) and download the row IDs in there. In this way, we can make use of the clusters generated from km and load them in logistic regression or other traditional approaches to find out what drives the separation of such clusters.
Thanks again and please let me know if you need extra clarification on this.
-Yuzu
We'd like all idioms used in KeplerMapper to fit seamless with the scikit-learn API.
What needs to be changed? Below are some things that have been brought up before and questions to ask about the current API.
fit
method and have fit_transform
only run fit
and return the result.map
method fit into this design? Should map
be fit
instead?Please suggest other changes to the API!
Hi, I installed KeplerMapper from source and running the newsgroup notebook I still got the same error as that in the recently closed issue. Is there something wrong? Thank you for the wonderful work and dedication.
mapper = km.KeplerMapper(verbose=2)
projected_X = mapper.fit_transform(X,
projection=[TfidfVectorizer(analyzer="char",
ngram_range=(1,6),
max_df=0.83,
min_df=0.05),
TruncatedSVD(n_components=100,
random_state=1729),
Isomap(n_components=2,
n_jobs=-1)],
scaler=[None, None, MinMaxScaler()])
print("SHAPE",projected_X.shape)
..Projecting data using: [TfidfVectorizer(analyzer='char', binary=False, decode_error='strict',
dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
lowercase=True, max_df=0.83, max_features=None, min_df=0.05,
ngram_range=(1, 6), norm='l2', preprocessor=None, smooth_idf=True,
stop_words=None, strip_accents=None, sublinear_tf=False,
token_pattern='(?u)\\b\\w\\w+\\b', tokenizer=None, use_idf=True,
vocabulary=None), TruncatedSVD(algorithm='randomized', n_components=100, n_iter=5,
random_state=1729, tol=0.0), Isomap(eigen_solver='auto', max_iter=None, n_components=2, n_jobs=-1,
n_neighbors=5, neighbors_algorithm='auto', path_method='auto', tol=0)]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-4-3a323e834380> in <module>()
10 Isomap(n_components=2,
11 n_jobs=-1)],
---> 12 scaler=[None, None, MinMaxScaler()])
13
14 print("SHAPE",projected_X.shape)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\kmapper\kmapper.py in fit_transform(self, X, projection, scaler, distance_matrix)
219 if self.verbose > 0:
220 print("\n..Projecting data using: %s" % (str(projection)))
--> 221 X = X[:, np.array(projection)]
222
223 # Scaling
IndexError: arrays used as indices must be of integer (or boolean) type
I'd like to talk about future directions of kepler-mapper and some work I'd like to do. Before I get too far ahead of myself, I want to make sure you (@MLWave) agree with the directions so a permanent fork won't be necessary.
Immediate steps are
Do you have a vision or direction for kepler-mapper? There is considerable new research on the method and I think kepler-mapper would be a great platform to introduce some of these ideas.
Currently, map
takes arguments for nr_cubes
and overlap_perc
as a singleton integer or float. I would like the option to specify a different number of cubes and overlap for each dimension.
I imagine a test would look something like this:
lens = np.random.rand(10,3)
cover = Cover(data, nr_cubes=[10,20,30])
assert len(cover.cubes) == 10*20*30
After the API changes of v00008 some of the examples have not been updated.
These include
It would be really helpful for interactive analysis if the tooltip labels could be strings instead of just integer labels. Otherwise one has to reference their original label mapping to understand which integers correspond to what labels/categories. If this functionality is already implemented please demonstrate. Thank you.
For example, the version of km that digits.py
uses assumes that the KeplerMapper
class has a reducer
attribute. In km.py
given in the repository, this parameter is in the fit_transform
method and is called projection
. Also digits.py
uses the parameter name cluster_algorithm
instead of clusterer
(as it is in km.py
).
I have seen that this project has gone through considerable developments and improvements in the last months. About one year ago, I spent about 2 months studying TDA and its applications to (moistly scientific) data analysis. When it was time to review MAPPER, I started playing with KeplerMapper and I found it extremely convenient for MAPPER-based data exploration (basically was the only good open implementation out there).
However, I was missing some interactivity (changing the variable used for node colouring, recomputing the simplicial complex with other clustering/coverer parameters, etc.) for the exploration of the output simplicial complex. With the aim of understanding better MAPPER I started rewriting my own implementation (which I called cartographer) re-using scikit-learn as much as possible, inspired with what was done in KeplerMapper.
Given that this project is now undergoing active development now and it is definitely more mature and has more user adoption than mine, I think it would be interesting to see wether some of the simplified API design choices and implementation changes could be ported to KeplerMapper to improve usability and performance in case you are interested.
So we can discuss what could be re-used within KeplerMapper, I list now the main design and implementation changes when rewriting the implementation (it is pretty simple and can bee seen here):
Opted for separated visualisation of the simplicial complex from the actual computation of the simplicial complex (the scikit-learn cluster-like model). My aim was to be able to adapt the visualisation details at a later stage and also have the possibility to either serve a standalone html or also see the visualisation within a Jupyter Notebook/Lab.
The Mapper class inherits from ClusterMixin and the three Mapper components can be configured in the constructor call: filterer (the transforming function reducing from the high-dimensionality of the data of from a lower dimensionality to compute the nerve), coverer (the transformer function that defines the overlapping spaces from which the nerve is computed) and clusterer (the clustering algoritm that is actually used in the algorithm).
The coverer, see the HyperRectangleCoverer divides the input space in overlapping regions. One trick that I discovered to speed up considerably the execution was to reduce set intersection checks to overlapping regions, by means of returning with the coverer a overlapping matrix (could be also sparse) and checking for intersection only on those subsets.
Standalone documentation with Jupyter notebooks (the output D3.js graph can be explored within a Jupyter output cell ), executed with nbsphinx when the docs are built by the CI.
In addition to the features I implemented in cartographer, I also spent a few weeks thinking on how to implement other improvements (e.g. bidirectional Jupyter widget visualization, how to deal with hyper-parameters, multi-scale MAPPER approaches). I will also be glad to discuss also those and contribute to their integration.
I want to plot that picture in an article, however, direct print from browser always have problems.
I also used that way in this issue #73 but failed.
I am wondering if there is a way to plot picture neatly in article.
Thanks a lot.
Hi, I just opened PR #87. Context is that I had a precomputed distance matrix that I wanted to cluster with. I used T-SNE to scale it down to two dimensions for the filter function, but I didn't see a way to use metric='precomputed'
with DBSCAN for the clustering with inverse_X
because of the hypercube slicing making things un-square. So this PR is what I did to tell mapper to give a square matrix to the clusterer.
ValueError: shape mismatch: value array of shape (40000,2) could not be broadcast to indexing result of shape (40000,1)
When using a classifier.
All changes for v1.1 are currently incorporated in the dev
branch.
Included in the release is
What needs to happen before we can deploy the next release :
Hi kepler-mapper team,
Thank you very much for your great TDA tool!
Could you please update the examples in the keppler-mapper
repo, according to the last version of kmapper?
I ran digits.py
and got this error message:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-1ae2740795d8> in <module>()
51 path_html="keplermapper_digits_custom_tooltips.html",
52 graph_gravity=0.25,
---> 53 custom_tooltips=tooltip_s)
54 # Tooltips with the target y-labels for every cluster member
55 mapper.visualize(graph,
TypeError: visualize() got an unexpected keyword argument 'graph_gravity'
I checked the visualize()
method definition, and indeed it has no keyword 'graph_gravity'
.
I commented it out and it worked, but I noticed that although in your code
the visualize()
is called with the default value, None
, for the color_function
the node colors in my html file are different from those of nodes in your html file posted in the folder examples (my nodes have mostly the central colors in the jet colormap, and only a few are red).
It's obvious that your plot was generated with a special color_function
that is not set in digits.py
.
I'd like to know a method of node coloring in the case of a filter function with values in
All references on TDA, related to mapper algorithm, explain only how to color the nodes in the case of a filter function with real values.
For breast-cancer
example you provided the color_function="average_signal_cluster"
, but the color_function
value should be a numpy.array
of floats
and this string caused the error:
./visuals.py in init_color_function(graph, color_function)
13 color_function = np.arange(n_samples).reshape(-1, 1)
14 else:
---> 15 color_function = color_function.reshape(-1, 1)
16 # MinMax Scaling to be friendly to non-scaled input.
17 scaler = preprocessing.MinMaxScaler()
AttributeError: 'str' object has no attribute 'reshape'
Thanks!!! :)
The README is getting very long and it would be nice to add more examples, tutorials, and guides.We could migrate most of the docs to another venue and leave the README as a short guide for setup and contribute instructions.I think to do this we could use mkdocs to build the pages for us and publish the docs with ReadTheDocs
I am running the following code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas import DataFrame
import kmapper as km
import sklearn
d = {'x': np.cos(np.arange(1,100)), 'y': np.sin(np.arange(1,100))}
df=DataFrame(data=d)
mapper = km.KeplerMapper(verbose=2)
lens = mapper.fit_transform(df)
complex = mapper.map(lens,df,
clusterer=sklearn.cluster.DBSCAN(eps=0.1, min_samples=5),
nr_cubes=10, overlap_perc=0.5)
mapper.visualize(complex, path_html="/home/dhananjay/kepler-mapper/keplermapper-fig8-xaxis.html",
title="fig8-xaxis")
I am getting following output and Error
..Composing projection pipeline length 1:
Projections: sum
Distance matrices: False
Scalers: MinMaxScaler(copy=True, feature_range=(0, 1))
..Projecting on data shaped (99, 2)
..Projecting data using: sum
..Scaling with: MinMaxScaler(copy=True, feature_range=(0, 1))
Mapping on data shaped (99, 2) using lens shaped (99, 1)
Minimal points in hypercube before clustering: 1
Creating 10 hypercubes.
There are 19 points in cube_0 / 10
Found 0 clusters in cube_0
There are 10 points in cube_1 / 10
Found 0 clusters in cube_1
There are 9 points in cube_2 / 10
Found 0 clusters in cube_2
There are 11 points in cube_3 / 10
Found 0 clusters in cube_3
There are 28 points in cube_4 / 10
Found 0 clusters in cube_4
There are 17 points in cube_5 / 10
Found 0 clusters in cube_5
There are 9 points in cube_6 / 10
Found 0 clusters in cube_6
There are 9 points in cube_7 / 10
Found 0 clusters in cube_7
There are 12 points in cube_8 / 10
Found 0 clusters in cube_8
There are 14 points in cube_9 / 10
Found 0 clusters in cube_9
Created 0 edges and 0 nodes in 0:00:00.020664.
kmapper/kmapper.py:133: FutureWarning: reshape is deprecated and will raise in a subsequent release. Please use .values.reshape(...) instead
X = np.sum(X, axis=1).reshape((X.shape[0], 1))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-42-d130d74910ae> in <module>()
12 # Visualize it
13 mapper.visualize(complex, path_html="/home/dhananjay/kepler-mapper/keplermapper-fig8-xaxis.html",
---> 14 title="fig8-xaxis")
15
/home/dhananjay/kepler-mapper/kmapper/kmapper.pyc in visualize(self, graph, color_function, custom_tooltips, custom_meta, path_html, title, save_file, inverse_X, inverse_X_names, projected_X, projected_X_names)
438 """
439
--> 440 color_function = init_color_function(graph, color_function)
441 json_graph = dict_to_json(
442 graph, color_function, inverse_X, inverse_X_names, projected_X, projected_X_names, custom_tooltips)
/home/dhananjay/kepler-mapper/kmapper/visuals.pyc in init_color_function(graph, color_function)
9 # If no color_function provided we color by row order in data set
10 # Reshaping to 2-D array is required for sklearn 0.19
---> 11 n_samples = np.max([i for s in graph["nodes"].values() for i in s]) + 1
12 if color_function is None:
13 color_function = np.arange(n_samples).reshape(-1, 1)
/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.pyc in amax(a, axis, out, keepdims)
2270
2271 return _methods._amax(a, axis=axis,
-> 2272 out=out, **kwargs)
2273
2274
/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.pyc in _amax(a, axis, out, keepdims)
24 # small reductions
25 def _amax(a, axis=None, out=None, keepdims=False):
---> 26 return umr_maximum(a, axis, None, out, keepdims)
27
28 def _amin(a, axis=None, out=None, keepdims=False):
**ValueError: zero-size array to reduction operation maximum which has no identity**
Can you please resolve
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.