jsevo / taxumap Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
If you try to run command by providing directory and file with any other name under -m flag it gives an error.
Results should be placed in phylo-umap/results folder, but that doesn't happen. All the logs and embedding.csv is in the working directory.
Do we need the notebook directory @jsevo ? The one with the R analyses.
It would be good to have a more extensive explanation on how to use flags '-a' or '-w'. For example, how to use '-a' if there is only one taxonomic level you want to group at (and if this is possible). Similarly '-w' flag explanation doesn't state what are the limits for weights (0-10?); and if bigger numbers mean more weight to that taxonomic level. An example would be great.
When using cityblock, we won't have to do what's happening here:
Line 20 in 61165d1
The distances can be calculated at ASV level and then the columns of distance matrices can be added based on taxonomy. This distance works, maybe others too.
Hello,
I've found that I was able to get TaxUMAP to work by specifying filenames to the Taxumap
function, but providing them as dataframes didn't seem to work.
Here is a screenshot of what portions of the relative abundance and taxonomy dataframes look like
And here is the error I get when running transform_self
:
Error in rule taxumap:
jobid: 0
output: analysis/taxumap/amadeus/output/amadeus_embedding.feather, analysis/taxumap/amadeus/output/amadeus_dominant_taxon.feather
RuleException:
KeyError in line 64 of /Users/funnellt/Projects/PICI_microbiome/workflow/rules/taxumap.smk:
'Methanobrevibacter_smithii'
File "/Users/funnellt/Projects/PICI_microbiome/workflow/rules/taxumap.smk", line 64, in __rule_taxumap
File "/Users/funnellt/Projects/phylo-umap/taxumap/taxumap_base.py", line 113, in transform_self
File "/Users/funnellt/Projects/phylo-umap/taxumap/tools.py", line 42, in tax_agg
File "/Users/funnellt/Projects/phylo-umap/taxumap/tools.py", line 55, in aggregate_at_taxlevel
File "/Users/funnellt/Projects/phylo-umap/taxumap/tools.py", line 55, in <listcomp>
I'm running Taxumap
like this:
from taxumap.taxumap_base import Taxumap
relab = pd.read_csv(input['relab'])
tax = pd.read_csv(input['tax'])
taxumap = Taxumap(taxonomy=tax, microbiota_data=relab)
taxumap.transform_self(
neigh=28,
min_dist = 0
)
However, it works fine if I just specify the filenames like this:
taxumap = Taxumap(taxonomy=input['tax'], microbiota_data=input['relab'])
taxumap.transform_self(
neigh=28,
min_dist = 0
)
There is some overengineeering going on wrt to the type checking of inputs.
from taxumap import Taxumap
import numpy
X = numpy.random.standard_normal(shape=(500,5))
tumap = Taxumap(rel_abundances = X)
This passes the data as numpy array, which we currently do not accept. It raises repeated Errors and logs them, ending on an empty NameError. Let's aim to make that simple type checking a little less overly complicated.
tumap = Taxumap(rel_abundances = X)
_name_value_error:ERROR
Please provide the constructor with one of the following: a filepath to your rel_abundances file via parameter `fpx`, or with an initialized variable for your rel_abundances dataframe via the parameter `rel_abundances`
Traceback (most recent call last):
File "/Users/schluj05/workspace/phylo-umap/taxumap/taxumap/taxumap.py", line 112, in __init__
raise NameError
NameError
_name_value_error:ERROR
Please provide the constructor with one of the following: a filepath to your rel_abundances file via parameter `fpx`, or with an initialized variable for your rel_abundances dataframe via the parameter `rel_abundances`
Traceback (most recent call last):
File "/Users/schluj05/workspace/phylo-umap/taxumap/taxumap/taxumap.py", line 112, in __init__
raise NameError
NameError
_name_value_error:ERROR
Please provide the constructor with one of the following: a filepath to your rel_abundances file via parameter `fpx`, or with an initialized variable for your rel_abundances dataframe via the parameter `rel_abundances`
Traceback (most recent call last):
File "/Users/schluj05/workspace/phylo-umap/taxumap/taxumap/taxumap.py", line 112, in __init__
raise NameError
NameError
_name_value_error:ERROR
Please provide the constructor with one of the following: a filepath to your rel_abundances file via parameter `fpx`, or with an initialized variable for your rel_abundances dataframe via the parameter `rel_abundances`
Traceback (most recent call last):
File "/Users/schluj05/workspace/phylo-umap/taxumap/taxumap/taxumap.py", line 112, in __init__
raise NameError
NameError
_name_value_error:ERROR
Please provide the constructor with one of the following: a filepath to your rel_abundances file via parameter `fpx`, or with an initialized variable for your rel_abundances dataframe via the parameter `rel_abundances`
Traceback (most recent call last):
File "/Users/schluj05/workspace/phylo-umap/taxumap/taxumap/taxumap.py", line 112, in __init__
raise NameError
NameError
_name_value_error:ERROR
Please provide the constructor with one of the following: a filepath to your rel_abundances file via parameter `fpx`, or with an initialized variable for your rel_abundances dataframe via the parameter `rel_abundances`
Traceback (most recent call last):
File "/Users/schluj05/workspace/phylo-umap/taxumap/taxumap/taxumap.py", line 112, in __init__
raise NameError
NameError
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-305-d8e2ccc3394d> in <module>
----> 1 tumap = Taxumap(random_state=1, rel_abundances = rX)
~/workspace/phylo-umap/taxumap/taxumap/taxumap.py in __init__(self, agg_levels, weights, rel_abundances, taxonomy, fpt, fpx, name, random_state)
110 else:
111 #TODO currently only file paths or pd.DataFrame accepted. Passing a numpy array raises a NameError without explanation. Adding explanation now, but should be possible to pass a np.array
--> 112 raise NameError
113 except (ValueError, NameError) as e:
114 _name_value_error(e, "fpx", "rel_abundances")
NameError:
Tutorial example "adjusting_taxumap_parameters" does not work. An error occurs when parsing the input. The strings provided to the example data ("example_data/taxonomy.csv") appear not to be parsed correctly. The error arises from one of the nested input validations, which returns "None" for the taxonomy table.
AttributeError Traceback (most recent call last)
Input In [2], in <module>
1 tu = Taxumap(
2 taxonomy="./example_data/taxonomy.csv",
3 microbiota_data="./example_data/microbiota_table.csv",
4 agg_levels=["Phylum", "Family"],
5 weights=[1, 1],
6 )
----> 8 tu.transform_self()
10 tu.scatter()
File ~/workspace/testtumap/taxumap/taxumap/taxumap_base.py:115, in Taxumap.transform_self(self, scale, debug, distance_metric, **kwargs)
112 low_precision=np.float(kwargs["low_precision"])
114 # Shouldn't need `try...except` because any Taxumap object should have proper attributes
--> 115 Xagg = tax_agg(
116 self.rel_abundances,
117 self.taxonomy,
118 self.agg_levels,
119 distance_metric,
120 self.weights,
121 low_precision=low_precision
122 )
124 rs = np.random.RandomState(seed=self.random_state)
126 if self._is_transformed:
File ~/workspace/testtumap/taxumap/taxumap/tools.py:48, in tax_agg(rel_abundances, taxonomy, agg_levels, distance_metric, weights, low_precision)
46 for agg_level, weight in zip(agg_levels, weights):
47 warnings.warn("aggregating on %s" % agg_level)
---> 48 Xagg = aggregate_at_taxlevel(_X, taxonomy, agg_level)
49 Xagg = ssd.cdist(Xagg, Xagg, distance_metric)
50 Xagg = pd.DataFrame(Xagg, index=_X.index, columns=_X.index)
File ~/workspace/testtumap/taxumap/taxumap/tools.py:60, in aggregate_at_taxlevel(X, tax, level)
58 """Helper function. For a given taxonomic level, aggregate relative abundances by summing all members of corresponding taxon."""
59 _X_agg = X.copy()
---> 60 _X_agg.columns = [tax.loc[x][level] for x in _X_agg.columns]
61 _X_agg = _X_agg.groupby(_X_agg.columns, axis=1).sum()
62 try:
File ~/workspace/testtumap/taxumap/taxumap/tools.py:60, in <listcomp>(.0)
58 """Helper function. For a given taxonomic level, aggregate relative abundances by summing all members of corresponding taxon."""
59 _X_agg = X.copy()
---> 60 _X_agg.columns = [tax.loc[x][level] for x in _X_agg.columns]
61 _X_agg = _X_agg.groupby(_X_agg.columns, axis=1).sum()
62 try:
AttributeError: 'NoneType' object has no attribute 'loc'
I had to go into phylo-umap folder and run python3 setup.py install to install it.
the Taxumap object requires a "microbiome_data" but the description indicates "relative_abundances" are required. This should be made consistent
show more examples with different levels.
adjusting_taxumap_parameters.ipynb
UserWarning: using precomputed metric; transform will be unavailable for new data and inverse_transform will be unavailable for all data warn(save_embedding:WARNING
I suggest we remove the logger completely and replace with warnings. That is better than print statements because print statements can't easily be suppressed by the user. The current logging system as-is is not very helpful for the user. What do you think, @jsevo ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.