jsevo / taxumap Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 5.0 5.81 MB

License: MIT License

Python 100.00%

taxumap's People

Contributors

Stargazers

Watchers

Forkers

ayeaton marcelladane granthussey nickp60

taxumap's Issues

Requirement for the microbiome table to be named microbiota_table.csv

If you try to run command by providing directory and file with any other name under -m flag it gives an error.

Parsing of various taxonomic formats

are taxonomy tables filled correctly

write some basic tests

Results in current directory

Results should be placed in phylo-umap/results folder, but that doesn't happen. All the logs and embedding.csv is in the working directory.

Taxumap.scatter() throws an error

Likely because the embedding is now stored as np.array ( .embedding) and dataframe with data like index ( .df_embedding).

notebooks directory unneeded

Do we need the notebook directory @jsevo ? The one with the R analyses.

Explanation for flags in README

It would be good to have a more extensive explanation on how to use flags '-a' or '-w'. For example, how to use '-a' if there is only one taxonomic level you want to group at (and if this is possible). Similarly '-w' flag explanation doesn't state what are the limits for weights (0-10?); and if bigger numbers mean more weight to that taxonomic level. An example would be great.

TODO: optimize distance calculations at different tax levels

When using cityblock, we won't have to do what's happening here:

taxumap/taxumap/tools.py

Line 20 in 61165d1

def tax_agg(rel_abundances, taxonomy, agg_levels, distance_metric, weights):

The distances can be calculated at ASV level and then the columns of distance matrices can be added based on taxonomy. This distance works, maybe others too.

Differences between inputting filenames and dataframes

Hello,

I've found that I was able to get TaxUMAP to work by specifying filenames to the Taxumap function, but providing them as dataframes didn't seem to work.

Here is a screenshot of what portions of the relative abundance and taxonomy dataframes look like

And here is the error I get when running transform_self:

Error in rule taxumap:
    jobid: 0
    output: analysis/taxumap/amadeus/output/amadeus_embedding.feather, analysis/taxumap/amadeus/output/amadeus_dominant_taxon.feather

RuleException:
KeyError in line 64 of /Users/funnellt/Projects/PICI_microbiome/workflow/rules/taxumap.smk:
'Methanobrevibacter_smithii'
  File "/Users/funnellt/Projects/PICI_microbiome/workflow/rules/taxumap.smk", line 64, in __rule_taxumap
  File "/Users/funnellt/Projects/phylo-umap/taxumap/taxumap_base.py", line 113, in transform_self
  File "/Users/funnellt/Projects/phylo-umap/taxumap/tools.py", line 42, in tax_agg
  File "/Users/funnellt/Projects/phylo-umap/taxumap/tools.py", line 55, in aggregate_at_taxlevel
  File "/Users/funnellt/Projects/phylo-umap/taxumap/tools.py", line 55, in <listcomp>

I'm running Taxumap like this:

        from taxumap.taxumap_base import Taxumap

        relab = pd.read_csv(input['relab'])
        tax = pd.read_csv(input['tax'])

        taxumap = Taxumap(taxonomy=tax, microbiota_data=relab)
        taxumap.transform_self(
            neigh=28,
            min_dist = 0
        )

However, it works fine if I just specify the filenames like this:

        taxumap = Taxumap(taxonomy=input['tax'], microbiota_data=input['relab'])
        taxumap.transform_self(
            neigh=28,
            min_dist = 0
        )

excessive error messaging when parsing incorrect inputs

There is some overengineeering going on wrt to the type checking of inputs.

from taxumap import Taxumap
import numpy
X = numpy.random.standard_normal(shape=(500,5))
tumap = Taxumap(rel_abundances  = X)

This passes the data as numpy array, which we currently do not accept. It raises repeated Errors and logs them, ending on an empty NameError. Let's aim to make that simple type checking a little less overly complicated.

tumap = Taxumap(rel_abundances = X)
_name_value_error:ERROR
Please provide the constructor with one of the following: a filepath to your rel_abundances file via parameter `fpx`, or with an initialized variable for your rel_abundances dataframe via the parameter `rel_abundances`
Traceback (most recent call last):
  File "/Users/schluj05/workspace/phylo-umap/taxumap/taxumap/taxumap.py", line 112, in __init__
    raise NameError
NameError
_name_value_error:ERROR
Please provide the constructor with one of the following: a filepath to your rel_abundances file via parameter `fpx`, or with an initialized variable for your rel_abundances dataframe via the parameter `rel_abundances`
Traceback (most recent call last):
  File "/Users/schluj05/workspace/phylo-umap/taxumap/taxumap/taxumap.py", line 112, in __init__
    raise NameError
NameError
_name_value_error:ERROR
Please provide the constructor with one of the following: a filepath to your rel_abundances file via parameter `fpx`, or with an initialized variable for your rel_abundances dataframe via the parameter `rel_abundances`
Traceback (most recent call last):
  File "/Users/schluj05/workspace/phylo-umap/taxumap/taxumap/taxumap.py", line 112, in __init__
    raise NameError
NameError
_name_value_error:ERROR
Please provide the constructor with one of the following: a filepath to your rel_abundances file via parameter `fpx`, or with an initialized variable for your rel_abundances dataframe via the parameter `rel_abundances`
Traceback (most recent call last):
  File "/Users/schluj05/workspace/phylo-umap/taxumap/taxumap/taxumap.py", line 112, in __init__
    raise NameError
NameError
_name_value_error:ERROR
Please provide the constructor with one of the following: a filepath to your rel_abundances file via parameter `fpx`, or with an initialized variable for your rel_abundances dataframe via the parameter `rel_abundances`
Traceback (most recent call last):
  File "/Users/schluj05/workspace/phylo-umap/taxumap/taxumap/taxumap.py", line 112, in __init__
    raise NameError
NameError
_name_value_error:ERROR
Please provide the constructor with one of the following: a filepath to your rel_abundances file via parameter `fpx`, or with an initialized variable for your rel_abundances dataframe via the parameter `rel_abundances`
Traceback (most recent call last):
  File "/Users/schluj05/workspace/phylo-umap/taxumap/taxumap/taxumap.py", line 112, in __init__
    raise NameError
NameError
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-305-d8e2ccc3394d> in <module>
----> 1 tumap = Taxumap(random_state=1, rel_abundances = rX)

~/workspace/phylo-umap/taxumap/taxumap/taxumap.py in __init__(self, agg_levels, weights, rel_abundances, taxonomy, fpt, fpx, name, random_state)
    110             else:
    111                 #TODO currently only file paths or pd.DataFrame accepted. Passing a numpy array raises a NameError without explanation. Adding explanation now, but should be possible to pass a np.array
--> 112                 raise NameError
    113         except (ValueError, NameError) as e:
    114             _name_value_error(e, "fpx", "rel_abundances")

NameError:

table requirements: which column needs to be called ASV? should it be ASV/OTU or something generic?

Examples not working

Tutorial example "adjusting_taxumap_parameters" does not work. An error occurs when parsing the input. The strings provided to the example data ("example_data/taxonomy.csv") appear not to be parsed correctly. The error arises from one of the nested input validations, which returns "None" for the taxonomy table.

AttributeError                            Traceback (most recent call last)
Input In [2], in <module>
      1 tu = Taxumap(
      2     taxonomy="./example_data/taxonomy.csv",
      3     microbiota_data="./example_data/microbiota_table.csv",
      4     agg_levels=["Phylum", "Family"],
      5     weights=[1, 1],
      6 )
----> 8 tu.transform_self()
     10 tu.scatter()

File ~/workspace/testtumap/taxumap/taxumap/taxumap_base.py:115, in Taxumap.transform_self(self, scale, debug, distance_metric, **kwargs)
    112     low_precision=np.float(kwargs["low_precision"])
    114 # Shouldn't need `try...except` because any Taxumap object should have proper attributes
--> 115 Xagg = tax_agg(
    116     self.rel_abundances,
    117     self.taxonomy,
    118     self.agg_levels,
    119     distance_metric,
    120     self.weights,
    121     low_precision=low_precision
    122 )
    124 rs = np.random.RandomState(seed=self.random_state)
    126 if self._is_transformed:

File ~/workspace/testtumap/taxumap/taxumap/tools.py:48, in tax_agg(rel_abundances, taxonomy, agg_levels, distance_metric, weights, low_precision)
     46 for agg_level, weight in zip(agg_levels, weights):
     47     warnings.warn("aggregating on %s" % agg_level)
---> 48     Xagg = aggregate_at_taxlevel(_X, taxonomy, agg_level)
     49     Xagg = ssd.cdist(Xagg, Xagg, distance_metric)
     50     Xagg = pd.DataFrame(Xagg, index=_X.index, columns=_X.index)

File ~/workspace/testtumap/taxumap/taxumap/tools.py:60, in aggregate_at_taxlevel(X, tax, level)
     58 """Helper function. For a given taxonomic level, aggregate relative abundances by summing all members of corresponding taxon."""
     59 _X_agg = X.copy()
---> 60 _X_agg.columns = [tax.loc[x][level] for x in _X_agg.columns]
     61 _X_agg = _X_agg.groupby(_X_agg.columns, axis=1).sum()
     62 try:

File ~/workspace/testtumap/taxumap/taxumap/tools.py:60, in <listcomp>(.0)
     58 """Helper function. For a given taxonomic level, aggregate relative abundances by summing all members of corresponding taxon."""
     59 _X_agg = X.copy()
---> 60 _X_agg.columns = [tax.loc[x][level] for x in _X_agg.columns]
     61 _X_agg = _X_agg.groupby(_X_agg.columns, axis=1).sum()
     62 try:

AttributeError: 'NoneType' object has no attribute 'loc'

Installation doesn't work with pip3

I had to go into phylo-umap folder and run python3 setup.py install to install it.

Fill taxonomy table function not working

Attempting to fill:

Yields:

Re-writing function now.

mismatch between api and documentation

the Taxumap object requires a "microbiome_data" but the description indicates "relative_abundances" are required. This should be made consistent

explain that taxumap does not need to always have two levels

show more examples with different levels.

adjusting_taxumap_parameters.ipynb

command line warning from umap (precomputed distance matrix) should be suppressed

UserWarning: using precomputed metric; transform will be unavailable for new data and inverse_transform will be unavailable for all data warn(save_embedding:WARNING

Remove logging

I suggest we remove the logger completely and replace with warnings. That is better than print statements because print statements can't easily be suppressed by the user. The current logging system as-is is not very helpful for the user. What do you think, @jsevo ?