Giter Site home page Giter Site logo

taxumap's People

Contributors

granthussey avatar jsevo avatar nickp60 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

taxumap's Issues

Results in current directory

Results should be placed in phylo-umap/results folder, but that doesn't happen. All the logs and embedding.csv is in the working directory.

Explanation for flags in README

It would be good to have a more extensive explanation on how to use flags '-a' or '-w'. For example, how to use '-a' if there is only one taxonomic level you want to group at (and if this is possible). Similarly '-w' flag explanation doesn't state what are the limits for weights (0-10?); and if bigger numbers mean more weight to that taxonomic level. An example would be great.

Differences between inputting filenames and dataframes

Hello,

I've found that I was able to get TaxUMAP to work by specifying filenames to the Taxumap function, but providing them as dataframes didn't seem to work.

Here is a screenshot of what portions of the relative abundance and taxonomy dataframes look like
image

And here is the error I get when running transform_self:

Error in rule taxumap:
    jobid: 0
    output: analysis/taxumap/amadeus/output/amadeus_embedding.feather, analysis/taxumap/amadeus/output/amadeus_dominant_taxon.feather

RuleException:
KeyError in line 64 of /Users/funnellt/Projects/PICI_microbiome/workflow/rules/taxumap.smk:
'Methanobrevibacter_smithii'
  File "/Users/funnellt/Projects/PICI_microbiome/workflow/rules/taxumap.smk", line 64, in __rule_taxumap
  File "/Users/funnellt/Projects/phylo-umap/taxumap/taxumap_base.py", line 113, in transform_self
  File "/Users/funnellt/Projects/phylo-umap/taxumap/tools.py", line 42, in tax_agg
  File "/Users/funnellt/Projects/phylo-umap/taxumap/tools.py", line 55, in aggregate_at_taxlevel
  File "/Users/funnellt/Projects/phylo-umap/taxumap/tools.py", line 55, in <listcomp>

I'm running Taxumap like this:

        from taxumap.taxumap_base import Taxumap

        relab = pd.read_csv(input['relab'])
        tax = pd.read_csv(input['tax'])

        taxumap = Taxumap(taxonomy=tax, microbiota_data=relab)
        taxumap.transform_self(
            neigh=28,
            min_dist = 0
        )

However, it works fine if I just specify the filenames like this:

        taxumap = Taxumap(taxonomy=input['tax'], microbiota_data=input['relab'])
        taxumap.transform_self(
            neigh=28,
            min_dist = 0
        )

excessive error messaging when parsing incorrect inputs

There is some overengineeering going on wrt to the type checking of inputs.

from taxumap import Taxumap
import numpy
X = numpy.random.standard_normal(shape=(500,5))
tumap = Taxumap(rel_abundances  = X)

This passes the data as numpy array, which we currently do not accept. It raises repeated Errors and logs them, ending on an empty NameError. Let's aim to make that simple type checking a little less overly complicated.

tumap = Taxumap(rel_abundances = X)
_name_value_error:ERROR
Please provide the constructor with one of the following: a filepath to your rel_abundances file via parameter `fpx`, or with an initialized variable for your rel_abundances dataframe via the parameter `rel_abundances`
Traceback (most recent call last):
  File "/Users/schluj05/workspace/phylo-umap/taxumap/taxumap/taxumap.py", line 112, in __init__
    raise NameError
NameError
_name_value_error:ERROR
Please provide the constructor with one of the following: a filepath to your rel_abundances file via parameter `fpx`, or with an initialized variable for your rel_abundances dataframe via the parameter `rel_abundances`
Traceback (most recent call last):
  File "/Users/schluj05/workspace/phylo-umap/taxumap/taxumap/taxumap.py", line 112, in __init__
    raise NameError
NameError
_name_value_error:ERROR
Please provide the constructor with one of the following: a filepath to your rel_abundances file via parameter `fpx`, or with an initialized variable for your rel_abundances dataframe via the parameter `rel_abundances`
Traceback (most recent call last):
  File "/Users/schluj05/workspace/phylo-umap/taxumap/taxumap/taxumap.py", line 112, in __init__
    raise NameError
NameError
_name_value_error:ERROR
Please provide the constructor with one of the following: a filepath to your rel_abundances file via parameter `fpx`, or with an initialized variable for your rel_abundances dataframe via the parameter `rel_abundances`
Traceback (most recent call last):
  File "/Users/schluj05/workspace/phylo-umap/taxumap/taxumap/taxumap.py", line 112, in __init__
    raise NameError
NameError
_name_value_error:ERROR
Please provide the constructor with one of the following: a filepath to your rel_abundances file via parameter `fpx`, or with an initialized variable for your rel_abundances dataframe via the parameter `rel_abundances`
Traceback (most recent call last):
  File "/Users/schluj05/workspace/phylo-umap/taxumap/taxumap/taxumap.py", line 112, in __init__
    raise NameError
NameError
_name_value_error:ERROR
Please provide the constructor with one of the following: a filepath to your rel_abundances file via parameter `fpx`, or with an initialized variable for your rel_abundances dataframe via the parameter `rel_abundances`
Traceback (most recent call last):
  File "/Users/schluj05/workspace/phylo-umap/taxumap/taxumap/taxumap.py", line 112, in __init__
    raise NameError
NameError
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-305-d8e2ccc3394d> in <module>
----> 1 tumap = Taxumap(random_state=1, rel_abundances = rX)

~/workspace/phylo-umap/taxumap/taxumap/taxumap.py in __init__(self, agg_levels, weights, rel_abundances, taxonomy, fpt, fpx, name, random_state)
    110             else:
    111                 #TODO currently only file paths or pd.DataFrame accepted. Passing a numpy array raises a NameError without explanation. Adding explanation now, but should be possible to pass a np.array
--> 112                 raise NameError
    113         except (ValueError, NameError) as e:
    114             _name_value_error(e, "fpx", "rel_abundances")

NameError: 

Examples not working

Tutorial example "adjusting_taxumap_parameters" does not work. An error occurs when parsing the input. The strings provided to the example data ("example_data/taxonomy.csv") appear not to be parsed correctly. The error arises from one of the nested input validations, which returns "None" for the taxonomy table.

AttributeError                            Traceback (most recent call last)
Input In [2], in <module>
      1 tu = Taxumap(
      2     taxonomy="./example_data/taxonomy.csv",
      3     microbiota_data="./example_data/microbiota_table.csv",
      4     agg_levels=["Phylum", "Family"],
      5     weights=[1, 1],
      6 )
----> 8 tu.transform_self()
     10 tu.scatter()

File ~/workspace/testtumap/taxumap/taxumap/taxumap_base.py:115, in Taxumap.transform_self(self, scale, debug, distance_metric, **kwargs)
    112     low_precision=np.float(kwargs["low_precision"])
    114 # Shouldn't need `try...except` because any Taxumap object should have proper attributes
--> 115 Xagg = tax_agg(
    116     self.rel_abundances,
    117     self.taxonomy,
    118     self.agg_levels,
    119     distance_metric,
    120     self.weights,
    121     low_precision=low_precision
    122 )
    124 rs = np.random.RandomState(seed=self.random_state)
    126 if self._is_transformed:

File ~/workspace/testtumap/taxumap/taxumap/tools.py:48, in tax_agg(rel_abundances, taxonomy, agg_levels, distance_metric, weights, low_precision)
     46 for agg_level, weight in zip(agg_levels, weights):
     47     warnings.warn("aggregating on %s" % agg_level)
---> 48     Xagg = aggregate_at_taxlevel(_X, taxonomy, agg_level)
     49     Xagg = ssd.cdist(Xagg, Xagg, distance_metric)
     50     Xagg = pd.DataFrame(Xagg, index=_X.index, columns=_X.index)

File ~/workspace/testtumap/taxumap/taxumap/tools.py:60, in aggregate_at_taxlevel(X, tax, level)
     58 """Helper function. For a given taxonomic level, aggregate relative abundances by summing all members of corresponding taxon."""
     59 _X_agg = X.copy()
---> 60 _X_agg.columns = [tax.loc[x][level] for x in _X_agg.columns]
     61 _X_agg = _X_agg.groupby(_X_agg.columns, axis=1).sum()
     62 try:

File ~/workspace/testtumap/taxumap/taxumap/tools.py:60, in <listcomp>(.0)
     58 """Helper function. For a given taxonomic level, aggregate relative abundances by summing all members of corresponding taxon."""
     59 _X_agg = X.copy()
---> 60 _X_agg.columns = [tax.loc[x][level] for x in _X_agg.columns]
     61 _X_agg = _X_agg.groupby(_X_agg.columns, axis=1).sum()
     62 try:

AttributeError: 'NoneType' object has no attribute 'loc'

Remove logging

I suggest we remove the logger completely and replace with warnings. That is better than print statements because print statements can't easily be suppressed by the user. The current logging system as-is is not very helpful for the user. What do you think, @jsevo ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.