Giter Site home page Giter Site logo

songbird's Introduction

Build Status

Songbird

What does Songbird produce?

The primary output from Songbird is a file containing differentials. These describe the log-fold change of features (microbes, metabolites, ...) with respect to certain field(s) in your sample metadata. The most important aspect of these differentials are rankings, which are obtained by sorting a column of differentials from lowest to highest. These rankings give information on the relative associations of features with a given covariate.

For more details, please see the paper introducing Songbird.

What do I need to run Songbird?

Songbird has a lot of parameters you can specify, but the three required parameters are:

  1. A BIOM table containing feature counts for the samples in your dataset
  2. A sample metadata file containing the covariates you're interested in studying (this should be a tab-separated file)
  3. A "formula" specifying the covariates to be included in the model Songbird produces, and their interactions (see this section for details)

How do I run Songbird?

You can run Songbird as a standalone tool from the command-line or as a QIIME 2 plugin. Either works!

The "Red Sea" dataset

This README's tutorials use a subset of the Red Sea metagenome dataset as an example dataset. This data is stored in the data/redsea/ directory within this repository. Throughout the rest of this README, we'll just refer to this dataset as "the Red Sea dataset", "Red Sea", or something like that.

Features in this dataset are KEGG orthologs (KOs) -- see the dataset's paper, in particular this section, for more details.

What will this README cover?

This README is divided into a few main sections:

  1. Using Songbird "standalone"
  2. Using Songbird through QIIME 2
  3. Specifying a formula
  4. Interpreting model fitting information
  5. Adjusting parameters to get reasonable fitting
  6. Validating by comparing to null/baseline models
  7. FAQs
  8. Visualizing Songbird's differentials
  9. More information

1. Using Songbird "standalone"

Installation

First off, make sure that conda is installed.

Songbird can be installed from the conda-forge channel as follows:

conda create -n songbird_env songbird "pandas>=0.18.0,<1" -c conda-forge
source activate songbird_env
Sidenote

If you get an error about "redirecting" from that first command, you probably need to add a \ (a backslash character) before the > and < characters. This can be a problem if you're using fish as your shell.

Running Songbird

Let's try running Songbird on the "Red Sea" example data included with this repository. You can just copy-and-paste the command below into your terminal, assuming you've activated your songbird_env.

songbird multinomial \
	--input-biom data/redsea/redsea.biom \
	--metadata-file data/redsea/redsea_metadata.txt \
	--formula "Depth+Temperature+Salinity+Oxygen+Fluorescence+Nitrate" \
	--epochs 10000 \
	--differential-prior 0.5 \
	--training-column Testing \
	--summary-interval 1 \
	--summary-dir results

The output differentials will be stored in results/differentials.tsv.

REQUIRED: Checking the fit of your model

You need to diagnose the fit of the model Songbird creates (e.g. to make sure it isn't "overfitting" to the data). When running Songbird standalone, you can do this using Tensorboard:

tensorboard --logdir .

When you open up Tensorboard in a web browser, it will show plots of the cross validation score and loss. See this section on interpreting model fitting for details on how to understand these plots, and see the section of the FAQs on running Songbird standalone for details on how to use Tensorboard.

Once you have a good fit for your model, you need to run a null model for comparison to make sure that your model is more predictive than the null model. See this section on how to run and compare a null model.

A note about metadata and running Songbird standalone

If you have any "comment rows" in your metadata -- for example, a #q2:types row -- you will need to delete these before running Songbird standalone, since otherwise standalone Songbird will interpret these rows as actual samples. (We're planning on addressing this in the future.)

2. Using Songbird through QIIME 2

Installation

First, you'll need to make sure that QIIME 2 is in between (version 2019.7 and 2020.6) is installed before installing Songbird. In your QIIME 2 conda environment, you can install Songbird by running:

pip install songbird

Next, run the following command to get QIIME 2 to "recognize" Songbird:

qiime dev refresh-cache

Importing data into QIIME 2

If your data is not already in QIIME2 format, you can import your BIOM tables into QIIME 2 "Artifacts". Starting from the Songbird root folder, let's import the Red Sea example data into QIIME 2 by running:

qiime tools import \
	--input-path data/redsea/redsea.biom \
	--output-path redsea.biom.qza \
	--type FeatureTable[Frequency]

Running Songbird

With your QIIME 2-formatted feature table, you can run Songbird through QIIME 2 as follows:

qiime songbird multinomial \
	--i-table redsea.biom.qza \
	--m-metadata-file data/redsea/redsea_metadata.txt \
	--p-formula "Depth+Temperature+Salinity+Oxygen+Fluorescence+Nitrate" \
	--p-epochs 10000 \
	--p-differential-prior 0.5 \
	--p-training-column Testing \
	--p-summary-interval 1 \
	--o-differentials differentials.qza \
	--o-regression-stats regression-stats.qza \
	--o-regression-biplot regression-biplot.qza

You can add the --verbose option to see a progress bar while this is running.

REQUIRED: Checking the fit of your model

You need to diagnose the fit of the model Songbird creates (e.g. to make sure it isn't "overfitting" to the data). When you run Songbird through QIIME 2, it will generate a regression-stats.qza Artifact. This can be visualized by running:

qiime songbird summarize-single \
	--i-regression-stats regression-stats.qza \
	--o-visualization regression-summary.qzv

The resulting visualization (viewable using qiime tools view or at view.qiime2.org) contains two plots. These plots are analogous to the two plots shown in Tensorboard's interface (the top plot shows cross-validation results, and the bottom plot shows loss information). The interpretation of these plots is the same as with the Tensorboard plots: see this section on interpreting model fitting for details on how to understand these plots.

Once you have a good fit for your model, you need to run a null model for comparison to make sure that your model is more predictive than the null model. See this section on how to run and compare a null model.

3. Specifying a formula

Hang on, what is a formula?

A formula specifies the statistical model to be built based on the columns in the metadata file. For example, if a user wanted to build a statistical model testing for differences between disease states while controlling for sex, the formula would look something like:

--formula "diseased+sex"

where diseased and sex are columns in the sample metadata file. This is similar to the statistical formulas used in R, but the order in which you specify the variables (a.k.a. column names) is not important. The backend we use here for processing formulas is called patsy.

The metadata columns used in the --formula can be either numeric or categorical. As you can imagine, there are many possible ways to specify metadata columns in your formula -- in particular, encoding categorical variables can get tricky. (For example, should a nominal variable like sex be encoded the same way as an ordinal variable like disease_progression? Probably not!) In this section we'll go over a few common cases, and link to some more general documentation at the end in case these examples aren't sufficient.

The implicit "reference": how categorical variables are handled

Let's say your formula just includes one categorical variable:

--formula "healthy_or_sick"

...where the only possible values of healthy_or_sick are healthy and sick. The output differentials produced by Songbird will only have two columns:

  1. Intercept
  2. healthy_or_sick[T.healthy] OR healthy_or_sick[T.sick].

The second differential column indicates association with the given value: healthy_or_sick[T.healthy] differentials indicate association with healthy-valued samples using sick-valued samples as a reference, and healthy_or_sick[T.sick] differentials indicate association with sick-valued samples using healthy-valued samples as a reference. You'll only get one of these columns; the choice of reference value, if left unspecified, is arbitrary.

The reference value is used as the denominator in the log-fold change computation of differentials. So, for healthy_or_sick[T.healthy] -- a differential column where sick-valued samples are implicitly set as the reference -- the features with the most negative differential ranking values will be more associated with sick-valued samples, whereas the features with the most positive differential ranking values will be more associated with healthy-valued samples.

It is possible to explicitly set the reference value in your formula. Going back to the healthy_or_sick example, if you know you want to use healthy-valued samples as a reference, you can describe this in the formula like so:

--formula "C(healthy_or_sick, Treatment('healthy'))"

This will ensure that your second column of differentials is C(healthy_or_sick, Treatment('healthy'))[T.sick] -- that is, the association with sick-valued samples using healthy-valued samples as a reference.

If you have more than two categories in a given column (e.g. healthy, disease A, or disease B) then Songbird will return K-1 columns of differentials (where K is the number of categories in a given column) with each category compared to the reference. In this example it would return two columns; one with disease A compared to healthy and one with disease B compared to healthy.

Do you have some more simple examples of using formulas?

Sure! In general, it's a good idea to read over the patsy documentation -- there are a lot of ways you can specify formulas. However, here are a few small examples of common formula uses.

Example 1: I have a categorical metadata field, and I want to explicitly set the reference

This was described above. Basically, you'll want to specify a formula of something like

--formula "C(your_metadata_field, Treatment('desired_reference_value'))"

See this blog post for more details.

Example 2: I have a categorical metadata field with ordered values, and I want to account for that

See this blog post for details on how to do this.

Basically, you'll want to specify a formula of something like

--formula "C(your_metadata_field, Diff, levels=['first_level', 'second_level', 'third_level'])"

How many variables can be passed into a formula?

That depends on the number of samples you have -- the rule of thumb is to only have about 10% of your samples. So if you have 100 samples, you should not have a formula with more than 10 variables. This measure needs to be used with caution, since the number of categories will also impact this. A categorical variable with k categories counts as k-1 variables, so a column with 3 categories will be represented as 2 variables in the model. Continuous (or numerical) variables will only count as 1 variable. Beware of overfitting, though! You can mitigate the risk of overfitting by adjusting the --differential-prior parameter. For more information on --differential-prior and other Songbird parameters, please see this section of the FAQs on parameters.

Further documentation on patsy formulas

In case the above examples didn't cut it, here are some links to external documentation that may be helpful, sorted very roughly into "less" and "more" technical categories:

A bit less technical
A bit more technical

If you have any other resources you'd like to see added here, feel free to send a PR/issue.

Hey, isn't it "formulae" instead of "formulas"?

That's debatable!

4. Interpreting model fitting information

Regardless of whether you run Songbird standalone or through QIIME 2, you'll need to check up on how Songbird's model has fit to your dataset.

If you ran Songbird standalone, you can open up Tensorboard in your results directory (or the directory above your results directory) to see something like this:

Tensorboard shown for the standalone Songbird run on the Red Sea data (You may need to wait for a while for Tensorboard to fully load everything. Refreshing the page seems to help.)

And if you ran Songbird through QIIME 2, you can open up your regression-summary.qzv using qiime tools view or using view.qiime2.org to see something like this:

Summary of the QIIME 2 Songbird run on the Red Sea data (Note that this screenshot has been zoomed out a lot in order to show the full display.)

Hey! I don't see anything in my plots, what's up with that?

If you don't see anything in these plots, or if the plots only show a small handful of points, you probably need to decrease your --summary-interval/--p-summary-interval parameter. This parameter (specified in seconds) impacts how often a "measurement" is taken for these plots. By default, it's set to 10 seconds -- but if your Songbird runs are finishing up in only a few seconds, that isn't very helpful!

Try setting the --summary-interval/--p-summary-interval to 1 to record the loss at every second. This should give you more detail in these plots. (If what you want to do is make Songbird run longer, i.e. over more iterations, then you can increase the --epochs/--p-epochs parameter -- more on this later.)

Also, as mentioned above: if you're using Tensorboard, you may also need to refresh the graph a few times to get stuff to show up.

4.1. Explaining Graph 1: cv_error or Cross validation score

This is a graph of the prediction accuracy of the model; the model will try to guess the count values for the training samples that were set aside in the script above, using only the metadata categories it has. Then it looks at the real values and sees how close it was.

The x-axis is the number of iterations (meaning times the model is training across the entire dataset). Every time you iterate across the training samples, you also run the test samples and the averaged results are being plotted on the y-axis.

The number of iterations is influenced by a number of parameters, most notably by the --epochs/--p-epochs parameter. All else being equal, the greater this number is the more iterations the model will go through.

The y-axis is the average number of counts off for each feature. The model is predicting the sequence counts for each feature in the samples that were set aside for testing. So in the Cross Validation graphs shown above it means that, on average, the model is off by around 5 to 10 counts, which is low. However, this is ABSOLUTE error -- not relative error (unfortunately we can’t do relative errors because of the sparsity of metagenomic datasets).

How can I tell if this graph "looks good"?

The raw numbers will be variable, but the most important thing is the shape of the graph. You want to see exponential decay and a stable plateau. The cross validation graphs shown in the Tensorboard/QIIME 2 summaries above are good examples. Importantly, you need to compare your model to a null model to ensure that the covariates entered into the formula are improving the model fit. More details on this can be found in section 6: Validating by comparing to null/baseline models.

4.2. Explaining Graph 2: loss

This graph is labelled "loss" because "loss" is the function being optimized. The goal here is to reduce the error of the training samples.

This graph represents how well the model fits your data. Just like the prediction accuracy graph, the x-axis is the number of iterations (meaning times the model is training across the entire dataset).

The y-axis is MINUS log probability of the model actually fitting: so LOWER is better (maximizing the probability = minimizing the negative log probability).

How can I tell if this graph "looks good"?

Again, the numbers vary greatly by dataset, but you want to see the curve decaying and plateau as close to zero as possible (the loss graphs shown in the Tensorboard/QIIME 2 summaries above are nice).

5. Adjusting parameters to get reasonable fitting

An introductory note

It's worth noting that, ignoring stuff like --output-dir, the only required parameters to Songbird are a feature table, metadata, and a formula.

In the examples using the Red Sea dataset above, the reason we changed the --epochs and --differential-prior to different values from the default was due to consulting Tensorboard to optimize parameter fit.

Okay, so how should I adjust parameters to get my model to fit properly?

It's recommended to start with a small formula (with only a few variables in the model) and increase from there, because it makes debugging easier. (See this section on specifying formulas for more information.) If your graphs are going down but not exponentially and not plateauing, you should consider increasing the number of iterations by, for example, increasing --epochs/--p-epochs.

If your graphs are going down but then going back up, this suggests overfitting; try reducing the number of variables in your formula, or reducing --differential-prior/--p-differential-prior. As a rule of thumb, you should try to keep the number of metadata categories less than 10% the number of samples (e.g. for 100 samples, no more than 10 metadata categories).

If you have a lot of samples, you may want to try increasing the --num-random-test-examples/--p-num-random-test-examples and/or --batch-size/--p-batch-size parameters. This parameter specifices how many samples will be held out of model Training and used for model Testing. As a rule of thumb, this should be ~10-20% of your samples.

If your sample categories are not evenly distributed (i.e. you have 100 healthy samples but only 20 sick samples), you can specify which samples will be used for training versus testing. By default, the train/test samples are randomly assigned. You can add a column to your metadata file that marks each sample as either Train or Test (these values are case sensitive!) and specifying this column in --p-training-column. Make sure that a relatively even number of each sample category are in the Train and Test groups (i.e. so you are not training only on healthy samples and testing on sick). More information on specifying the training columns can be found here.

Is there anything else I can do?

There are other parameters besides the ones mentioned above that you can tweak in Songbird in order to adjust your model's fitting. Depending on if you're using Songbird standalone or through QIIME 2, you can run songbird multinomial --help or qiime songbird multinomial --help to get a list of all available parameters, respectively.

6. Validating by comparing to null/baseline models

6.1. Generating a null model

Now that we have generated a model and understand how to interpret the diagnostic plots, it is important to compare this model to a model made without any metadata input (a "null model"). In other words, this will allow us to see how strongly the covariates in the formula can be associated with the features of the model, as compared to random chance. We can do this by simply supplying --formula/--p-formula with "1" as shown below.

Please note that the standalone and QIIME 2 versions of Songbird will get you different things. Both show you a plot with multiple model fitting curves, but:

  1. The standalone version of Songbird has support through Tensorboard for comparing model fitting (see the screenshot below).

  2. The QIIME 2 version of Songbird uses a much simpler interface. It also includes a Q2 value.

We recommend trying this out with the QIIME 2 version at first.

6.1.1. Null models and "standalone" Songbird

If you're using Songbird standalone, Tensorboard makes it particularly easy to compare different parameters: if you simply change a parameter and run Songbird again (under a different output file name) that graph will pop up on top of the first graphs in Tensorboard! Let's generate a null model as follows:

songbird multinomial \
	--input-biom data/redsea/redsea.biom \
	--metadata-file data/redsea/redsea_metadata.txt \
	--formula "1" \
	--epochs 10000 \
	--differential-prior 0.5 \
	--training-column Testing \
	--summary-interval 1 \
	--summary-dir results

Notice how this is the same command that we used before (e.g. we have the same --differential-prior and --epochs), with the exception that the --formula is now just "1".

Let's open up Tensorboard. You can click the graphs on and off in the lower left hand panel, and read just the axis for a given graph (or set of graphs) by clicking the blue expansion rectangle underneath the graph. Within Tensorboard, you can also view a summary of final results under the HPARAMS tab -- that will look something like this:

HParams for the Red Sea Data

Above, the model with the lowest cross-validation error is highlighted in green. You can see how its performance compares to other models that were fit on this dataset with various parameters. Additionally, the model's diagnostic plots are shown, which will have an exponential decay and a stable plateau if the model converged.

6.1.2. Null models and QIIME 2 + Songbird

If you're running Songbird through QIIME 2, the qiime songbird summarize-paired command allows you to view two sets of diagnostic plots at once as follows:

# Generate a null model
qiime songbird multinomial \
	--i-table redsea.biom.qza \
	--m-metadata-file data/redsea/redsea_metadata.txt \
	--p-formula "1" \
	--p-epochs 10000 \
	--p-differential-prior 0.5 \
	--p-training-column Testing \
	--p-summary-interval 1 \
	--o-differentials null-diff.qza \
	--o-regression-stats null-stats.qza \
	--o-regression-biplot null-biplot.qza

# Visualize the first model's regression stats *and* the null model's
# regression stats
qiime songbird summarize-paired \
	--i-regression-stats regression-stats.qza \
	--i-baseline-stats null-stats.qza \
	--o-visualization paired-summary.qzv

The summary generated will look something like as follows:

Summary of the QIIME 2 Songbird run on the Red Sea data

This visualization isn't quite as fancy as the Tensorboard one we showed above, but it does include a Q2 value.

6.1.2.1. Sidenote: Interpreting Q2 values

The Q2 score is adapted from the Partial least squares literature. Here it is given by Q^2 = 1 - m1/m2, where m1 indicates the average absolute model error and m2 indicates the average absolute null or baseline model error. If Q2 is close to 1, that indicates a high predictive accuracy on the cross validation samples. If Q2 is low or below zero, that indicates poor predictive accuracy, suggesting possible overfitting. This statistic behaves similarly to the R2 classically used in a ordinary linear regression if --p-formula is "1" in the m2 model.

If the Q2 score is extremely close to 0 (or negative), this indicates that the model is overfit or that the metadata supplied to the model are not predictive of microbial composition across samples. You can think about this in terms of "how does using the metadata columns in my formula improve a model?" If there isn't really an improvement, then you may want to reconsider your formula.

... But as long as your Q2 score is above zero, your model is learning something useful.

6.2. Generating baseline models

(For the sake of simplicity, this subsection just uses Songbird through QIIME 2.)

In addition to comparing your model to a null model, it may also be useful to compare it to 'baseline' models, which contain only a subset of the formula variables. The Q2 score will allow you to determine how much a given variable(s) improves the model, and therefore how important it is for determining differntially abundant microbes.

For instance, in the Red Sea dataset we can fit a model on just Depth and compare it to our original model to gauge how informative the other variables are (i.e. Temperature, Salinity, Oxygen, Fluorescence, and Nitrate). The Q2 value we see in the paired-summary.qzv is the predictive accuracy estimated from the variables left out of the regression fit.

# Generate a baseline model
qiime songbird multinomial \
	--i-table redsea.biom.qza \
	--m-metadata-file data/redsea/redsea_metadata.txt \
	--p-formula "Depth" \
	--p-epochs 10000 \
	--p-differential-prior 0.5 \
	--p-training-column Testing \
	--p-summary-interval 1 \
	--o-differentials baseline-diff.qza \
	--o-regression-stats baseline-stats.qza \
	--o-regression-biplot baseline-biplot.qza

# Visualize the first model's regression stats *and* the baseline model's
# regression stats
qiime songbird summarize-paired \
	--i-regression-stats regression-stats.qza \
	--i-baseline-stats baseline-stats.qza \
	--o-visualization paired-summary-2.qzv

This plot provides evidence for how much the model fit improved by adding Temperature, Salinity, Oxygen, Fluorescence, and Nitrate. A positive Q2 score indicates an improvement over the baseline model.

TL;DR

Basically, you'll want to futz around with the parameters until you see two nice exponential decay graphs, where your model provides an improvement in the cross-validation graph compared to the null model. Once you have that, you can start to look at the differentials Songbird produced!

7. FAQs

7.1. FAQs: Running Songbird standalone

Q. What am I looking at in the output directory?

A. There are 3 major types of files to note:

  1. differentials.tsv: This contains the ranks of the features for certain metadata categories. The higher the rank, the more associated it is with that category. The lower the rank, the more negatively associated it is with a category. The recommended way to view these files is to sort the features within a given column in this file and investigate the top/bottom features with the highest/lowest ranks. (Qurro makes this sort of analysis easier.)

    The first column in differentials.tsv contains the IDs of your features (if you plugged in a QIIME 2 feature table and your "features" are ASVs/sOTUs/..., then you can look up the sequence or bacterial name by merging with rep-seqs or taxonomy, respectively). Once you have identified the features that change the most and least (have the highest and lowest differentials) you can plot the log ratio of these features across metadata categories or gradients!

  2. checkpoint: this points to checkpoint files -- this can be used for saving intermediate results. This is more important for jobs that will take days to run, where the models parameter can be investigated while the program is running, rather than waiting for differentials.tsv to be written.

  3. events.out.tfevents.*: These files are what is being read into Tensorboard when it visualizes model fitting plots.

Q. I'm confused, what is Tensorboard?

A. Tensorboard is a diagnostic tool that runs in a web browser. To open tensorboard, make sure you’re in the songbird environment (songbird_env) and cd into the folder you are running the script above from. Then run:

tensorboard --logdir .

Returning line will look something like:
TensorBoard 1.9.0 at http://Lisas-MacBook-Pro-2.local:6006 (Press CTRL+C to quit)

Open the website (highlighted in red) in a browser. (Hint; if that doesn’t work try putting only the port number (here it is 6006), adding localhost, localhost:6006). Leave this tab alone. Now any songbird output directories that you add to the folder that tensorflow is running in will be added to the webpage.

This should produce a website with 2 graphs, which tensorflow actively updates as songbird is running. tensorboard See this section on interpreting model fitting for details on how to understand these plots.

7.2. FAQs: Running Songbird through QIIME 2

Q. What are all of these QZA files, and what can I do with them?

A. When you run qiime songbird multinomial, you'll get three output QIIME 2 artifacts:

  1. differentials.qza: This is analagous to the differentials.tsv file described above. This is represented as a QIIME 2 FeatureData[Differential] artifact, so you can directly load it into QIIME 2 plugins that accept differentials like Qurro.

  2. regression-stats.qza: This artifact contains information about how Songbird's model fitting went. You can visualize this using qiime songbird summarize-single, and if you have multiple Songbird runs on the same dataset you can visualize two artifacts of this type by using qiime songbird summarize-paired. See this section on interpreting model fitting for details on how to understand the resulting visualization.

  3. regression-biplot.qza: This is a biplot. It's a bit unconventionally structured in that points in the biplot correspond to features and arrows in the biplot correspond to covariates. We'll show how to visualize this later in this FAQ section.

Q. (This answer uses the Red Sea dataset.) What can I do with that regression-biplot.qza file I get from running qiime songbird multinomial? Can I eat it?

A. You can't eat it, unfortunately. But you can visualize it in Emperor!

qiime emperor biplot \
	--i-biplot regression-biplot.qza \
	--m-sample-metadata-file data/redsea/feature_metadata.txt \
	--p-ignore-missing-samples \
	--p-number-of-features 7 \
	--o-visualization emperor-biplot

You can view the resulting visualization using qiime tools view or at view.qiime2.org.

These biplots have a different interpretation - the points correspond to features and the arrows correspond to covariates of interest. Running these models on the full dataset can yield something similar to as follows: biplot

7.3. FAQs: Parameters

Q. The "specifying a formula" section mentioned that I can mitigate overfitting with --differential-prior. What does that mean?

A. By overfitting, we're referring to scenarios when the model attempts to memorize data points rather than building predictive models to undercover biological patterns. See https://xkcd.com/1725/

The --differential-prior command specifies the width of the prior distribution of the differentials. For --differential-prior 1, this means 99% of rankings (given in differentials.tsv) are within -3 and +3 (log fold change). The higher differential-prior is, the more parameters can have bigger changes, so you want to keep this relatively small. If you see overfitting (accuracy and fit increasing over iterations in tensorboard) you may consider reducing the differential-prior in order to reduce the parameter space.

Q. What's up with the --training-column argument?

A. That is used for cross-validation if you have a specific reproducibility question that you are interested in answering. If this is specified, only samples labeled "Train" under this column will be used for building the model and samples labeled "Test" will be used for cross validation. In other words the model will attempt to predict the feature abundances for the "Test" samples. Ideally the random samples held out for testing will be a relatively even mix of samples from different categories. This parameter can be especially useful if your samples are not evenly distributed across sample categories, so that you can ensure even distribution of sample categories across the "Train" and "Test" groups.

The resulting prediction accuracy is used to evaluate the generalizability of the model in order to determine if the model is overfitting or not. If this argument is not specified, then 5 random samples will be chosen for the test dataset. If you want to specify more random samples to allocate for cross-validation, the --num-random-test-examples argument can be specified. As a rule of thumb, the number of test samples held out for cross-validation is typically 10-20%.

Note that the Train and Test values are case sensitive -- if you specify train and test instead, things may go wrong.

Q. How long should I expect this program to run?

A. There are many factors that influence this. The numbers of samples and features in your dataset play a large role, as does the --epochs/--p-epochs parameter.

The larger the batch size, the more samples you average per iteration, but the less iterations you have - which can sometimes buy you less time to reach convergence (so you may have to compensate by increasing the epochs). On the other hand, if you decrease the batch size, you can have more iterations, but the variability between each iteration is higher. This also depends on if your program will converge. This may also depend on the --learning-rate which specifies the resolution (smaller step size = smaller resolution, but may take longer to converge). You will need to consult with Tensorboard (or your regression-stats.qza output) to make sure that your model fit is sane. See this paper for more details on gradient descent: https://arxiv.org/abs/1609.04747

Q. How exactly does filtering work?!?

A. There are two filtering parameters --min-feature-count and --min-sample-count. Be careful, these two parameters have different behaviors. --min-sample-count will filter out samples according to the total read count. For instance, if --min-sample-count 1000 is specified, then all samples that have less than 1000 reads will be filtered out.

--min-feature-count will filter out features according to how many samples they appear in. For instance, if --min-feature-count 10 is specified, then all features than appear in less than 10 samples will be thrown out. It is important to note that we are filtering according to number of samples rather than number of reads. The reason why this behavior is chosen relates to a rule of thumb commonly used in linear regression - if a microbe appears in less than 10 samples, it is difficult to fit a meaningful line for that microbe. In other words, there is not even resolution in the study to say anything meaningful about that microbe in the context of differential abundance analysis. The --min-feature-count filter is applied after the --min-sample-count is applied, so it's possible for (for example) a sample to get filtered out which in turn causes a feature to get filtered out.

7.4. FAQs: Output files

Q. Why do I have so many columns in my differentials even when I'm only using one continuous variable?

A. If Songbird sees that a sample metadata column contains any non-numeric values -- even things like Not Applicable, NA, nan, etc. -- then Songbird will assume that that sample metadata column is categorical (and not quantitative). This is likely why you're seeing multiple columns for a single sample metadata column.

If you're having this problem and you're running Songbird standalone, you will need to delete any #q2: headers at the start of the sample metadata file -- otherwise, Songbird will interpret these lines as describing actual samples.

7.5. FAQs: Installation

Q. I've installed Songbird, but when I try to run it I get an error that says something like ImportError: /lib64/libc.so.6: version `GLIBC_2.15' not found. What's going on here?

A. In our experience, this is a problem with installing TensorFlow on conda. You should be able to get around this problem by uninstalling TensorFlow, then re-installing it from the defaults conda channel:

conda uninstall tensorflow
conda install -c defaults tensorflow

8. Visualizing Songbird's differentials

Qurro generates interactive visualizations of the differentials produced by Songbird! A Qurro demo using the Red Sea dataset is available here.

You can also view the output differentials as a nice, sortable table using qiime metadata tabulate:

qiime metadata tabulate \
	--m-input-file differentials.qza \
	--o-visualization differentials-viz.qzv

(The output .qzv file can be viewed using qiime tools view or at view.qiime2.org.)

Tabulated differentials

9. More information

9.1. Further Documentation

For a more complete tutorial, see the following url that includes real datasets:

https://github.com/knightlab-analyses/reference-frames

9.2. Acknowledgements

Credits to Lisa Marotz (@lisa55asil) and Marcus Fedarko (@fedarko) for the FAQs and a ton of the documentation in this README.

9.3. Citations

If you use this tool and you like it, please cite it at

@article{morton2019establishing,
  title={Establishing microbial composition measurement standards with reference frames},
  author={Morton, James T and Marotz, Clarisse and Washburne, Alex and Silverman, Justin and Zaramela, Livia S and Edlund, Anna and Zengler, Karsten and Knight, Rob},
  journal={Nature communications},
  volume={10},
  number={1},
  pages={2719},
  year={2019},
  publisher={Nature Publishing Group}
}

songbird's People

Contributors

antgonza avatar eldeveloper avatar fedarko avatar gibsramen avatar gwarmstrong avatar lisa55asil avatar mortonjt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

songbird's Issues

Null distribution?

Would be great if we could handle null distribution for differentials and carve something that looks like a p-value

pypi install

Right now, we can install songbird via pip, but it'll require numpy and biom-format to be installed before hand. One way to do this is to create a conda environment as follows

conda create -n songbird pip numpy biom-format
source activate songbird
pip install songbird

This is not great, since even for the simple pip install, we'll need to have a conda install.
May want to explore deploying docker containers.

ModuleNotFoundError: No module named 'songbird.q2'

Hi,
I installed your software in a conda environment where qiime2 was already installed following the 2/3 commands. Then when I type qiime dev refresh-cache I got this error

Traceback (most recent call last):
  File "/Users/michoug/miniconda3/envs/qiime2/bin/qiime", line 11, in <module>
    sys.exit(qiime())
  File "/Users/michoug/miniconda3/envs/qiime2/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Users/michoug/miniconda3/envs/qiime2/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Users/michoug/miniconda3/envs/qiime2/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/michoug/miniconda3/envs/qiime2/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/michoug/miniconda3/envs/qiime2/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/michoug/miniconda3/envs/qiime2/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/michoug/miniconda3/envs/qiime2/lib/python3.6/site-packages/q2cli/dev.py", line 27, in refresh_cache
    import q2cli.cache
  File "/Users/michoug/miniconda3/envs/qiime2/lib/python3.6/site-packages/q2cli/cache.py", line 301, in <module>
    CACHE = DeploymentCache()
  File "/Users/michoug/miniconda3/envs/qiime2/lib/python3.6/site-packages/q2cli/cache.py", line 61, in __init__
    self._state = self._get_cached_state(refresh=refresh)
  File "/Users/michoug/miniconda3/envs/qiime2/lib/python3.6/site-packages/q2cli/cache.py", line 107, in _get_cached_state
    self._cache_current_state(current_requirements)
  File "/Users/michoug/miniconda3/envs/qiime2/lib/python3.6/site-packages/q2cli/cache.py", line 200, in _cache_current_state
    state = self._get_current_state()
  File "/Users/michoug/miniconda3/envs/qiime2/lib/python3.6/site-packages/q2cli/cache.py", line 238, in _get_current_state
    plugin_manager = qiime2.sdk.PluginManager()
  File "/Users/michoug/miniconda3/envs/qiime2/lib/python3.6/site-packages/qiime2/sdk/plugin_manager.py", line 44, in __new__
    self._init()
  File "/Users/michoug/miniconda3/envs/qiime2/lib/python3.6/site-packages/qiime2/sdk/plugin_manager.py", line 59, in _init
    plugin = entry_point.load()
  File "/Users/michoug/miniconda3/envs/qiime2/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2346, in load
    return self.resolve()
  File "/Users/michoug/miniconda3/envs/qiime2/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2352, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
ModuleNotFoundError: No module named 'songbird.q2'

Any ideas ?

Unable to install songbird and Qiime together

Hello.

I've been trying to install songbird in my existing Qiime environment (r2019.7) but doing so results in an error upon trying to use any Qiime functionality (and also cache refresh).

Traceback (most recent call last):
  File "/miniconda3/envs/nash_mice/bin/qiime", line 11, in <module>
    sys.exit(qiime())
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/click/core.py", line 716, in main
    with self.make_context(prog_name, args, **extra) as ctx:
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/click/core.py", line 641, in make_context
    self.parse_args(ctx, args)
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/q2cli/click/command.py", line 43, in parse_args
    return super().parse_args(ctx, args)
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/click/core.py", line 1086, in parse_args
    echo(ctx.get_help(), color=ctx.color)
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/click/core.py", line 516, in get_help
    return self.command.get_help(self)
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/click/core.py", line 879, in get_help
    self.format_help(ctx, formatter)
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/click/core.py", line 898, in format_help
    self.format_options(ctx, formatter)
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/q2cli/click/command.py", line 157, in format_options
    for subcommand in self.list_commands(ctx):
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/q2cli/commands.py", line 93, in list_commands
    plugins = sorted(self._plugin_lookup)
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/q2cli/commands.py", line 77, in _plugin_lookup
    import q2cli.core.cache
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/q2cli/core/cache.py", line 403, in <module>
    CACHE = DeploymentCache()
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/q2cli/core/cache.py", line 61, in __init__
    self._state = self._get_cached_state(refresh=refresh)
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/q2cli/core/cache.py", line 107, in _get_cached_state
    self._cache_current_state(current_requirements)
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/q2cli/core/cache.py", line 200, in _cache_current_state
    state = self._get_current_state()
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/q2cli/core/cache.py", line 238, in _get_current_state
    plugin_manager = qiime2.sdk.PluginManager()
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/qiime2/sdk/plugin_manager.py", line 44, in __new__
    self._init()
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/qiime2/sdk/plugin_manager.py", line 59, in _init
    plugin = entry_point.load()
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2433, in load
    self.require(*args, **kwargs)
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2456, in require
    items = working_set.resolve(reqs, env, installer, extras=self.extras)
  File "/miniconda3/envs/nash_mice/lib/python3.6/site-packages/pkg_resources/__init__.py", line 786, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'mock>=2.0.0' distribution was not found and is required by tensorflow-estimator

I also tried creating a fresh conda environment installing songbird first and then Qiime, but the issue persists.

For my purposes I created a separate environment for songbird and used the non-qiime CLI, but this is a bit of a hassle.

Running MacOS Mojave 10.14.5

q2-songbird Biplot Metadata Issue

Hello @mortonjt ,

I am trying to create the regression biplot in emperor using the following command:

qiime emperor biplot
--i-biplot songbird/regression-biplot.qza
--m-sample-metadata-file taxonomy/taxonomy.qza
--p-ignore-missing-samples
--p-number-of-features 8
--o-visualization songbird/emperor-biplot.qzv

I found out that I need to input my feature metadata (which in my case is the taxonomy file I used for the deicode biplot) instead of my sample metadata. This biplot works to create the biplot with my metadata variables as arrows, but now I cannot look at the sample metadata associated with my points in emperor. Is there a way I can do this that I am missing?

Thanks,
Zach

Handle filtering edge cases

  1. Handle edge case with hold out data where not all categories are represented
  2. Do the same with the training data

fix qiime2 deps

qiime2 is currently locked into numpy=1.12

This will need to be updated in order for qiime2 to be compatible with any version of tensorflow

Accounting for different summary intervals in summarize-paired?

It looks like the code for _summarize() assumes that both stats inputs have the same summary interval:

Note
----
This assumes that the same summary interval was used
for both analyses.

As far as I'm aware, this isn't documented in the user-facing API (I didn't know this was the case until today, and I'm guessing not a lot of other Songbird users know this). Is this still a restriction? I tested this by running the same qiime songbird multinomial command twice (once with a summary interval of 5, and once with a summary interval of 1)—attaching a picture showing the resulting summarize-paired visualization (baseline = interval of 1, model = interval of 5).

Screen Shot 2019-09-18 at 4 10 27 PM

This looks fine. The curves' starting x-axis positions are different, but that seems to be a natural result of when the recording started (so not really a problem)—in the summarize-single for the model curve, it also starts at around 10k iterations.

tldr

If having different summary intervals isn't a problem, I can close this issue and remove the corresponding note in the code. If it is a problem, here are some ideas for improving the user experience that I'd be happy to add:

  1. Specify this restriction clearly in the summarize-paired documentation
  2. Throw an error in _convergence_plot under the if baseline is not None: block if len(regression.index) != len(baseline.index)

Let me know what you think. Thanks!

Making certain command-line arguments minimally required

In the standalone script, you can just run songbird multinomial by itself -- this should give some sort of click-generated error along the lines of "please pass in a biom and metadata at least", but instead right now it tries to read in None as a metadata file, which gives the user a semi-cryptic error message.

mfedarko@mwf-knightlabwork ~/S/f/songbird> songbird multinomial                                                                                                                                                            (qiime2-2019.1) 
Traceback (most recent call last):
  File "/anaconda3/envs/qiime2-2019.1/bin/songbird", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/Users/mfedarko/Software/forks/songbird/scripts/songbird", line 110, in <module>
    songbird()
  File "/anaconda3/envs/qiime2-2019.1/lib/python3.5/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/anaconda3/envs/qiime2-2019.1/lib/python3.5/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/anaconda3/envs/qiime2-2019.1/lib/python3.5/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/anaconda3/envs/qiime2-2019.1/lib/python3.5/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/anaconda3/envs/qiime2-2019.1/lib/python3.5/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/mfedarko/Software/forks/songbird/scripts/songbird", line 69, in multinomial
    metadata = read_metadata(metadata_file)
  File "/Users/mfedarko/Software/forks/songbird/songbird/util.py", line 122, in read_metadata
    filepath, dtype=object)
  File "/anaconda3/envs/qiime2-2019.1/lib/python3.5/site-packages/pandas/io/parsers.py", line 709, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/anaconda3/envs/qiime2-2019.1/lib/python3.5/site-packages/pandas/io/parsers.py", line 433, in _read
    filepath_or_buffer, encoding, compression)
  File "/anaconda3/envs/qiime2-2019.1/lib/python3.5/site-packages/pandas/io/common.py", line 211, in get_filepath_or_buffer
    raise ValueError(msg.format(_type=type(filepath_or_buffer)))
ValueError: Invalid file path or buffer object type: <class 'NoneType'>

All that's really needed to fix this is just telling click which parameters are necessary for a run of songbird, and making those required. Beyond --input-biom, --metadata-file, and --formula, I'm not sure which are strictly required.

(QIIME 2 marks the BIOM, metadata, and formula params as required for the multinomial command -- as far as I can tell this is due to these params not having any default option set in songbird/q2/_method.py.)

@mortonjt if you'd like I can try to submit a PR for this: let me know which songbird multinomial params you'd like to be required.

More Patsy formula examples in README? / More documentation on design matrices, ANOVA, etc.

Something I've been thinking about is that, to a pretty large degree, "using Songbird" boils down to choosing a formula.

I know that the current README already links to the Patsy docs, and I know that explaining statistical formulas to the user in the README is a tall order, but it might be a good idea to at least add a few example formulas for common use cases (e.g. using C() and Treatment coding to explicitly set a reference).

...At the very least we could add a link to this blog post :)

Tensorboard and qiime2

At the moment, tensorboard is disabled when running the qiime2 songbird command.
The original motivation was because an additional logging directory needs to be created in order to enable this.

I'm not sure how qiime2 is expected to handle logging directories - should they be typed?

Thoughts @fedarko @ebolyen?

Adding --p-logging-dir option

As discussed in #70 , we don't want to arbitrary save logging information in a default location (esp if it is running in qiita or something).

Having a --p-logging-dir option may be needed in order to pipe this information to /tmp if necessary. We'll want to avoid the option of completely disabling logging if possible (too much code will have to change).

@antgonza , feel free to qiime in.

Detect and ignore #q2:types rows in standalone metadata input

Standalone Songbird doesn't understand these rows, so it'll interpret them as actual samples (messing things up slightly). This is already documented in the README, but it'd be good to automatically handle these situations.

I wrote a function in Qurro that should handle this -- see e.g. this part of code. can take a crack at incorporating this into songbird.

numerical issues

Sooner or later, this discussion will pop up.

There will be some numerical issues when converting back and forth between alr and clr coordinates.
In otherwords, you shouldn't need to use clr_inv to convert clr to proportions, then to proportions to alr coordinates. It should be one step.

For instance, we know that we can convert from logarithms to alr coordinates as follows

alr(x) = log(x) A

where

A = [[1, 0, ..., 0, -1]
     [0, 1, ..., 0, -1]
     ...
     [0, 0, ..., 1, -1]]

There should correspond an appropriate inv(A) where log(x) = alr(x) A'
If we can do this, we can easily convert logs to clr coordinates (since this is just mean centering).
The solution needs to be simple and fast (i.e. no matrix inversions or other cubic time crap).

summarize-single doesn't actually need a feature table as input

As far as I can tell, the feature table is only used in the summary commands to get the number of samples (i.e. the n parameter to _summarize()). Furthermore, n is only used in summarize-paired.

At minimum, we should be able to safely remove the --i-feature-table input for the summarize-single command.

As a maybe-out-there idea, we could also have qiime songbird multinomial automatically generate a visualization of the stats, which'd be convenient (note: this might not be possible with the current QIIME 2 architecture, that's ok if so).

Drop duplicates in metadata

Getting this error

Traceback (most recent call last):
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.7/bin/songbird", line 113, in <module>
    songbird()
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.7/bin/songbird", line 76, in multinomial
    min_sample_count, min_feature_count)
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/songbird/util.py", line 172, in match_and_filter
    table = table.sort(sort_f=sort_f, axis='sample')
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/biom/table.py", line 2123, in sort
    return self.sort_order(sort_f(self.ids(axis=axis)), axis=axis)
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/songbird/util.py", line 170, in sort_f
    return [xs[metadata.index.get_loc(x)] for x in xs]
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/songbird/util.py", line 170, in <listcomp>
    return [xs[metadata.index.get_loc(x)] for x in xs]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 176 but corresponding boolean dimension is 196

This is because there are duplicate metadata entries.

Not sure if this sort of stuff is being caught in q2 metadata. But we probably want to catch this in the standalone api.

We can add a fix to the following line
https://github.com/biocore/songbird/blob/master/songbird/util.py#L168

metadata = metadata.loc[~metadata.index.duplicated(keep='first')]

summary command for qiime2 CLI

This command will have

  1. A crude loss curve
  2. A crude cross-validation curve
  3. The option to merge multiple summaries together to provide a pseudo-R2 (i.e. allow for the customization of a null-hypothesis).

Using a smaller default summary-interval

I think lowering this to something in the 5 -- 30 range would be helpful, especially since the default parameters for running Songbird mean that most datasets (at least the ones I've tried) get processed pretty quickly.

(Sorry to inundate you with issues btw, I'm mostly just trying to log these here so I remember to address them soon)

Installation of songbird within the 2019.4 qiime2 environment errors out

Attempting to create a conda environment with both qiime2 and songbird and keep running into issues.

Created the qiime2 environment using the conda installation instructions as specified in the documentation: https://docs.qiime2.org/2019.4/install/native/#install-qiime-2-within-a-conda-environment

Attempted to install songbird with the command:
conda install -c conda-forge songbird

Which resulted in this error:

QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.
Traceback (most recent call last):
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/bin/qiime", line 11, in
sys.exit(qiime())
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/click/core.py", line 716, in main
with self.make_context(prog_name, args, **extra) as ctx:
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/click/core.py", line 641, in make_context
self.parse_args(ctx, args)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/click/command.py", line 42, in parse_args
return super().parse_args(ctx, args)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/click/core.py", line 1086, in parse_args
echo(ctx.get_help(), color=ctx.color)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/click/core.py", line 516, in get_help
return self.command.get_help(self)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/click/core.py", line 879, in get_help
self.format_help(ctx, formatter)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/click/core.py", line 898, in format_help
self.format_options(ctx, formatter)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/click/command.py", line 153, in format_options
for subcommand in self.list_commands(ctx):
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/commands.py", line 91, in list_commands
plugins = sorted(self._plugin_lookup)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/commands.py", line 75, in _plugin_lookup
import q2cli.core.cache
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/core/cache.py", line 400, in
CACHE = DeploymentCache()
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/core/cache.py", line 61, in init
self._state = self._get_cached_state(refresh=refresh)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/core/cache.py", line 107, in _get_cached_state
self._cache_current_state(current_requirements)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/core/cache.py", line 200, in _cache_current_state
state = self._get_current_state()
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/core/cache.py", line 238, in _get_current_state
plugin_manager = qiime2.sdk.PluginManager()
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/plugin_manager.py", line 44, in new
self._init()
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/plugin_manager.py", line 59, in _init
plugin = entry_point.load()
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/pkg_resources/init.py", line 2433, in load
self.require(*args, **kwargs)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/pkg_resources/init.py", line 2456, in require
items = working_set.resolve(reqs, env, installer, extras=self.extras)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/pkg_resources/init.py", line 786, in resolve
raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'Cython>=0.23' distribution was not found and is required by hdmedians

Then attempted to install songbird by cloning the repo and using pip as specified in this issue: #46

This appeared to complete installation correctly, but results in this error when running songbird

$ qiime dev refresh-cache
QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.
Traceback (most recent call last):
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: /lib64/libc.so.6: version `GLIBC_2.15' not found (required by /home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/bin/qiime", line 11, in
sys.exit(qiime())
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/builtin/dev.py", line 30, in refresh_cache
import q2cli.core.cache
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/core/cache.py", line 400, in
CACHE = DeploymentCache()
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/core/cache.py", line 61, in init
self._state = self._get_cached_state(refresh=refresh)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/core/cache.py", line 107, in _get_cached_state
self._cache_current_state(current_requirements)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/core/cache.py", line 200, in _cache_current_state
state = self._get_current_state()
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/core/cache.py", line 238, in _get_current_state
plugin_manager = qiime2.sdk.PluginManager()
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/plugin_manager.py", line 44, in new
self._init()
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/plugin_manager.py", line 59, in _init
plugin = entry_point.load()
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/pkg_resources/init.py", line 2434, in load
return self.resolve()
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/pkg_resources/init.py", line 2440, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/home/demo/demo01/songbird/songbird/q2/init.py", line 2, in
from ._method import multinomial
File "/home/demo/demo01/songbird/songbird/q2/_method.py", line 6, in
import tensorflow as tf
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/tensorflow/init.py", line 28, in
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/tensorflow/python/init.py", line 49, in
from tensorflow.python import pywrap_tensorflow
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: /lib64/libc.so.6: version `GLIBC_2.15' not found (required by /home/demo/demo01/.conda/envs/qiime2-2019.4/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so)

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.

Can anyone provide some guidance on what might be going wrong?

Thanks!

Improving QIIME 2 summary visualizations

This came up in #70. For some background here: I tried using these options back in July, couldn't make sense of the plots, and when I asked about it the response was opening #63 :) ...So I ended up just running Songbird outside of QIIME 2 for that project.

Since generating TF logging files through QIIME 2 sounds like it'd be a hassle, keeping these plots in the Q2 Songbird plugin (and improving them, to be usable by people who aren't intimately familiar with ML/Songbird/Tensorflow) would make sense.

So! Here's what the current output of qiime songbird summarize-single looks like (zoomed out to show the full display)—

Screen Shot 2019-09-17 at 3 27 32 PM

And here's the current output from qiime songbird summarize-paired
Screen Shot 2019-09-17 at 3 35 17 PM

For reference, here's a screenshot of tensorboard showing a single run:

Screen Shot 2019-09-17 at 3 40 16 PM

Ideas for improving the Q2 summary plots

  1. Make the y-axes (Loglikelihood and Cross validation score) more descriptive, because the meanings aren't clear (I've talked to other people who also aren't sure how to interpret these).

    • I get that Tensorboard's labels also aren't that useful, but the saving grace there is that Songbird's README does a super good job of explaining how to interpret those plots. If we can at least make it clear which of the summary plots corresponds to which of the Tensorboard plots, that would make interpreting these a lot easier. (I'm guessing it's switched around now, and that the bottom Cross validation score plot corresponds to the top cv_error Tensorboard plot?)
    • When I first tried using summarize-single back in July, the shape of the plots was a lot grosser than the shape shown here. (The summarize-single plot here was generated using already-tuned-using-Tensorboard hyperparameters, which is why it looks nice.) Users who run into this page will likely start off with gross plots that don't resemble the expected shape at all, which is why having context is important.
  2. Add text explaining how to interpret these plots

    • This could be mostly copy-pasted from the README (we could even add a hyperlink to the relevant part(s) of the README, as an alternative).
    • This could cover common problems people run into, e.g. "so if you don't see anything in these plots you probably need to increase decrease your --p-summary-interval".

These are cosmetic changes, mostly, so fixing them shouldn't be too difficult. I'd like to hear your thoughts on these ideas—can start working on a PR to resolve this immediately afterwards, either tonight or tomorrow.

Mismatches between sample metadata and BIOM table are demonstrably causing (small?) problems with filtering

songbird/songbird/util.py

Lines 130 to 167 in be66273

def match_and_filter(table, metadata, formula,
min_sample_count, min_feature_count):
""" Matches and aligns biom and metadata tables.
This will also return the patsy representation.
Parameters
----------
table : biom.Table
Table of abundances
metadata : pd.DataFrame
Sample metadata
Returns
-------
table : biom.Table
Filtered biom table
metadata : pd.DataFrame
Sample metadata
"""
# match them
metadata = metadata.loc[table.ids(axis='sample')]
def sample_filter(val, id_, md):
return id_ in metadata.index and np.sum(val) > min_sample_count
def read_filter(val, id_, md):
return np.sum(val > 0) > min_feature_count
def metadata_filter(val, id_, md):
return id_ in metadata.index
table = table.filter(metadata_filter, axis='sample')
table = table.filter(sample_filter, axis='sample')
table = table.filter(read_filter, axis='observation')
metadata = metadata.loc[table.ids(axis='sample')]

So: if table.ids(axis='sample') contains IDs that aren't in the metadata file, then the metadata file will still include rows for those sample IDs, albeit filled with NaN values.

This makes both metadata_filter() and the first part of sample_filter() effectively not work, since these will be true for every ID in the table (after running the first line with metadata = metadata.loc[table.ids(axis='sample')], at least).

Proof of this with a small DataFrame:

>>> df = pd.DataFrame({"col1": [1, 2, 3], "col2": [6, 7, 8]}, index=["a", "b", "c"])
>>> df.loc['a'] # works as expected
col1    1
col2    6
Name: a, dtype: int64
>>> df.loc['d'] # also works as expected: throws a KeyError (d isn't in df's index)
... long stack trace goes here ...
>>> df = df.loc[['a', 'b', 'd']] # wait, what?
>>> df
    col1   col2
a    1.0    6.0
b    2.0    7.0
d    NaN    NaN

>>> 'd' in df.index # this is what metadata_filter() is doing
True

I don't think this will throw off results (I'd imagine tensorflow can ignore NaNs satisfactorily), but I'm not sure.

I don't currently have the bandwidth to work on a fix for this, but might be able to do so in a week.

I've been thinking recently anyway that it might be worth taking some time to make a library that basically mimics qiime2.Metadata without actually requiring the user to have Q2 installed. This would enable us to handle metadata, do matching with BIOM tables, etc. a lot more confidently in standalone versions of Q2 plugins (as well as in general tool development). Because pandas is great, but sometimes it does some weird stuff.

Alpha release checklist

  • Travis setup
  • Update README with examples + tutorial
  • setup.py corrections
  • License
  • Changelog
  • Pip install
  • Conda install
  • Basic summary command
  • q2 songbird plugin

Error: Got unexpected extra arguments ( )

When I try to run the following input:

songbird multinomial
  --input-biom /projects/qiita_data/BIOM/73084/otu_table.redist.species.biom
  --metadata-file /home/vjcantu/maz1/songbird/YooMeta2.txt
  --formula “genotype+activated”
  --epochs 20000
  --batch-size 10
  --differential-prior 1
  --learning-rate 0.001  
  --min-feature-count 5
  --num-random-test-examples 10
  --summary-dir /home/vjcantu/maz1/songbird/

I get the following error:

Usage: songbird multinomial [OPTIONS]
Try "songbird multinomial --help" for help.

Error: Got unexpected extra arguments (                    )

Any tips on how to get around would be appreciated.

tensorflow and tqdm not included in setup.py

This causes ImportErrors when songbird has been installed via pip install -e .

When running something like qiime --help, QIIME 2 tries to load songbird (which involves stuff from songbird/q2/_method.py and songbird/multinomial.py being imported), and these modules require tensorflow and tqdm.

This problem can be fixed by just running pip install tqdm "tensorflow>1.5" after installing songbird (these requirements are listed in ci/conda_requirements.txt but not in setup.py). Unless I'm missing something here, would it make sense to just add these requirements to setup.py? Can submit a PR if it would help.

Thanks!

Output type for multinomial

This conversion originated with talks with @fedarko about appropriate output types for the songbird multinomial command.

To provide some background, the original reason we wanted to output a FeatureTable[Frequency] object was so that we can output a tabular object the users can view.

This sort of defeats the original purpose, since biom is not easily viewable as a table. Having an ordinations results object wouldn't be ideal for this purpose either, since the OrdinationResults object can't be easily viewed as a table.

One option worth considering is to have 2 outputs from songbird multinomial, namely

--o-differentials FeatureData[Differentials] a new type which we will define that will have tabular data that can be unzipped for users to directly view
--o-biplot PCoAResults % Properties('biplot') that basically merge in all of the results from regression-biplot.

q2-Songbird: plotting regression-biplots issue

Hi Jamie,
Unless I am misunderstanding something, I think there might be an issue with the QIIME2 Songbird plugin regarding making the regression biplot. When I use your RedSea tutorial dataset, the qiime songbird multinomial function appears to run correctly, but when I go to the plug the regression-biplot.qza into anime emperor biplot I get an error:

(qiime2-2019.1) jobakerlt-osx:march-q2songbird jobaker$ qiime songbird multinomial \

--i-table redsea.qza
--m-metadata-file redsea_metadata.txt
--p-formula "Depth+Temperature+Salinity+Oxygen+Fluorescence+Nitrate"
--o-differentials redsea_differentials.qza
--o-regression-stats redsea_regression-stats.qza
--o-regression-biplot redsea_regression-biplot.qza
--verbose
2019-05-07 20:40:05.079391: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-05-07 20:40:05.079660: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 8. Tune using inter_op_parallelism_threads for best performance.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8000/8000 [00:10<00:00, 731.36it/s]
Saved FeatureData[Differential] to: redsea_differentials.qza
Saved SampleData[SongbirdStats] to: redsea_regression-stats.qza
Saved PCoAResults % Properties(['biplot']) to: redsea_regression-biplot.qza
(qiime2-2019.1) jobakerlt-osx:march-q2songbird jobaker$ qiime emperor biplot
--i-biplot redsea_regression-biplot.qza
--m-sample-metadata-file redsea_metadata.txt
--p-number-of-features 7
--o-visualization redsea_emperor-biplot
Plugin error from emperor:

None of the sample identifiers match between the metadata and the coordinates. Verify that you are using metadata and coordinates corresponding to the same dataset.

Debug info has been saved to /var/folders/rr/2hbbkrqs5mn86rnt2s4yywsr0003kj/T/qiime2-q2cli-err-y6jdxgcr.log

I noticed this error initially when running my own dataset, using my same OTU table and metadata table that had worked with making a biplot from DEICODE (i.e. the biplot didn't give a similar error then). Since your RedSea tutorial data had the same issue I don't actually think it's a lack of match up between the OTU table and metadata. If it helps, my qiime info is here:
System versions
Python version: 3.6.5
QIIME 2 release: 2019.1
QIIME 2 version: 2019.1.0
q2cli version: 2019.1.0

Installed plugins
alignment: 2019.1.0
composition: 2019.1.0
cutadapt: 2019.1.0
dada2: 2019.1.0
deblur: 2019.1.0
deicode: 0.1.6
demux: 2019.1.0
diversity: 2019.1.0
emperor: 2019.1.0
feature-classifier: 2019.1.0
feature-table: 2019.1.0
fragment-insertion: 2019.1.0
gneiss: 2019.1.0
longitudinal: 2019.1.0
metadata: 2019.1.0
phylogeny: 2019.1.0
quality-control: 2019.1.0
quality-filter: 2019.1.0
rankratioviz: 0.0.0
sample-classifier: 2019.1.0
songbird: 0.8.2
taxa: 2019.1.0
types: 2019.1.0
vsearch: 2019.1.0

Thanks!

k-way poisson

Scalable poisson regression specifically tailored for categorical data

Index matching Float / String

Index from md is float:
(Float64Index([ 10317.000054161, 10317.000033729, 10317.000065509,
...
dtype='float64', name='#SampleID', length=609),

Index from BIOM table is string:
array(['10317.000054161', '10317.000033729', '10317.000065509',
...
'10317.000029409', '10317.000059111', '10317.000038006'],
dtype=object))

Yields error:
miniconda3/envs/regression/lib/python3.5/site-packages/pandas/core/indexing.py", line 1206, in _validate_read_indexer
key=key, axis=self.obj._get_axis_name(axis)))
KeyError: "None of [['10317.000054161' .....'10317.000059111'\n '10317.000038006']] are in the [index]"

From command:
songbird multinomial
--formula 'vioscreen_fiber'
--input-biom exported-subset-by-country/feature-table.biom
--metadata-file by_country.csv
--summary-dir results

Fine tune summary legends

It should be possible to retrieve the input strings for the --p-formula option for each regression model from the qiime2 providence. Would be super useful if these formulas could be directly displayed in the summary. For example in the image below, rather than displaying model vs baseline, it would display "Depth+Temperature+Salinity+Oxygen+Fluorescence+Nitrate" vs "1" (using the formulas from the red sea tutorial)

image

Tenserflow issue with OpenMP

Hi,
I'm on a mac and when I ran the command

qiime songbird multinomial --i-table redsea.biom.qza --m-metadata-file data/redsea/redsea_metadata.txt --p-formula "Depth+Temperature+Salinity+Oxygen+Fluorescence+Nitrate" --o-differentials differentials.qza --o-regression-stats regression-stats.qza --o-regression-biplot regression-biplot.qza --verbose

I got this error

OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
Abort trap: 6

that I solved using this link where the answer is installing this package

conda install nomkl

It could maybe be useful to add it to the dependencies?

Better defaults!

  1. batch_size around 5 (since most studies are small)
  2. learning_rate around 1e-3

Any other suggestions?

How is # iterations actually being calculated?

The README says that

Number of iterations = --epoch # multiplied by the --batch-size parameter

...But I'm testing out Songbird on the Red Sea dataset with --epochs 10000 and the default --batch-size 5, and instead of 50k iterations I'm seeing ~80k iterations. I do have --differential-prior 0.5; might that be a reason why there are more iterations than expected?

This isn't a big issue, but if the actual number of iterations is computed differently it'd be a good idea to update the README accordingly (at the very least, removing this equation if it is incorrect).

include colinearity discussion?

should we flag new users about problems with possible colinearity issues if their metadata categories are too similar (e.g. including a continuous value of BMI and a quartile category of BMI)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.