changwn / scfea Goto Github PK

View Code? Open in Web Editor NEW

89.0 1.0 29.0 52.4 MB

single cell Flux Estimation Analysis (scFEA) Try the below web server!

Home Page: http://scflux.org/

License: Other

Python 5.35% Jupyter Notebook 94.65%

single-cell metabolism scrna-seq metabolic-modules

scfea's Introduction

scFEA: A graph neural network model to estimate cell-wise metabolic using single cell RNA-seq data

Change Log

v1.0

Release scFEA initial version with full paper and installtion manual
Release human complete metabolic flux modules

v1.1

Update full modules with metabolites names.
Update data_dir and input_dir two directories. data_dir is for model files and input_dir is for single cell input files.
Fix bugs in single cell imputation step by using Magic.

v1.1.2

Release mouse complete metabolic flux modules
Release tutorial using jupyter. The tutorial contains full installation manual, installation testing, two examples of scFEA for human and mouse model respectively. It also includes R script for loading predicted flux result and a simple visialuzation demo.
Add parameter output_flux_file and output_balance_file which allows user define custom output file names for predicted flux and balance files.
Fix bug in cName_c70_m168.csv which contains NA value

To be released soon

If you have an interested topic, please feel free to open an issue or I can also merge your completed function into main branch

Abstract

The metabolic heterogeneity, and metabolic interplay between cells and their microenvironment have been known as significant contributors to disease treatment resistance. Our understanding of the intra-tissue metabolic heterogeneity and cooperation phenomena among cell populations is unfortunately quite limited, without a mature single cell metabolomics technology. To mitigate this knowledge gap, we developed a novel computational method, namely scFEA (single cell Flux Estimation Analysis), to infer single cell fluxome from single cell RNA-sequencing (scRNA-seq) data. scFEA is empowered by a comprehensively reorganized human metabolic map as focused metabolic modules, a novel probabilistic model to leverage the flux balance constraints on scRNA-seq data, and a novel graph neural network based optimization solver. The intricate information cascade from transcriptome to metabolome was captured using multi-layer neural networks to fully capitulate the non-linear dependency between enzymatic gene expressions and reaction rates. We experimentally validated scFEA by generating an scRNA-seq dataset with matched metabolomics data on cells of perturbed oxygen and genetic conditions. Application of scFEA on this dataset demonstrated the consistency between predicted flux and metabolic imbalance with the observed variation of metabolites in the matched metabolomics data. We also applied scFEA on publicly available single cell melanoma and head and neck cancer datasets, and discovered different metabolic landscapes between cancer and stromal cells. The cell-wise fluxome predicted by scFEA empowers a series of downstream analysis including identification of metabolic modules or cell groups that share common metabolic variations, sensitivity evaluation of enzymes with regards to their impact on the whole metabolic flux, and inference of cell-tissue and cell-cell metabolic communications.

The computational framework of scFEA

The manuscript and supplementary methods

Our paper and supplementary methods is available at here!

Supplementary figures and tables

donwload supplementary files

Supplementary Tables:

Table S1. Information of reorganized human metabolic map.
Table S2. Differentially expressed genes (DEG) and Pathway Enrichment (PE) results of the Pa03c cell line data.
Table S3. ssGSEA results, metabolomics data and clusters of metabolic modules derived in the Pa03c cell line data.
Table S4. Predicted cell type specific fluxome and metabolic imbalance in the melanoma and head and neck cancer data.

Supplementary Figures:

Figure S1. qRT-PCR results. Mock and SCR are controls and siRef-1 are knock down of APEX1.
Figure S2. Correlation between metabolomic difference of the eight metabolites and differences of the averaged ssGSEA score of the modules using the eight metabolites as a substrate, in the APEX1-KD cells vs control. The x-axis is the difference of averaged ssGSEA score in the APEX1-KD cells vs control and the y-axis is the fold change of observed metabolomic profile.
Figure S3. The impact of each gene to the metabolic module 1-14 (glycolysis and TCA cycle modules) in the Pa03c cell line data. The x-axis represents genes and y-axis represents impacts. The larger absolute value on the y-axis indicates a stronger impact of the gene to the metabolic module.
Figure S4. tSNE plot of the cell clusters generated based on metabolic flux of the pancreatic cancer cell line data.
Figure S5. Boxplots of the predicted fluxes of Valine -> Succinyl-CoA, Isoleucine -> Succinyl-CoA, Isoleucine -> Acetyl-CoA, Glutathione -> Glycine + Cysteine, Glutathione -> Glutamate, Glutamate -> Glutamine and predicted changes in the abundance of Glutathione and Glutamate in the PV-ADSC of high stemness (HS) and more differentiation (MD).
Figure S6. Convergency of the flux balance loss and non-negative loss during the training of scFEA on the pancreatic cancer cell line data. The hyper parameters of the two loss were set differently to form four experiments. The flux balance loss, non-negative loss and total loss were blue, red and black-dash colored.

Requirements and Installation

scFEA is implemented by Python3. If you don't have python, please download Anaconda with python 3 version.

torch >= 0.4.1
numpy >= 1.15.4
pandas >= 0.23.4
matplotlib >=3.0.2
magic >= 2.0.4

Download scFEA:

git clone https://github.com/changwn/scFEA

Install requirements:

cd scFEA
conda install --file requirements
conda install pytorch torchvision -c pytorch
pip install --user magic-impute

Usage

You can see the input arguments for scFEA by help option:

python src/scFEA.py --help
usage: scFEA.py [-h] [--data_dir <data_directory>]
                [--input_dir <input_directory>] [--res_dir <data_directory>]
                [--test_file TEST_FILE] [--moduleGene_file MODULEGENE_FILE]
                [--stoichiometry_matrix STOICHIOMETRY_MATRIX]
                [--sc_imputation {True,False}]

scFEA: A graph neural network model to estimate cell-wise metabolic flux using
single cell RNA-seq data

optional arguments:
  -h, --help            show this help message and exit
  --data_dir <data_directory>
                        The data directory for scFEA model files.
  --input_dir <input_directory>
                        The data directory for single cell input data.
  --res_dir <data_directory>
                        The data directory for result [output]. The output of scFEA includes two matrices, predicted metabolic flux and metabolites
                        stress at single cell resolution.
  --test_file TEST_FILE
                        The test SC file [input]. The input of scFEA is a single cell profile matrix, where row is gene and column is cell. Example
                        datasets are provided in /data/ folder. The input can be raw counts or normalised counts. The logarithm would be performed
                        if value larger than 30.
  --moduleGene_file MODULEGENE_FILE
                        The table contains genes for each module. We provide human and mouse two models in scFEA. For human model, please use
                        module_gene_m168.csv which is default. All candidate moduleGene files are provided in /data/ folder.
  --stoichiometry_matrix STOICHIOMETRY_MATRIX
                        The table describes relationship between compounds and modules. Each row is an intermediate metabolite and each column is
                        metabolic module. For human model, please use cmMat_171.csv which is default. All candidate stoichiometry matrices are
                        provided in /data/ folder.
  --cName_file CNAME_FILE
                        The name of compounds. The table contains two rows. First row is compounds name and second row is corresponding id.
  --sc_imputation {True,False}
                        Whether perform imputation for SC dataset (recommend set to <True> for 10x data).

Run code with default parameters:

python src/scFEA.py

Other example:

python src/scFEA.py --input_dir data --res_dir output --test_file Melissa_full.csv

Citation

If you find our work helpful in your resarch or work, please cite us.

N. Alghamdi, W. Chang, P. Dang, X. Lu, C. Wan, Z. Huang, J. Wang, M. Fishel, S. Cao, C. Zhang. scFEA: A graph neural network model to estimate cell-wise metabolic using single cell RNA-seq data, under review at Genome Research, 2020.

Questions & Problems

If you have any questions or problems, please feel free to open a new issue here. We will fix the new issue ASAP. For code questions, please contact Wennan Chang.

Wennan Chang ([email protected])

For any other further questions or requests, please contact the Principle Investigator of BDRL lab.

Prof. Chi Zhang ([email protected])

PhD candidate at Biomedical Data Research Lab (BDRL) , Indiana University School of Medicine

scfea's People

Contributors

Stargazers

Watchers

scfea's Issues

interpretation of result

Thanks for your nice software.
After applying scFEA to Melissa_full.csv, I encountered several problems.

What do these two documents mean? (balance*.csv & Melissa_full_module171*.csv)
column names of balance*.csv are numbers?
where to find cName_file

thanks

Error in performing analysis in Seurat integrated data

Hi Everyone!

I have facing the issue to perform the analysis using integrate single data. I follow the tutorial 2 steps to store the expression data from RNA assay. However a error appear when I analyze this data . I would like if someone faced the same issue analyzing an integrate seurat object. So far, I saw that the tutorial 2 only consider one single cell data.

Thank you in advance for you help.

How to biologically interpret negative flux values?

hi, everyone,
as I can see there are many people have the same question with me, that is what does a negative flux value biologically mean? Did anyone have some good ideas?

Support for sparse matrices / AnnData?

Hi,
Thank you for the great tool.
I was wondering whether there is a plan to support sparse matrices (e.g. as used in AnnData files) as inputs?
Right now, using a T4 GPU with 16GB GPU memory, the largest dataset I can run scFEA with is about ~10,000 cells x ~500 genes.
This is largely because scFEA reads in a dense counts matrix into GPU memory.
If this could be a sparse matrix in the future, then I would assume scFEA could scale to more cells.

How to add other more metabolic pathways?

Hi,
The scFEA is an amazing method for metabolic analysis, it helps me a lot. But I am wondering whether I can add other more pathways to the original metabolic pathways, such as arachidonic acid metabolism and how to create STOICHIOMETRY_MATRIX file with numbers of -1,0,1 on my own. I cannot figure out the way to create STOICHIOMETRY_MATRIX file. Thank you very much!

Zero output in Tutorial1 example2

Thanks for sharing the great tool!
I successfully run these codes in tutorial1.ipynb, but the mouse_flux.csv turned out to be zero since row 27. The only change in the src/scFEA.py is 'BATCH-SIZE = 64' in my case.
Any suggesion will be greatly appreciated!

%%bash #cd /Users/chang/Documents/work/flux/scFEA cd ./ python src/scFEA.py --data_dir data --input_dir input \ --test_file mouse_example_data.csv \ --moduleGene_file module_gene_complete_mouse_m168.csv \ --stoichiometry_matrix cmMat_complete_mouse_c70_m168.csv \ --sc_imputation True \ --output_flux_file output/mouse_flux_new.csv \ --output_balance_file output/mouse_balance_new.csv

I cannot output a suitable cell_id file from the seurat object

TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

Hi, I met a wrong when I run the mouse data. My input data is in .csv format.
When it runs into this step, occurred some wrongs.

Starting load data...
Calculating MAGIC...
Running MAGIC on 51914 cells and 22382 genes.
Calculating graph and diffusion operator...
Calculating PCA...
Calculated PCA in 39.08 seconds.
Calculating KNN search...
Calculated KNN search in 493.08 seconds.
Calculating affinities...
Calculated affinities in 490.14 seconds.
Calculated graph and diffusion operator in 1027.61 seconds.
Running MAGIC with solver='exact' on 22382-dimensional data may take a long time. Consider denoising specific genes with genes=<list-like> or using solver='approximate'.
Calculating imputation...
Calculated imputation in 104.30 seconds.
Calculated MAGIC in 1135.44 seconds.

Traceback (most recent call last):
File "scFEA/./src/scFEA.py", line 370, in
main(args)
File "scFEA/./src/scFEA.py", line 140, in main
cmMat = torch.FloatTensor(cmMat).to(device)
TypeError: can't convert np.ndarray of type numpy.object. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool._

Please help me! Thanks.

How to biologically interpret negative flux values

Hi,
I would like to know how to biologically interpret negative flux values? Does this indeed mean that the reaction has reversed? Would you anticipate this to mean that there may be an accumulation of the metabolite in to the reaction?
Many thanks

Create STOICHIOMETRY_MATRIX

Hi everyone

I am trying to do the analysis for another organism beyond human and mouse.

Does some know how to create STOICHIOMETRY_MATRIX file with numbers of -1,0,1?

I contacted the developers by email, but I did not get any answer from them.I did not understand very well how I could construct STOICHIOMETRY_MATRIX file. May I ask you some help?

Many thaks in avance for your help.

Paola

Could you share the script for “perturbation analysis to identify high impact metabolic genes”？

Hi,
The scFEA is an amazing perspective for metabolic analysis, it helps me a lot. I was wondering if you could share the code for "perturbation analysis to identify high impact metabolic genes".
Thank you very much.

Missing of lipid metabolism network file in download page

Thanks for the useful tools!!
May I know where to download the lipid metabolism network? It seems like related-file is absent in the Download Page of scFEA. It would be really helpful for us to have that file. Thanks!

Bests,
Elaine

-

ValueError: operands could not be broadcast together with shapes

Hi! It's a really nice software to work with. I've been using it on windows with my laptop and it works fine. However, when I turned to use scFEA on MAC OS, I always encountered bugs as bellow:

Traceback (most recent call last):File"/Users/yuhaihui/scFEA/src/scFEA.py"，line 370，in
main(args)
File "/Users/yuhaihui/scFEA/src/scFEA.py"，line 184，in mainmodule scale = torch,FloatTensor(module scale.values/ moduleLen

ValueError: operands could not be broadcast together with shapes (0,9) (168, )

I am very glad to receive help of any kind from you.

10X data

Thanks for your software.
I have a question. If scFEA can be used to analyze 10X scRNA-Seq data?

spatial transcriptomic data analysis (spatial dependent metabolic and biochemical changes)

Would you provide spatial transcriptomic data analysis and more visualization for us? Thank you very much!

questions about Pandas?

(base) C:\Users\Lenovo\scFEA>python src/scFEA.py --data_dir data --input_dir input --test_file Seurat_geneExpr.csv --moduleGene_file module_gene_glutaminolysis1_m23.csv --stoichiometry_matrix cmMat_glutaminolysis1_c17_m23.csv --cName_file cName_glutaminolysis1_c17_m23.csv --output_flux_file output/Seurat_gluta_flux.csv --output_balance_file output/Seurat_gluta_balance.csv
Starting load data...
Load compound name file, the balance output will have compound name.
Load data done.
Starting process data...
Traceback (most recent call last):
File "C:\Users\Lenovo\scFEA\src\scFEA.py", line 370, in
main(args)
File "C:\Users\Lenovo\scFEA\src\scFEA.py", line 172, in main
geneExprDf = geneExprDf.append(temp, ignore_index = True, sort=False)
^^^^^^^^^^^^^^^^^
File "D:\ANACONDA\Lib\site-packages\pandas\core\generic.py", line 6296, in getattr
return object.getattribute(self, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'DataFrame' object has no attribute 'append'. Did you mean: '_append'?

Is this question about pandas? How to resolve? Thanks a lot!

"duplicate" metabolites with different values in balance file

Hi, I provide two screenshots and hope it can be helpful to solve this problem.

Error analysing Visium data

Hello @changwn,

Thank you for the great package. I have successfully installed and ran the Melissa_full.csv data set on the package, however I have not been able to run any of the Visium data (including the one made available in the package under /data) - as soon as it starts training neural networks I get the following error:

Since the compatibility of scFEA with Visium is a main interest, how do you recommend I solve this?

Thank you.

Melissa

Inputs in different folders

I would like to run scFEA with inputs in different folders (e.g. expression in one folder and the inputs supplied by scFEA in another folder). Can I just specify empty string for data path and then full path for individual files?

Example 3 producing blank results

When following example 4.3 from the tutorial, I get blank files (other than the column and row names) for both the flux and balance files. What could be a reason for this?

How to explain the loss terms and set the hyperparameters

Dear sir，

The function of scFEA is exactly what I need, thanks for your useful job!
I have some questions about scFEA
I have test a scRNA-seq dataset of cancer. The picture shows the loss in the result

I don't know why the cellVar is too high, and does not converge.

Could you please help me explain these five items？

Another question, where can I set the hyperparameters? Is it necessary to set them?

Best wishes,
Shang Yunfei

Input file row and columns do not match documentation

I supplied input expression genes*cells, e.g. wc -l input.csv returns the number of genes+1 (30673). There are 102143 cells in the file. However, when I run the scFEA it seems that it swaps genes and cells (see below). So there is something wrong as documentation says that genes should be in rows.

Calculating MAGIC...
Starting load data...
  Running MAGIC on 30672 cells and 102143 genes.
  Calculating graph and diffusion operator...
    Calculating PCA...
    Calculated PCA in 357.77 seconds.
    Calculating KNN search...
    Calculated KNN search in 165.82 seconds.
    Calculating affinities...
    Calculated affinities in 146.12 seconds.
  Calculated graph and diffusion operator in 682.13 seconds.
  Running MAGIC with `solver='exact'` on 102143-dimensional data may take a long time. Consider denoising specific genes with `genes=<list-like>` or using `solver='approximate'`.
  Calculating imputation...
  Calculated imputation in 2589.52 seconds.
Calculated MAGIC in 3316.86 seconds.
/home/icb/karin.hrovatin/.local/lib/python3.8/site-packages/graphtools/graphs.py:287: RuntimeWarning: Detected zero distance between 42878 pairs of samples. Consider removing duplicates to avoid errors in downstream processing.

Error in loading compound name file

Hi,
I'm getting a 'file not found' error when trying to load the compound name file if I specify a non-standard data directory with the data_dir flag. This error was fixed if I used the standard data as my data directory. In my case, I specified the directory with:
--data_dir scFEA/data

and got the error:
FileNotFoundError: [Errno 2] No such file or directory: './data/cName_c70_m168.csv'

From looking through your code, I think the error is coming from line 145 of your scFEA.py code:
"./data/" + cName_file,
where calling the data directory variable data_path would fix it.

It seems like a great tool, thank you for building it!

Separate or pooled analysis for different experimental conditions

Hi,

Thank you for developing this great tool!

I was wondering if I should run scFEA separately on the different scRNAseq data from different experimental conditions or I can simply run scFEA on a concatenated expression matrix to compare the fluxome between conditions.

For example, to generate Figure 4 (APEX1-KD vs control), did you run scFEA separately on 1) Normoxia Control, 2) Normoxia KD, 3) Hypoxia Control and 4) Hypoxia KD and then compare the flux? or did you run scFEA on a concatenated expression matrix that includes all four conditions and then compare the flux?

Given scFEA tries to minimise the overall flux imbalance across all input cells, the former approach (run scFEA separately and compare the flux) sounds more reasonable than the latter, but I am not sure if the flux estimates from separate scFEA runs are comparable.

Thanks,

Balance values interpretation

As others have indicated previously. It is not clear how to interpret the balance values (especially the negative values). Could you maybe explain it a bit more in simpler terms?

Error with large mouse dataset

Hi,

Thank you very much for providing this really interesting and useful analysis pipeline!
Installation went smoothly with your documentation - thank you also for that!
I ran the pipeline on the test dataset and it worked without a problem.

Then I tried a larger mouse dataset and got an error - see below.
Any ideas on how to fix this are highly appreciated!

Thank you very much!

Calculating MAGIC...
Starting load data...
Running MAGIC on 2566 cells and 17278 genes.
Calculating graph and diffusion operator...
Calculating PCA...
Calculated PCA in 4.22 seconds.
Calculating KNN search...
Calculated KNN search in 0.60 seconds.
Calculating affinities...
Calculated affinities in 0.60 seconds.
Calculated graph and diffusion operator in 5.56 seconds.
Running MAGIC with solver='exact' on 17278-dimensional data may take a long time. Consider denoising specific genes with genes=<list-like> or using solver='approximate'.
Calculating imputation...
Calculated imputation in 3.07 seconds.
Calculated MAGIC in 8.75 seconds.
Load data done.
Starting process data...
Traceback (most recent call last):
File "scFEA/src/scFEA.py", line 327, in
main(args)
File "scFEA/src/scFEA.py", line 157, in main
module_scale = torch.FloatTensor(module_scale.values/ moduleLen)
ValueError: operands could not be broadcast together with shapes (2566,161) (162,)

res_dir bug and solution

In the "scFEA.py" file, in lines 222 and 324, the specifications for "lossName" and "balanceName" directly use the "output" folder without utilizing the "res_dir" parameter provided by the user. This results in an error when the user specifies an output path other than "output".
FileNotFoundError: [Errno 2] No such file or directory: './output/lossValue_20231124-161348.txt'
The issue can be resolved by changing the content of these two lines to the following:
lossName = "./" + res_dir + "/lossValue_" + timestr + ".txt"
balanceName = "./" + res_dir + "/balance_" + timestr + ".csv"

How can I use scFEA in other species?

Hi,
Thanks for the useful tool for metabolic analysis. I try to use this tool in our model species (axolotl). I can replace the single-cell input matrix with human gene name. But I am not sure how can I obtain the two example RData files (mouse_example_cell_ident.RData and mouse_module_info.RData) from my single-cell RNA-seq data of axolotl.
Hope for your reply and help.
Thanks for your work.

How to get the over-all flux score in a "Supermodule"

I only got the individual 168 small module reaction flux score, but I am having trouble getting an overall score for a Supermodule.
For example:
The first Supermodule "glycolysis + TCA" has 14 small modules, and I only have the flux score and a ridge plot for each small module.
How to get the overall flux score for Supermodule 1?
Can I just simply add them up? I don't find any related information in their tutorial either.

issue for NA in balance file

Hi ,
Thank you for this awesome tool. I found that NA is included in the balance file. I am not sure which metabolite corresponds to it. Thanks.

scFEA_tutorial2 Report an error

Seurat_gluta_flux.csv
After scFEA done, an error was reported when performing FindClusters(obj, resolution = 0.5, verbose = F)

r$> predFlux <- read.csv('./output/Seurat_gluta_flux.csv', header = T, row.names = 1)
r$> predFlux <- data.matrix(predFlux)
r$> predFlux0 <- t(predFlux)
r$> # add flux as a new assay
obj[["FLUX"]] <- CreateAssayObject(counts = predFlux0)
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')

r$> DefaultAssay(obj) <- 'FLUX'
r$> obj <- FindVariableFeatures(obj, selection.method = "vst", nfeatures = 2000, verbose = F)
r$> obj <- ScaleData(obj, features = rownames(obj), assay = 'FLUX', verbose = F)
r$> obj <- RunPCA(obj, features = VariableFeatures(object = obj), npcs = 10, reduction.name = 'pca.flux', verbose = F)
Warning message:
Cannot add objects with duplicate keys (offending key: PC_) setting key to original value 'pca.flux_'
r$> obj <- FindNeighbors(obj, dims = 1:2, verbose = F)
r$> obj <- FindClusters(obj, resolution = 0.5, verbose = F)
Error in FindClusters.Seurat(obj, resolution = 0.5, verbose = F) :
Provided graph.name not present in Seurat object

Redundant reactions in the M171 list

Hi, the following two reactions seem to be redundant in the M171 list:
M_1,Glucose_G6P,Glucose,C00267,G6P,C00668,1
M_106,Glucose_Glucose-6-phosphate,Glucose,C00267,Glucose-6-phosphate,C00668,13

Are these reactions representing different biological processes?

Thanks,

issue of cName_glutaminolysis1 and 2

Hi, wennan, cname of Lactate has not updated, it should be C00256.

How to mapping modules to pathways

Dear Sir or Madam,

Thanks for the powerful tool

I used module_gene_m168.csv and cmMat_171.csv as input and generated a result file for 168 modules, which I think contains almost all of human metabolism.
But if in research just using modules for follow-up analysis is difficult for people to understand, so most people accept the pathway more, which is more biologically meaningful, so I hope you can tell me how to map the modules to the pathways
I noticed that each module contains all the reactions related, so can I consider that the flux of a metabolic pathway contains the sum of all modules that contain this pathway.
Such as TCA, Should I focus on all modules or just the modules in supermodule 1？

I also noticed that the flux of the module is the same as the flux of all the reactions inside the module, so how to understand when a reaction exists in multiple modules?

Hope to get your reply soon, any help is appreciated!

Best wishes,
Shang

Supplementary methods

Where can I find the supplementary methods (e.g. simulation methods) that you mention should be in supplementary? I can find only supplementary figures and tables, but no methods on BioRXiv.

scFEA output

For the same data, when I work with a smaller number of cells (1000), it works fine. But when I'm working with a lot more cells (>50000),The flux and balance files are NA.Why is that？

Input types

Could you add some documentation on what kind of inputs/file formats the tool accepts, e.g. expanding descriptions in

python src/scFEA.py --help

Issue on downstream analysis

Hi chang,

Thanks for the powerful tool. I noticed scFEA consists of three major computational components, however, I did not found any guideline/results for downstream analysis after I got the output of scFEA. Could you kindly provide any suggestion for downstream analysis, any help will be appreciated! Thanks!

wrong names for metabolites

Thanks for your software! However, why the names of metabolites are numbers?

Problem about scFEA_tutorial2.ipynb

Running `scFEA`

Hi,

Thanks a lot for releasing scFEA. I'm very excited to test it on my data.

However, I was wondering if you could include a brief section on how to install and run the tool. I believe
that would be very beneficial.

Thanks!

issue of tutorial

Hi, "./scFEA/input/cell_ident.RData" is missing, can i trouble you to upload it. And, how to run scFEA for samples with two conditions(such as cancer and normal). Thanks a lot.

Is it ok to remove mitochondrial genes in scRNA-seq data if I'm going to use scFEA?

Hello. Is it ok to remove mitochondrial genes in scRNA-seq data if I'm going to use scFEA?

Tutorial2 Bug

Hi,

Thank you so much for all your work!

I've come across an error in your tutorial2, when you compute the tSNE of the FLUX matrix using Seurat.

obj <- RunTSNE(obj, dims = 1:2, assay = 'FLUX', reduction.name = "tsne.flux", verbose = F)

As both UMAP and tSNE functions in seurat by default use the PCA embedding, it is using the PCA calculated from the gene matrix previously. If you set dims = 1:10 you can see that the resulting tSNE plot is identical to the one computed on the gene matrix. Instead the pca embedding to be used for the tSNE function should be specified with reduction = 'pca.flux'

I would also recommend to include a few lines on how to calculate the optimal number of pca dimensions to be used in the tSNE calculation, as two dimensions (dims = 1:2) wouldn't really require dimensionality reduction.
Seurat has a JackStraw() function or the less computationally intensive ElbowPlot().

Thanks again for scFEA, I look forward to using it in the future.

AttributeError: 'DataFrame' object has no attribute 'append'. Did you mean: '_append'?

when I use the test data Melissa_full.csv to run the code, the below error appear.

AttributeError: 'DataFrame' object has no attribute 'append'. Did you mean: '_append'?

how do i fixed it. thank you

RuntimeError: value cannot be converted to type uint8_t without overflow: 3416

Thanks for the brilliant scFEA. However, I am running into a problem. The error is as below:

Load compound name file, the balance output will have compound name.
Load data done.
Starting process data...
Process data done.
Starting train neural network...
  0%|                                                                                                                                                      | 0/100 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/Users/rg/scFEA/src/scFEA.py", line 370, in <module>
    main(args)
  File "/Users/rg/scFEA/src/scFEA.py", line 235, in main
    geneScale = X_scale_batch, moduleScale = m_scale_batch)
  File "/Users/rg/scFEA/src/scFEA.py", line 50, in myLoss
    if sum(diff > 0) == m.shape[0]: # solve Nan after several iteraions
RuntimeError: value cannot be converted to type uint8_t without overflow: 3416

My script is:

python /Users/rg/scFEA/src/scFEA.py --data_dir /Users/XX/scFEA/data \               
                                                            --input_dir /Users/XX/scFEA/input \
                                                            --moduleGene_file module_gene_complete_mouse_m168.csv\
                                                            --test_file count_matrix.csv\
                                                            --cName_file cName_complete_mouse_c70_m168.csv\
                                                            --sc_imputation True \
                                                            --stoichiometry_matrix cmMat_complete_mouse_c70_m168.csv\
                                                            --output_flux_file output/adj_flux.csv\
                                                            --output_balance_file output/adj_balance.csv

I am using MacOS Ventrua 13.2.1 (22D68) with 64GB memory, and version of related packages are:

 numpy                              1.17.0
 torch                              1.0.1
 pandas                             1.0.5
 matplotlib                         3.2.2
 magic-impute                       3.0.0

I am completely new at python, so I have no idea what caused this error.
Please let know if there are any solutions to this problem.
Thank you

fluxStatuTest[i, :] = out_m_batch.detach().numpy() type error

Hi, i'm trying to run this on a GPU and it errors below:

any suggestions on how to fix that?

I'm using:
numpy==1.21.5
torch==1.11.0 (build py3.7_cuda10.2_cudnn7.6.5_0 -c pytorch)

python /nfs/team297/kt16/Softwares/scFEA/src/scFEA.py --data_dir /nfs/team297/kt16/Softwares/scFEA/data --input_dir soc_untreated --test_file expression_mat.csv --moduleGene_file module_gene_m168.csv --stoichiometry_matrix cmMat_c70_m168.csv
Starting load data...
Load compound name file, the balance output will have compound name.
Load data done.
Starting process data...
Process data done.
Starting train neural network...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:57<00:00,  1.74it/s]
Training time:  57.375060081481934
Traceback (most recent call last):
  File "/nfs/team297/kt16/Softwares/scFEA/src/scFEA.py", line 370, in <module>
    main(args)
  File "/nfs/team297/kt16/Softwares/scFEA/src/scFEA.py", line 299, in main
    fluxStatuTest[i, :] = out_m_batch.detach().numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

cmMat_171.csv is missing

Hello

I am trying to run the code on single-cell human data and I didn't find the cmMat_171.csv (stoichiometry_matrix) in the data folder. What is the alternative?


--stoichiometry_matrix STOICHIOMETRY_MATRIX
                        The table describes relationship between compounds and modules. Each row is an intermediate metabolite and each column is
                        metabolic module. For human model, please use cmMat_171.csv which is default. All candidate stoichiometry matrices are
                        provided in /data/ folder.

Small result file value

I was checking my result metabolite and flux file and most of the values were smaller than 0.05.
While I am comparing to tutorial sample the flux is around thousands
Does this indicate there is nothing changed in my datasets or something went wrong? I am using normalized counts btw.