Giter Site home page Giter Site logo

error in diffExp about flair HOT 11 CLOSED

brookslabucsc avatar brookslabucsc commented on August 11, 2024
error in diffExp

from flair.

Comments (11)

csoulette avatar csoulette commented on August 11, 2024

Hi ematlab,

I suspect the problem is being caused by a simple formatting error causing gene expression to not be properly quantified, perhaps something has gone wrong with the count_matrix.tsv file. Can you confirm that the count_matrix.tsv file formatting follows the format we've outlined in the readme (posted here)?

For example:

ids samp1_conditionA_batch1 samp2_conditionA_batch1 samp3_conditionA_batch2 ...
0042c9e7-b993_ENSG00000131368.3 237.0 156.0 165.0 150.0 ...
0042d216-6b08_ENSG00000101940.13 32.0 14.0 25.0 ...

It would be very helpful if you could attach/forward part or whole of your counts matrix.

Thanks
-CMS

from flair.

ematlab avatar ematlab commented on August 11, 2024

Hi, thank you for you reply.
I attach you part of our counts matrix:

ids sample2_conditionB_batch1 sample1_conditionA_batch1
0000a32d-27a2-42a1-a4ba-17dfa77d17e7;16_ENSG00000100422.13 2.0 7.0
00082903-7250-4b72-b379-143a0a9ca04f;16_chr20:63743000 2.0 1.0
00154906-62c9-4b68-ad01-6cdce8ce1405;0_chr11:63909000 3.0 12.0
001990ab-5a56-4644-90a6-7779fc0d6ef1;16_ENSG00000166888.11 40.0 58.0
00297a92-b90b-45ec-b182-47efdcb367d4;0_ENSG00000133619.17 4.0 4.0
002ab489-1c61-43d0-9066-14b98c0b4e8a;0_chr3:185483000 3.0 0.0
0035309b-19ad-4eac-a77b-bc78d6f792de;16_ENSG00000159733.13 1.0 2.0
0036ad34-deaa-4664-8116-e8d4055aa056;16_ENSG00000140988.15-2 10.0 11.0
003c97c6-af60-4f77-84ae-e2dbecc827b3;16_ENSG00000132465.10 62.0 1573.0
003d0572-9025-4255-a129-6eec8187830c;0_ENSG00000104859.14 8.0 9.0
00481c66-1d75-431d-8479-9b2b31f31c6c;0_chr15:51003000 3.0 11.0
004d4553-5d77-43a6-854a-5b1f69804431;0_ENSG00000138092.10 0.0 8.0
005b08d6-ba3b-487b-ab11-68ce8fb00b4a;0_ENSG00000143093.14 3.0 1.0
005ba608-dca1-4256-a669-2c88c24247b1;0_ENSG00000100297.15 4.0 44.0
00615da0-22c7-4d26-8746-6ca625da5a44;0_ENSG00000147403.16 108.0 424.0
0068d1da-d1e3-4937-bd9f-feda6ec3e41a;0_chr7:12242000 1.0 2.0
007471e5-4490-49b7-8820-aad7880356b2;0_ENSG00000188820.12 4.0 81.0

diffExp module also returned the following error:
Traceback (most recent call last):
File "/home/flair-master/flair-master/bin/runDU.py", line 23, in
from rpy2 import robjects
File "/usr/local/lib/python2.7/dist-packages/rpy2-2.8.6-py2.7-linux-x86_64.egg/rpy2/robjects/init.py", line 16, in
import rpy2.rinterface as rinterface
File "/usr/local/lib/python2.7/dist-packages/rpy2-2.8.6-py2.7-linux-x86_64.egg/rpy2/rinterface/init.py", line 92, in
from rpy2.rinterface._rinterface import (baseenv,
ImportError: /usr/local/lib/python2.7/dist-packages/rpy2-2.8.6-py2.7-linux-x86_64.egg/rpy2/rinterface/_rinterface.so: undefined symbol: R_NilValue

We also tried to reinstall rpy 2.8.6 version, without results.
Thank you again

from flair.

csoulette avatar csoulette commented on August 11, 2024

Hi ematlab,

Looks like there was a minor formatting error/bug in the formula table for deseq2, and also with some old code that was used to compute isoform usage percentages (your first issue). As for the rpy2 issue, i've ran through my data and a modified version of your data using python2.7 / rpy version 2.8.6 without any error. I would make sure all of the required libraries for R are properly installed.

A question about your data though, is your comparison between only two samples? The diffExp module is explicitly for datasets that have at least 3 samples per condition. If you are trying to compare expression simply between two samples, I would advise following the instructions on our readme for using the diff_iso_usage.py script, which is a module specifically for comparing two samples.

Thanks~

-CMS

from flair.

ematlab avatar ematlab commented on August 11, 2024

Hi csoulette

I used the diff_iso_usage.py script and it works. Just a question: the results file contains less gene records than the input.psl file (19414 vs 2835). Is it right?
I also reinstalled R libraries and rpy2, and I ran diffExp using python3. In this way everything worked better but not good: the final results contained only the "die_conditionB_v_conditionA_drimseq2_results.tsv" file (I used a modified psl input with 3 technical replicates for condition, just to test the script).
The main error of diffexp reported was:
" init.py:146: RRuntimeWarning: Error: object "\001NULL\001" not found",
folowed by the following warning messages:
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Inoltre:
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Warning message:

warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: In formals(fun) :
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: argument is not a function

warnings.warn(x, RRuntimeWarning)
/home/ezio/flair-master/flair-master/bin/runDU.py:105: FutureWarning: read_table is deprecated, use read_csv instead.
quantDF = pd.read_table(matrix, header=0, sep='\t', index_col=0)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: ! Using a subset of 0.1 genes to estimate common precision !

warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: ! Using common_precision = 993.7223 as prec_init !

warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: ! Using loess fit as a shrinkage factor !

warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: * Fitting the DM model..

warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Using the regression approach.

warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Took 3.1816 seconds.

warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: * Fitting the BB model..

warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Took 0.9573 seconds.

warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Using the one way approach.

warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Took 1.7099 seconds.

warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: * Calculating likelihood ratio statistics..

warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Took 0.0025 seconds.

warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Took 0.8891 seconds.

warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Took 0.0031 seconds.

warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/robjects/pandas2ri.py:191: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict(items), ...) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order.
res = PandasDataFrame.from_items(items)

Thanks for your help

from flair.

belgravia avatar belgravia commented on August 11, 2024

To address the diff_iso_usage.py results: Yes, that behavior is expected. The script outputs the most significant isoform usage switch per gene between your two samples, hence fewer lines in the output. However to avoid confusion, I have changed the output for that script now, so there will be a Fisher's p-value assigned for every isoform. Genes that only have 1 isoform will have NA assigned to that isoform. The script now also takes in two column arguments in case your expression values are not in adjacent columns. Please rerun if you get the chance.

diff_iso_usage.py should only be used for pairwise comparisons with no replicates- 1 condition compared to 1 other condition. If you have 3 conditions, you would run diff_iso_usage.py multiple times, comparing condition A to B, B to C, and also A to C depending on what you are interested in. You should only run flair-diffExp if you have 3 or more replicates for each condition. We're looking into all the rpy2 error messages in the meantime.

from flair.

csoulette avatar csoulette commented on August 11, 2024

Hi ematlab,

Apologies for the delay. I've changed some code to make the diffExp module a little more strict as to what type of input is acceptable, since, as i've mentioned, it should only be used in cases where you are comparing groups with at least 3 replicates.

The module is more or less a wrapper for DESeq2, so the requirements for a successful differential expression analysis are inherited from DESeq2. Therefore, for your run with technical replicates, I think such an experiment would only work if there is enough variability in your technical replicates.

Please try the latest version of flair and let me know if any problems arise. My response time will be much better moving forward. If there is still issues in running your data then it may be necessary to take a closer look at your inputs if possible.

Thanks for your patience ~

-CMS

from flair.

ematlab avatar ematlab commented on August 11, 2024

Hi csoulette,
thank you for your effort. Don't worry about the time, we have entire life.
As you suggested I have tried the last flair version (including the new dependency "salmon") and I got the following error:
dge_stderr.txt

thanks again!

from flair.

csoulette avatar csoulette commented on August 11, 2024

Hi ematlab,

The error you are getting looks like a write permission error. This may be good news since I think it means that the workflows are actually finishing, but cannot write any output due to the permission error. It seems that the diffExp module is generating a dge_stderr.txt file, which means you can write to the directory you are specifying just fine, so the issue is likely due to the method i'm using to write the output pdfs and tables. I've changes some code and am now using an alternative method for writing all of the output files. Please pull the latest version of flair (mainly the runDU.py and runDE.py scripts in ./bin) and let me know if the updates have solved the issue.

-CMS

from flair.

ematlab avatar ematlab commented on August 11, 2024

Hi csoulette,
thanks again. I followed your instructions and seems works good. I've got 4 tsv files and two pdf files containing results. Anyway, an error file is still present.
dge_stderr.txt

Thank you very much for your precious support!

from flair.

csoulette avatar csoulette commented on August 11, 2024

Hey ematlab,

There was an issue specific to python2 in creation of the formula matrix. I had pushed a fix for this issue, but it seemed to have broken something else. I have now implemented a fix for formatting of the formula matrix for the differential expression and usage analysis. If you pull runDU.py and rerun the analysis, all should work fine!

As always, thank for your patience! (=

-CMS

from flair.

ematlab avatar ematlab commented on August 11, 2024

Hi csoulette,
please forgive me for the huge delay. Finally yesterday I had time for rerun the analysis with your update. I got same four file and one error file (below). Anyway, the information I was seeking were obtained. I'm reporting the error file just for your update.
I think you made a great job! Thanks a lot!
dge_stderr.txt

from flair.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.