Comments (11)
Hi ematlab,
I suspect the problem is being caused by a simple formatting error causing gene expression to not be properly quantified, perhaps something has gone wrong with the count_matrix.tsv file. Can you confirm that the count_matrix.tsv file formatting follows the format we've outlined in the readme (posted here)?
For example:
ids samp1_conditionA_batch1 samp2_conditionA_batch1 samp3_conditionA_batch2 ...
0042c9e7-b993_ENSG00000131368.3 237.0 156.0 165.0 150.0 ...
0042d216-6b08_ENSG00000101940.13 32.0 14.0 25.0 ...
It would be very helpful if you could attach/forward part or whole of your counts matrix.
Thanks
-CMS
from flair.
Hi, thank you for you reply.
I attach you part of our counts matrix:
ids sample2_conditionB_batch1 sample1_conditionA_batch1
0000a32d-27a2-42a1-a4ba-17dfa77d17e7;16_ENSG00000100422.13 2.0 7.0
00082903-7250-4b72-b379-143a0a9ca04f;16_chr20:63743000 2.0 1.0
00154906-62c9-4b68-ad01-6cdce8ce1405;0_chr11:63909000 3.0 12.0
001990ab-5a56-4644-90a6-7779fc0d6ef1;16_ENSG00000166888.11 40.0 58.0
00297a92-b90b-45ec-b182-47efdcb367d4;0_ENSG00000133619.17 4.0 4.0
002ab489-1c61-43d0-9066-14b98c0b4e8a;0_chr3:185483000 3.0 0.0
0035309b-19ad-4eac-a77b-bc78d6f792de;16_ENSG00000159733.13 1.0 2.0
0036ad34-deaa-4664-8116-e8d4055aa056;16_ENSG00000140988.15-2 10.0 11.0
003c97c6-af60-4f77-84ae-e2dbecc827b3;16_ENSG00000132465.10 62.0 1573.0
003d0572-9025-4255-a129-6eec8187830c;0_ENSG00000104859.14 8.0 9.0
00481c66-1d75-431d-8479-9b2b31f31c6c;0_chr15:51003000 3.0 11.0
004d4553-5d77-43a6-854a-5b1f69804431;0_ENSG00000138092.10 0.0 8.0
005b08d6-ba3b-487b-ab11-68ce8fb00b4a;0_ENSG00000143093.14 3.0 1.0
005ba608-dca1-4256-a669-2c88c24247b1;0_ENSG00000100297.15 4.0 44.0
00615da0-22c7-4d26-8746-6ca625da5a44;0_ENSG00000147403.16 108.0 424.0
0068d1da-d1e3-4937-bd9f-feda6ec3e41a;0_chr7:12242000 1.0 2.0
007471e5-4490-49b7-8820-aad7880356b2;0_ENSG00000188820.12 4.0 81.0
diffExp module also returned the following error:
Traceback (most recent call last):
File "/home/flair-master/flair-master/bin/runDU.py", line 23, in
from rpy2 import robjects
File "/usr/local/lib/python2.7/dist-packages/rpy2-2.8.6-py2.7-linux-x86_64.egg/rpy2/robjects/init.py", line 16, in
import rpy2.rinterface as rinterface
File "/usr/local/lib/python2.7/dist-packages/rpy2-2.8.6-py2.7-linux-x86_64.egg/rpy2/rinterface/init.py", line 92, in
from rpy2.rinterface._rinterface import (baseenv,
ImportError: /usr/local/lib/python2.7/dist-packages/rpy2-2.8.6-py2.7-linux-x86_64.egg/rpy2/rinterface/_rinterface.so: undefined symbol: R_NilValue
We also tried to reinstall rpy 2.8.6 version, without results.
Thank you again
from flair.
Hi ematlab,
Looks like there was a minor formatting error/bug in the formula table for deseq2, and also with some old code that was used to compute isoform usage percentages (your first issue). As for the rpy2 issue, i've ran through my data and a modified version of your data using python2.7 / rpy version 2.8.6 without any error. I would make sure all of the required libraries for R are properly installed.
A question about your data though, is your comparison between only two samples? The diffExp module is explicitly for datasets that have at least 3 samples per condition. If you are trying to compare expression simply between two samples, I would advise following the instructions on our readme for using the diff_iso_usage.py
script, which is a module specifically for comparing two samples.
Thanks~
-CMS
from flair.
Hi csoulette
I used the diff_iso_usage.py script and it works. Just a question: the results file contains less gene records than the input.psl file (19414 vs 2835). Is it right?
I also reinstalled R libraries and rpy2, and I ran diffExp using python3. In this way everything worked better but not good: the final results contained only the "die_conditionB_v_conditionA_drimseq2_results.tsv" file (I used a modified psl input with 3 technical replicates for condition, just to test the script).
The main error of diffexp reported was:
" init.py:146: RRuntimeWarning: Error: object "\001NULL\001" not found",
folowed by the following warning messages:
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Inoltre:
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Warning message:
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: In formals(fun) :
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: argument is not a function
warnings.warn(x, RRuntimeWarning)
/home/ezio/flair-master/flair-master/bin/runDU.py:105: FutureWarning: read_table is deprecated, use read_csv instead.
quantDF = pd.read_table(matrix, header=0, sep='\t', index_col=0)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: ! Using a subset of 0.1 genes to estimate common precision !
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: ! Using common_precision = 993.7223 as prec_init !
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: ! Using loess fit as a shrinkage factor !
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: * Fitting the DM model..
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Using the regression approach.
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Took 3.1816 seconds.
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: * Fitting the BB model..
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Took 0.9573 seconds.
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Using the one way approach.
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Took 1.7099 seconds.
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: * Calculating likelihood ratio statistics..
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Took 0.0025 seconds.
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Took 0.8891 seconds.
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Took 0.0031 seconds.
warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/robjects/pandas2ri.py:191: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict(items), ...) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order.
res = PandasDataFrame.from_items(items)
Thanks for your help
from flair.
To address the diff_iso_usage.py
results: Yes, that behavior is expected. The script outputs the most significant isoform usage switch per gene between your two samples, hence fewer lines in the output. However to avoid confusion, I have changed the output for that script now, so there will be a Fisher's p-value assigned for every isoform. Genes that only have 1 isoform will have NA assigned to that isoform. The script now also takes in two column arguments in case your expression values are not in adjacent columns. Please rerun if you get the chance.
diff_iso_usage.py
should only be used for pairwise comparisons with no replicates- 1 condition compared to 1 other condition. If you have 3 conditions, you would run diff_iso_usage.py
multiple times, comparing condition A to B, B to C, and also A to C depending on what you are interested in. You should only run flair-diffExp if you have 3 or more replicates for each condition. We're looking into all the rpy2 error messages in the meantime.
from flair.
Hi ematlab,
Apologies for the delay. I've changed some code to make the diffExp module a little more strict as to what type of input is acceptable, since, as i've mentioned, it should only be used in cases where you are comparing groups with at least 3 replicates.
The module is more or less a wrapper for DESeq2, so the requirements for a successful differential expression analysis are inherited from DESeq2. Therefore, for your run with technical replicates, I think such an experiment would only work if there is enough variability in your technical replicates.
Please try the latest version of flair and let me know if any problems arise. My response time will be much better moving forward. If there is still issues in running your data then it may be necessary to take a closer look at your inputs if possible.
Thanks for your patience ~
-CMS
from flair.
Hi csoulette,
thank you for your effort. Don't worry about the time, we have entire life.
As you suggested I have tried the last flair version (including the new dependency "salmon") and I got the following error:
dge_stderr.txt
thanks again!
from flair.
Hi ematlab,
The error you are getting looks like a write permission error. This may be good news since I think it means that the workflows are actually finishing, but cannot write any output due to the permission error. It seems that the diffExp module is generating a dge_stderr.txt
file, which means you can write to the directory you are specifying just fine, so the issue is likely due to the method i'm using to write the output pdfs and tables. I've changes some code and am now using an alternative method for writing all of the output files. Please pull the latest version of flair (mainly the runDU.py
and runDE.py
scripts in ./bin) and let me know if the updates have solved the issue.
-CMS
from flair.
Hi csoulette,
thanks again. I followed your instructions and seems works good. I've got 4 tsv files and two pdf files containing results. Anyway, an error file is still present.
dge_stderr.txt
Thank you very much for your precious support!
from flair.
Hey ematlab,
There was an issue specific to python2 in creation of the formula matrix. I had pushed a fix for this issue, but it seemed to have broken something else. I have now implemented a fix for formatting of the formula matrix for the differential expression and usage analysis. If you pull runDU.py
and rerun the analysis, all should work fine!
As always, thank for your patience! (=
-CMS
from flair.
Hi csoulette,
please forgive me for the huge delay. Finally yesterday I had time for rerun the analysis with your update. I got same four file and one error file (below). Anyway, the information I was seeking were obtained. I'm reporting the error file just for your update.
I think you made a great job! Thanks a lot!
dge_stderr.txt
from flair.
Related Issues (20)
- Transcript read number
- Question regarding sequencing depth and collapse step HOT 1
- Issues implementing FLAIR2 HOT 1
- Unexpected entries in gtf file after flair collapse HOT 1
- FLAIR could improve alignments by passing minimap2 splice junctions HOT 3
- diffSplice files are empty HOT 1
- Running Flair with multiple samples and replicates HOT 1
- diffSplice : No such file or directory: 'call_diffsplice_events.py' HOT 1
- Flair collapse stuck at "renaming isoforms using gtf" without error HOT 3
- Running collapse_isoforms_precise.py in a main entry point HOT 2
- Not full length transcript HOT 2
- ITD detection? HOT 12
- Flair quantify error message bug HOT 1
- A question about correct model HOT 4
- Set isoform abundance cutoff to FLAIR quantify HOT 1
- Crash running flair collapse HOT 8
- Gene model naming issue HOT 1
- error in flair quantify HOT 9
- add --split-prefix to collapse minimap command
- Flair collapse issue HOT 13
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flair.