oshlack / lace Goto Github PK
View Code? Open in Web Editor NEWBuilding SuperTranscripts: A linear representation of transcriptome data
License: Other
Building SuperTranscripts: A linear representation of transcriptome data
License: Other
After install Lace ,:
conda env create -f environment.yml
conda activate lace
pip install .
running Lace return s
Lace/Lace_run.py -h
Traceback (most recent call last):
File "/share/software/Lace/latest/Lace/Lace_run.py", line 14, in <module>
from Lace.BuildSuperTranscript import SuperTran
File "/home/rna/software/miniconda3/envs/lace/lib/python3.9/site-packages/Lace/BuildSuperTranscript.py", line 11, in <module>
import networkx as nx
File "/home/rna/software/miniconda3/envs/lace/lib/python3.9/site-packages/networkx/__init__.py", line 114, in <module>
import networkx.generators
File "/home/rna/software/miniconda3/envs/lace/lib/python3.9/site-packages/networkx/generators/__init__.py", line 14, in <module>
from networkx.generators.intersection import *
File "/home/rna/software/miniconda3/envs/lace/lib/python3.9/site-packages/networkx/generators/intersection.py", line 13, in <module>
from networkx.algorithms import bipartite
File "/home/rna/software/miniconda3/envs/lace/lib/python3.9/site-packages/networkx/algorithms/__init__.py", line 16, in <module>
from networkx.algorithms.dag import *
File "/home/rna/software/miniconda3/envs/lace/lib/python3.9/site-packages/networkx/algorithms/dag.py", line 23, in <module>
from fractions import gcd
ImportError: cannot import name 'gcd' from 'fractions' (/home/rna/software/miniconda3/envs/lace/lib/python3.9/fractions.py)
Searched 68096 bases in 10 sequences
add_edge() takes exactly 3 arguments (4 given)
FAILED to construct
add_edge() takes exactly 3 arguments (4 given)
FAILED to construct
add_edge() takes exactly 3 arguments (4 given)
FAILED to construct
add_edge() takes exactly 3 arguments (4 given)
FAILED to construct
The following lines are reported and then the process stops.
Bypass the I/O by writing on aligner or encorporating blat using cython?
I think lace still isn't handling the strand properly. e.g. see RECK in /mnt/storage/nadiad/work_area/20160203_ALL/simulation/SIM/lace on our server
input files are all.fasta and all.groupings
in /mnt/storage/nadiad/work_area/20160203_ALL/simulation/SIM
lace was installed using conda instructions on the wiki.
This newest version of Lace (including commenting out the print_exception line) returns a file where some sequences have a -1 count for transcripts and whirls. This seems unusual.
Ideal describing which block belong to which transcripts.
And something to indicate that a bubble was burts or the longest contigs was returns.
e.g
NDUFV2.fasta Number of transcripts: 4, Bubbles broken
[chej2tc@mu01 Example]$ /GS01/software/biosoft/python/python3.5/bin/python /GS01/software/biosoft/Lace-1.00/Lace.py -o test2 Example_Genome.fasta clusters.txt
( ) / \ / )( __)
/ (// ( (__ ) )
_/_/_/_)(___)
Lace Version: 0.82
Last Editted: 30/01/17
Creating output directory
Creating dictionary of transcripts in clusters...
Creating a fasta file per gene...
Now Building SuperTranscript for each gene...
sh: blat: command not found
FAILED to construct
sh: blat: command not found
FAILED to construct
BUILT SUPERTRANSCRIPTS ---- 0.11878800392150879 seconds ----
Dear Oshalak,
I run the Lace.py with this code, but there two FAILED, although the output file has been created, but I don't know why has FAILED, is there any package I don't installation correctly? or This is right for run the lace?
Another question:
I don't understand how to use the "ClusterFile". The species I study just have Trinity created result, I don't know how to collect the ClusterFile. and Why we need a ClusterFile?
"GenomeFile" makes me think of the genome reference fasta file.
Does the cluster sequence in "SuperDuper.fasta" file mean a unigene? If it is, Can I do the regular non-model species transcriptome pipeline analysis? I mean, annotate these cluster genes(unigenes) and do downstream differential expression analysis?
i made an assembly of arabidopsis using trinity, and trying to run LACE. however, i am getting errors. i figured out where it goes wrong, but not why.
in the function BuildGraph, between #Copy graph before simplifying and ####### Whirl Elimination ######################, i get a runtime error: dictionary changed size during iteration.
and in the first blat run, i sometimes get the error:
add_edge() takes 3 positional arguments but 4 were given
I am working on python 3.6 and used your method to install LACE. any tips?
I have a more conceptual question. I have used Lace to create the reference SuperTranscript from de novo assembly in order to call variants between 2 individuals reared under 2 conditions (4 samples/4 libraries/4 vcf files). Reads used for calling SNPs were the same used for the de novo assembly, which was performed by trinity.
I would like to ask you why only heterozygous SNPs, which are defined as those with at least one read supporting the reference allele, should be further Analysed? I thought that homozygous SNPs would have been more informative to me as I want to detect any differences between the 2 strains.
Are these heterozygous SNPs those that are represented by GT:0/1 in the vcf files?
Thanks! Sofia
On a large dataset (made from 30 mouse samples, of different tissues, 100M RNAseq reads per sample) Lace consistently stalls without error. I traced this to excessive memory usage (>200GB of RAM), which exceeds our capacity to run the program on the whole dataset.
The denovo assembly was conducted in Trinity and the clustering was done using the necklace protocol. https://github.com/Oshlack/necklace.
Hi there,
Lace & superTranscripts sounds excellent for non-model organisms without a reference genome. I'd love to try it. Though the application would be slightly different. I think it may still work but would like your opinion.
I work with single celled eukaryotic algae. While they don't seem to usually splice their transcripts, they are riddled with paralogs which they transcribe. I'd like to use this method to compare paralogs by treating them the same as splice variants. Do you see any problems with this?
Cheers!
Hello!
Could you please help me to resolve some issue I encountered while using Lace. All the time I ran it in my dataset, the job finished without any warnings, but nothing produced. The Lace on the test data worked successfully. I have a corset-produced clusters.
I have no idea how to resolve the issue. I am reinstalling and reconfiguring the Lace and trying different SLURM parameters for weeks.
Best regards
Asan
Problem reported by user over email.
Appears to be a single cluster that uses all the memory.
User sent data and the problem was reproduced. We need to investigate Cluster-16676.1839
I noticed during my use of Lace that when I included networkx v2 in my conda enviornment it ran without error and created an incorrect superTranscriptome where all "whirl" counts were set to 0 and no case change occcured in the sequences.
It would be helpful to raise this error at runtime so that new users can identify this and adjust their environment setup accordingly (especially since many other programs require networkx v2).
hello,
lace stalls always without stopping or giving any error...
the individual fasta-files are generated and then supertranscripts are built (some with the usage of large amounts of RAM, up to 256+128 SWAP), but after a while the program stalls for a long time and after I stop the program with Ctrl-C following output is given: any ideas what goes wrong...
^CProcess ForkPoolWorker-5:
Traceback (most recent call last):
File "/data/analysis/Dietmar/SW/Supertranscript/Lace-master/Lace.py", line 192, in
Split(args.GenomeFile,args.ClusterFile,args.cores,args.maxTran,args.outputDir)
File "/data/analysis/Dietmar/SW/Supertranscript/Lace-master/Lace.py", line 136, in Split
pool.join()
File "/data/analysis/Dietmar/SW/Anaconda/Anaconda3/envs/lace/lib/python3.6/multiprocessing/pool.py", line 510, in join
self._worker_handler.join()
File "/data/analysis/Dietmar/SW/Anaconda/Anaconda3/envs/lace/lib/python3.6/threading.py", line 1056, in join
self._wait_for_tstate_lock()
File "/data/analysis/Dietmar/SW/Anaconda/Anaconda3/envs/lace/lib/python3.6/threading.py", line 1072, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
Traceback (most recent call last):
KeyboardInterrupt
File "/data/analysis/Dietmar/SW/Anaconda/Anaconda3/envs/lace/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/data/analysis/Dietmar/SW/Anaconda/Anaconda3/envs/lace/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/data/analysis/Dietmar/SW/Anaconda/Anaconda3/envs/lace/lib/python3.6/multiprocessing/pool.py", line 108, in worker
task = get()
File "/data/analysis/Dietmar/SW/Anaconda/Anaconda3/envs/lace/lib/python3.6/multiprocessing/queues.py", line 343, in get
res = self._reader.recv_bytes()
File "/data/analysis/Dietmar/SW/Anaconda/Anaconda3/envs/lace/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/data/analysis/Dietmar/SW/Anaconda/Anaconda3/envs/lace/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/data/analysis/Dietmar/SW/Anaconda/Anaconda3/envs/lace/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
KeyboardInterrupt
Hello!
Here is an issue I have recently encountered: the python version was incompatible with the networkx
package.
Traceback (most recent call last):
File "/project/_app/Lace/Lace/Lace_run.py", line 14, in <module>
from Lace.BuildSuperTranscript import SuperTran
File "/home/user/miniconda3/envs/lace/lib/python3.11/site-packages/Lace/BuildSuperTranscript.py", line 11, in <module>
import networkx as nx
File "/home/user/miniconda3/envs/lace/lib/python3.11/site-packages/networkx/__init__.py", line 114, in <module>
import networkx.generators
File "/home/user/miniconda3/envs/lace/lib/python3.11/site-packages/networkx/generators/__init__.py", line 14, in <module>
from networkx.generators.intersection import *
File "/home/user/miniconda3/envs/lace/lib/python3.11/site-packages/networkx/generators/intersection.py", line 13, in <module>
from networkx.algorithms import bipartite
File "/home/user/miniconda3/envs/lace/lib/python3.11/site-packages/networkx/algorithms/__init__.py", line 16, in <module>
from networkx.algorithms.dag import *
File "/home/user/miniconda3/envs/lace/lib/python3.11/site-packages/networkx/algorithms/dag.py", line 23, in <module>
from fractions import gcd
ImportError: cannot import name 'gcd' from 'fractions' (/home/user/miniconda3/envs/lace/lib/python3.11/fractions.py)
Look at this compatibility table. fractions.gcd(a, b)
has been moved to math.gcd(a, b)
in Python 3.9. Either recent networkx version should be used or python version downgraded.
Best regards
Asan
Add functionality to:
Sadly, networkx 2.3 clashes with Python 3.9, since fractions.gcd() has been migrated to math.gcd(). Because of that, running Lace following installation causes an error:
ImportError: cannot import name 'gcd' from 'fractions'
Fixing the version of Python in Lace-1.14.1/environment.yml
fixes this problem. I tried Python=3.5, and now it works.
Could you add an appropriate shebang (#!/usr/bin/env python
) to the various python scripts? Users (and bioconda) won't need to modify them then.
In the example "Differential Transcript Usage on a non model organism", the script for DTE analysis requires having a biological replicate. Is it possible do the analysis without biological replicate? The DTE script of the example (voom_diff.R) doesn't run if not having biological replicate.
I modified the script and now it runs without biological replicates, but iI don't know if the analysis is correct.
My script is this:
# library
library('edgeR')
## Read in data
counts <- read.table("Counts/counts.txt",header=TRUE,sep="\t")
##Define groups
treatment = c('T1','T2')
## Make exon id
eid = paste0(counts$Chr,"-S",counts$Start,"-E",counts$End)
## Define design matrix
design <- model.matrix(~treatment)
## Make DGElist and normalise
dx <- DGEList(counts[,c(7:8)])
dx <-calcNormFactors(dx,group= treatment )
## glmFit
gfit <- glmFit(dx, design, dispersion = 0.1)
ds <- diffSpliceDGE(gfit, geneid = counts$Chr, exonid = eid)
## Results
topSpliceDGE(ds, number = 20, test = "Simes")
plotSpliceDGE(ds)
I found a few odd entries in the chicken SuperDuperTrans.gff for the genes AKAP2 and FAM188B. Blocks are annotated beond the length of the super transcript. Both these gene have another gene that includes there name (PALM2-AKAP2 and INMT-FAM188B) and I think the annotation of these is getting confused. Some output from a command I was running that ran into the issue is pasted below.
Feature (AKAP2:4374-5634) beyond the length of AKAP2 size (3007 bp). Skipping.
Feature (AKAP2:5637-7198) beyond the length of AKAP2 size (3007 bp). Skipping.
Feature (FAM188B:2849-3179) beyond the length of FAM188B size (753 bp). Skipping.
Feature (FAM188B:3179-3409) beyond the length of FAM188B size (753 bp). Skipping.
Feature (FAM188B:2849-3179) beyond the length of FAM188B size (753 bp). Skipping.
Feature (FAM188B:3179-3237) beyond the length of FAM188B size (753 bp). Skipping.
Feature (FAM188B:3238-3282) beyond the length of FAM188B size (753 bp). Skipping.
Hi
on the https://github.com/Oshlack/Lace/wiki/Usage-Documentation page the command
python Lace.py Example/Example_Genome.fasta Example/clusters.txt -t -o Test
does not work because it should be
python3 Lace.py Example/Example_Transcripts.fasta Example/clusters.txt -t -o Test
Add Logger to script
Hello!
I configured virtual environment using your environment.yml
file and changed the python version to 3.9. While running Lace on test data, the error occured:
'DiGraph' object has no attribute 'node'
The suggestion on StackExchange is to change networkx
version to 1.1 or modify files used by DiGraph
.
What is your preferred way to solve the issue?
Best regards
Asan
9th column should give the gene ID and/or transcript ID not be "."
Suggestion. So that typing "ls" in the directory where ribbon ran doesn't take forever, and so it's quick to identify the final .fasta and .psl files
Error reported at the end of a run (to do with clean up).
Doesn't seems to affect the results.
e.g.
ZNF385A.fasta Number of transcripts: 1
should be
ZNF385A Number of transcripts: 1
So it matches the cluster ID.
Hello team of the Oshlak lab,
do you have experience with BUSCO-analysis on SuperTranscript data?
I have used corset and Lace to cluster and stitch plant transcriptome assemblies. Afterwards, it did not find a lot of the BUSCOs. However, when using OrthoFinder to find orthologs to additional species, which uses BLAST/Diamond, the assemblies looked more complete.
Do you think, SuperTranscripts are in principle compatible with BUSCO?
Could you think of an alternative way to check the completeness of the SuperTranscript assemblies?
Thank you,
Maria
As least featureCounts expects it. One line example:
PYGL SuperTranscript exon 0 301 . . 0
vs.
PYGL SuperTranscript exon 0 301 . . 0 .
This is a test
Hello World
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.