Comments (15)
Darn. Thanks for checking. I've raised the issue on the kb-python github, so we'll see if they have a solution. In the meantime, suppose I can just map on AWS. Sucks a little because the reason I switched to kallisto was so I could map locally but it is what it is. Thanks for this tutorial though - very very useful to have.
from seurat-to-rna-velocity.
Unfortunately, you'll have to go back to your BAM/FASTQ files. Seurat objects won't contain the necessary information to run RNA Velocity. I used to prefer to use Velocyto's RUN
command to generate a loom file (which will have non-spliced and spliced counts needed.)
velocyto run -b filtered_barcodes.tsv -o output_path -m repeat_msk_srt.gtf possorted_genome_bam.bam reference_annotation.gtf
The full reference can be found here. The -m command is optional.
However, lately, I have been using Kalisto Bustools to generate loom files, as it is much faster than Velocyto RUN (KB will use your FASTQ files.) I just edited my tutorial to add a section on KB.
from seurat-to-rna-velocity.
Hi basilkhuder,
I've managed to download SRA files from server and converted them to fastq files.
and based on your recent updates on tutorial, i can either use KB two-step process or Velocyto's RUN command to generate a LOOM file right?
I would like to try your recommendation KB and i want to make sure i do understand the codes !
I've downloaded both FAST and GTF files for my species (from Ensembl link provided in the tutorial)
Mus_musculus.GRCm38.100.gtf.gz
Mus_musculus.GRCm38.cdna.all.fa.gz
and have let's say have 5 fastq files
1.fastq
2.fastq
3.fastq
4.fastq
5.fastq
and if i'm going to run the code below to use KB, i'm not quite sure where I put reference file names and fastq files !
1st step
kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno
fasta.fa
gtf.gtf
2nd step
kb count -i transcriptome.idx -g t2g.txt -x 10xv2 --lamanno --loom -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt read_1.fastq.gz read_2.fastq.gz
could you elaborate further on these codes?
Thank you
from seurat-to-rna-velocity.
Everything looks good! For the kb ref
step, the fasta and GTF files just go at the very end:
kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno fasta.fa gtf.gtf
If you run into any errors trying the code, let me know.
from seurat-to-rna-velocity.
I've tried below codes
kb ref -i transcriptome.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt --lamanno Mus_musculus.GRCm38.cdna.all.fa Mus_musculus.GRCm38.100.gtf
in this code I could not use -- workflow lamanno
as I was getting an error : error: unrecognized arguments: --workflow
but if i just use lamanno --> which creates six files
cdna.fa
cdna_t2c.txt
intron.fa
intron_t2c.txt
t2g.txt
transcriptome.idx
but i cannot run the second code kb count
and gets this error message
kb: error: unrecognized arguments: -f1 -f2 intron.fa
I think this is because except transcriptome.idx file, all the other files are empty
and thus are unrecognized.
from seurat-to-rna-velocity.
The -- workflow lamanno
only works on the latest KB-python version. Either way, I think you are getting empty files because you are using a cDNA reference genome rather than the DNA primary assembly. Download these files instead:
wget ftp://ftp.ensembl.org/pub/release-98/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
wget ftp://ftp.ensembl.org/pub/release-98/gtf/mus_musculus/Mus_musculus.GRCm38.98.gtf.gz
And run the following code:
kb ref -i transcriptome.idx -g t2g.txt -f1 cnda.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 introns_t2c.txt --lamanno Mus_musculus.GRCm38.dna.primary_assembly.fa.gz Mus_musculus.GRCm38.98.gtf.gz
I just tested this myself and it successfully generated the needed files for kb count
from seurat-to-rna-velocity.
thank you basilkhuder
the codes you've provided seems to be running, but my terminal is stuck at last step
Indexing to transcriptome.idx
I'm running this program on Mac 32GB
and it seems like it's been on for like 18 hrs and I'm not sure if this is usual
kb ref -i transcriptome.idx -g t2g.txt -f1 cnda.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 introns_t2c.txt --lamanno Mus_musculus.GRCm38.dna.primary_assembly.fa.gz Mus_musculus.GRCm38.98.gtf.gz
[2020-05-16 03:01:29,525] INFO Decompressing Mus_musculus.GRCm38.dna.primary_assembly.fa.gz to tmp
[2020-05-16 03:01:44,636] INFO Sorting tmp/Mus_musculus.GRCm38.dna.primary_assembly.fa
[2020-05-16 03:07:41,644] INFO Decompressing Mus_musculus.GRCm38.98.gtf.gz to tmp
[2020-05-16 03:07:43,832] INFO Sorting tmp/Mus_musculus.GRCm38.98.gtf
[2020-05-16 03:08:28,293] INFO Splitting genome into cDNA at cnda.fa
[2020-05-16 03:09:12,983] INFO Creating cDNA transcripts-to-capture at cdna_t2c.txt
[2020-05-16 03:09:13,940] INFO Splitting genome into introns at intron.fa
[2020-05-16 03:12:25,162] INFO Creating intron transcripts-to-capture at cdna_t2c.txt
[2020-05-16 03:12:30,973] INFO Concatenating cDNA and intron FASTAs
[2020-05-16 03:12:38,094] INFO Creating transcript-to-gene mapping at t2g.txt
[2020-05-16 03:12:45,986] INFO Indexing to transcriptome.idx
from seurat-to-rna-velocity.
I can't say why the process seems still to be running, but I can say that 32GB won't be enough RAM to run KB, let alone the rest of your RNA Velocity analysis. In the newer version of KB, there's a -n
parameter that splits the index into parts, which will require less memory. But, I am concerned that even if you get past creating your velocity index files, that you'll run into issues when you import your single-cell data. Is there any way you can use a high-performance computer with high levels of RAM to run this analysis?
from seurat-to-rna-velocity.
sadly I do not have any alternative options, do you think I might be able to get around with this low RAM issue by using server based python?
from seurat-to-rna-velocity.
basilkhuder, I found a way to use a high-performance computer and successfully generated kb ref.
now I'm having trouble getting around with kb count.
where I get this error
"kb: error: unrecognized arguments: -f1 -f2 intron.fa"
from seurat-to-rna-velocity.
basilkhuder, I found a way to use a high-performance computer and successfully generated kb ref.
now I'm having trouble getting around with kb count.
where I get this error"kb: error: unrecognized arguments: -f1 -f2 intron.fa"
Great! Can you send over your command? Also, which version of KB-Python did you install?
from seurat-to-rna-velocity.
I've installed kb-python-0.24.4
and ran these command
kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno -n 4 Mus_musculus.GRCm38.dna.primary_assembly.fa.gz Mus_musculus.GRCm38.98.gtf.gz
--which successfully generated 6 files
##cdna.fa
##cdna_t2c.txt
##intron.fa
##intron_t2c.txt
##t2g.txt
##transcriptome.idx
-- and ran below command
kb count -i transcriptome.idx -g t2g.txt -x 10xv3 --lamanno --loom -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt SRR10870267.fastq.gz fastq.gz SRR10870268.fastq.gz
-- and got these errors
kb: error: unrecognized arguments: -f1 -f2 intron.fa SRR10870267.fastq.gz fastq.gz SRR10870268.fastq.gz
from seurat-to-rna-velocity.
@skim245 - You don't need the -f1 and -f2 tags with those fasta files for kb count.
@basilkhuder By any chance have you had successful results using the index splitting in the kb? I'm using the devel branch, and split my velocity index into 8 parts. But kb count doesn't seem to work. For example, if you use -i index.idx -n 8, you get back index.idx_cdna and then 7 index.idx_intron.x files (x from 0 to 6). Then, in kb count, I tried setting -i index.idx - but this isn't recognized.
from seurat-to-rna-velocity.
I've installed kb-python-0.24.4
and ran these commandkb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno -n 4 Mus_musculus.GRCm38.dna.primary_assembly.fa.gz Mus_musculus.GRCm38.98.gtf.gz
--which successfully generated 6 files
##cdna.fa
##cdna_t2c.txt
##intron.fa
##intron_t2c.txt
##t2g.txt
##transcriptome.idx-- and ran below command
kb count -i transcriptome.idx -g t2g.txt -x 10xv3 --lamanno --loom -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt SRR10870267.fastq.gz fastq.gz SRR10870268.fastq.gz
-- and got these errors
kb: error: unrecognized arguments: -f1 -f2 intron.fa SRR10870267.fastq.gz fastq.gz SRR10870268.fastq.gz
@skim245 Apologies for the late reply! I don't get notifications for replies to issues. @skannan4 is completely right, omit those tags. I'm going to do a large edit on my tutorial in the upcoming days, so I'll get this fixed.
from seurat-to-rna-velocity.
@skim245 - You don't need the -f1 and -f2 tags with those fasta files for kb count.
@basilkhuder By any chance have you had successful results using the index splitting in the kb? I'm using the devel branch, and split my velocity index into 8 parts. But kb count doesn't seem to work. For example, if you use -i index.idx -n 8, you get back index.idx_cdna and then 7 index.idx_intron.x files (x from 0 to 6). Then, in kb count, I tried setting -i index.idx - but this isn't recognized.
I haven't. However, I just did a quick test run last night using those option and ran into the exact problems as you. I've had plenty of other problems with KB-python - which is a shame since Velocyto Run takes much, much longer and requires more resources.
from seurat-to-rna-velocity.
Related Issues (20)
- Any idea about the multiple sample looms to integrated by loompy.combine function? HOT 1
- raise ValueError"cannot reindex from a duplicate axis" HOT 3
- Variable names are not unique and 'cellID_obs' is not defined HOT 2
- Imported tSNE coordintes not being read!
- Error: /lib64/libm.so.6: version `GLI BC_2.29' not found
- Question about the loom file
- Kallisto Bustools "Failed to find compatible kallisto binary." HOT 2
- Index splitting and specifying index for kb count
- How to run velocyto/RNA-velcocity of a subset
- error merging Index data frame with UMAP
- Fail to install velocyto
- errors when running kb count
- general question embedding projection
- Failed to incorprate colors with "color = sample_one.uns['Cluster_colors']" HOT 4
- Fail to filter loom by CellID extracted from Seurat HOT 9
- ValueError: cannot reindex from a duplicate axis when merge. HOT 5
- 10x 5 prime data
- Mention velociraptor package HOT 1
- Getting AssertionError in scv.pp.moments HOT 7
- Multiple-Sample Integration for filtering cell ID based off Seurat HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from seurat-to-rna-velocity.