Giter Site home page Giter Site logo

Comments (15)

skannan4 avatar skannan4 commented on June 13, 2024 1

Darn. Thanks for checking. I've raised the issue on the kb-python github, so we'll see if they have a solution. In the meantime, suppose I can just map on AWS. Sucks a little because the reason I switched to kallisto was so I could map locally but it is what it is. Thanks for this tutorial though - very very useful to have.

from seurat-to-rna-velocity.

basilkhuder avatar basilkhuder commented on June 13, 2024

Unfortunately, you'll have to go back to your BAM/FASTQ files. Seurat objects won't contain the necessary information to run RNA Velocity. I used to prefer to use Velocyto's RUN command to generate a loom file (which will have non-spliced and spliced counts needed.)

velocyto run -b filtered_barcodes.tsv -o output_path -m repeat_msk_srt.gtf possorted_genome_bam.bam reference_annotation.gtf

The full reference can be found here. The -m command is optional.

However, lately, I have been using Kalisto Bustools to generate loom files, as it is much faster than Velocyto RUN (KB will use your FASTQ files.) I just edited my tutorial to add a section on KB.

from seurat-to-rna-velocity.

skim245 avatar skim245 commented on June 13, 2024

Hi basilkhuder,
I've managed to download SRA files from server and converted them to fastq files.
and based on your recent updates on tutorial, i can either use KB two-step process or Velocyto's RUN command to generate a LOOM file right?

I would like to try your recommendation KB and i want to make sure i do understand the codes !
I've downloaded both FAST and GTF files for my species (from Ensembl link provided in the tutorial)
Mus_musculus.GRCm38.100.gtf.gz
Mus_musculus.GRCm38.cdna.all.fa.gz

and have let's say have 5 fastq files
1.fastq
2.fastq
3.fastq
4.fastq
5.fastq
and if i'm going to run the code below to use KB, i'm not quite sure where I put reference file names and fastq files !

1st step
kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno
fasta.fa
gtf.gtf

2nd step
kb count -i transcriptome.idx -g t2g.txt -x 10xv2 --lamanno --loom -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt read_1.fastq.gz read_2.fastq.gz

could you elaborate further on these codes?

Thank you

from seurat-to-rna-velocity.

basilkhuder avatar basilkhuder commented on June 13, 2024

Everything looks good! For the kb ref step, the fasta and GTF files just go at the very end:

kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno fasta.fa gtf.gtf

If you run into any errors trying the code, let me know.

from seurat-to-rna-velocity.

skim245 avatar skim245 commented on June 13, 2024

I've tried below codes
kb ref -i transcriptome.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt --lamanno Mus_musculus.GRCm38.cdna.all.fa Mus_musculus.GRCm38.100.gtf

in this code I could not use -- workflow lamanno

as I was getting an error : error: unrecognized arguments: --workflow

but if i just use lamanno --> which creates six files
cdna.fa
cdna_t2c.txt
intron.fa
intron_t2c.txt
t2g.txt
transcriptome.idx

but i cannot run the second code kb count
and gets this error message

kb: error: unrecognized arguments: -f1 -f2 intron.fa

I think this is because except transcriptome.idx file, all the other files are empty
and thus are unrecognized.

from seurat-to-rna-velocity.

basilkhuder avatar basilkhuder commented on June 13, 2024

The -- workflow lamanno only works on the latest KB-python version. Either way, I think you are getting empty files because you are using a cDNA reference genome rather than the DNA primary assembly. Download these files instead:

wget ftp://ftp.ensembl.org/pub/release-98/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz

wget ftp://ftp.ensembl.org/pub/release-98/gtf/mus_musculus/Mus_musculus.GRCm38.98.gtf.gz

And run the following code:

kb ref -i transcriptome.idx -g t2g.txt -f1 cnda.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 introns_t2c.txt --lamanno Mus_musculus.GRCm38.dna.primary_assembly.fa.gz Mus_musculus.GRCm38.98.gtf.gz

I just tested this myself and it successfully generated the needed files for kb count

from seurat-to-rna-velocity.

skim245 avatar skim245 commented on June 13, 2024

thank you basilkhuder

the codes you've provided seems to be running, but my terminal is stuck at last step
Indexing to transcriptome.idx

I'm running this program on Mac 32GB
and it seems like it's been on for like 18 hrs and I'm not sure if this is usual

kb ref -i transcriptome.idx -g t2g.txt -f1 cnda.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 introns_t2c.txt --lamanno Mus_musculus.GRCm38.dna.primary_assembly.fa.gz Mus_musculus.GRCm38.98.gtf.gz
[2020-05-16 03:01:29,525] INFO Decompressing Mus_musculus.GRCm38.dna.primary_assembly.fa.gz to tmp
[2020-05-16 03:01:44,636] INFO Sorting tmp/Mus_musculus.GRCm38.dna.primary_assembly.fa
[2020-05-16 03:07:41,644] INFO Decompressing Mus_musculus.GRCm38.98.gtf.gz to tmp
[2020-05-16 03:07:43,832] INFO Sorting tmp/Mus_musculus.GRCm38.98.gtf
[2020-05-16 03:08:28,293] INFO Splitting genome into cDNA at cnda.fa
[2020-05-16 03:09:12,983] INFO Creating cDNA transcripts-to-capture at cdna_t2c.txt
[2020-05-16 03:09:13,940] INFO Splitting genome into introns at intron.fa
[2020-05-16 03:12:25,162] INFO Creating intron transcripts-to-capture at cdna_t2c.txt
[2020-05-16 03:12:30,973] INFO Concatenating cDNA and intron FASTAs
[2020-05-16 03:12:38,094] INFO Creating transcript-to-gene mapping at t2g.txt
[2020-05-16 03:12:45,986] INFO Indexing to transcriptome.idx

from seurat-to-rna-velocity.

basilkhuder avatar basilkhuder commented on June 13, 2024

I can't say why the process seems still to be running, but I can say that 32GB won't be enough RAM to run KB, let alone the rest of your RNA Velocity analysis. In the newer version of KB, there's a -n parameter that splits the index into parts, which will require less memory. But, I am concerned that even if you get past creating your velocity index files, that you'll run into issues when you import your single-cell data. Is there any way you can use a high-performance computer with high levels of RAM to run this analysis?

from seurat-to-rna-velocity.

skim245 avatar skim245 commented on June 13, 2024

sadly I do not have any alternative options, do you think I might be able to get around with this low RAM issue by using server based python?

from seurat-to-rna-velocity.

skim245 avatar skim245 commented on June 13, 2024

basilkhuder, I found a way to use a high-performance computer and successfully generated kb ref.
now I'm having trouble getting around with kb count.
where I get this error

"kb: error: unrecognized arguments: -f1 -f2 intron.fa"

from seurat-to-rna-velocity.

basilkhuder avatar basilkhuder commented on June 13, 2024

basilkhuder, I found a way to use a high-performance computer and successfully generated kb ref.
now I'm having trouble getting around with kb count.
where I get this error

"kb: error: unrecognized arguments: -f1 -f2 intron.fa"

Great! Can you send over your command? Also, which version of KB-Python did you install?

from seurat-to-rna-velocity.

skim245 avatar skim245 commented on June 13, 2024

I've installed kb-python-0.24.4
and ran these command

kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno -n 4 Mus_musculus.GRCm38.dna.primary_assembly.fa.gz Mus_musculus.GRCm38.98.gtf.gz

--which successfully generated 6 files
##cdna.fa
##cdna_t2c.txt
##intron.fa
##intron_t2c.txt
##t2g.txt
##transcriptome.idx

-- and ran below command

kb count -i transcriptome.idx -g t2g.txt -x 10xv3 --lamanno --loom -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt SRR10870267.fastq.gz fastq.gz SRR10870268.fastq.gz

-- and got these errors
kb: error: unrecognized arguments: -f1 -f2 intron.fa SRR10870267.fastq.gz fastq.gz SRR10870268.fastq.gz

from seurat-to-rna-velocity.

skannan4 avatar skannan4 commented on June 13, 2024

@skim245 - You don't need the -f1 and -f2 tags with those fasta files for kb count.

@basilkhuder By any chance have you had successful results using the index splitting in the kb? I'm using the devel branch, and split my velocity index into 8 parts. But kb count doesn't seem to work. For example, if you use -i index.idx -n 8, you get back index.idx_cdna and then 7 index.idx_intron.x files (x from 0 to 6). Then, in kb count, I tried setting -i index.idx - but this isn't recognized.

from seurat-to-rna-velocity.

basilkhuder avatar basilkhuder commented on June 13, 2024

I've installed kb-python-0.24.4
and ran these command

kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno -n 4 Mus_musculus.GRCm38.dna.primary_assembly.fa.gz Mus_musculus.GRCm38.98.gtf.gz

--which successfully generated 6 files
##cdna.fa
##cdna_t2c.txt
##intron.fa
##intron_t2c.txt
##t2g.txt
##transcriptome.idx

-- and ran below command

kb count -i transcriptome.idx -g t2g.txt -x 10xv3 --lamanno --loom -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt SRR10870267.fastq.gz fastq.gz SRR10870268.fastq.gz

-- and got these errors
kb: error: unrecognized arguments: -f1 -f2 intron.fa SRR10870267.fastq.gz fastq.gz SRR10870268.fastq.gz

@skim245 Apologies for the late reply! I don't get notifications for replies to issues. @skannan4 is completely right, omit those tags. I'm going to do a large edit on my tutorial in the upcoming days, so I'll get this fixed.

from seurat-to-rna-velocity.

basilkhuder avatar basilkhuder commented on June 13, 2024

@skim245 - You don't need the -f1 and -f2 tags with those fasta files for kb count.

@basilkhuder By any chance have you had successful results using the index splitting in the kb? I'm using the devel branch, and split my velocity index into 8 parts. But kb count doesn't seem to work. For example, if you use -i index.idx -n 8, you get back index.idx_cdna and then 7 index.idx_intron.x files (x from 0 to 6). Then, in kb count, I tried setting -i index.idx - but this isn't recognized.

I haven't. However, I just did a quick test run last night using those option and ran into the exact problems as you. I've had plenty of other problems with KB-python - which is a shame since Velocyto Run takes much, much longer and requires more resources.

from seurat-to-rna-velocity.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.