Giter Site home page Giter Site logo

mandalorion's People

Contributors

rvolden avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

mandalorion's Issues

cell-type specific isoform analysis

Dear Rodger,

Could you help me in the analysis of cell-type specific isoforms?

After obtaining cell barcode list for each cell type, I pooled reads.fasta files and subreads,fastq files into single file, respectively to get pooled.fasta and pooled.fastq and ran mandalorion but it failed.

The problem seems to arise because the read id and subread id are trimmed of the additional information separated by "_".


Here is the details of what I did

  1. I used fasta and fastq files from MergeUMIs10x.py step in 10xR2C2.
$ head cell_208_GTTTGGAAGTGTCATC.merged.fasta
>71c10016-715b-4bb8-8808-2059ecc38312
AGACGTTCTTCGCCGA....ATGACACTTCCAAAC
>b81eb912-76a4-4ac1-b18c-f47010c334f5
GCTCTTTCTCAGTGA.....CCGGGTGGTTTGCTT
$ head cell_208_GTTTGGAAGTGTCATC.merged.subreads.fastq
@71c10016-715b-4bb8-8808-2059ecc38312
TATTGTGTACCTTTTGCTAG...CGGCCGCCCA
+
>?;<;;>==<;=@>()$$$-336...5;=>>;.&
@71c10016-715b-4bb8-8808-2059ecc38312
ATACCTTCCGTTCA...TGCGGCCGCCCATAGC
+
###$%-.%/088<+-2...>?ADPC?@BJ
  1. After getting barcode lists per cell type using seurat, I merged fasta and fastq of the same cell type
$ cat \
   cell_173_CACGTTCGTATGTCCA.merged.fasta \
   cell_229_TGCCGAGCATGACGTT.merged.fasta \
   > Pooled_reads.fasta

$ cat \
   cell_173_CACGTTCGTATGTCCA.merged.subreads.fastq \
   cell_229_TGCCGAGCATGACGTT.merged.subreads.fastq \
  > pooled.fastq
  1. I ran mandalorion on the pooled reads
python3 Mandalorion_nonqsub.sh \
 -c ${cfg} \
 -s 500 \
 -g ${gtf_fn_step10} \
 -G ${ref} \
 -a ${adapter} \
 -f pooled.fasta \
 -b pooled.fastq \
 -p ${newdir}/mandalorian_output \
 -O 0,70,0,70 \
 -t ${thread_step10} \
 -e TGGG,AAAA

  1. I got the error message as following
[M::mm_idx_gen::66.980*1.80] collected minimizers
[M::mm_idx_gen::75.689*3.01] sorted minimizers
[M::main::75.689*3.01] loaded/built the index for 25 target sequence(s)
[M::mm_mapopt_update::80.209*2.90] mid_occ = 748
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 25
[M::mm_idx_stat::82.165*2.85] distinct minimizers: 167178949 (35.49% are singletons); average occurrences: 5.986; average spacing: 3.086
[M::worker_pipeline::215.486*10.32] mapped 30792 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -G 400k --secondary=no -ax splice:hq -t 15 Homo_sapiens_assembly38.fasta Pooled_reads.fasta
[M::main] Real time: 215.642 sec; CPU: 2224.208 sec; Peak RSS: 27.709 GB
SAM from mm: false
Took 844.717088ms to run.
rm: cannot remove 'mandalorian_output//parsed_reads/': No such file or directory
rm: cannot remove 'mandalorian_output//mp/': No such file or directory
Using medaka from your path, not the config file.
rm: cannot remove 'mandalorian_output//mp': No such file or directory
Traceback (most recent call last):
  File "createConsensi.py", line 318, in <module>
    main()
  File "createConsensi.py", line 303, in main
    determine_consensus(name, fasta, fastq, str(counter))
  File "createConsensi.py", line 159, in determine_consensus
    fastq_reads_full, fastq_reads_partial = read_fastq_file(fastq)
  File "createConsensi.py", line 122, in read_fastq_file
    number = int(name_root[1])
IndexError: list index out of range

Could you help me with this issue?

medaka command error

I temporarily resolved the previous issue(#2 ) by adding unique numbers starting from 0

but I bumped into another issue of medaka as following

Traceback (most recent call last):
  File "createConsensi.py", line 318, in <module>
    main()
  File "createConsensi.py", line 303, in main
    determine_consensus(name, fasta, fastq, str(counter))
  File "createConsensi.py", line 273, in determine_consensus
    temp_folder + '/' + counter, temp_folder + '/' + counter)

sError: Command 'medaka -f -i  mp/1_subsampled.fastq -d  mp/1_consensus_1.fasta -o mp/1 > mp/1_medaka_messages.txt 2>&1' returned non-zero exit status 2

(I changed os.system to subprocess.run to propagate exceptions )

directly running the medaka command gave this error

usage: medaka [-h] [--version]
              {compress_bam,features,train,consensus,smolecule,consensus_from_features,fastrle,stitch,variant,snp,tools}
              ...
medaka: error: argument command: invalid choice: 'mp/1_subsampled.fastq' (choose from 'compress_bam', 'features', 'train', 'consensus', 'smolecule', 'consensus_from_features', 'fastrle', 'stitch', 'variant', 'snp', 'tools')

I initially installed medaka using this command

conda create -n medaka -c conda-forge -c bioconda medaka

Newbie question - Can I even use Mandalorian?

Hi. I just downloaded our very first batch of ONT cDNA reads ever from the contractor we use for sequencing, and wish to assemble transcripts using these data. I searched online to find out potential assemblers and found a master's thesis online that evaluated Mandalorian, so I went ahead and downloaded Mandalorian and got it set up. But then I see that Mandalorian as input takes reads that resulted from R2C2. From looking around online a bit it seems to me that R2C2 is not a standard process, so it's extremely likely that the "standard workflow" ONT data our contractor generated is not the result of R2C2. Assuming I am right about my data not being R2C2, does that mean there is no way for me to use Mandalorian?

Thanks in advance for answering this question, and I apologize for having to ask such a basic question.

John Martinson

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.