rvolden / 10xr2c2 Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 6.0 2.19 MB

Scripts for analyzing 10x R2C2 data

License: MIT License

Python 100.00%

10xr2c2's People

Contributors

Stargazers

Watchers

Forkers

ashdangerbyrne alicecsq hd00ljy theron-palmer chancen-king

10xr2c2's Issues

detBarcode.py result - barcodes have low frequency in general

Dear Rodger,

I finished running MergeUMI.py and tried to find the most frequent cell barcodes using detBarcodes.py

However, I found the counts per barcode are extremely low.

Among 3,564,096 barcodes obtained by C3POa_postprocessing.py (filename : R2C2_full_length_consensus_reads_10X_sequences.fasta),

only 37,728(1.06%) of barcode counts were in top 1500 barcodes (summed up the last number of id lines in 1500_most_frequent_bcs.fasta)

And the barcode counts drop to 1 from the ~700th frequent barcode

Here is the command I ran

python3 detBarcodes.py  \
  R2C2_full_length_consensus_reads_10X_sequences.fasta  \
  737K-august-2016.txt \
  > 1500_most_frequent_bcs.fasta

Here is the barcode counts distribution drawn by detBarcodes.py

These are the top 25 frequent barcodes

>barcode_0_3214
GATCAGTAGTGAACAT
>barcode_1_2998
GGGACCTCAACCGCCA
>barcode_2_2941
ACTGTCCAGGCCCGTT
>barcode_3_2751
TATCAGGTCGCATGAT
>barcode_4_2656
GGGTCTGTCCGAACGC
>barcode_5_1930
TTCGGTCGTTAGGGTG
>barcode_6_1656
GGTGAAGGTTTGTTGG
>barcode_7_1334
GCACATATCTCTAAGG
>barcode_8_1165
GCAGTTAAGCAGCCTC
>barcode_9_1156
TGGTTAGGTATATGGA
>barcode_10_1030
TCAGGTACACTAGTAC
>barcode_11_923
AACCATGTCTCGAGTA
>barcode_12_849
GGGACCTGTAGGAGTC
>barcode_13_849
ATTTCTGGTCTCTCTG
>barcode_14_754
CTGCCTAGTTTGACAC
>barcode_15_587
GTTCATTGTAGCTAAA
>barcode_16_558
GACCAATAGTTTAGGA
>barcode_17_428
CCGGTAGGTCAAAGAT
>barcode_18_422
TCAGGTAAGGCAGGTT
>barcode_19_359
ACACTGACATCGGTTA
>barcode_20_335
ATGAGGGGTTTGACAC
>barcode_21_287
TGCGGGTGTAATTGGA
>barcode_22_284
CACAGGCTCTCGTTTA
>barcode_23_263
ACACTGAAGGGATCTG
>barcode_24_254
GCAGCCAGTCATCGGC

Could you help me with this?

error report : fastq reader

Due to this part of the code, racon fails.

#MergeUMIs.py
158 :      name, seed = split_name[0], 0

Since the subreads matching with a consensus share the same rootnames, setting seed=0 causes a racon error as following due to the same subread names between different subreads

[hrs@node12 PM-PS-0001-T_R2C2_q7_pass_merged.fastq_step02_c3poa_post]$ cat ../racon_messages.txt
[racon::Polisher::initialize] loaded target sequences 0.000187 s
[racon::Polisher::initialize] loaded sequences 0.000122 s
[racon::Overlap::transmute] error: unequal lengths in sequence and overlap file for sequence 3d5c86cb-ffb5-4813-958c-356360cfc519_0!

How is "R2C2_10x_postprocessed.fasta" generated?

Hi, I am trying to understand the workflow but it appears to begin with a "postprocessed" fasta rather than what I can download from SRA. How do I generate the "R2C2 reads and R2C2 10x reads" (as described in README)? I.e. is the 10x cell barcode/UMI always the first 26bp of the R2C2 read, or is the orientation sometimes reversed?

thank you,

results of 10xR2C2

Hi,
I got results of the MergeUMIs10x.py, which consisted of .matched_reads.txt, .merged.fasta, .merged.subreads.fastq and .UMI_only.fasta. What is the difference between .UMI_only.fasta and .merged.fasta ? And merged.fasta contains more reads than UMI_only.fasta. So which one should I use for analysis? Thank you.

Best regards,
Chujie

Is it okay to skip MergeUMI.py step?

Hello!

I am trying to run 10xR2C2 on my C3POa postprocessed data.

I am wondering if it is okay to skip ExtractUMI.py and MergeUMI.py ( but not MergeUMI10x.py )

After I run the C3POa_postprocessing.py with the following code

time python ${C3POA_base}/C3POa_postprocessing.py \
 -i ${consensus} \
 -o ${odir2} \
 -c ${cfg} \
 -n ${thread} \
 -bt \
 -a ${adapter} \
 -b

I get the following list of files

R2C2_full_length_consensus_reads.fasta
R2C2_full_length_consensus_reads_10X_sequences.fasta
R2C2_full_length_consensus_reads_left_splint.fasta
R2C2_full_length_consensus_reads_right_splint.fasta

I tried running ExtractUMI and MergeUMI on the resulting file with the following code

time python3 ${R2C2}/ExtractUMIs.py \
 -i5 ${odir2}/R2C2_full_length_consensus_reads_right_splint.fasta \
 -i3 ${odir2}/R2C2_full_length_consensus_reads_left_splint.fasta \
 -i ${odir2}/R2C2_full_length_consensus_reads.fasta \
 -o ${odir3_temp1}

time python3 ${R2C2}/MergeUMIs.py  \
 -f ${odir2}/R2C2_full_length_consensus_reads.fasta \
 -s ${odir1}/Splint1/R2C2_Subreads.fastq \
 -o ${odir3_temp2} \
 -u ${odir3_temp1}/R2C2_full_length_consensus_reads.UMI \
 -c ${cfg}

But after that, I found that the resulting "R2C2_full_length_consensus_reads_UMI_merged.fasta" file from MergeUMIs.py does not have the same order of IDs as the "R2C2_full_length_consensus_reads_10X_sequences.fasta" from C3POa_postprocessing.py
This seems to be because several consensus reads with the same splint UMI are merged into a single FASTA line.

And this led to problems in demux step - all non-matching reads are discarded

Could you help me on this issue?
How can I match "R2C2_full_length_consensus_reads_10X_sequences.fasta"(C3POa_postprocessing.py result) with the "R2C2_full_length_consensus_reads_UMI_merged.fasta"(MergeUMIs.py result).
Or is it okay to just skip split-UMI merging steps?

bug: demux_nano.py

Hello Roger,

Thank you for your amazing tools,
I am trying to analyse R2C2long read single cell data, after finishing the C3POa pipline, I started the 10xR2C2 data, but stopped in the following step during demultiplexing, I faced a problem at the separate reads into cells as below:

The ERROR:

Output:
/dir/demuxed
bcGuide file only ..... without the Fasta file

Could you please help me solving this issue?
However, I run all the steps of the C3POa pipeline successfully and I got the R2C2_consensus_reads.fasta and R2C2_consensus_subreads.fastq.

Your help would be appreciated

Thanks in advance
Usama

score.mat

Dear Roger,

Thank you for offering a great tool for nanopore scRNA-seq data processing.

I have a few questions regarding the data processing

1. score.mat

in the UMI merging step scores.mat is needed but I cannot find it in this repo and also in C3POa repo
where can I find scores.mat file?

2. preprocessing order

I am following the instructions in the repos and the bioarxiv preprint (Highly Multiplexed Single-Cell Full-Length cDNA Sequencing of human immune cells with 10X Genomics and R2C2)

I am planning to execute the code in the following order.

C3POa.py
C3POa_postprocessing.py
ExtractUMIs.py
MergeUMIs.py
MergeUMIs10x.py
detBarcodes.py
Demultiplex_R2C2_reads_kmerBased.py
match_fastas.py
demux_nano.py
make_cell_subreads.py

But I am a little bit confusing since the order of scripts are slightly different between that in this repo's README.md and that in the preprint
Is merger UMI steps should be done before detBarcodes.py or after all the other scripts?

rvolden / 10xr2c2 Goto Github PK

10xr2c2's People

Contributors

Stargazers

Watchers

Forkers

10xr2c2's Issues

detBarcode.py result - barcodes have low frequency in general

error report : fastq reader

How is "R2C2_10x_postprocessed.fasta" generated?

results of 10xR2C2

Is it okay to skip MergeUMI.py step?

bug: demux_nano.py

score.mat

1. score.mat

2. preprocessing order

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent