Comments (5)
Hi,
- The bam file should be sorted by coordinates using
samtools sort
, becauseExtractHLAread.sh
extracts HLA-related reads by coordinates. In convertingbam
tofastq
,samtools fastq
can handle suchbam
file.
As for the large number of unpaired reads, please check following two points:
- please ensure the reference version (hg19 or hg38) is the same between generating bam and running
ExtractHLAread.sh
. - For the unpaired reads, please check the alignment situation in the
bam
, see what alleles the unextracted read end mapped to.
please let me know if it works.
Best,
shuai
from spechla.
Hi shuai,
I looked at samtools fastq --help, which also says it needs to be sorted by reads name. However, using samtools fastq for bam files sorted by reads name and coordinates yielded little difference in results. With bam bam2Fastq, however, bam files sorted by read name get 20 times as many paired reads as those sorted by coordinates.
from spechla.
Hi,
Apologies for any confusion caused. It seems that ExtractHLAread.sh
utilizes bam bam2Fastq
to obtain the fastq
files. In order to investigate the discrepancy in read numbers resulting from bam bam2Fastq
, I suggest examining the reads extracted from the "sorted by read name" but not from the "sorted by coordinates" approach. Please select a subset of these reads and verify the regions to which they are mapped by examining the BAM
. If the reads belong to the HLA region, it is likely that the "sorted by read name" approach is correct; otherwise, the "sorted by coordinates" approach may be more appropriate.
By the way, could you please check the alignment detail of the reads ("unpaired reads produced are far more than the paired reads") in the original BAM
.
from spechla.
Hi shuai,
Thanks for your reply and suggestions, i will check this later. I noticed that others were also discussing the issue of preprocessing and gave some suggestions. This might help:
Kingsford-Group/kourami#30
https://bitbucket.nygenome.org/projects/COMPBIO/repos/hla_prep/browse
from spechla.
Thanks for the information. According to the issue, the reason of extracting too much unpaired reads might be keeping multiple alignments other than only the primary alignment.
from spechla.
Related Issues (20)
- ExtractHLAread.sh syntax errors. HOT 4
- MICA and MICB HOT 3
- Identify somatic HLA mutation from SpecHLA HOT 6
- Performace among Novoalign V4, V3 and Bowtie2 in read-binning step HOT 2
- HLA typing only based on long-read RNA-seq data? HOT 2
- Feature request HOT 7
- Low confidence result for all HLA genes HOT 10
- Question about minimap2 parameters in long_read_typing.py HOT 15
- complaining about dependencies / empty output files HOT 11
- what's the meanning of "interval_dict" in long_read_typing.py ? HOT 7
- how to get hla_gen.format.filter.extend.DRB.no26789.fasta ? HOT 5
- Couldn't run a test on example files of exon. I saw this error related to libtinfo.so.5: version `NCURSES_TINFO_5.0.19991023' HOT 6
- Typing long reads HOT 7
- Index command fails but reports success HOT 1
- how to assign read to gene ? HOT 1
- use deep-variants calling too to replace longshot ? HOT 1
- Reason for setting bias value for annoHLA.pl
- Bug report : Cannot find sample.realign.sort.bam for long read typing
- Got different allele result for different IMGT version
- Install error HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spechla.