Comments (4)
Hi, @sloth-eat-pudding,
Thank you for your interest in ClairS.
-
Our design is to select heterozygous SNPs from both normal and tumor samples for phasing and haplotagging. These signals are more likely to represent true germline heterozygous SNPs rather than somatic mutations.
-
We should only use heterozygous germline SNPs variants for phasing. In some cases, some somatic variants may have similar patterns to germline variants(might be due to the quality or high normal AF), which were identified as germline by Clair3. We are also actively working on excluding the rest of them from the phasing process to avoid confusing the model.
-
It would be highly beneficial to have a longer phaseset and an improved haplotagging ratio. We believe that having more phased alignments will significantly enhance performance. Any hints on parameter settings to have an improved phasing performance? Thanks!
-
Currently, we categorize the haplotypes into germline H1 and H2 only. However, it would be beneficial to include somatic haplotypes (H3) as well. However, linking distant somatic variants to obtain somatic haplotypes would be challenging. After analyzing the data, we have observed that the distance between two somatic variants can range from 10k to 100k, which presents a challenge in acquiring somatic haplotypes even in ONT reads.
Look forward to having a new LongPhase version for somatic variant calling!
Zhenxian
from clairs.
I apologize for not being clear earlier.
Q1. I searched for confident heterozygous germline variants identified by ClairS in the normal.bam file, but found that they actually contain homozygous variants.(as shown in Figure 2).
Q3. This is our development goal.Additionally, I noticed that Clair3's Makefile uses LongPhase v1.3. Our current version is v1.6, and I suggest upgrading to this version for improvements in accuracy and processing time.
from clairs.
For Q1, seems there are no homozygous variants in normal BAM in Figure 2, are you referring to a homozygous reference(that is the same allele as the reference base)? But thanks for reporting this, we will check the details then.
For Q3, thanks for the suggestion, sure, we will update LongPhase to v1.6 in our next release.
from clairs.
Q1. Your understanding is correct. Thank you for your confirmation.
from clairs.
Related Issues (20)
- Option to call SNPs only HOT 1
- Haplotype filtering step keep stuck HOT 4
- Training for PacBio HiFi indel calling HOT 11
- question: model for 5khz data HOT 4
- Nondeterminism in ClairS output HOT 1
- Germline variants present in output.vcf HOT 1
- Question: comparison with DRAGEN Somatic HOT 1
- Docker latest version HOT 1
- [Ask for insights on Illumina results regarding ClairS workflow/design choices] HOT 5
- [Inquiry for train dataset generation procedure] HOT 2
- add v4.3.0 model for clair3 params HOT 6
- sh files for data preprocessing HOT 1
- Question in training data label generation code - get_candidates.py HOT 2
- Enhancing somatic variant calling and execution speed HOT 5
- ClairS crashing with spaces in input file name HOT 2
- tmp folders not being deleted after calling HOT 2
- ClairS quits with exit code 0 but no output, no error logged HOT 5
- Adding Normal Sample GT to the VCF file HOT 3
- samtools index: failed to open error HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clairs.