Comments (4)
Add an example
index | seqname | strand | clv | aclv | gene_name | gene_id | signed_dist_to_aclv | evidence_type | contig_ids | max_contig_len | max_contig_mapq | any_contig_is_hardclipped | num_suffix_contigs | num_bridge_contigs | num_link_contigs | num_blank_contigs | num_total_contigs | num_suffix_reads | max_suffix_contig_tail_len | num_bridge_reads | max_bridge_read_tail_len | num_link_reads | ctg_hex | ctg_hex_id | ctg_hex_pos | ctg_hex_dist | ref_hex | ref_hex_id | ref_hex_pos | ref_hex_dist | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
37378 | chr1 | + | 204489245 | 204501779 | MDM4 | ENSG00000198625 | -12534 | bridge | A0.R100625 | 3334 | 40 | True | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 6 | 4 | 0 | ACTAAA | 6 | 204489235 | 10 | ACTAAA | 6 | 204489238 | 7 | |
37379 | chr1 | + | 204489253 | 204501779 | MDM4 | ENSG00000198625 | -12526 | bridge | A0.R100625 | 3334 | 40 | True | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 35 | 6 | 0 | AATACA | 9 | 204489241 | 12 | AATACA | 9 | 204489244 | 9 | |
46052 | chr7 | - | 109555254 | 109599274 | EIF3IP1 | ENSG00000237064 | -44020 | bridge | A0.S61128 | 68 | 1 | True | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 6 | 0 | TTTAAA | 3 | 109555268 | 14 | AATAAA | 16 | 109555282 | 28 |
ctg_hex_dist
and ref_hex_dist
don't match because of 3bp insertion in A0.R100625
from kleat.
Actually, ctg_hex_dist could be correct, it's just the ctg_hex_pos is not consistent with genomics coordinates and should be dropped.
from kleat.
Insight: It's that ctg_hex_pos isn't interpretable wst. genome coordinate. But ctg_hex_dist is probably valid.
from kleat.
seqname | strand | clv | aclv | gene_name | gene_id | signed_dist_to_aclv | evidence_type | contig_ids | max_contig_len | max_contig_mapq | any_contig_is_hardclipped | num_suffix_contigs | num_bridge_contigs | num_link_contigs | num_blank_contigs | num_total_contigs | num_suffix_reads | max_suffix_contig_tail_len | num_bridge_reads | max_bridge_read_tail_len | num_link_reads | ctg_hex | ctg_hex_id | ctg_hex_pos | ctg_hex_dist | ref_hex | ref_hex_id | ref_hex_pos | ref_hex_dist | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
39789 | chr1 | + | 204489245 | 204501779 | MDM4 | ENSG00000198625 | -12534 | bridge | A0.R100625 | 3334 | 40 | True | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 6 | 4 | 0 | ACTAAA | 6 | 204489235 | 10 | ACTAAA | 6 | 204489238 | 7 | |
39790 | chr1 | + | 204489253 | 204501779 | MDM4 | ENSG00000198625 | -12526 | bridge | A0.R100625 | 3334 | 40 | True | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 35 | 6 | 0 | AATACA | 9 | 204489241 | 12 | AATACA | 9 | 204489244 | 9 | |
46413 | chr7 | - | 109555254 | 109599274 | EIF3IP1 | ENSG00000237064 | -44020 | bridge | A0.S61128 | 68 | 1 | True | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 6 | 0 | AATATA | 10 | 109555301 | 47 | AATAAA | 16 | 109555282 | 28 |
Happy to see the PAS hexamers finally match between contig and reference in the third case.
Conclusion:
ctg_hex_pos
is not interpretable in terms of reference genome, but decided to leave it here for reference as a sanity check/indication of insertion, so is the difference between ctg_hex_dist
and ref_hex_dist
.
ctg_hex_pos
may become useful when the ref_hex is not found.
from kleat.
Related Issues (20)
- Unify tail_side and tail_base, and other names that indicates the same information
- Deal with hardclip in xseq_minus.py HOT 1
- Consider how to deal with fuzzy skip boundary
- Refactor calc_genome_offset with skip_check_size
- Parallize the initial looping through contig HOT 5
- test edgecases
- Format test case doc strings
- Sum up num_suffix_reads is buggy
- visualize-alignment.py enhancement
- Inconsistent extracted seq for contig and reference HOT 1
- Make window_size adjustable when searching for polyadenylation signal hexamer
- Probably only one end o f a blank contiguous could support a clv depending on the the strand of the annotated genes
- Parallelize aggregating polyA evidence after groupby (seqname, strand, clv) HOT 1
- Output a comprehensive list of PAS hexamers available in upstream regardless of their strength
- Add two columns for ctg_hex_dist and ref_hex_dist
- Remove window default value for extract_seq in contig PAS hexamer search
- Think through hardclip from softclip
- Work out difference between query_length and infer_query_length HOT 1
- Maybe better never reverse reference genome when calc_genome_offset
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kleat.