tleonardi / bedparse Goto Github PK
View Code? Open in Web Editor NEWPython module and CLI tool to perform operations on BED files
Home Page: http://bedparse.rtfd.io
License: MIT License
Python module and CLI tool to perform operations on BED files
Home Page: http://bedparse.rtfd.io
License: MIT License
convertChr should support BED files with extra fields (or produce a meaningful error message).
Hi,
I am having issues with using bedparse to convert an ensembl bed file into UCSC. The bed file was made with nanocompore and the alignments were made to gencode.v33_transcripts which contain the transcript version numbers.
When I run the following code on a bed file with the header removed
bedparse convertChr --assembly hg38 --target ucsc WT_v_KO_DRS.2_sig_sites_GMM_logit_pvalue_context_2_thr_0.01.bed
The script fails on the first line of the bed file and I get the following error;
Traceback (most recent call last):
File "/home/samirwatson/miniconda3/envs/guitar/bin/bedparse", line 10, in
sys.exit(main())
File "/home/samirwatson/miniconda3/envs/guitar/lib/python3.9/site-packages/bedparse/bedparse.py", line 250, in main
args.func(args)
File "/home/samirwatson/miniconda3/envs/guitar/lib/python3.9/site-packages/bedparse/bedparse.py", line 121, in convertChr
translatedLine=bedline(line.split('\t')).translateChr(assembly=args.assembly, target=args.target, suppress=args.suppressMissing, ignore=args.allowMissing, patches=args.patches)
File "/home/samirwatson/miniconda3/envs/guitar/lib/python3.9/site-packages/bedparse/bedline.py", line 524, in translateChr
raise BEDexception("The chromosome of transcript "+self.name+" ("+self.chr+") can't be found in the DB.")
bedparse.BEDexception: The chromosome of transcript ENST00000368723.4_ACCTC (chr1) can't be found in the DB.
I have checked the bed file using the validateFormat function and it passed.
For example, gtf2bed fails with the gtf file from XenBase:
ftp://ftp.xenbase.org/pub/Genomics/JGI/Xenla9.2/XL9_2_GCA.gff3
While not an issue, you may be interested in using something like sphinxcontrib-programoutput to have your CLI help command generated upon each commit so that it always stays up to date.
The one rub would be that you'd need to switch from markdown to rst, but that should be easy using panda. If you're interested and run into any trouble, I'm happy to help out.
As per the JOSS review requirements.
Filtering is based on the 'transcript_type' field, but Ensembl GTF has a 'transcript_biotype' field.
When converting a GTF file to BED, allow to filter by gene type.
This causes beparse join to fail for BED4 format, because the txName contains the newline and can't be matched with the annotation.
The function throws the following error when used on a BED6 record:
AttributeError: ("'bedline' object has no attribute 'exStarts'", 'occurred at index 0')
Add long_description in setup.py as in https://packaging.python.org/tutorials/packaging-projects/#setup-args
I tried using the BED3 file from the Ensembl example and am currently getting:
(tempenv-19e22026634fb) ~ ❯❯❯ cat test.bed
chr1 213941196 213942363
chr1 213942363 213943530
chr1 213943530 213944697
chr2 158364697 158365864
chr2 158365864 158367031
chr3 127477031 127478198
chr3 127478198 127479365
chr3 127479365 127480532
chr3 127480532 127481699
(tempenv-19e22026634fb) ~ ❯❯❯ bedparse cds test.bed
Traceback (most recent call last):
File "/Users/BenjaminLee/.virtualenvs/tempenv-19e22026634fb/bin/bedparse", line 10, in <module>
sys.exit(main())
File "/Users/BenjaminLee/.virtualenvs/tempenv-19e22026634fb/lib/python3.6/site-packages/bedparse/bedparse.py", line 222, in main
args.func(args)
File "/Users/BenjaminLee/.virtualenvs/tempenv-19e22026634fb/lib/python3.6/site-packages/bedparse/bedparse.py", line 40, in cds
utr=bedline(line.split('\t')).cds(ignoreCDSonly=args.ignoreCDSonly)
File "/Users/BenjaminLee/.virtualenvs/tempenv-19e22026634fb/lib/python3.6/site-packages/bedparse/bedline.py", line 29, in __init__
raise BEDexception("Only BED3,4,6,12 are supported. "+self.name+" is neither.")
bedparse.BEDexception: Only BED3,4,6,12 are supported. NoName is neither.
Is this behavior anticipated? Either way, could you provide some example files in the repository for experimentation?
The docs says that the subcommand reports the promoters of coding genes, while it prints promoters of all transcripts in the input BED.
Not sure what it's called, but next to the description for the repo at the very top, there's a spot for a link. Would be nice to put the link to your docs up there.
See here for an example.
The print() and pprint() methods should be removed and their functionality implemented in str and repr.
See thread in JOSS review
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.