candig / htsget_app Goto Github PK
View Code? Open in Web Editor NEWHtsget API implementation based on the Htsget protocol
License: GNU Lesser General Public License v3.0
Htsget API implementation based on the Htsget protocol
License: GNU Lesser General Public License v3.0
The htsget
spec defines that
the id
should pass as a URL parameter, e.g., /data/<id>
, but this application considers id
a query string parameter, which treats it as /data?id=
.
parameters should be written in camelCase, such as referenceName
, the application spec writes it as reference_name
.
server should accept chr1
, 1
both as valid reference to chromosome 1, yet the server only accepts 1
. The spec also indicates that chr
is of type int
, while it should be of type string
.
Following python_model_service, create directory structure and a setup.py
such that one can do the following steps:
virtualenv htsenv
source htsenv/bin/activate
python setup.py install
and have the htsget server installed in the local python environment. This will require #1 to be complete so that one can change configuration parameters without having to reinstall.
Following the python model service, register repository with codefactor and pyup to automatically track simple code quality issues and old dependencies
while it should respect the input's start and end parameters..
Currently, if the requested region is smaller than the chunked size, it returns the entire chromosome
This is not going to work
E.g.
If requesting a block of 10k bps, and the chunk size is 100k bps, instead of returning data chunks of 10k bps, it returns the entire chromosome, this doesn't make sense
in _execute
, rather than calling the sqlite3 library, use SQLAlchemy; this will mean creating a database engine and connection on initialization, and then calling execute
on the connection (see here: https://docs.sqlalchemy.org/en/13/core/tutorial.html). This modest change will mean that this service could be used with back ends that use MySQL, Postgres, etc as well.
This may or many not be possible but is worth considering, as it would avoid having to clean up temporary files in the case of clients disconnecting/not following up on tickets
With #6 completed we can start thinking about automated CI testing with DRS and Minio servers; this may require the use of Circle-CI rather than Travis however since it will involve 3 services plus a client running simultaneously. This will take some thought.
It looks like it doesn't support search for VCFs across multiple chromosomes; the implementation seems to assume that one VCF file can only contain records from 1 chromosome.
E.g., for the following sample VCF File (headers are stripped)
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 217-70-3296_sample_1
1 76569151 . C A . PASS . GT 0/0
1 82441079 . C T . PASS . GT 0/0
3 46018344 . G A . PASS . GT 1/1
21 34609505 . G A . PASS . GT 1/1
If you do a /variants
search on this, it will return the referenceName
as None
, start as the first row, which is 76569151
, and end as the last row, which is 34609505
, even though they belong to different chromosomes
Using pytest, create a small test suite using another client to successfully pull down slices off the packaged data:
Can you please confirm if v0.1.4 of htsget_app is the latest to be used for CanDIGv2. It looks like the stable branch is ahead but I wanted to make sure that we should switch to it instead of the v0.1.4 tagged version.
htsget should access the data file directly from minio
There's a number of ways this can be done, but a simple and common way to do this is with configparser
: https://docs.python.org/3/library/configparser.html
Here's an example of combining configparser with argparse to allow the config parameters to be overridden on the command line: https://stackoverflow.com/questions/3609852/which-is-the-best-way-to-allow-configuration-options-be-overridden-at-the-comman
http://abc.ca:3333/htsget/v1/reads?id=abc.bam&format=BAM&reference_name=21&start=14099895&end=14168318
The request only yields the following response, which spans across the entire chromosome "http://abc.ca:3333/htsget/v1/data?id=abc.bam&reference_name=21"
While a bigger range request
http://abc.ca:3333/htsget/v1/reads?id=abc&reference_name=21&start=0&end=10000000
yields
{
"htsget": {
"format": "BAM",
"urls": [
{
"url": "http://abc.ca:3333/htsget/v1/data?id=abc&reference_name=21&start=0&end=10000000"
},
{
"url": "http://abc.ca:3333/htsget/v1/data?id=abc&reference_name=21&start=10000000&end=10000000"
}
]
}
}
The second url does not seem to be needed, but at least it includes the start
and end
in its response.
Currently, the /data
endpoint defaults to returning plain text (equivalent to SAM), the endpoint should really allow for the option to return data in binary or plain text format.
Additionally, IGV browser would expect the binary (bam) formatted data, not sam.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.