Giter Site home page Giter Site logo

htsget_app's People

Contributors

afrosimon avatar brouillette avatar daisieh avatar dependabot[bot] avatar fcoralsasso avatar justin-ys avatar kcranston avatar ljdursi avatar mshadbolt avatar ordineu avatar pyup-bot avatar sealrealize avatar shaikh-rashid avatar zhengwin avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

zhengwin

htsget_app's Issues

The application does not seem to be following some of the htsget specs

The htsget spec defines that

  • the id should pass as a URL parameter, e.g., /data/<id>, but this application considers id a query string parameter, which treats it as /data?id=.

  • parameters should be written in camelCase, such as referenceName, the application spec writes it as reference_name.

  • server should accept chr1, 1 both as valid reference to chromosome 1, yet the server only accepts 1. The spec also indicates that chr is of type int, while it should be of type string.

Add python packaging for package

Following python_model_service, create directory structure and a setup.py such that one can do the following steps:

  • virtualenv htsenv
  • source htsenv/bin/activate
  • python setup.py install

and have the htsget server installed in the local python environment. This will require #1 to be complete so that one can change configuration parameters without having to reinstall.

create_slice returns entire chromosome for 1 slice

while it should respect the input's start and end parameters..

Currently, if the requested region is smaller than the chunked size, it returns the entire chromosome

This is not going to work

E.g.

If requesting a block of 10k bps, and the chunk size is 100k bps, instead of returning data chunks of 10k bps, it returns the entire chromosome, this doesn't make sense

Travis-CI tests

Once #2 and #3 are done we can easily add travis-CI testing for at least the local-file/sqlite3 service.

Dockerfile + Quay.io

After #1 and #2, create a docker file for the python model server (following the python model server), and we'll register the repository w/ Quay.io (after making it public) to have docker images built automatically as part of the CI process.

Create a README.md for the repo

  • Create a README which describes this repo as an implementation of the HTSGET protocol (and link to it), with a very brief description of the features and how to run it. You can, but needn't, reference the python model service (https://github.com/CanDIG/python_model_service) and use its README as a model.
  • Add badges for travis, pyup, and code factor. You can see how that's done in the python model service README; I've activated this repo for those services.
  • When the docker container is successfully being built we can also include the badge for quay.io.

DRS+Minio CI testing?

With #6 completed we can start thinking about automated CI testing with DRS and Minio servers; this may require the use of Circle-CI rather than Travis however since it will involve 3 services plus a client running simultaneously. This will take some thought.

The /variants endpoint is not handling the request correctly

It looks like it doesn't support search for VCFs across multiple chromosomes; the implementation seems to assume that one VCF file can only contain records from 1 chromosome.

E.g., for the following sample VCF File (headers are stripped)

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	217-70-3296_sample_1
1	76569151	.	C	A	.	PASS	.	GT	0/0
1	82441079	.	C	T	.	PASS	.	GT	0/0
3	46018344	.	G	A	.	PASS	.	GT	1/1
21	34609505	.	G	A	.	PASS	.	GT	1/1

If you do a /variants search on this, it will return the referenceName as None, start as the first row, which is 76569151, and end as the last row, which is 34609505, even though they belong to different chromosomes

Add some pytest tests

Using pytest, create a small test suite using another client to successfully pull down slices off the packaged data:

  • Should successfully pull some prescribed slices of both VCFs and BAMs, showing no difference from tabix-extracted subset
  • Should successfully pull entire file if no start/end given,
  • Should fail with the expected error if request is given for a data file which doesn't exist
  • Should fail with the expected error on malformed request (e.g., end < start)

Tag latest version of htsget_app

Can you please confirm if v0.1.4 of htsget_app is the latest to be used for CanDIGv2. It looks like the stable branch is ahead but I wanted to make sure that we should switch to it instead of the v0.1.4 tagged version.

The application does not seem to support small-range searches

http://abc.ca:3333/htsget/v1/reads?id=abc.bam&format=BAM&reference_name=21&start=14099895&end=14168318

The request only yields the following response, which spans across the entire chromosome "http://abc.ca:3333/htsget/v1/data?id=abc.bam&reference_name=21"

While a bigger range request

http://abc.ca:3333/htsget/v1/reads?id=abc&reference_name=21&start=0&end=10000000

yields

{
  "htsget": {
    "format": "BAM",
    "urls": [
      {
        "url": "http://abc.ca:3333/htsget/v1/data?id=abc&reference_name=21&start=0&end=10000000"
      },
      {
        "url": "http://abc.ca:3333/htsget/v1/data?id=abc&reference_name=21&start=10000000&end=10000000"
      }
    ]
  }
}

The second url does not seem to be needed, but at least it includes the start and end in its response.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.