Giter Site home page Giter Site logo

solvebio-python's People

Contributors

dandanxu avatar davecap avatar eyalfoni avatar jercoh avatar jhuttner avatar jsh2134 avatar krivi95 avatar kuzentio avatar markkaganovich avatar nikolamaric avatar nishanthmerwin avatar nrasulicpfm avatar oboforty avatar pgeez avatar rocky avatar spaugh avatar wesf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

solvebio-python's Issues

Make the netrc path configurable

Currently, the NETRC_PATH is always set to the users home directory in this block. In the multiuser environment we are running on, I am encountering permission issues in this block. By making NETRC_PATH, it would allow users of solvebio to use a fixed path rather than using the home directory which will change user to user.

"Cannot detect terminal column width" error in ipython notebooks

"Cannot detect terminal column width" error message occurs when running solvebio in ipython notebooks

output in ipython notebooks:

import solvebio
[SolveBio] Cannot detect terminal column width
WARNING:solvebio:Cannot detect terminal column width

it's just a minor error, it works totally fine

support partial slice request

the client currently explodes if you pass a key like "q[:50]" or "q[50:]" or "q[:]". this should be supported as per the standard Python implementation

List index out of range error when looping through query

This is only a problem in the staging side, not the production side. When I try to loop through the whole dataset, the last record that comes back gives an indexing error.

for n in db.query().filter(gene_symbols='AGTR1'): print n

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/Users/dandanxu/Documents/solvebio/solvebio-python/solvebio/cli/ipython.pyc in <module>()
----> 1 for n in db.query().filter(gene_symbols='AGTR1'): print n

/Users/dandanxu/Documents/solvebio/solvebio-python/solvebio/query.pyc in next(self)
    433         self._cursor.advance()
    434 
--> 435         return self.results[_result_start]
    436 
    437     def _build_query(self, **kwargs):

IndexError: list index out of range

in context:

[SolveBio] In <22>: db.query().filter(gene_symbols='AGTR1')
           Out<22>: 

|                Fields | Data                             |
|-----------------------+----------------------------------|
|     alternate_alleles | [u'T']                           |
|       clinical_origin | [u'germline']                    |
| clinical_significance | Pathogenic                       |
|          gene_symbols | [u'AGTR1']                       |
|       hg18_chromosome | 3                                |
|              hg18_end | 149941888                        |
|            hg18_start | 149941888                        |
|       hg19_chromosome | 3                                |
|              hg19_end | 148459198                        |
|            hg19_start | 148459198                        |
|       hg38_chromosome | 3                                |
|              hg38_end | 148741411                        |
|            hg38_start | 148741411                        |
|                  hgvs | [u'NC_000003.12:g.148741411C>T'] |
|          rcvaccession | RCV000043469                     |
|  rcvaccession_version | 23                               |
|      reference_allele | C                                |
|                  rsid | rs397514687                      |
|                  type | SNV                              |

... 5 more results.

[SolveBio] In <23>: for n in db.query().filter(gene_symbols='AGTR1'): print n
{u'gene_symbols': [u'AGTR1'], u'hg18_chromosome': u'3', u'hg38_chromosome': u'3', u'hg19_start': 148459198, u'rcvaccession': u'RCV000043469', u'hg38_start': 148741411, u'hg18_end': 149941888, u'hg18_start': 149941888, u'reference_allele': u'C', u'hg38_end': 148741411, u'rcvaccession_version': 23, u'rsid': u'rs397514687', u'hg19_chromosome': u'3', u'hg19_end': 148459198, u'hgvs': [u'NC_000003.12:g.148741411C>T'], u'clinical_significance': u'Pathogenic', u'alternate_alleles': [u'T'], u'clinical_origin': [u'germline'], u'type': u'SNV'}
{u'gene_symbols': [u'AGTR1'], u'hg18_chromosome': u'3', u'hg38_chromosome': u'3', u'hg19_start': 148459667, u'rcvaccession': u'RCV000019690', u'hg38_start': 148741880, u'hg18_end': 149942357, u'hg18_start': 149942357, u'reference_allele': u'C', u'hg38_end': 148741880, u'rcvaccession_version': 25, u'rsid': u'rs104893677', u'hg19_chromosome': u'3', u'hg19_end': 148459667, u'hgvs': [u'NC_000003.12:g.148741880C>T'], u'clinical_significance': u'Pathogenic', u'alternate_alleles': [u'T'], u'clinical_origin': [u'germline'], u'type': u'SNV'}
{u'gene_symbols': [u'AGTR1'], u'hg18_chromosome': u'3', u'hg38_chromosome': u'3', u'hg19_start': 148414685, u'rcvaccession': u'RCV000093069', u'hg38_start': 148696898, u'hg18_end': 149897375, u'hg18_start': 149897375, u'reference_allele': u'G', u'hg38_end': 148696898, u'rcvaccession_version': 1, u'rsid': u'rs207463871', u'hg19_chromosome': u'3', u'hg19_end': 148414685, u'hgvs': [u'NC_000003.12:g.148696898G>C'], u'clinical_significance': u'other', u'alternate_alleles': [u'C'], u'clinical_origin': [u'somatic'], u'type': u'SNV'}
{u'gene_symbols': [u'AGTR1'], u'hg18_chromosome': u'3', u'hg38_chromosome': u'3', u'hg19_start': 148458931, u'rcvaccession': u'RCV000019689', u'hg38_start': 148741144, u'hg18_end': 149941621, u'hg18_start': 149941621, u'reference_allele': u'A', u'hg38_end': 148741144, u'rcvaccession_version': 27, u'rsid': u'rs387906577', u'hg19_chromosome': u'3', u'hg19_end': 148458931, u'hgvs': [u'NC_000003.12:g.148741145dupT'], u'clinical_significance': u'Pathogenic', u'alternate_alleles': [u'AT'], u'clinical_origin': [u'germline'], u'type': u'DIV'}
{u'gene_symbols': [u'AGTR1'], u'hg18_chromosome': u'3', u'hg38_chromosome': u'3', u'hg19_start': 148459988, u'rcvaccession': u'RCV000019688', u'hg38_start': 148742201, u'hg18_end': 149942678, u'hg18_start': 149942678, u'reference_allele': u'A', u'hg38_end': 148742201, u'rcvaccession_version': 2, u'rsid': u'rs5186', u'hg19_chromosome': u'3', u'hg19_end': 148459988, u'hgvs': [u'NC_000003.12:g.148742201A>C'], u'clinical_significance': u'other', u'alternate_alleles': [u'C'], u'clinical_origin': [u'germline'], u'type': u'SNV'}
{u'gene_symbols': [u'AGTR1'], u'hg18_chromosome': u'3', u'hg38_chromosome': u'3', u'hg19_start': 148459073, u'rcvaccession': u'RCV000043468', u'hg38_start': 148741286, u'hg18_end': 149941763, u'hg18_start': 149941763, u'reference_allele': u'G', u'hg38_end': 148741286, u'rcvaccession_version': 23, u'rsid': u'rs398122935', u'hg19_chromosome': u'3', u'hg19_end': 148459073, u'hgvs': [u'NC_000003.12:g.148741286G>A'], u'clinical_significance': u'Pathogenic', u'alternate_alleles': [u'A'], u'clinical_origin': [u'germline'], u'type': u'SNV'}
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/Users/dandanxu/Documents/solvebio/solvebio-python/solvebio/cli/ipython.pyc in <module>()
----> 1 for n in db.query().filter(gene_symbols='AGTR1'): print n

/Users/dandanxu/Documents/solvebio/solvebio-python/solvebio/query.pyc in next(self)
    433         self._cursor.advance()
    434 
--> 435         return self.results[_result_start]
    436 
    437     def _build_query(self, **kwargs):

IndexError: list index out of range

[SolveBio] In <24>: 

David Caplan

davecap commented an hour ago
Thanks for catching this.. it looks like its a problem with the solvebio-python package, not the API. Can you move it over there?

Python booleans not supported in filters

is_reference_allele=False
doesn't work

but is_reference_allele='False'
does

[SolveBio] In <39>: kg.query().filter(allele_frequency_eur__gte='0.05',allele_frequency_eas__lte='0.0001',is_reference_allele=False)
           Out<39>: Query returned 0 results.

[SolveBio] In <40>: kg.query().filter(allele_frequency_eur__gte='0.05',allele_frequency_eas__lte='0.0001',is_reference_allele='False')
           Out<40>: 

|                   Fields | Data                                              |
|--------------------------+---------------------------------------------------|
|                   allele | A                                                 |
|             allele_count | 124                                               |
| allele_count_denominator | 5008                                              |
|         allele_frequency | 0.0247603834                                      |
|     allele_frequency_afr | 0.0030257186                                      |
|     allele_frequency_amr | 0.0360230548                                      |
|     allele_frequency_eas | 0.0                                               |
|     allele_frequency_eur | 0.0884691849                                      |
|     allele_frequency_sas | 0.0061349693                                      |
|         alternate_allele | [u'G']                                            |
|      genomic_coordinates | {u'start': 106857006, u'stop': 106857006, u'build'|
|         homozygote_count | 8                                                 |
|      is_reference_allele | True                                              |
|               population | [{u'allele_frequency': 0.0208333333, u'allele ... |
|         reference_allele | A                                                 |
|                    rs_id | rs703478                                          |
|               variant_id | rs703478                                          |

... 28,112 more results.

Weird error message when giving password when logging in

But after this error message, if i open the solvebio shell, I'm logged in anyways. Not sure if this is just a weird one-off on my end or not?


solvebio --api-host https://api.solvebio.com login
Email address: [email protected]
Password (typing will be hidden): 
Traceback (most recent call last):
  File "/Users/dandanxu/.virtualenvs/solvebio/bin/solvebio", line 9, in <module>
    load_entry_point('solvebio==1.5.2', 'console_scripts', 'solvebio')()
  File "/Users/dandanxu/.virtualenvs/solvebio/lib/python2.7/site-packages/solvebio/cli/main.py", line 93, in main
    args.func(args)
  File "/Users/dandanxu/.virtualenvs/solvebio/lib/python2.7/site-packages/solvebio/cli/auth.py", line 50, in login
    response = client.request('post', '/v1/auth/token', data)
  File "/Users/dandanxu/.virtualenvs/solvebio/lib/python2.7/site-packages/solvebio/client.py", line 157, in request
    _handle_request_error(e)
TypeError: _handle_request_error() takes exactly 2 arguments (1 given)

Queries are evaluated twice on failure in some cases

A query will be sent twice when in the interactive shell if the query raises an exception (in the case of an invalid filter).

Example:

$ solvebio
>>> Dataset.retrieve('ClinVar/ClinVar').query().filter(review_status_star__in=[])

Unicode displaying in dataset descriptions

  • when I do look at, for example, clinvar_phenotypes.fields(), the apostrophes come out as \u2019
clinvar_phenotypes = solvebio.Dataset.retrieve('ClinVar/2.0.0-1/Phenotypes')
clinvar_phenotypes.fields()
{
      "class_name": "DatasetField", 
      "created_at": "2014-07-24T01:06:08Z", 
      "data_type": "string", 
      "dataset": "ClinVar/2.0.0-1/Phenotypes", 
      "dataset_id": 23, 
      "description": "A universally unique identifier containing the variant\u2019s RCV Accession Number and GRC human reference assembly.", 
      "facets_url": "https://api.solvebio.com/v1/dataset_fields/673/facets", 
      "full_name": "ClinVar/2.0.0-1/Phenotypes/uuid", 
      "id": 673, 
      "name": "uuid", 
      "updated_at": "2014-07-24T23:13:24.547Z", 
      "url": "https://api.solvebio.com/v1/dataset_fields/673"
    }

[Manifest] Add() not allowing my local filepath

manifest = solvebio.Manifest()
manifest.add(path='~/Desktop/data.json.gz')

is causing

ValueError: Paths in manifest must be valid URL (starting with http:// or https://) or a valid local filename, directory, or glob (such as: *.vcf)

Maybe something in /resource/manifest.py is not recognizing tildes?

Easy way to pull out latest available datasets for retrieval in the API

Right now I'm still going back to the website (www.solvebio.com/library) every time I want a new dataset to pull out the exact dataset name for the one I'm interested in.
I know that solvebio.Depository.all() exists, but it's too bulky to quickly scan by eye.
Would really like something like solvebio.Depository.latest()
that brings me just a concise non-detailed list like:
ClinVar/2.0.0-1/Variants
ClinVar/2.0.0-1/Submissions
ClinVar/2.0.0-1/Phenotypes
HGNC/1.0.0-1/HGNC
and the dataset IDs or something.

related: Would it be possible to make an auto-complete or something for solvebio.Dataset.retrieve so that when I start typing solvebio.Dataset.retrieve('C -- and tabs, it brings up ClinVar? or too complicated?

OAuth2 app client_credentials support

Add a mechanism which exchanges client credentials (app ID and app secret) for a bearer access token. Support both user secret API keys as well as OAuth2 bearer tokens.

API Key Login not registering?

Example

solvebio.login(api_key='XYZABC')
solvebio.User.retrieve()
Traceback (most recent call last):
File "", line 1, in
File "/Users/hello/code/solvebio/solvebio-python/solvebio/resource/apiresource.py", line 114, in retrieve
return super(SingletonAPIResource, cls).retrieve(None)
File "/Users/hello/code/solvebio/solvebio-python/solvebio/resource/apiresource.py", line 25, in retrieve
instance.refresh()
File "/Users/hello/code/solvebio/solvebio-python/solvebio/resource/apiresource.py", line 29, in refresh
self.refresh_from(self.request('get', self.instance_url()))
File "/Users/hello/code/solvebio/solvebio-python/solvebio/resource/solveobject.py", line 72, in request
response = client.request(method, url, **kwargs)
File "/Users/hello/code/solvebio/solvebio-python/solvebio/client.py", line 213, in request
_handle_api_error(response)
File "/Users/hello/code/solvebio/solvebio-python/solvebio/client.py", line 32, in _handle_api_error
raise SolveError(response=response)
solvebio.errors.SolveError: You do not have permission to perform this action.

solvebio login shows unusual email default

solvebio login
Email address (Namespace(api_host=None, api_key=None, func=<function login at 0x100c529b0>, subcommands='login')): Traceback (most recent call last):

This will be fixed soon.

initial double-querying

if you do the following query...

q = .query(limit=N)[N:]

the client queries the backend twice. however, it does not when you query...

q = .query(limit=N)[M:](M < N)

Test suite & sanity checks

We need a "solvebio test" subcommand that runs sanity checks against the API and a core set of tests to ensure that everything is working.

add support for .count()

similar to Django ORM, support count() which returns total number of results produced by query.

caveat: count() should operate independently of the limit keyword

e.g. consider the following SQ:
if

SELECT COUNT(*) FROM SOME_TABLE
>> 1,000,000

then

SELECT COUNT(*) FROM SOME_TABLE LIMIT 10
>> 1,000,000

whereas

SELECT COUNT(*) FROM (SELECT * FROM SOME_TABLE LIMIT 10)
>> 10

Add test coverage for changelog requests

Examples:

Get the changelog for a specific version of the ClinVar/ClinVar dataset compared to the previous version:
https://api.solvebio.com/v1/datasets/clinvar/3.7.0-2015-12-06/clinvar/changelog

Get the changelog for a specific version of ClinVar compared to a different previous version:
https://api.solvebio.com/v1/datasets/clinvar/3.7.0-2015-12-06/clinvar/changelog/3.6.0-2015-09-04

Get the changelog for a Depository Version compared to the previous (compares all datasets within):
https://api.solvebio.com/v1/depository_versions/clinvar/3.7.0-2015-12-06/changelog

The tests should use the changelog() method on the Dataset resource, with and without a previous version:
Dataset.retrieve('ClinVar/3.7.0-2015-12-06/ClinVar').changelog()
Dataset.retrieve('ClinVar/3.7.0-2015-12-06/ClinVar').changelog('3.6.0-2015-09-04')

Auth credentials error

Even with the presence of a valid .solvebio/credentials file, $SOLVEBIO_API_KEY must be set for the client to work.

The bug occurs because client.SolveTokenAuth._get_token_auth reads solvebio.api_token, but solvebio.api_token is only set to a non-None value if $SOLVEBIO_API_KEY is defined (see: solvebio/__init__.py)

Bulk query interface

The bulk query interface should allow users to query multiple datasets in a single request.

client-side warning for long-iterating queries

if a client naively queries a dataset and attempts to page through it, raise an Exception() upon an attempt to fetch the second page and inform client of (as of yet to be determined) query parameter to enable endless paging.

fix `limit` semantics

inline with SQL, limit should be a hard limit on the number of rows retrieved. period.

what does this do?

Dataset.query(limit=100)[0:1000]

does it return a list of 100 results, or does it raise an IndexOutOfBoundsException?

easier way to logout

instead of having to type solvebio.cli.auth.logout('')
just typing: logout
should log the user out

Filter by number of items in a field

It's be awesome to be able to set a filter that only returns records with a certain number of items in a field. For example, I'm only interested in multi-allelic records in ClinVar/2.0.0-1/Variants, so I want to only retrieve records with something like len(alternate_alleles) > 1.

[Manifest] Add() not allowing URLs

[SolveBio] In <7>: manifest.add('https://molgenis26.target.rug.nl/downloads/gonl_public/variants/release5/gonl.chr1.snps_indels.r5.vcf.gz')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/Library/Python/2.7/site-packages/solvebio/cli/ipython.pyc in <module>()
----> 1 manifest.add('https://molgenis26.target.rug.nl/downloads/gonl_public/variants/release5/gonl.chr1.snps_indels.r5.vcf.gz')

/Library/Python/2.7/site-packages/solvebio/resource/manifest.pyc in add(self, *args)
     63 
     64         for path in args:
---> 65             path = os.path.expandpath(path)
     66 
     67             if _is_url(path):

AttributeError: 'module' object has no attribute 'expandpath'

manifest.add_url() works perfectly fine for URLs

Handle API rate-limiting in the client

The API will respond with 429 Too Many Requests when the client is rate-limited and a header Retry-After which contains the number of seconds to wait before retrying the request. The client should time.sleep(<Retry-After seconds>) and then retry the request.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.