Giter Site home page Giter Site logo

resolwe-bio's Introduction

Resolwe Bioinformatics

Build Status Coverage Status Documentation Status Version on PyPI Supported Python versions

Bioinformatics pipelines for the Resolwe dataflow package for Django framework.

Docs & Help

Read about getting started and how to write processes in the documentation.

To chat with developers or ask for help, join us on Slack.

Install

Prerequisites

Make sure you have Python 3.6 installed on your system. If you don't have it yet, follow these instructions.

Resolwe requires PostgreSQL (9.4+). Many Linux distributions already include the required version of PostgreSQL (e.g. Fedora 22+, Debian 8+, Ubuntu 15.04+) and you can simply install it via distribution's package manager. Otherwise, follow these instructions.

Additionally, installing some (indirect) dependencies from PyPI will require having a C compiler (e.g. GCC) as well as Python development files installed on the system.

Note

The preferred way to install the C compiler and Python development files is to use your distribution's packages, if they exist. For example, on a Fedora/RHEL-based system, that would mean installing gcc and python3-devel packages.

Using PyPI

pip install resolwe-bio

To install a pre-release, use:

pip install --pre resolwe-bio

Using source on GitHub

pip install --pre https://github.com/genialis/resolwe-bio/archive/<git-tree-ish>.tar.gz

where <git-tree-ish> can represent any commit SHA, branch name, tag name, etc. in Resolwe Bioinformatics' GitHub repository. For example, to install the latest Resolwe Bioinformatics from the master branch, use:

pip install --pre https://github.com/genialis/resolwe-bio/archive/master.tar.gz

Contribute

We welcome new contributors. To learn more, read Contributing section of the documentation.

resolwe-bio's People

Contributors

abukosek avatar acopar avatar alovse avatar anzelovse avatar dblenkus avatar gregorjerse avatar hadalin avatar jberci avatar jenkob avatar jkokosar avatar jurezmrzlikar avatar jvrakor avatar kostko avatar laraj1 avatar lukaw3d avatar marcellevstek avatar markogresak avatar matyasfodor avatar miazganjar avatar miha-skalic avatar mstajdohar avatar mzganec avatar nkran avatar otonicarjan avatar plojyon avatar romunov avatar teja91 avatar tjanez avatar tristanbrown avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

resolwe-bio's Issues

Sample API resource is inconsistent with the rest of the API

It would be more consistent with the rest of the API if the Sample resource was accessible via /api/sample and annotated would be a filter query argument. So you would then do queries:

  • /api/sample?annotated=1 instead of /api/sample/annotated
  • /api/sample?annotated=0 instead of /api/sample/unannotated

This would make it more consistent for use in the GenJs frontend API.

Remove obstolete Mongo escaping

There is some left-over escaping for MongoDB syntax from the times that Resolwe used MongoDB. It should be removed as it may cause problems.

Function in question, which is used in some places and should be removed:

def escape_mongokey(key):
"""Escape keys when serializing database entries."""
return key.replace('$', u'\uff04').replace('.', u'\uff0e').replace(' ', '_')

Feature query fails for many genes

Issue moved from genialis/resolwe-bio-py#78

To reproduce, run:

import resdk
res = resdk.Resolwe(url='https://qa.genialis.com')

res.feature.filter(source="NCBI", query=range(300)) # works
res.feature.filter(source="NCBI", query=range(400)) # fails

Elastic search traceback:

Traceback:

File "/srv/genialis/venv/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
  149.                     response = self.process_exception_by_middleware(e, request)

File "/srv/genialis/venv/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
  147.                     response = wrapped_callback(request, *callback_args, **callback_kwargs)

File "/srv/genialis/venv/lib/python2.7/site-packages/django/views/decorators/csrf.py" in wrapped_view
  58.         return view_func(*args, **kwargs)

File "/srv/genialis/venv/lib/python2.7/site-packages/rest_framework/viewsets.py" in view
  83.             return self.dispatch(request, *args, **kwargs)

File "/srv/genialis/venv/lib/python2.7/site-packages/rest_framework/views.py" in dispatch
  477.             response = self.handle_exception(exc)

File "/srv/genialis/venv/lib/python2.7/site-packages/rest_framework/views.py" in handle_exception
  437.             self.raise_uncaught_exception(exc)

File "/srv/genialis/venv/lib/python2.7/site-packages/rest_framework/views.py" in dispatch
  474.             response = handler(request, *args, **kwargs)

File "/srv/genialis/venv/lib/python2.7/site-packages/resolwe/elastic/viewsets.py" in list_with_post
  158.         return self.paginate_response(search)

File "/srv/genialis/venv/lib/python2.7/site-packages/resolwe/elastic/viewsets.py" in paginate_response
  120.         return Response(serializer.data)

File "/srv/genialis/venv/lib/python2.7/site-packages/rest_framework/serializers.py" in data
  725.         ret = super(ListSerializer, self).data

File "/srv/genialis/venv/lib/python2.7/site-packages/rest_framework/serializers.py" in data
  262.                 self._data = self.to_representation(self.instance)

File "/srv/genialis/venv/lib/python2.7/site-packages/rest_framework/serializers.py" in to_representation
  643.             self.child.to_representation(item) for item in iterable

File "/srv/genialis/venv/lib/python2.7/site-packages/elasticsearch_dsl/search.py" in __iter__
  233.         return iter(self.execute())

File "/srv/genialis/venv/lib/python2.7/site-packages/elasticsearch_dsl/search.py" in execute
  627.                     **self._params

File "/srv/genialis/venv/lib/python2.7/site-packages/elasticsearch/client/utils.py" in _wrapped
  69.             return func(*args, params=params, **kwargs)

File "/srv/genialis/venv/lib/python2.7/site-packages/elasticsearch/client/__init__.py" in search
  539.             doc_type, '_search'), params=params, body=body)

File "/srv/genialis/venv/lib/python2.7/site-packages/elasticsearch/transport.py" in perform_request
  327.                 status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)

File "/srv/genialis/venv/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py" in perform_request
  109.             self._raise_error(response.status, raw_data)

File "/srv/genialis/venv/lib/python2.7/site-packages/elasticsearch/connection/base.py" in _raise_error
  113.         raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)

Exception Type: TransportError at /api/kb/feature/search
Exception Value: TransportError(500, u'search_phase_execution_exception', u'maxClauseCount is set to 1024')

Fix Pylint warning about over-ridden create methods in feature and mapping view sets

The newest version of Pylint (1.8.3) detects that parameters of the over-ridden create() in FeatureViewSet and MappingViewSet differ from the one in DRF:

linters runtests: commands[1] | pylint resolwe_bio .scripts/check_large_files.py
Using config file /var/lib/jenkins/jobs/genialis-github/jobs/resolwe-bio/branches/PR-519/workspace/.pylintrc
************* Module resolwe_bio.kb.views
W:110, 4: Parameters differ from overridden 'create' method (arguments-differ)
W:184, 4: Parameters differ from overridden 'create' method (arguments-differ)

------------------------------------------------------------------
Your code has been rated at 9.99/10 (previous run: 9.99/10, -0.00)

Probably, the overridden methods should also handle *args and **kwargs?

Make our custom HTML page template work with Sphinx 1.6.1+

Building documentation with the latest version of Sphinx (1.6.1) fails with:

$ python setup.py build_sphinx --fresh-env --warning-is-error
running build_sphinx
Running Sphinx v1.6.1
loading intersphinx inventory from https://docs.python.org/3/objects.inv...
loading intersphinx inventory from https://resolwe.readthedocs.io/en/latest/objects.inv...
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 10 source files that are out of date
updating environment: 10 added, 0 changed, 0 removed
reading sources... [100%] ref                                                                                                               
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [ 10%] CHANGELOG                                                                                                          
Exception occurred:
  File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx_rtd_theme/layout.html", line 45, in top-level template code
    {% for cssfile in css_files %}
TypeError: 'NoneType' object is not iterable
The full traceback has been saved in /tmp/sphinx-err-453cydcj.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
A bug report can be filed in the tracker at <https://github.com/sphinx-doc/sphinx/issues>. Thanks!

Full traceback in /tmp/sphinx-err-453cydcj.log is:

Traceback (most recent call last):
  File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx/setup_command.py", line 192, in run
    app.build(force_all=self.all_files)
  File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx/application.py", line 338, in build
    self.builder.build_update()
  File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx/builders/__init__.py", line 328, in build_update
    'out of date' % len(to_build))
  File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx/builders/__init__.py", line 394, in build
    self.write(docnames, list(updated_docnames), method)
  File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx/builders/__init__.py", line 431, in write
    self._write_serial(sorted(docnames))
  File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx/builders/__init__.py", line 440, in _write_serial
    self.write_doc(docname, doctree)
  File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx/builders/html.py", line 556, in write_doc
    self.handle_page(docname, ctx, event_arg=doctree)
  File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx/builders/html.py", line 940, in handle_page
    output = self.templates.render(templatename, ctx)
  File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx/jinja2glue.py", line 176, in render
    return self.environment.get_template(template).render(context)
  File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/jinja2/environment.py", line 1008, in render
    return self.environment.handle_exception(exc_info, True)
  File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/jinja2/environment.py", line 780, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/jinja2/_compat.py", line 37, in reraise
    raise value.with_traceback(tb)
  File "/home/tadej/Genialis/resolwe-bio/docs/_templates/page.html", line 3, in top-level template code
    {% set css_files = css_files + ["_static/css/custom.css"] %}
  File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx/themes/basic/page.html", line 10, in top-level template code
    {%- extends "layout.html" %}
  File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx_rtd_theme/layout.html", line 45, in top-level template code
    {% for cssfile in css_files %}
TypeError: 'NoneType' object is not iterable

It looks like something is wrong in our docs/_templates/page.html file on line:

{% set css_files = css_files + ["_static/css/custom.css"] %}

Write new GO enrichment process

New process slug: go-enrichment.

Inputs:

  • ontology (data:ontology:obo)
  • gene_ids (list:basic:string)
  • source (basic:string)
  • gaf (data:gaf, optional)
  • pvalue_threshold (basic:decimal, default=0.1)
  • genes_in_term_threshold (basic:integer, default=1)

Algorithm

You will have to use resdk to access GAF data object and map genes.

If gaf input not given:
If source input is equal to the source of any GAF data object on the platform (query data resources). Download the GAF file of the latest data:gaf object (use resdk). RUN GOTEA
If the source input has to be mapped:
Find out to which of the GAF sources do the gene ids map (by query). Map them, then RUN GOTEA.

If gaf input given:
If gaf source is equal to source input:
RUN GOTEA
If gaf source is not equal to source input:
Try to map gene_ids input to gaf source.
If nothing mapps raise error “Input genes did not map to GAF gene ids.”
If mappings successful: RUN GOTEA

How should we implement the mappings, do queries? I suggest Domen works on the mapping part of the process.

Improve and enable test_amplicon_report test

Currently, resolwe_bio.tests.processes.test_generate_report.ReportProcessorTestCase.test_amplicon_report test is disabled due to requiring a custom Docker image.

I think we need to improve the following:

  1. The test is quite slow. It takes ~35s on my machine. I profiled the test line-by-line and obtained the following result:

    Timer unit: 1e-06 s
    
    Total time: 35.5463 s
    File: /home/tadej/Genialis/resolwe-bio/resolwe_bio/tests/processes/test_generate_report.py
    Function: test_amplicon_report at line 8
    
    Line #      Hits         Time  Per Hit   % Time  Line Contents
    ==============================================================
         8                                               @skipDockerFailure("Processor requires a custom Docker image.")
         9                                               @profile
        10                                               def test_amplicon_report(self):
        11         1       933503 933503.0      2.6          template = self.run_process('upload-file', {'src': 'report_template.tex'})
        12         1       898266 898266.0      2.5          logo = self.run_process('upload-file', {'src': 'genialis_logo.pdf'})
        13                                           
        14         1       936879 936879.0      2.6          bam = self.run_process('upload-bam', {'src': '56GSID_10k_trimmed.bam'})
        15         1       924495 924495.0      2.6          bed = self.run_process('upload-bed', {'src': '56g_targets_small.bed'})
        16                                           
        17         1      2076697 2076697.0      5.8          coverage = self.run_process('coveragebed', {'alignment': bam.id, 'bed': bed.id})
        18                                           
        19         1      2835931 2835931.0      8.0          genome = self.run_process('upload-genome', {'src': 'hs_b37_chr22_frag.fasta.gz'})
        20         1       890908 890908.0      2.5          bed_picard = self.run_process('upload-bed', {'src': '56g_targets_picard_small.bed'})
        21                                           
        22         1            4      4.0      0.0          inputs = {'src': 'Mills_and_1000G_gold_standard.indels.b37.chr22_small.vcf.gz'}
        23         1       905177 905177.0      2.5          indels = self.run_process('upload-variants-vcf', inputs)
        24                                           
        25         1       869405 869405.0      2.4          dbsnp = self.run_process('upload-variants-vcf', {'src': 'dbsnp_138.b37.chr22_small.vcf.gz'})
        26                                           
        27                                                   inputs = {
        28         1            5      5.0      0.0              'alignment': bam.id,
        29         1            1      1.0      0.0              'bed': bed_picard.id,
        30         1            1      1.0      0.0              'genome': genome.id,
        31         1            1      1.0      0.0              'known_indels': [indels.id],
        32         1            3      3.0      0.0              'known_vars': [dbsnp.id]
        33                                                   }
        34                                           
        35         1     21350158 21350158.0     60.1          preprocess_bam = self.run_process('vc-preprocess-bam', inputs)
        36                                           
        37                                                   report_inputs = {
        38         1            2      2.0      0.0              'bam': preprocess_bam.id,
        39         1            0      0.0      0.0              'coverage': coverage.id,
        40         1            2      2.0      0.0              'template': template.id,
        41         1            1      1.0      0.0              'logo': logo.id
        42                                                   }
        43                                           
        44         1      2924870 2924870.0      8.2          self.run_process('amplicon-report', report_inputs)
    

    This revealed the following issues:

    • 60 % of the time is spent running the vc-preprocess-bam process which is already covered by the resolwe_bio.tests.processes.test_variant_calling.VariantCallingTestCase.test_vc_preprocess_bam test.
    • Only 8 % of the time is actually spent calling the amplicon-report process which means the test could be a lot faster.
  2. The test requires a custom Docker image only due to one of its prerequisites requiring a custom Docker image. The amplicon-report process doesn't actually require a custom Docker image.

I suggest you rewrite the test to only run the amplicon-report process and provide it with the 4 pre-computed inputs. This will speed up the test significantly and remove the requirement for a custom Docker image.

Fix resolwebio/utils Docker image

Currently, building this Docker images on Docker Hub fails with:

Downloading 'ncbi/sra-toolkit' version '2.8.2-1'...

Fetching 'https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-ubuntu64.tar.gz'...

Verifying package...

�[91mERROR: SHA256 digest mismatch.

Here is an example build log.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.