genialis / resolwe-bio Goto Github PK
View Code? Open in Web Editor NEWBioinformatics pipelines for Resolwe
License: Apache License 2.0
Bioinformatics pipelines for Resolwe
License: Apache License 2.0
Remove "Differential expression" from the data name.
New process slug: go-enrichment.
Inputs:
Algorithm
You will have to use resdk to access GAF data object and map genes.
If gaf input not given:
If source input is equal to the source of any GAF data object on the platform (query data resources). Download the GAF file of the latest data:gaf object (use resdk). RUN GOTEA
If the source input has to be mapped:
Find out to which of the GAF sources do the gene ids map (by query). Map them, then RUN GOTEA.
If gaf input given:
If gaf source is equal to source input:
RUN GOTEA
If gaf source is not equal to source input:
Try to map gene_ids input to gaf source.
If nothing mapps raise error “Input genes did not map to GAF gene ids.”
If mappings successful: RUN GOTEA
How should we implement the mappings, do queries? I suggest Domen works on the mapping part of the process.
It would be more consistent with the rest of the API if the Sample
resource was accessible via /api/sample
and annotated
would be a filter query argument. So you would then do queries:
/api/sample?annotated=1
instead of /api/sample/annotated
/api/sample?annotated=0
instead of /api/sample/unannotated
This would make it more consistent for use in the GenJs frontend API.
This problem affects resolwebio/base
Docker image which our other images are derived from. Consequently, many processes cannot be updated. This is a known problem https://support.bioconductor.org/p/101833/. Even the official bioconductor/release-base Docker image uses an older version of R.
Processors:
create-geneset
upload-geneset
create-geneset-venn
Also add a warning if there are duplicated genes.
Currently, EnrichmentProcessorTestCase needs to access gene knowledge base on an external API server which is prone to test's failures and inconsistencies.
Replace this by using Resolwe's custom LiveServerTestCase
.
Prerequisites:
LiveServerTestCase
.In cuffnorm one of the outputs is boxplot. There should be sample names instead of replicates numbers.
The example is on BCM server: https://bcm.genialis.com/rna-seq/bioinformatics/collection/p4936_ms2_rna/data/slug/cuffnorm-p4936_rna_ms2_2hr_r2-p4936_rna_ms2_8hr_r3-p4936_rna_ms2_8hr_r2-p4936_rna_ms2_8hr_r1-2?_b=15c8ff39-d90b-487f-a40c-12912ba5c838&_s=36d81108-56f8-4d3d-8f88-995631216903
Currently, building this Docker images on Docker Hub fails with:
Downloading 'ncbi/sra-toolkit' version '2.8.2-1'...
Fetching 'https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-ubuntu64.tar.gz'...
Verifying package...
�[91mERROR: SHA256 digest mismatch.
Here is an example build log.
Currently, resolwe_bio.tests.processes.test_generate_report.ReportProcessorTestCase.test_amplicon_report
test is disabled due to requiring a custom Docker image.
I think we need to improve the following:
The test is quite slow. It takes ~35s on my machine. I profiled the test line-by-line and obtained the following result:
Timer unit: 1e-06 s
Total time: 35.5463 s
File: /home/tadej/Genialis/resolwe-bio/resolwe_bio/tests/processes/test_generate_report.py
Function: test_amplicon_report at line 8
Line # Hits Time Per Hit % Time Line Contents
==============================================================
8 @skipDockerFailure("Processor requires a custom Docker image.")
9 @profile
10 def test_amplicon_report(self):
11 1 933503 933503.0 2.6 template = self.run_process('upload-file', {'src': 'report_template.tex'})
12 1 898266 898266.0 2.5 logo = self.run_process('upload-file', {'src': 'genialis_logo.pdf'})
13
14 1 936879 936879.0 2.6 bam = self.run_process('upload-bam', {'src': '56GSID_10k_trimmed.bam'})
15 1 924495 924495.0 2.6 bed = self.run_process('upload-bed', {'src': '56g_targets_small.bed'})
16
17 1 2076697 2076697.0 5.8 coverage = self.run_process('coveragebed', {'alignment': bam.id, 'bed': bed.id})
18
19 1 2835931 2835931.0 8.0 genome = self.run_process('upload-genome', {'src': 'hs_b37_chr22_frag.fasta.gz'})
20 1 890908 890908.0 2.5 bed_picard = self.run_process('upload-bed', {'src': '56g_targets_picard_small.bed'})
21
22 1 4 4.0 0.0 inputs = {'src': 'Mills_and_1000G_gold_standard.indels.b37.chr22_small.vcf.gz'}
23 1 905177 905177.0 2.5 indels = self.run_process('upload-variants-vcf', inputs)
24
25 1 869405 869405.0 2.4 dbsnp = self.run_process('upload-variants-vcf', {'src': 'dbsnp_138.b37.chr22_small.vcf.gz'})
26
27 inputs = {
28 1 5 5.0 0.0 'alignment': bam.id,
29 1 1 1.0 0.0 'bed': bed_picard.id,
30 1 1 1.0 0.0 'genome': genome.id,
31 1 1 1.0 0.0 'known_indels': [indels.id],
32 1 3 3.0 0.0 'known_vars': [dbsnp.id]
33 }
34
35 1 21350158 21350158.0 60.1 preprocess_bam = self.run_process('vc-preprocess-bam', inputs)
36
37 report_inputs = {
38 1 2 2.0 0.0 'bam': preprocess_bam.id,
39 1 0 0.0 0.0 'coverage': coverage.id,
40 1 2 2.0 0.0 'template': template.id,
41 1 1 1.0 0.0 'logo': logo.id
42 }
43
44 1 2924870 2924870.0 8.2 self.run_process('amplicon-report', report_inputs)
This revealed the following issues:
vc-preprocess-bam
process which is already covered by the resolwe_bio.tests.processes.test_variant_calling.VariantCallingTestCase.test_vc_preprocess_bam
test.amplicon-report
process which means the test could be a lot faster.The test requires a custom Docker image only due to one of its prerequisites requiring a custom Docker image. The amplicon-report
process doesn't actually require a custom Docker image.
I suggest you rewrite the test to only run the amplicon-report
process and provide it with the 4 pre-computed inputs. This will speed up the test significantly and remove the requirement for a custom Docker image.
resolwe-bio/resolwe_bio/processes/alignment/subread.yml
Lines 167 to 168 in f7a80b5
The second line should be
-p {{ PE_options.consensus_subreads }}
The newest version of Pylint (1.8.3) detects that parameters of the over-ridden create()
in FeatureViewSet
and MappingViewSet
differ from the one in DRF:
linters runtests: commands[1] | pylint resolwe_bio .scripts/check_large_files.py
Using config file /var/lib/jenkins/jobs/genialis-github/jobs/resolwe-bio/branches/PR-519/workspace/.pylintrc
************* Module resolwe_bio.kb.views
W:110, 4: Parameters differ from overridden 'create' method (arguments-differ)
W:184, 4: Parameters differ from overridden 'create' method (arguments-differ)
------------------------------------------------------------------
Your code has been rated at 9.99/10 (previous run: 9.99/10, -0.00)
Probably, the overridden methods should also handle *args
and **kwargs
?
Issue moved from genialis/resolwe-bio-py#78
To reproduce, run:
import resdk
res = resdk.Resolwe(url='https://qa.genialis.com')
res.feature.filter(source="NCBI", query=range(300)) # works
res.feature.filter(source="NCBI", query=range(400)) # fails
Elastic search traceback:
Traceback:
File "/srv/genialis/venv/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
149. response = self.process_exception_by_middleware(e, request)
File "/srv/genialis/venv/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
147. response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/srv/genialis/venv/lib/python2.7/site-packages/django/views/decorators/csrf.py" in wrapped_view
58. return view_func(*args, **kwargs)
File "/srv/genialis/venv/lib/python2.7/site-packages/rest_framework/viewsets.py" in view
83. return self.dispatch(request, *args, **kwargs)
File "/srv/genialis/venv/lib/python2.7/site-packages/rest_framework/views.py" in dispatch
477. response = self.handle_exception(exc)
File "/srv/genialis/venv/lib/python2.7/site-packages/rest_framework/views.py" in handle_exception
437. self.raise_uncaught_exception(exc)
File "/srv/genialis/venv/lib/python2.7/site-packages/rest_framework/views.py" in dispatch
474. response = handler(request, *args, **kwargs)
File "/srv/genialis/venv/lib/python2.7/site-packages/resolwe/elastic/viewsets.py" in list_with_post
158. return self.paginate_response(search)
File "/srv/genialis/venv/lib/python2.7/site-packages/resolwe/elastic/viewsets.py" in paginate_response
120. return Response(serializer.data)
File "/srv/genialis/venv/lib/python2.7/site-packages/rest_framework/serializers.py" in data
725. ret = super(ListSerializer, self).data
File "/srv/genialis/venv/lib/python2.7/site-packages/rest_framework/serializers.py" in data
262. self._data = self.to_representation(self.instance)
File "/srv/genialis/venv/lib/python2.7/site-packages/rest_framework/serializers.py" in to_representation
643. self.child.to_representation(item) for item in iterable
File "/srv/genialis/venv/lib/python2.7/site-packages/elasticsearch_dsl/search.py" in __iter__
233. return iter(self.execute())
File "/srv/genialis/venv/lib/python2.7/site-packages/elasticsearch_dsl/search.py" in execute
627. **self._params
File "/srv/genialis/venv/lib/python2.7/site-packages/elasticsearch/client/utils.py" in _wrapped
69. return func(*args, params=params, **kwargs)
File "/srv/genialis/venv/lib/python2.7/site-packages/elasticsearch/client/__init__.py" in search
539. doc_type, '_search'), params=params, body=body)
File "/srv/genialis/venv/lib/python2.7/site-packages/elasticsearch/transport.py" in perform_request
327. status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/srv/genialis/venv/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py" in perform_request
109. self._raise_error(response.status, raw_data)
File "/srv/genialis/venv/lib/python2.7/site-packages/elasticsearch/connection/base.py" in _raise_error
113. raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
Exception Type: TransportError at /api/kb/feature/search
Exception Value: TransportError(500, u'search_phase_execution_exception', u'maxClauseCount is set to 1024')
Building documentation with the latest version of Sphinx (1.6.1) fails with:
$ python setup.py build_sphinx --fresh-env --warning-is-error
running build_sphinx
Running Sphinx v1.6.1
loading intersphinx inventory from https://docs.python.org/3/objects.inv...
loading intersphinx inventory from https://resolwe.readthedocs.io/en/latest/objects.inv...
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 10 source files that are out of date
updating environment: 10 added, 0 changed, 0 removed
reading sources... [100%] ref
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [ 10%] CHANGELOG
Exception occurred:
File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx_rtd_theme/layout.html", line 45, in top-level template code
{% for cssfile in css_files %}
TypeError: 'NoneType' object is not iterable
The full traceback has been saved in /tmp/sphinx-err-453cydcj.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
A bug report can be filed in the tracker at <https://github.com/sphinx-doc/sphinx/issues>. Thanks!
Full traceback in /tmp/sphinx-err-453cydcj.log
is:
Traceback (most recent call last):
File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx/setup_command.py", line 192, in run
app.build(force_all=self.all_files)
File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx/application.py", line 338, in build
self.builder.build_update()
File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx/builders/__init__.py", line 328, in build_update
'out of date' % len(to_build))
File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx/builders/__init__.py", line 394, in build
self.write(docnames, list(updated_docnames), method)
File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx/builders/__init__.py", line 431, in write
self._write_serial(sorted(docnames))
File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx/builders/__init__.py", line 440, in _write_serial
self.write_doc(docname, doctree)
File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx/builders/html.py", line 556, in write_doc
self.handle_page(docname, ctx, event_arg=doctree)
File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx/builders/html.py", line 940, in handle_page
output = self.templates.render(templatename, ctx)
File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx/jinja2glue.py", line 176, in render
return self.environment.get_template(template).render(context)
File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/jinja2/environment.py", line 1008, in render
return self.environment.handle_exception(exc_info, True)
File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/jinja2/environment.py", line 780, in handle_exception
reraise(exc_type, exc_value, tb)
File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/jinja2/_compat.py", line 37, in reraise
raise value.with_traceback(tb)
File "/home/tadej/Genialis/resolwe-bio/docs/_templates/page.html", line 3, in top-level template code
{% set css_files = css_files + ["_static/css/custom.css"] %}
File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx/themes/basic/page.html", line 10, in top-level template code
{%- extends "layout.html" %}
File "/home/tadej/.virtualenvs/resolwe-bio/lib/python3.5/site-packages/sphinx_rtd_theme/layout.html", line 45, in top-level template code
{% for cssfile in css_files %}
TypeError: 'NoneType' object is not iterable
It looks like something is wrong in our docs/_templates/page.html
file on line:
{% set css_files = css_files + ["_static/css/custom.css"] %}
There is some left-over escaping for MongoDB syntax from the times that Resolwe used MongoDB. It should be removed as it may cause problems.
Function in question, which is used in some places and should be removed:
resolwe-bio/resolwe_bio/tools/utils.py
Lines 11 to 13 in 9518106
It is not recommended to use try
/except
in scenarios where failure in expected (handling errors takes much more time than other checks), so it would be much faster to use get_or_create
query and checks second parameter of the returned tuple, which tells if object was created or not.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.