ebivariation / eva-pipeline Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 24.0 7.16 MB

Genomic variation pipeline for the European Variation Archive, implemented using Spring Batch

Home Page: http://www.ebi.ac.uk/eva

License: Apache License 2.0

Java 89.61% Perl 10.39%

eva-pipeline's People

Contributors

Stargazers

Watchers

eva-pipeline's Issues

BUILD FAILURE using v2.0 release code

Hi. I followed the instructions using v2.0 source code.
When running tests by using mvn test package, got the following results.

Results :

Failed tests:
  CreateDatabaseIndexesStepTest.testIndexesAreCreated:75 expected:<[{ "v" : [1 , "key" : { "_id" : 1} , "name" : "_id_" , "ns" : "ed18730b-debc-479c-b625-1b9893dcfa0c.features"}, { "v" : 1] , "key" : { "name" ...> but was:<[{ "v" : [2 , "key" : { "_id" : 1} , "name" : "_id_" , "ns" : "ed18730b-debc-479c-b625-1b9893dcfa0c.features"}, { "v" : 2] , "key" : { "name" ...>
  StatisticsMongoWriterTest.shouldCreateIndexesInCollection:138 expected:<[{ "v" : 1 , "key" : { "_id" : 1} , "name" : "_id_" , "ns" : "8d03943c-db27-4dea-b3e3-421bf090acbe.populationStatistics"}, { "v" : 1 , "unique" : true , "key" : { "chr" : 1 , "start" : 1 , "ref" : 1 , "alt" : 1 , "sid" : 1 , "cid" : 1} , "name" : "vscid" , "ns" : "8d03943c-db27-4dea-b3e3-421bf090acbe.populationStatistics"}]> but was:<[{ "v" : 2 , "key" : { "_id" : 1} , "name" : "_id_" , "ns" : "8d03943c-db27-4dea-b3e3-421bf090acbe.populationStatistics"}, { "v" : 2 , "unique" : true , "key" : { "chr" : 1 , "start" : 1 , "ref" : 1 , "alt" : 1 , "sid" : 1 , "cid" : 1} , "name" : "vscid" , "ns" : "8d03943c-db27-4dea-b3e3-421bf090acbe.populationStatistics"}]>

Tests run: 475, Failures: 2, Errors: 0, Skipped: 1

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:14 min
[INFO] Finished at: 2020-02-17T17:09:13+09:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project eva-pipeline: There are test failures.
[ERROR]
[ERROR] Please refer to /home/kimoton/eva-pipeline/2.0/eva-pipeline-2.0/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

Please give me some advise.

tools information

$ java -version
openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~18.04.1-b10)
OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)

$ mvn --version
Apache Maven 3.6.0
Maven home: /usr/share/maven
Java version: 1.8.0_222, vendor: Private Build, runtime: /usr/lib/jvm/java-8-openjdk-amd64/jre
Default locale: en, platform encoding: UTF-8
OS name: "linux", version: "4.4.0-18362-microsoft", arch: "amd64", family: "unix"

$ mongo --version
MongoDB shell version v3.6.17
git version: 3d6953c361213c5bfab23e51ab274ce592edafe6
OpenSSL version: OpenSSL 1.0.2n  7 Dec 2017
allocator: tcmalloc
modules: none
build environment:
    distmod: ubuntu1604
    distarch: x86_64
    target_arch: x86_64

Integrate ETL pipeline with auto-mapped variation-commons models

variation-commons and eva-pipeline have different definitions of models and converters between Java classes and MongoDB documents. variation-commons has been recently reworked to facilitate reading from MongoDB.

Make eva-pipeline use the new classes from variation-commons.

Chunk size parameter should be optional

The application fails if the config.chunk.size parameter is not provided. A better approach would be to use a default value like 500, which allows the pipeline to progress without too much contention but at the same time ensures the memory usage is not too high.

Abort when duplicate sample names are found

Letting a VCF with duplicate samples be run through the pipeline causes issues when accessing them through the API. The reason is that we use the sample name and not the column index they occupy in the file as an accessor; it is not possible for it to differentiate between 2 occurrences of the sample same.

In addition to this, the VCF specification doesn't allow samples with duplicate names to be listed in a file.

An example of a valid VCF header line would be:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE1 SAMPLE2 SAMPLE3

And an invalid VCF header line would be:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE1 SAMPLE2 SAMPLE1

The pipeline must abort when an input like the latter is provided, by modifying the class VcfHeaderReader.

Ignore VCF records that report non-variant sites

Since EVA is a variation archive, we shall not load into the database those records from a VCF file where one of the following conditions occurs:

All the sample genotypes are non-variant (./., 0/0, 0|0)
The value of the INFO field AF is zero
The value of the INFO field AN is zero

Try to install EVA-pipeline

I follow installation instructions;
git clone https://github.com/EBIvariation/opencga.git
cd opencga && mvn clean install -DskipTests

error said

[INFO] 11 errors
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] opencga ........................................... SUCCESS [0.149s]
[INFO] opencga-lib ....................................... SUCCESS [3.248s]
[INFO] opencga-storage ................................... SUCCESS [0.004s]
[INFO] opencga-storage-core .............................. FAILURE [1.250s]
[INFO] opencga-catalog ................................... SKIPPED
[INFO] opencga-analysis .................................. SKIPPED
[INFO] opencga-storage-mongodb ........................... SKIPPED
[INFO] opencga-storage-app ............................... SKIPPED
[INFO] opencga-app ....................................... SKIPPED
[INFO] opencga-account ................................... SKIPPED
[INFO] opencga-storage-hbase ............................. SKIPPED
[INFO] opencga-server .................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 5.332s
[INFO] Finished at: Fri Aug 11 07:46:39 UTC 2017
[INFO] Final Memory: 23M/252M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.2:compile (default-compile) on project opencga-storage-core: Compilation failure: Compilation failure:
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/VepVariantAnnotator.java:[7,39] package org.opencb.cellbase.core.client does not exist
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/CellBaseVariantAnnotator.java:[12,32] cannot find symbol
[ERROR] symbol: class CellBaseConfiguration
[ERROR] location: package org.opencb.cellbase.core
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/CellBaseVariantAnnotator.java:[13,39] package org.opencb.cellbase.core.client does not exist
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/CellBaseVariantAnnotator.java:[14,44] cannot find symbol
[ERROR] symbol: class CellbaseConfiguration
[ERROR] location: package org.opencb.cellbase.core.common.core
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/CellBaseVariantAnnotator.java:[44,13] cannot find symbol
[ERROR] symbol: class CellBaseClient
[ERROR] location: class org.opencb.opencga.storage.core.variant.annotation.CellBaseVariantAnnotator
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/CellBaseVariantAnnotator.java:[82,37] cannot find symbol
[ERROR] symbol: class CellBaseClient
[ERROR] location: class org.opencb.opencga.storage.core.variant.annotation.CellBaseVariantAnnotator
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/CellBaseVariantAnnotator.java:[100,13] cannot find symbol
[ERROR] symbol: class CellBaseClient
[ERROR] location: class org.opencb.opencga.storage.core.variant.annotation.CellBaseVariantAnnotator
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/CellBaseVariantAnnotator.java:[103,38] cannot find symbol
[ERROR] symbol: class CellBaseClient
[ERROR] location: class org.opencb.opencga.storage.core.variant.annotation.CellBaseVariantAnnotator
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/CellBaseVariantAnnotator.java:[243,35] package CellBaseClient does not exist
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/CellBaseVariantAnnotator.java:[244,35] package CellBaseClient does not exist
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/CellBaseVariantAnnotator.java:[246,35] package CellBaseClient does not exist
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :opencga-storage-core

What should I do

Thank you ,
Note

ebivariation / eva-pipeline Goto Github PK

eva-pipeline's People

Contributors

Stargazers

Watchers

Forkers

eva-pipeline's Issues

BUILD FAILURE using v2.0 release code

tools information

Integrate ETL pipeline with auto-mapped variation-commons models

Chunk size parameter should be optional

Abort when duplicate sample names are found

Ignore VCF records that report non-variant sites

Try to install EVA-pipeline

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent