Giter Site home page Giter Site logo

ebivariation / eva-pipeline Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 24.0 7.16 MB

Genomic variation pipeline for the European Variation Archive, implemented using Spring Batch

Home Page: http://www.ebi.ac.uk/eva

License: Apache License 2.0

Java 89.61% Perl 10.39%

eva-pipeline's People

Contributors

andresfsilva avatar apriltuesday avatar cyenyxe avatar jmmut avatar jorizci avatar junaidnz97 avatar nitin-ebi avatar nitrozyna avatar pabarcgar avatar poggio84 avatar sundarvenkata-ebi avatar tcezard avatar tomdcsmith avatar tskir avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eva-pipeline's Issues

BUILD FAILURE using v2.0 release code

Hi. I followed the instructions using v2.0 source code.
When running tests by using mvn test package, got the following results.

Results :

Failed tests:
  CreateDatabaseIndexesStepTest.testIndexesAreCreated:75 expected:<[{ "v" : [1 , "key" : { "_id" : 1} , "name" : "_id_" , "ns" : "ed18730b-debc-479c-b625-1b9893dcfa0c.features"}, { "v" : 1] , "key" : { "name" ...> but was:<[{ "v" : [2 , "key" : { "_id" : 1} , "name" : "_id_" , "ns" : "ed18730b-debc-479c-b625-1b9893dcfa0c.features"}, { "v" : 2] , "key" : { "name" ...>
  StatisticsMongoWriterTest.shouldCreateIndexesInCollection:138 expected:<[{ "v" : 1 , "key" : { "_id" : 1} , "name" : "_id_" , "ns" : "8d03943c-db27-4dea-b3e3-421bf090acbe.populationStatistics"}, { "v" : 1 , "unique" : true , "key" : { "chr" : 1 , "start" : 1 , "ref" : 1 , "alt" : 1 , "sid" : 1 , "cid" : 1} , "name" : "vscid" , "ns" : "8d03943c-db27-4dea-b3e3-421bf090acbe.populationStatistics"}]> but was:<[{ "v" : 2 , "key" : { "_id" : 1} , "name" : "_id_" , "ns" : "8d03943c-db27-4dea-b3e3-421bf090acbe.populationStatistics"}, { "v" : 2 , "unique" : true , "key" : { "chr" : 1 , "start" : 1 , "ref" : 1 , "alt" : 1 , "sid" : 1 , "cid" : 1} , "name" : "vscid" , "ns" : "8d03943c-db27-4dea-b3e3-421bf090acbe.populationStatistics"}]>

Tests run: 475, Failures: 2, Errors: 0, Skipped: 1

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:14 min
[INFO] Finished at: 2020-02-17T17:09:13+09:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project eva-pipeline: There are test failures.
[ERROR]
[ERROR] Please refer to /home/kimoton/eva-pipeline/2.0/eva-pipeline-2.0/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

Please give me some advise.

tools information

$ java -version
openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~18.04.1-b10)
OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)

$ mvn --version
Apache Maven 3.6.0
Maven home: /usr/share/maven
Java version: 1.8.0_222, vendor: Private Build, runtime: /usr/lib/jvm/java-8-openjdk-amd64/jre
Default locale: en, platform encoding: UTF-8
OS name: "linux", version: "4.4.0-18362-microsoft", arch: "amd64", family: "unix"

$ mongo --version
MongoDB shell version v3.6.17
git version: 3d6953c361213c5bfab23e51ab274ce592edafe6
OpenSSL version: OpenSSL 1.0.2n  7 Dec 2017
allocator: tcmalloc
modules: none
build environment:
    distmod: ubuntu1604
    distarch: x86_64
    target_arch: x86_64

Chunk size parameter should be optional

The application fails if the config.chunk.size parameter is not provided. A better approach would be to use a default value like 500, which allows the pipeline to progress without too much contention but at the same time ensures the memory usage is not too high.

Abort when duplicate sample names are found

Letting a VCF with duplicate samples be run through the pipeline causes issues when accessing them through the API. The reason is that we use the sample name and not the column index they occupy in the file as an accessor; it is not possible for it to differentiate between 2 occurrences of the sample same.

In addition to this, the VCF specification doesn't allow samples with duplicate names to be listed in a file.

An example of a valid VCF header line would be:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE1 SAMPLE2 SAMPLE3

And an invalid VCF header line would be:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE1 SAMPLE2 SAMPLE1

The pipeline must abort when an input like the latter is provided, by modifying the class VcfHeaderReader.

Ignore VCF records that report non-variant sites

Since EVA is a variation archive, we shall not load into the database those records from a VCF file where one of the following conditions occurs:

  • All the sample genotypes are non-variant (./., 0/0, 0|0)
  • The value of the INFO field AF is zero
  • The value of the INFO field AN is zero

Try to install EVA-pipeline

I follow installation instructions;
git clone https://github.com/EBIvariation/opencga.git
cd opencga && mvn clean install -DskipTests

error said

[INFO] 11 errors
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] opencga ........................................... SUCCESS [0.149s]
[INFO] opencga-lib ....................................... SUCCESS [3.248s]
[INFO] opencga-storage ................................... SUCCESS [0.004s]
[INFO] opencga-storage-core .............................. FAILURE [1.250s]
[INFO] opencga-catalog ................................... SKIPPED
[INFO] opencga-analysis .................................. SKIPPED
[INFO] opencga-storage-mongodb ........................... SKIPPED
[INFO] opencga-storage-app ............................... SKIPPED
[INFO] opencga-app ....................................... SKIPPED
[INFO] opencga-account ................................... SKIPPED
[INFO] opencga-storage-hbase ............................. SKIPPED
[INFO] opencga-server .................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 5.332s
[INFO] Finished at: Fri Aug 11 07:46:39 UTC 2017
[INFO] Final Memory: 23M/252M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.2:compile (default-compile) on project opencga-storage-core: Compilation failure: Compilation failure:
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/VepVariantAnnotator.java:[7,39] package org.opencb.cellbase.core.client does not exist
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/CellBaseVariantAnnotator.java:[12,32] cannot find symbol
[ERROR] symbol: class CellBaseConfiguration
[ERROR] location: package org.opencb.cellbase.core
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/CellBaseVariantAnnotator.java:[13,39] package org.opencb.cellbase.core.client does not exist
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/CellBaseVariantAnnotator.java:[14,44] cannot find symbol
[ERROR] symbol: class CellbaseConfiguration
[ERROR] location: package org.opencb.cellbase.core.common.core
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/CellBaseVariantAnnotator.java:[44,13] cannot find symbol
[ERROR] symbol: class CellBaseClient
[ERROR] location: class org.opencb.opencga.storage.core.variant.annotation.CellBaseVariantAnnotator
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/CellBaseVariantAnnotator.java:[82,37] cannot find symbol
[ERROR] symbol: class CellBaseClient
[ERROR] location: class org.opencb.opencga.storage.core.variant.annotation.CellBaseVariantAnnotator
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/CellBaseVariantAnnotator.java:[100,13] cannot find symbol
[ERROR] symbol: class CellBaseClient
[ERROR] location: class org.opencb.opencga.storage.core.variant.annotation.CellBaseVariantAnnotator
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/CellBaseVariantAnnotator.java:[103,38] cannot find symbol
[ERROR] symbol: class CellBaseClient
[ERROR] location: class org.opencb.opencga.storage.core.variant.annotation.CellBaseVariantAnnotator
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/CellBaseVariantAnnotator.java:[243,35] package CellBaseClient does not exist
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/CellBaseVariantAnnotator.java:[244,35] package CellBaseClient does not exist
[ERROR] /mnt/speedSeq_data/opencga/opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/variant/annotation/CellBaseVariantAnnotator.java:[246,35] package CellBaseClient does not exist
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :opencga-storage-core

What should I do

Thank you ,
Note

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.