Giter Site home page Giter Site logo

dockstore-cgpmap's People

Contributors

gitter-badger avatar gregoryleeman avatar keiranmraine avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dockstore-cgpmap's Issues

Is unaligning BAMs necessary?

With dockstore-cgpmap_3.3.0.sif is it necessary to unalign BAM files before mapping or is this adequately catered for by your pipeline. Empirically it does not seem to be required. Please can you clarify this, and whether this is true for both bwamem and bwamem2 with or without bwakit. Thanks, Marian

cgpmap on singularity

Dear Colleagues

I hope you are all well.

I would like to run 'cgpmap' on a compute cluster using singularity. I would like to use multiple cores to align genomes faster. I find that even if I specify multiple cores using the "--threads" parameter (eg. 16), when bwa is fired, only one core is used. The first few steps of 'cgpmap' spawn multiple processes that take more than 1 CPU collectively, but no single process can take more than 100% on CPUs. Bwa ends up using only one core and runs for a long time. On a standalone machine with Docker, I had no problem running bwa with 16 cores.

Have you encountered similar behavior with singularity before, and is there something I can do to use more cores for mapping?

Best
Dominik

Segmentation faults

Dear Colleagues,

I am recurrently getting errors when mapping ~80x WGS samples, using the singularity image. I wonder if you could advise any checks I could do on the process or parameters.

I am running the image similarly to:

export CGPMAP_VER=3.0.0 singularity pull docker://quay.io/wtsicgp/dockstore-cgpmap:$CGPMAP_VER singularity exec\ --workdir /.../workspace \ --home /.../workspace:/home \ --bind /.../ref/human:/var/spool/ref:ro \ --bind /.../example_data/cgpmap/insilico_21:/var/spool/data:ro \ dockstore-cgpmap-${CGPMAP_VER}.simg \ ds-cgpmap.pl \ -r /var/spool/ref/core_ref_GRCh37d5.tar.gz \ -i /var/spool/ref/bwa_idx_GRCh37d5.tar.gz \ -s SOMENAME \ -t 12 \ /var/spool/data/\*.fastq

The errors I get are similar to this one:
more PCAP_Bwa_bwa_mem.2.err \+ set -o pipefail \+ /opt/wtsi-cgp/bin/bwa mem -v 1 -p -R \"@RG\tID:2\tSM:M00026\" -t 6 /home/refer ence_files/genome.fa /home/tmpMap_M00026/split/2/i.fastq.gz \+ /opt/wtsi-cgp/bin/reheadSQ -d /home/reference_files/genome.fa.dict \+ /opt/wtsi-cgp/biobambam2/bin/bamsort fixmate=1 inputformat=sam level=1 tmpfile =/home/tmpMap_M00026/bamsort.2_tmp O=/home/tmpMap_M00026/sorted/2_i.fastq.gz _sorted.bam outputthreads=5 calmdnm=1 calmdnmrecompindetonly=1 calmdnmreference= /home/reference_files/genome.fa [V] Reading alignments from source. [V] 1M [V] 2M [V] 3M [V] 4M [V] 5M [V] 6M [V] 7M /home/tmpMap_M00026/logs/PCAP_Bwa_bwa_mem.2.sh: line 3: 19166 Broken pipe /opt/wtsi-cgp/bin/bwa mem -v 1 -p -R \"@RG\tID:2\tSM:M00026\" -t 6 /home /reference_files/genome.fa /home/tmpMap_M00026/split/2/i.fastq.gz 19167 | /opt/wtsi-cgp/bin/reheadSQ -d /home/reference _files/genome.fa.dict 19168 Segmentation fault (core dumped) | /opt/wtsi-cgp/biobambam2/bin/ bamsort fixmate=1 inputformat=sam level=1 tmpfile=/home/tmpMap_M00026/bamsort. 2_tmp O=/home/tmpMap_M00026/sorted/2_i.fastq.gz_sorted.bam outputthreads=5 cal mdnm=1 calmdnmrecompindetonly=1 calmdnmreference=/home/reference_files/genome.fa Command exited with non-zero status 139 3594.45user 12.26system 9:55.32elapsed 605%CPU (0avgtext+0avgdata 6201544maxresi dent)k

On smaller fastq files the jobs sometimes complete with and without errors. On larger fastq files they consistently give errors, but after variable about of time.
I also previously successfully aligned a WGS 40x case using this code.

Any advice on how to successfully execute the mapping process is really appreciated. Kind regards

alpine build

Attempt building on alpine for secure and small image.

Alpine may not be possible due to underlying software used but worth a try.

Mate pairs and RNEXT, PNEXT fields in the result bam file

Hello Colleagues,

I realise that the source of my downstream problem is that the bam files aligned with cgpmap always have zeros in RNEXT and PNEXT.
Eg A00125:128:HG5TCDSXX:1:1101:11388:1141 0 X 74361818 60 151M * 0 0 ATAGAATATAATTAACAT...

I am guessing bwa does not see the read pairs from two fast files as pairs. I checked read names in two fast files, and they match. Eg matching lines from the two fast files are:
Fastq1
@a00155:110:HGCVKDSXX:1:1101:16107:7435 1:N:0:ACGCACCT+GGTGAAGG
TTGGA...
Fast2
@a00155:110:HGCVKDSXX:1:1101:16107:7435 2:N:0:ACGCACCT+GGTGAAGG
ATCCA

Could there be something wrong with my fastq files? Alternatively, the way I am running Cgpmap (shown below)?

I appreciate your help. Dominik

export CGPMAP_VER=3.0.0 singularity exec \ --workdir /home/workspace \ --home /home/dglodzik/:/home \ --bind /home/dglodzik/referenceData:/var/spool/ref:ro \ --bind /home/dglodzik/testData:/var/spool/data:ro \ dockstore-cgpmap-${CGPMAP_VER}.simg \ ds-cgpmap.pl \ -r /var/spool/ref/core_ref_GRCh37d5.tar.gz \ -i /var/spool/ref/bwa_idx_GRCh37d5.tar.gz \ -s testSample \ -t 6 \ /var/spool/data/test_*.fastq.gz

samtools error

error while loading shared libraries: libhts.so.2: cannot open shared object file: No such file or directory

Please suggest, how to deal with this error.

Following is the complete error file.

  • /opt/wtsi-cgp/bin/samtools view -F 2816 -T /home/reference_files/genome.fa -u /home/tmpMap_Hu_333/links/Hu_333_normal.bam
  • /opt/wtsi-cgp/biobambam2/bin/bamtofastq exclude=QCFAIL,SECONDARY,SUPPLEMENTARY tryoq=1 gz=1 level=1 outputperreadgroup=1 outputperreadgroupsuffixF=_i.fq outputperreadgroupsuffixF2=_i.fq T=/home/tmpMap_Hu_333/bamtofastq.1 outputdir=/home/tmpMap_Hu_333/split/1 split=2000000000000
    /opt/wtsi-cgp/bin/samtools: error while loading shared libraries: libhts.so.2: cannot open shared object file: No such file or directory
    BgzfInflateBase::readData(): unexpected eof

/opt/wtsi-cgp/biobambam2/bin/../lib/libmaus2.so.2(_ZN8libmaus24util10StackTraceC1Ev+0x4c) [0x7f8224ff9f8c]
/opt/wtsi-cgp/biobambam2/bin/bamtofastq(_ZN8libmaus29exception16LibMausExceptionC1Ev+0x20) [0x41a680]
/opt/wtsi-cgp/biobambam2/bin/bamtofastq() [0x43e2c2]
/opt/wtsi-cgp/biobambam2/bin/bamtofastq() [0x43e3f4]
/opt/wtsi-cgp/biobambam2/bin/../lib/libstdc++.so.6(_ZNSt15basic_streambufIcSt11char_traitsIcEE5uflowEv+0x26) [0x7f822445d416]
/opt/wtsi-cgp/biobambam2/bin/../lib/libstdc++.so.6(_ZNSi3getEv+0x66) [0x7f8224437f76]
/opt/wtsi-cgp/biobambam2/bin/bamtofastq() [0x42fd1f]
/opt/wtsi-cgp/biobambam2/bin/bamtofastq() [0x480721]
/opt/wtsi-cgp/biobambam2/bin/bamtofastq() [0x481124]
/opt/wtsi-cgp/biobambam2/bin/bamtofastq() [0x48772c]
/opt/wtsi-cgp/biobambam2/bin/bamtofastq() [0x48e205]
/opt/wtsi-cgp/biobambam2/bin/bamtofastq() [0x415f73]
/opt/wtsi-cgp/biobambam2/bin/bamtofastq() [0x4169fe]
/opt/wtsi-cgp/biobambam2/bin/bamtofastq() [0x41210a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f82219b4830]
/opt/wtsi-cgp/biobambam2/bin/bamtofastq() [0x412912]

Command exited with non-zero status 1
0.01user 0.10system 0:00.20elapsed 61%CPU (0avgtext+0avgdata 5360maxresident)k
6890inputs+16outputs (42major+2100minor)pagefaults 0swaps

Bulk of continer build will be removed

In the near future the elements specific to PCAP-core will be built in a separate container which this one will add the "helper scripts" to. This is to aid in our internal testing cycle.

Pre-release for fragment+readQC

Create feature/fragementQc to link to pre-release of PCAP-core

  • Link relevant version
  • Fix #6
  • Fix #7
  • Do #10
  • Do #11
  • Build process needs updating to work with new HTSlib + Bio::DB::HTS
  • Create pre-release
  • Trigger Quay build
  • Link pre-release to dockstore
  • Test with standard dataset

Looks like parameter file needs update past 2.0.3

{
  "reference": {
    "path": "/path/to/core_ref_GrCh38.tar.gz",
    "class": "File"
  },
  "bwa_idx": {
    "path": "/path/to/bwa_idx_GrCh38.tar.gz",
    "class": "File"
  },
  "sample": "sim",
  "seq_in": [
    {
      "path": "/path/to/read_1.fastq",
      "class": "File"
    },
    {
      "path": "/path/to/read_2.fastq",
      "class": "File"
    }
  ],
  "out_bam": {
    "path": "/path/to/mapped.bam",
    "class": "File"
  },
  "bwa": " -Y -K 100000000",
  "mmqc": false,
  "mmqcfrac": 0.05
}

See conversation at https://gitter.im/ga4gh/dockstore?at=5d8beae234a7236bf5bef131

no intro on how to use the group meta yaml file and the tool does not validate yaml properly

When input sequencing files are in fastq format, how to use the yaml file is tricky. For example the paired fastq file names have to ended with _1.fq.gz and _2.fg.gz, but such converntions are not mentioned anywhere.

Our code does not validate the file properly either, a few cases below:

  1. If SM tag is missed in the yaml file, cgpmap runs without complains, but will ignore rg id in the yaml file and generate a random one.
  2. Identical fastq file names in the file will not trigger any error/warning.
  3. When a input file is missed in the file, it will not complain.

I think it'll be better if we have a flag option specificly for single-ended fastqs. Cgpmap will assume inputs are paired-ended, and complains if inputs are neither interleaved nor paired, and if it's really a single ended input, user will need to label them specificly. Currently it just went on with its own assumptions silently.

cgpBigWig bugfix

0.4.2 fixes problem in handling contig names including : as found in GRCh38

cram files as inputs

I want to use CRAM files as inputs. Where do I give CGPMAP the reference that the CRAM was mapped to? I cannot find this in your documentation, Iā€™m sure Iā€™m being very inobservant.
thanks,
Marian

Add docs for CPU env variable

Add info for use of CPU to override the actual CPU count when running on a cluster rather than in a docker instance.

Add cores option to top level script

To enable better support for running on traditional compute environments like singularity on LSF.

It is possible now to specify max CPU with the environment variable CPU otherwise all CPU resources are potentially used which is not farm friendly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.