Giter Site home page Giter Site logo

icgc-tcga-pancancer / pcap-core Goto Github PK

View Code? Open in Web Editor NEW
13.0 8.0 14.0 15 MB

Legacy, see cancerit/PCAP-core: NGS reference implementations and helper code for the IGCG/TCGA Pan-Cancer Analysis Project

License: GNU General Public License v2.0

Perl 72.34% Makefile 0.93% C 22.82% Shell 3.03% Visual Basic 0.88%
legacy-support icgc pcap perl

pcap-core's Introduction

ICGC-TCGA-PCAP

NGS reference implementations and helper code for the ICGC/TCGA Pan-Cancer Analysis Project.

Master Dev
Build Status Build Status

This repository contains code to run genomic alignments of paired end data and subsequent calling algorithms.

The intention is to provide reference implementations and simple to execute wrappers that are useful for the scientific community who may have little IT/bioinformatic support.

Please see the wiki for further details.


###Dependencies/Install

Please install the following before running setup.sh:

  • cgpBigWig
  • Addtional OS packages required by kentsrc, in Ubuntu naming
    • unzip
    • libpng12-dev (for libpng-config)

Dependancies installed by setup.sh:

And various perl modules.

Please see the respective licence for each before use.

Please be aware that this expects basic C compilation libraries and tools to be available, most are listed in INSTALL.


###Programs

Please see the wiki for details of programs.


##Creating a release ####Preparation

  • Commit/push all relevant changes.
  • Pull a clean version of the repo and use this for the following steps.

####Cutting the release

  1. Update lib/PCAP.pm to the correct version.
  2. Ensure upgrade path for new version number is added to lib/PCAP.pm.
  3. Update CHANGES.md to show major items.
  4. Run ./prerelease.sh
  5. Check all tests and coverage reports are acceptable.
  6. Commit the updated docs tree and updated module/version.
  7. Push commits.
  8. Use the GitHub tools to draft a release.

pcap-core's People

Contributors

briandoconnor avatar jenniferliddle avatar jwhsanger avatar keiranmraine avatar mcast avatar tonydebat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pcap-core's Issues

PCAP-core tests

The following errors were encountered while running "make test"

t/3_external_progs.t .. 
...
    not ok 1 - Expect version 0.0.189 for bamcollate2
    not ok 2 - Expect version 0.0.189 for bammarkduplicates
    not ok 3 - Expect version 0.0.189 for bamsort

I think this is simply because LD_LIBRARY_PATH is not updated and libmaus cannot be resolved.

The second is more complicated.

t/2_pl_compile.t ...... 
ok 1 - Compilation check: /opt/gridware/apps/cancerit/src/PCAP-core-1.6.1/t/../bin/diff_bams.pl
ok 2 - Compilation check: /opt/gridware/apps/cancerit/src/PCAP-core-1.6.1/t/../bin/bamToBw.pl
ok 3 - Compilation check: /opt/gridware/apps/cancerit/src/PCAP-core-1.6.1/t/../bin/bwa_aln.pl
not ok 4 - Compilation check: /opt/gridware/apps/cancerit/src/PCAP-core-1.6.1/t/../bin/xml_to_bas.pl

#   Failed test 'Compilation check: /opt/gridware/apps/cancerit/src/PCAP-core-1.6.1/t/../bin/xml_to_bas.pl'
#   at t/2_pl_compile.t line 40.
ok 5 - Compilation check: /opt/gridware/apps/cancerit/src/PCAP-core-1.6.1/t/../bin/bwa_mem.pl
not ok 6 - Compilation check: /opt/gridware/apps/cancerit/src/PCAP-core-1.6.1/t/../bin/gnos_pull.pl

#   Failed test 'Compilation check: /opt/gridware/apps/cancerit/src/PCAP-core-1.6.1/t/../bin/gnos_pull.pl'
#   at t/2_pl_compile.t line 40.
ok 7 - Compilation check: /opt/gridware/apps/cancerit/src/PCAP-core-1.6.1/t/../bin/bam_to_sra_sub.pl
ok 8 - Compilation check: /opt/gridware/apps/cancerit/src/PCAP-core-1.6.1/t/../bin/monitor.pl
ok 9 - Compilation check: /opt/gridware/apps/cancerit/src/PCAP-core-1.6.1/t/../bin/bam_stats.pl
1..9
# Looks like you failed 2 tests of 9.
Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/9 subtests 

If I understand correctly what these tests do is simply to run "perl -c <perl_script>"
The two files with errors cannot run because they cannot get PCAP.pm. Adding

BEGIN {
  use Cwd qw(abs_path);
  use File::Basename;
  unshift (@INC,dirname(abs_path($0)).'/../lib');
};

which other files have fixes this problem but "xml_to_bas.pl" still has a compilation problem:
Type of arg 1 to keys must be hash (not hash element) at (around line 85)

  my @columns = sort keys $first_record->{'metrics'};

Can you please advise if this is important or can be ignored?

Use of absolute path prevents name fixed via symlinks

There is a lot of data in CGhub with bad naming conventions. Locally it is sensible to correct these via symlinks rather than messing with the original data files... unfortunately bwa_mem.pl currently forces absoulute paths.

This should be changed to append the executing directory to the start of paths not beginning /.

fastq to bam file conversion issue

I have tried to convert whole Genome fastq file to unaligned bam file using
PCAP-CORE installed in Centos-6.4
But I got the following error

[user1@cs-12 1532SD]$ fastqtobam I=C278CACXX_lane5.1532SD_1.fastq I=C278CACXX_lane5.1532SD_2.fastq RGID=SYNTEKA:1532SD_5 RGCN=SYNTEKA RGDT=2014-04-19T21:27:00.+00:00 RGLB=WGS:SYNTEKA:2 RGPI=393 RGPL=ILLUMINA RGPM=Illumina HiSeq 2000 RGPU=SYNTEKA:1532SD_5 RGSM=5b902da2-1588-4b3c-b0d5-dd22e68cb15e > SNU_WGS_01_G0_lane5.bam
[D] warning, ignoring additional input files past first two
name HWI-ST1218:287:C278CACXX:5:1101:1898:1960 1:N:0:CGATGT does not look like a first mate read name

/usr/local/lib/libmaus.so.0(_ZN7libmaus4util10StackTraceC1Ev+0x4c) [0x7fbe0c9bcfbc:??:0]
fastqtobam(_ZN7libmaus9exception16LibMausExceptionC1Ev+0x2e) [0x41938e:]
fastqtobam(_Z14fastqtobamPairIN7libmaus6bambam9BamWriterEEvRSiS3_RT_iRKSs+0x9c4) [0x42fe34:]
fastqtobam(_Z10fastqtobamRKN7libmaus4util7ArgInfoE+0x16f4) [0x412744:]
fastqtobam(main+0xc1e) [0x41389e:]
/lib64/libc.so.6(__libc_start_main+0xfd) [0x3ea601ecdd:??:0]
fastqtobam() [0x410a59:]

Does this means that our sample has a problem or Can there be any other reason for
this error?

Any kind of help would be appreciated

Regards
Jonghui

Module::Build version

Minor thing, on our base VMs the Module::Build version is too old, so the install fails. As a quick fix, just added --reinstall to the cpanm calls in setup.sh, not sure if there's a better way?

For reference, here's the error:

Building BioPerl-1.006923 ... Module::Build version 0.42 required--this is only version 0.38 at ./Build line 43.
FAIL
! Installing Bio::Perl failed. See /root/.cpanm/build.log for details.
! Bailing out the installation for PCAP-0.2.0. Retry with --prompt or --force.

When bam_to_sra_sub.pl is re-run with the same output directory, it fails

Re-run bam_to_sra_sub.pl second time with the same output directory, it fails due to file permission issue. The reason is that the generated shell script is read-only:

-r-x------ 1 ubuntu ubuntu 1576 Apr  4 03:47 auto_upload.sh

Also, previously generated GNOS analysis objects are still there. Although they do not present any problem if user chooses to use auto_upload.sh script to submit, they may cause confusion if user do submission manually.

Would it be fine to clear up the output directory first? Maybe user gets warned if the output directory is not empty before proceeding?

Install isssue in Centos-6.4

Because of help from people in here

finally I could install PCAP-Core in Centos-6.4

But at the final stage of installation " Building PCAP"

It fails. When I see the log file, It shows like this

  PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/1_pm_compile.t ...... ok
t/2_pl_compile.t ...... ok

    #   Failed test 'Expect version 0.0.129 for bamcollate2'
    #   at t/3_external_progs.t line 51.

    #   Failed test 'Expect version 0.0.129 for bammarkduplicates'
    #   at t/3_external_progs.t line 51.

    #   Failed test 'Expect version 0.0.129 for bamsort'
    #   at t/3_external_progs.t line 51.
    # Looks like you failed 3 tests of 4.

#   Failed test 'External programs have expected version'
#   at t/3_external_progs.t line 53.
# Looks like you failed 1 test of 2.
t/3_external_progs.t ..
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/2 subtests
t/pcap.t .............. ok
t/pcapBam.t ........... ok
t/pcapBamStats.t ...... ok
t/pcapBwa.t ........... ok
t/pcapBwaMeta.t ....... ok
t/pcapCli.t ........... ok
t/pcapSra.t ........... ok
t/pcapThreaded.t ...... ok

Test Summary Report
-------------------
t/3_external_progs.t (Wstat: 256 Tests: 2 Failed: 1)
  Failed test:  2
  Non-zero exit status: 1
Files=11, Tests=51,  5 wallclock secs ( 0.10 usr  0.03 sys +  4.60 cusr  0.44 csys =  5.17 CPU)
Result: FAIL
Failed 1/11 test programs. 1/51 subtests failed.
make: *** [test_dynamic] Error 255

It seens that there is a problem in 'make test' of Building PCAP.

Can it be a problem in further use of PCAP-Core?

And I tried to install PCAP-core in ubunu also

It goes well

When I compare the contents of bin folder of installation of ubuntu and
installation of centos

I found following binary and perl script are missing in centos installation

config_data
instmodsh
prove
pwhich
stag-autoschema.pl
stag-db.pl
stag-diff.pl
stag-drawtree.pl
stag-filter.pl
stag-findsubtree.pl
stag-flatten.pl
stag-grep.pl
stag-handle.pl
stag-itext2simple.pl
stag-itext2sxpr.pl
stag-itext2xml.pl
stag-join.pl
stag-merge.pl
stag-mogrify.pl
stag-parse.pl
stag-query.pl
stag-splitter.pl
stag-view.pl
stag-xml2itext.pl

what could be ther problem?

Regards
Jonghui

Parse additional info from @CO

Additional information about sample/donor to be encoded in @co headers to populate information that may not be available from DCC. This is to be added as key/value pairs in analysis.xml:

<ANALYSIS_SET>
  <ANALYSIS>
    <ANALYSIS_ATTRIBUTES>
      <ANALYSIS_ATTRIBUTE>
        <TAG>KEY</TAG>
        <VALUE>VALUE</VALUE>
        <UNITS>[what defines the value e.g. https://tcga-data.nci.nih.gov/datareports/codeTablesReport.htm Disease study]</UNITS>
      </ANALYSIS_ATTRIBUTE>
      ...
    </ANALYSIS_ATTRIBUTES>
  <ANALYSIS>
<ANALYSIS_SET>

We've identified these to be the following existing fields:

legacy_sample_id
    Check if EGA identifier exists otherwise not defined
participant_id 
    DCC Donor, or UUID if unknown
sample_id
    DCC sample or UUID if unknown (should match the UUID in SAMPLE_DESCRIPTOR)
disease_abbr
    see 'Disease study: Study Abbreviation': https://tcga-data.nci.nih.gov/datareports/codeTablesReport.htm
tss_id
    see 'Tumour Source Site: TSS Code': https://tcga-data.nci.nih.gov/datareports/codeTablesReport.htm
analyte_code
    see 'Portion Analyte: Code': https://tcga-data.nci.nih.gov/datareports/codeTablesReport.htm
sample_type
    see 'Sample Type: Code': https://tcga-data.nci.nih.gov/datareports/codeTablesReport.htm

And the following additional fields to make the data easier to work with for individual sites:

submitter_participant_id
    Site identifier, ours will match what we've previously sent to DCC, e.g. CGP_donor_1199131
use_cntl
    uuid or DCC identifier for 'normal data' that this 'tumour data is to be compared against'
    Only defined if DISEASE_ABBR != CNTL

Mapped sequence stat

Current implementation uses mapped pos of start end, couple of issues.

  1. Needs +1 to get correct length
  2. Need to deduct del and ref skip operations from length
  3. Need to add insert to length

2/3 may be better served by just interrogating cigar

CV lookup files don't work once installed

Code was written to allow it to function from an uninstalled version. An assumption about paths that would be present was poor resulting in cv_tables not being available when run from the installed version of the script.

ega_sample_accession error

Hi - I'm trying to use bam_to_sra_sub.pl (dev branch) on BAM files, which as far as I can tell follow the SOP, and I get the error:

ERROR: Previously parsed data for 25c76a8f-77c0-4650-bddf-45ed0c10a2e6 has entry for ega_sample_accession, bam[.info] '2893595918.cleaned.bam' does not

Ideas?

PCAP::BAM->sample_name should die if no sample name found

The method doesn't check if it manages to find a sample name:

PCAP-core/lib/PCAP/Bam.pm

Lines 161 to 172 in 902c4dc

sub sample_name {
my $bam = shift;
my $sam = sam_ob($bam);
my $header = $sam->header->text;
my $sample;
while($header =~ m/\tSM:([^\t\n]+)/xmsg) {
my $new_sample = $1;
die "BAM file appears to contain data for multiple samples, not supported: \n\n$header\n" if(defined $sample && $sample ne $new_sample);
$sample = $new_sample;
}
return ($sample, $sam); # also return the SAM object
}

Modify to by default issue a warning but add a param to allow it to force fail, e.g.

my ($bam, $die_no_sample) = @_;
...
die "Failed to find samplename in RG headers:\n\n$header\n" if(defined $die_no_sample && $die_no_sample != 0);

New dependency - GD Graphics Library

In installing the beta version had to: apt-get install libgd2-xpm-dev

The error was:

**UNRECOVERABLE ERROR**
Could not find gdlib-config in the search path. Please install libgd 2.0.28 or higher.
If you want to try to compile anyway, please rerun this script with the option --ignore_missing_gd.
N/A
! Configure failed for GD-2.53. See /root/.cpanm/build.log for details.
! Bailing out the installation for PCAP-0.2.99. Retry with --prompt or --force.

How do I check if PCAP-core is installed?

Hi,

I used the following commands to install PCAP-core:

$ git clone https://github.com/ICGC-TCGA-PanCancer/PCAP-core
$ ./setup.sh /path/to/PCAP-core/1.10.0

I then added the bin to the PATH variable:
export PATH=/path/to/PCAP-core/1.10.0/bin:$PATH

and added the following to the PERL5LIB variable:
export PERL5LIB=/path/to/PCAP-core/1.10.0/lib/perl5:$PERL5LIB
export PERL5LIB=/path/to/PCAP-core/1.10.0/lib/perl5/x86_64-linux-thread-multi:$PERL5LIB

Now, when I am trying to install cgpBattenberg, it is giving me an error:
$ ./setup.sh /path/to/cgpBattenberg/1.4.0
PREREQUISITE: Please install PCAP-core before proceeding:
https://github.com/ICGC-TCGA-PanCancer/PCAP-core/releases

I checked the setup.sh file and ran line 80 in my terminal:
perl -le 'eval "require $ARGV[0]" and print $ARGV[0]->VERSION' PCAP
it returns the correct version number i.e. 1.10.0

But at line 54, it is unsetting the PERL5LIB and when I try to replicate the problem by unsetting PERL5LIB and re-run the above command, it doesn't return anything.

snappy needs to be included in distribution

Snappy no longer has a direct, sensibly accessible download (only a Google drive). They claim migration to GitHub but the current main release has not been tagged or released.

Will have to manually include in distro until this is resolved.

MACHTYPE in setup.sh

The following condition (line 121)

if [[ $MACHTYPE == x86_64 ]] ; then

should to be replaced by

if [[ `uname -m` == x86_64 ]] ; then

On the machines I tested MACHTYPE is always longer e.g. "x86_64-suse-linux" or "x86_64-redhat-linux-gnu". Therefore the condition is false and the script just stops.

bash script generated by bam_to_sra_xml.pl

If the permission file is incorrect the content of the cgsubmit log file is more than a single word of text. This causes an error when you try to resume following correction of the permissions file in use.

installation problem on Red Hat Enterprise Linux Server release 6.3

Hi,
I installed previously v1.0.0 without any problems.
I tried to upgrade to 1.0.1 (due to fastqtobam issue) but now I get the following error when I try to install:

PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/.t

Failed test 'use PCAP::Bam::Bas;'

at t/1_pm_compile.t line 35.

Tried to use 'PCAP::Bam::Bas'.

Error: Type of arg 1 to keys must be hash (not hash element) at /home/gmsioan/PCAP-core-1.0.2/blib/lib/PCAP/Bam/Bas.pm line 84, near "})"

Compilation failed in require at t/1_pm_compile.t line 35.

BEGIN failed--compilation aborted at t/1_pm_compile.t line 35.

Bailout called. Further testing stopped: Unable to 'use' module PCAP::Bam::Bas

Tests were run but no plan was declared and done_testing() was not seen.

Looks like your test exited with 255 just after 7.

FAILED--Further testing stopped: Unable to 'use' module PCAP::Bam::Bas
make: *** [test_dynamic] Error 255

Any help on this will be appreciated. Thank you!

xml_to_bas.pl - detect readgroup id clashes and attempt to reconcile

If ID field of RG header is not unique within the sample prior to merging it is modified during the merge step. In most cases in PanCancer this can be resolved by correlating on PU tag instead.

Example of bad XML:

https://gtrepo-ebi.annailabs.com/cghub/metadata/analysisFull/0d8605fc-6510-4b3d-91e9-11c7c771d2f3

bwa_mem.pl - fails when source BAM has RGIDs containing '

In some cases CRAM files are generated where RGIDs are compressed to the smallest possible value (only stored in CRAM header so minimal saving there). If these files are per-lane and the converted to BAM before merging you can end up with RGIDs being suffixed with ' to prevent clash when other data indicates they are not from the same experiment.

This results in problems when splitting merged BAMs back to lanes for re-mapping process.

bwa_mem.pl - paired fastq input

There seems to be a problem when using paired fastq as input, it looks like it may be to do with executing with more than the required threads for mapping. It seems that fi 12 threads are passed when only 1 mapping step is required a second mapping step is triggered and the inputs are messed up resulting in a very odd BAM file.

Install issue in Centos 6.4

Hi my name is Jong hui Hong

When I tried to install recent version of PCAP-CORE ( maybe 3.0 version )

I face the installation problem

It stops at building libmaus

[root@cs-11 PCAP-core-master]# ./setup.sh ../tools
Max compilation CPUs set to 6
Installing build prerequisite File::ShareDir... done.
Installing build prerequisite File::ShareDir::Install... done.
Installing build prerequisite Const::Fast... done.
Building BWA ... previously installed ... done.
Building snappy ... previously installed ... done.
Building io_lib ... previously installed ... done.
Building libmaus ...[root@cs-11 PCAP-core-master]#

So when I see the setup.log

g++ -DHAVE_CONFIG_H -I. -I.. -O3 -rdynamic -std=gnu++0x -pthread -fopenmp -MT testthreadpool.o -MD -MP -MF .deps/testthreadpool.Tpo -c -o testthreadpool.o test -f 'test/testthreadpool.cpp' || echo './'test/testthreadpool.cpp
test/testthreadpool.cpp: In constructor ?ibmaus::parallel::DummyThreadWorkPackage::DummyThreadWorkPackage()?
test/testthreadpool.cpp:41: error: no matching function for call to ?td::shared_ptrlibmaus::parallel::PosixMutex::shared_ptr(int)?
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/shared_ptr.h:1263: note: candidates are: std::shared_ptr<_Tp>::shared_ptr(std::shared_ptr<_Tp>&&) [with _Tp = libmaus::parallel::PosixMutex]
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/shared_ptr.h:1238: note: std::shared_ptr<_Tp>::shared_ptr() [with _Tp = libmaus::parallel::PosixMutex]
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/shared_ptr.h:1236: note: std::shared_ptrlibmaus::parallel::PosixMutex::shared_ptr(const std::shared_ptrlibmaus::parallel::PosixMutex&)
make[2]: *** [testthreadpool.o] Error 1
make[2]: *** Waiting for unfinished jobs....
mv -f .deps/testbammergecoordinate.Tpo .deps/testbammergecoordinate.Po
mv -f .deps/testvalidatebamindex.Tpo .deps/testvalidatebamindex.Po
mv -f .deps/testbammergequeryname.Tpo .deps/testbammergequeryname.Po
mv -f .deps/testbammergecollate-testbammergecollate.Tpo .deps/testbammergecollate-testbammergecollate.Po
mv -f .deps/testbamcat.Tpo .deps/testbamcat.Po
make[2]: Leaving directory /usr/local/pcap1/PCAP-core-master/install_tmp/libmaus/src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory/usr/local/pcap1/PCAP-core-master/install_tmp/libmaus'
make: *** [all] Error 2

Any idea to solve the problem would be welcomed

thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.