Giter Site home page Giter Site logo

hla-polysolver's Introduction

This is NOT the distribution site for Polysolver software. This is a modification made for pipeline incorperation from v1.0. If you're looking for Polysolver please see:

http://archive.broadinstitute.org/cancer/cga/polysolver

Shukla SA, Rooney MS, Rajasagi M, Tiao G, Dixon PM, Lawrence MS, Stevens J, Lane WJ, Dellagatta JL, Steelman S, Sougnez C, Cibulskis K, Kiezun A, Hacohen N, Brusic V, Wu CJ, Getz G. Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes. Nat Biotechnol. 2015 Nov;33(11):1152-8. PubMed PMID: 26372948; PubMed Central PMCID: PMC4747795.

https://www.ncbi.nlm.nih.gov/pubmed/26372948

โš  Please note that this fork of polysolver does not produce exactly the same HLA haplotype calls as the example file included with polysolver. I observed one difference in the haplotype, but I think it was in one of the more minor subtype of one of the haplotypes.

The most current version of polysolver I know of now is available through docker here https://hub.docker.com/r/sachet/polysolver/tags/ These have been maintained by the first author of Polysolver, and I recommend using this if at all possible.

If you are using this fork hla-polysolver as part of a pipeline be sure to site the above Polysolver software and publication. This is modification under Polysolver's "BSD-style License" with the purpose of maintaing a stable platform for supplying the required files for to work with the LOHHLA pipeline (McGranahan N., et. al. https://doi.org/10.1016/j.cell.2017.10.001) and other pipelines. The mutual requirements include legacy versions of dependencies that are not described in the manuals so this hla-polysolver fork is intended to provide a clear connection to those undocumented dependencies and ease their depolyment on conda and docker environments. This fork was generated from v1.0 Polysolver and is not affiliated with Polysolver or the Broad Institute. Please see LICENSE for the particular requirements and respect the license requirements of the dependencies. And to make adjustments or updated genomes/annotations as necessary. My work on this fork is free to use reuse and modify under the Apache 2.0 license, but please pay attention to the included LICENSE notice for Polysolver and its dependancies there are a lot of boutique licenses to pay attention to including Broad licenses and Commercial Novoalign license.

hla-polysolver

Changes in hla-polsyolver 1.0.0 from polysolver v1.0
  • Added added build recipie for conda
  • Remove absolute path references that break the run
  • Use the install of build.sh to make it so environment variables are set automatically when run within Conda
  • Reduced use of enviornment variables
  • Remove Novoalign index from the source code. Now is pre-built when building the conda environment, and is included along with other necessary data in the conda environment. If you are doing a local run and not using Conda see build.sh to see how to create this data file.
  • Change from hardcoded perl and bash to whichever the user has installed under /usr/bin/env. This is better for non-conda runs but has the added benefit of using Conda's perl when running.
  • Installed the shell scripts for hla typing, mutation calling, and annotation in the Conda enviornment so they are in the PATH of the Conda environment.
  • Cleaning up the command calls and piping allows running the installed scripts from outside of the source directory.
  • Removed hardcoded author paths
  • Hardcoded temporary directory to /tmp. Not a great thing, but should work on most linux, and I plan to fix this soon.
  • Added old picard tools dependency (likely what polysolver referred to as GATK)
  • Updated data to include necessary fastas to complete mutation calling pipeline (part 2 of polysolver)

Running in Conda

hla-polysolver is mainly intended for use by adding it to a conda environment.

You can do this by setting your ~/.condarc file to include

channels:
  - defaults
  - bioconda
  - conda-forge
  - vacation

Then if you can install the usual conda way. I recommend creating a seperate environment for it because of its many ancient dependancies. You probably don't want them in your every day working environment.

$ conda create -n polysolver -c vacation hla-polysolver

Then after the environement is created you can activate it.

$ source activate polysolver

After activating the environment you need to set the perl5 path (thanks smangul1).

(polysolver)$ export PERL5LIB="$CONDA_PREFIX/lib/perl5/5.22.0/"

And then you can run polysolver as described int the testing description. If you get the github repository you will have access to the testing data.

(polysolver)$ git clone https://github.com/jason-weirather/hla-polysolver.git

And you can run the test.

(polysolver)$ shell_call_hla_type hla-polysolver test/test.bam Unknown 1 hg19 STDFQ 0 output
(polysolver)$ shell_call_hla_mutations_from_type hla-polysolver/test/test.bam hla-polysolver/test/test.tumor.bam output/winners.hla.txt hg19 STDFQ output
(polysolver)$ shell_annotate_hla_mutations indiv output

This will produce results in a folder called output.

Building the Conda Enviroment yourself

Requirements to build the linux conda environment that didn't seem to pick up from conda revolved around the strelka caller and maybe its vcftools ... may need to have these in the environment to build

  • zlib1g-dev
  • g++

TABLE OF CONTENTS

  1. Description 1.1 POLYSOLVER 1.2 POLYSOLVER-based mutation detection 1.3 Annotation of mutations
  2. Installation
  3. Testing 3.1 POLYSOLVER 3.2 POLYSOLVER-based mutation detection 3.3 Annotation of mutations
  4. Running 4.1 POLYSOLVER 4.2 POLYSOLVER-based mutation detection 4.3 Annotation of mutations

1. Description

This software package consists of 3 main tools:

1.1 POLYSOLVER (POLYmorphic loci reSOLVER)

This tool can be used for HLA typing based on an input exome BAM file and is currently infers infers alleles for the three major MHC class I (HLA-A, -B, -C).

Script: shell_call_hla_type

Input parameters:

-bam: path to the BAM file to be used for HLA typing
-race: ethnicity of the individual (Caucasian, Black, Asian or Unknown)
-includeFreq: flag indicating whether population-level allele frequencies should be used as priors (0 or 1)
-build: reference genome used in the BAM file (hg18 or hg19)
-format: fastq format (STDFQ, ILMFQ, ILM1.8 or SLXFQ; see Novoalign documentation)
-insertCalc: flag indicating whether empirical insert size distribution should be used in the model (0 or 1)
-outDir: output directory

Output:

winners.hla.txt: file containing the two inferred alleles for each of HLA-A, HLA-B and HLA-C

1.2 POLYSOLVER-based mutation detection

This tool works on a tumor/normal pair of exome BAM files and inferred mutations in the tumor file. It assumes that POLYSOLVER has already been run on the normal BAM.

Script: shell_call_hla_mutations_from_type

Input parameters:

-normal_bam_hla: path to the normal BAM file
-tumor_bam_hla: path to the tumor BAM file
-hla: inferred HLA allele file from POLYSOLVER (winners.hla.txt or winners.hla.nofreq.txt)
-build: reference genome used in the BAM file (hg18 or hg19)
-format: fastq format (STDFQ, ILMFQ, ILM1.8 or SLXFQ; see Novoalign documentation)
-outDir: output directory	  

Output:

call_stats.$allele.out: Mutect output for each inferred allele in winners.hla.txt
$allele.all.somatic.indels.vcf: Strelka output for each inferred allele in winners.hla.txt

1.3 Annotation of mutations

This tool annotates the predicted mutations from (ii) with gene compartment and amino acid change information

Script: shell_annotate_hla_mutations

Input parameters:

-indiv: individual ID, used as prefix for output files
-dir: directory containing the raw call files (Mutect: call_stats*, Strelka: *all.somatic.indels.vcf). Also the output directory	

Output:

(a). Mutect $indiv.mutect.unfiltered.nonsyn.annotated - list of all unfiltered mutations $indiv.mutect.filtered.nonsyn.annotated - list of cleaned non-synonymous mutations $indiv.mutect.filtered.syn.annotated - list of cleaned synonymous changes $indiv.mutect.ambiguous.annotated - list of ambiguous calls. This will generally be empty (save for the header). It will be populated if the same mutation (ex. p.A319E) is found in two or more alleles in the individual, with the same allele fractions. In such cases one allele is randomly chosen and included in the .nonysn.annotated file while the complete list of alleles is listed in the .ambiguous.annotated file. If the ethnicity of the individual is known, an alternate method would be to pick the allele with the highest frequency.

(b). Strelka $indiv.mutect.unfiltered.nonsyn.annotated - list of all unfiltered indels (as detected by Strelka) $indiv.strelka_indels.filtered.annotated - list of cleaned indels (as detected by Strelka) $indiv.strelka_indels.ambiguous.annotated - see description of $indiv.mutect.ambiguous.annotated in (a). above

2. Installation

The POLYSOLVER suite of tools depends upon the following packages and utilities:

Samtools (http://samtools.sourceforge.net/) GATK (https://www.broadinstitute.org/gatk/download) Novoalign (http://www.novocraft.com/main/downloadpage.php) Perl modules ((http://www.cpan.org/modules/INSTALL.html)

Also make changes to the config.sh file to set up the following environmental variables

-PSHOME: POLYSOLVER home directory -SAMTOOLS_DIR: directory containing the samtools executable -JAVA_DIR: directory containing the JAVA executable -NOVOALIGN_DIR: directory containing the Novoalign executables -GATK_DIR: directory containing the GATK jar files -MUTECT_DIR: directory containing the Mutect executable (for POLYSOLVER-based mutation detection only) -STRELKA_DIR: directory containing the Strelka (for POLYSOLVER-based mutation detection only)

The following command should make the necessary changes prior to running the tools (assuming the tcsh shell):

source scripts/config.sh

3. Testing

Your installation can be tested by running the following command from $PSHOME:

3.1 POLYSOLVER

scripts/shell_call_hla_type test/test.bam Unknown 1 hg19 STDFQ 0 test

If successful, the following command should not yield any differences:

diff test/winners.hla.txt test/orig.winners.hla.txt

3.2 POLYSOLVER-based mutation detection

scripts/shell_call_hla_mutations_from_type test/test.bam test/test.tumor.bam test/winners.hla.txt hg19 STDFQ test

If successful, the following command should not yield any differences:

diff test/call_stats.hla_b_39_01_01_02l.out test/orig.call_stats.hla_b_39_01_01_02l.out

3.3 Annotation of mutations

scripts/shell_annotate_hla_mutations indiv test

If successful, the following command should not yield any differences:

diff test/indiv.mutect.filtered.nonsyn.annotated test/orig.indiv.mutect.filtered.nonsyn.annotated

4. Running

The tools can be run using the following commands:

4.1 POLYSOLVER

$PSHOME/scripts/shell_call_hla_type </path/to/bam> </path/to/output_directory>

example:

$PSHOME/scripts/shell_call_hla_type test/test.bam Unknown 1 hg19 STDFQ 0 test

4.2 POLYSOLVER-based mutation detection

$PSHOME/scripts/shell_call_hla_mutations_from_type </path/to/normal_bam> </path/to/tumor_bam> </path/to/winners.hla.txt> </path/to/output_directory>

example:

$PSHOME/scripts/shell_call_hla_mutations_from_type test/test.bam test/test.tumor.bam test/winners.hla.txt hg19 STDFQ test

4.3 Annotation of mutations

$PSHOME/scripts/shell_annotate_hla_mutations <prefix_to_use> </path/to/directory_with_mutation_detection_output>

example:

$PSHOME/scripts/shell_annotate_hla_mutations indiv test

hla-polysolver's People

Contributors

jason-weirather avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

hla-polysolver's Issues

Getting wrong winner for test bams

Hi! Thank you for the hla-polysolver program! When I was running the test bam files, here's what I got: winners1 hla_a_01_01_01_01 hla_b_07_02_01 hla_c_01_02_01 for both alleles. I followed the exact same steps in your instruction. Could you give a suggestion of what might went wrong?

Getting same winner HLAs everytime

I get the same alleles hla a_01_01_01, hla b_07_02_01 and hla c_01_02_01 when I run WES bam files from different individuals. When I run the test.bam, only hla c second winner did not match with the original winners. Any help would be greatly appreciated. Thanks

HLAtyping))chr6region.1.fastq with 0 size

I runned "shell_call_hla_type" for HLA-typing. There were not error in log, and the file named "winners.hla.txt" created well.

winners.hla.txt : HLA-A hla_a_24_02_01_01 hla_a_33_03_01 HLA-B hla_b_35_01_01_01 hla_b_44_03_01 HLA-C hla_c_03_03_01 hla_c_14_03

However, because chr6region.1.fastq and chr6region.2.fastq file were created with no contents, I have doubts about my result.
Is it okay that these files are created with no contents?

Thank you for developing this tool.

strelka is not well a tested

It doesn't look like strelka is covered in the test cases included with polysolver. we don't have any current errors in it, but it was not a straitforward build since vcftools never seemed to get where it was supposed to go on its own, and we are running off a bioconda perl vcf tools.

HLA fasta file for hg38

Hello,

I am interested in running polysolver for data aligned to hg38. I just wonder if the abc_complete.fasta is applicable for hg38? If not, could you please advise how to extract the equivalent HLA reference file from hg38.

Thank you!

Error when running

Hi Jason,

Thanks for doing a great job making a USABLE version of the software.

I followed your instruction. Unfortunately, i am still getting an error

"perl: symbol lookup error: /PHShome/sv188/perl5/lib/perl5/x86_64-linux-thread-multi/auto/List/MoreUtils/XS/XS.so: undefined symbol: Perl_Istack_sp_ptr"

"perl: symbol lookup error: /PHShome/sv188/perl5/lib/perl5/x86_64-linux-thread-multi/auto/List/Util/Util.so: undefined symbol: Perl_Istack_sp_ptr
"

I am not familiar with perl. So i was unable to figure out the error. I was just wondering if you know what can cause the problem.

Thanks,
Serghei

chr region to bam file

Hi. I'm trying to run polysolver on bam files already aligned by someone else.

I'm getting the following error:
[bam_parse_region] fail to determine the sequence name.
[main_samview] region "6:29941260-29945884" specifies an unknown reference name. Continue anyway.
[bam_parse_region] fail to determine the sequence name.
[main_samview] region "6:31353872-31357187" specifies an unknown reference name. Continue anyway.
[bam_parse_region] fail to determine the sequence name.
[main_samview] region "6:31268749-31272105" specifies an unknown reference name. Continue anyway.
[samopen] SAM header is present: 194 sequences.
[sam_read1] reference 'SN:chrUn_GL000218v1 LN:161147

Which I'm guessing is arising from the bam files having the "chr" prefix and the region being extracted not. Is there a setting to deal with this? I know it was aligned to hg38 for sure.

Failure in test for difference between orig.winners.hla.txt and winners.hla.txt

Hi,
I want to use polysolver for HLA-typing, thank you very much for developing the conda version of polysolver! (I am not root user so cannot install it using docker as suggested by Broad Institute). I install the polysolver following your guide, all steps seems working well. But when I test on test.bam, there is difference between my output and the standard output.
(polysolver) [shiyang@statcomp hla-polysolver-master]$ diff output/winners.hla.txt test/orig.winners.hla.txt
2,3c2,3
< HLA-B hla_b_39_01_01_02l hla_b_39_01_01_03
< HLA-C hla_c_07_01_05 hla_c_07_01_05
---
> HLA-B hla_b_39_01_01_02l hla_b_39_01_01_02l
> HLA-C hla_c_07_01_05 hla_c_06_02_01_01
And below is a section of my standard output that seems wrong when I run test script:
[samopen] SAM header is present: 6597 sequences.
Tue Jul 10 22:05:17 CST 2018
get first winners
Tue Jul 10 22:05:17 CST 2018
rm: cannot remove 'output/counts1.R0k6': No such file or directory
Tue Jul 10 22:05:26 CST 2018
calculating lik2
winners1 hla_a_24_02_01_01 hla_b_39_01_01_02l hla_c_07_01_05
Tue Jul 10 22:05:28 CST 2018
get second winners
rm: cannot remove 'output/counts2.R0k6': No such file or directory
winners1 hla_a_24_02_01_01 hla_b_39_01_01_02l hla_c_07_01_05
winners2 hla_a_24_02_01_01 hla_b_39_01_01_03 hla_c_07_01_05
cleanup
Do you know which cause this discrepancy? Could you be kind to give me some advice?
Thanks!

Yang

Call Stats file not produced

Hi, I'm trying to run the test files given as part of the repo. The HLA alleles get typed right but after that when I run shell_call_hla_mutations_from_type it does not return the call_stats file as expected. Any help to workaround this issue would be appreciated.

HLA typing testing discordance

Hi Jason,

First of all, thank you for creating this great conda environment.

I followed an instruction and tried to do testing using test.bam included in the polysolver package.
However, I obtained different result as following.

winners1 hla_a_24_02_01_01 hla_b_39_01_01_02l hla_c_07_01_05
winners2 hla_a_24_02_01_01 hla_b_39_01_01_03 hla_c_06_02_08

HLA-A alleles are okay, but one of the HLA-B and HLA-C alleles does not match the ones from "orig.winners.hla.txt".

Have you tested this testing functionality and did your result match?

I appreciate your help!

Won-Chul

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.