ikalatskaya / isown Goto Github PK

View Code? Open in Web Editor NEW

42.0 42.0 15.0 11.72 MB

License: Apache License 2.0

Perl 100.00%

isown's People

Contributors

Stargazers

Watchers

Forkers

zlskidmore gastonlab hrk2109 zzygyx9119 nmola scchess layalyasin songtr merckey flywind2 herokoking z8jiang chengzhongshan zjmanmu

isown's Issues

Error about dbsnp

I have 13 lung cancer data, some of which are not normal. I hope you can use this software to find somatic mutations.
I ran this program when the following error occurred, I did not understand the readme instructions need to re-download dbsno142 or use your dbsnp download link,

perl ${ISOWN_HOME}/bin/database_annotation.pl PM2018122708.sofstric.watson.vcf PM2018122708.sofstric.watson.vcf.test.vcf

annotating input file with ANNOVAR ...NOTICE: Output files were written to PM2018122708.sofstric.watson.vcf.test.vcf.temp.annovar.vcf.temp.convert2annovar.variant_function, PM2018122708.sofstric.watson.vcf.test.vcf.temp.annovar.vcf.temp.convert2annovar.exonic_variant_function
NOTICE: Reading gene annotation from /gpfs/home//software/ISOWN/bin/../external_tools/annovar_2012-03-08/humandb/hg19_refGene.txt ... Done with 52068 transcripts (including 11837 without coding sequence annotation) for 26464 unique genes
NOTICE: Processing next batch with 2995 unique variants in 2995 input lines
NOTICE: Reading FASTA sequences from /gpfs/home/zhaohongqiang/software/ISOWN/bin/../external_tools/annovar_2012-03-08/humandb/hg19_refGeneMrna.fa ... Done with 1579 sequences
WARNING: A total of 356 sequences will be ignored due to lack of correct ORF annotation

The dbSNP 142 file is not found. Please correct the path in /gpfs/home/software/ISOWN//bin/database_annotation.pl and try again - see path below:

    /gpfs/home/software/ISOWN//bin/../external_databases/dbSNP142_All_20141124.vcf.gz.modified.vcf.gz

core dumped

when I use ISOWN to analyais tumor only somatic data, it was interrupted when step1 was performed. Check the cause，I found it breaks when analysis qpipeline tabix .

cmmond:/software/ISOWN/ISOWN/bin/qpipeline tabix -m 2020 -d /software/ISOWN/ISOWN/bin/../external_databases/dbSNP142_All_20141124.vcf.gz.modified.vcf.gz -A -E -p dbSNP142_All_20141124 -i HCC1143_anno.vcf.temp.annovar.vcf -f /database/hg19/ucsc.hg19.fasta > HCC1143_anno.vcf.temp.dbSNP.vcf
error: Segmentation fault (core dumped)

How can I solve this error?

The dbSNP 142 file is not found.

Dear sir,
Thanks for your good job!
I just finish instructing the pipeline. But when I run ISOWN with the test data you provide for us, there is an error.
`annotating input file with ANNOVAR ...

The dbSNP 142 file is not found. Please correct the path in /software/pipeline/somatic_call/ISOWN/ISOWN/bin/database_annotation.pl and try again - see path below:

/software/pipeline/somatic_call/ISOWN/ISOWN/bin/../external_databases/dbSNP142_All_20141124.vcf.gz.modified.vcf.gz`

I am sure I have done everything followed your course, and I have tried to download dbSNP and re-format it again.
But I still couldn't find dbSNP142_All_20141124.vcf.gz.modified.vcf.gz

Any help is grateful.

Best wishes,
Shang

some problems

when i run programming,error occur. could you tell me why?

qpipeline tabix -m 2021 -d /mnt/sdb/ISOWN1/ISOWN/external_databases/COSMIC_v69.vcf.gz -A -E -p COSMIC_v69 -i test.vcf.temp.annovar.vcf -f /mnt/sdb/ISOWN1/ISOWN/external_databases/hg19_random/genome.fa > db.vcf
Segmentation fault

Mutation assessor link is dead

Hello,

I am trying to setup all the dependencies. The link for downloading the mutation assessor scores is not working. I downloaded the rel3 version http://mutationassessor.org/r3/MA_scores_rel3_hg19_full.tar.gz but the perl script did not work with this version. Can you please fix this issue?

Best,
Bekir

Error in cosmic_format_index.pl with absolute paths

Hello, I've been trying to run this perl program and I've found one error when running it using absoblute paths

This is the command I've performed:

perl /home/idibell/opt/ISOWN/bin/cosmic_format_index.pl \ ~/opt/ISOWN/external_databases/CosmicCodingMuts.vcf.gz \ ~/opt/ISOWN/external_databases/CosmicNonCodingVariants.vcf.gz

The error appears at combine headers step and is a "file not found error", this is the error trace:

"Combine headers ...cat: **_/home/idibell/opt/ISOWN/external_databases/CosmicCodingMuts.vcf.gz.header:** No existe el fichero o el directorio cat: _/home/idibell/opt/ISOWN/external_databases/CosmicNonCodingVariants.vcf.gz.header: No existe el fichero o el directorio "

I've checked the code and the origin of this error are the zcat outputs, you are adding an underscore in the begin of the file location, and chrashes when the file is an absolute path

printf "\nExtract headers from ${CODING_FILE} and ${NON_CODING_FILE} ...";

system "zcat $CODING_FILE | head -500 | grep "^#" | grep -v CHROM > _${CODING_FILE}.header";

system "zcat $NON_CODING_FILE | head -500 | grep "^#" > _${NON_CODING_FILE}.header";
`

Regards,
Luis.

Error while Reformating and indexing files

ISOWN]$ perl bin/exac_format_index.pl ExAC.r0.3.1.sites.vep.vcf.gz ExAC.r0.3.1.database.vcf

Reformat ExAC.r0.3.1.sites.vep.vcf.gz ...
gzip: ExAC.r0.3.1.sites.vep.vcf.gz: invalid compressed data--crc error

gzip: ExAC.r0.3.1.sites.vep.vcf.gz: invalid compressed data--length error

qpipeline segmentation fault

I'm trying to use ISOWN for somatic/germline prediction.

After installation, when I test it, I faced this error message. (I erased absolute address of files)

$cwd/qpipeline tabix -m 2020 -d 00-All.modified.vcf.gz -A -E -p dbSNP142_All_20141124 -i test1 -f hg19_random.fa > test2sh: line 1: 11178 Segmentation fault (core dumped)
$cwd/qpipeline tabix -m 2020 -d 00-All.modified.vcf.gz -A -E -p dbSNP142_All_20141124 -i test1 -f hg19_random.fa > test2

I think there might be impossible access in 'qpipeline' code. But I cannot identify it because qpipeline is provided only in the executable file. Could you check this?

Segmentation fault when computing flanking regions

I run into a segmentation error when calculating flanking regions:

ISOWN/bin/qpipeline_internal tabix -m 9555 -i somatic.vcf.temp.sequence.context.vcf -f hg19_random.fa

The file somatic.vcf.temp.sequence.context.vcf seems OK.

If I try to skip this step and run directly
ISOWN/bin/qpipeline_internal tabix -m 9503 -i somatic.vcf.temp.sequence.context.vcf the resulting vcf contains no variants, so it seems that calculating flanking regions is unavoidable.

Solution #3 (ulimit -s 65536) helped only partially: without this, even the -v option didn't output any variants.

I attach hereby the first lines of my somatic.vcf.temp.sequence.context.vcf and the output of the tabix -m 9555 command with -v option.

somatic_test.vcf.temp.sequence.context.vcf.txt
v_command_output.txt

Can't run classifier.

Dear Author,

I got the message as shown below when trying to run the final step run_isown.pl:

perl /workplace/Software/ISOWN/bin/run_isown.pl 181023001/ 181023001/181023001.isown.txt "-trainingSet /workplace/Software/ISOWN/training_data/UCEC_100_TrainSet.arff -sanityCheck false -classifier nbc"

Reformat files in '181023001' to emaf ...

WARNING: 18 variants with unknown annotation were removed
Total number of variants after filtering 3770

Running prediction using file '181023001/181023001.isown.txt.emaf' ...

...
Your working directory is 181023001
...
This file was chosen for classifier training: /workplace/Software/ISOWN/training_data/UCEC_100_TrainSet.arff
...
Total number of samples in your set is 1
...
Number of loaded nonsilent coding variants in test set is 808
...

Naive Bayes Classifier:
Option: supervised discretization (SD) is true
10-fold cross-validation

F1-measure: 98.12%.
Recall: 97.817%.
Precision: 98.425%.
False positive rate: 1.565%.
AUC: 99.77%.

Can't run classifier.
java.io.IOException: nominal value not declared in header, read Token[null], line 19
at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:240)
at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:578)
at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:423)
at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:391)
at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:376)
at weka.core.converters.ArffLoader$ArffReader.(ArffLoader.java:138)
at weka.core.Instances.(Instances.java:126)
at main.Prediction.runClassifier(Prediction.java:233)
at main.runISOWN.main(runISOWN.java:90)

...
Total number of predicted somatic mutations 0
Final results are saved here: 181023001/181023001.isown.txt
...

Done

INTERESTINGLY, I got no error running both database_annotation.pl and run_isown.pl with the two vcf files provided in the test_data/ directory ...

I googled about the "nominal value not declared in header" and some said it is something to do with weka, so I checked:

java -jar /workplace/Software/ISOWN/bin/weka.jar

Exception in thread "main" java.lang.ExceptionInInitializerError
Caused by: java.awt.HeadlessException:
No X11 DISPLAY variable was set, but this program performed an operation which requires it.
at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:204)
at java.awt.Window.(Window.java:536)
at java.awt.Frame.(Frame.java:420)
at javax.swing.JFrame.(JFrame.java:233)
at weka.gui.LogWindow.(LogWindow.java:252)
at weka.gui.GUIChooser.(GUIChooser.java:215)

So did I miss anything? By the way, the weka.jar was already in the bin/ directory when I installed ISOWN, so I did not do any replacement of weka.jar since check_dependencies.pl said everything was installed.

Thank you very much!

Pang

mutation taser file not available

I found an issue and trying to fix it. Please let me know if I am right.

MA.scores.hg19.tar.bz2 is not available anymore. They have released MA_scores_rel3_hg19_full.tar (rel 3). Upon unzipping the files there are many csv files under the folder MA_scores_rel3_hg19_full

Then:
Reformat and index Mutation Assessor by using the following script:
perl ${ISOWN_HOME}/bin/mutation-assessor_format_index_vcf.pl MA_scores_rel3_hg19_full 2013_12_11_MA.vcf

does your perl code works for the new release? ( I never done perl before)

Thanks
Sasi

dependencies file missing

Dear Q the link to dependencies file is broken. I am having problems which the annotations from the annovar (function and exonic functions) are not added to the out put file. I was wondering may be it is because my annovar version is newer than 2012 one.

Mutation Accessor file missed

Hi,
After downloading Mutation Accessor, and uncompressed it, I find all the file are all in csv format.

wget http://mutationassessor.org/r2/MA.scores.hg19.tar.bz2 --no-passive-ftp

I cannot go on with following step, any other link I can use?

perl ${ISOWN_HOME}/bin/mutation-assessor_format_index_vcf.pl MA.hg19 2013_12_11_MA.vcf

Thank you.

qpipeline source code needed

my os does not support glibc2.14 and we can not update glibc to version 2.14 or later, i found you have modified your qpipeline in github to specifically fit for ISOWN and compile it under glibc2.14 or later, I have no way to run it.
should you please share a copy of the qpipeline source code or recompile a version under GLIBC_2.12 anyway ?

Can we use ISOWN for oral cancer data?

I am working on the rna-seq data of oral cancer patients . can I use ISOWN for somatic mutation detection using the vcf file that I obtained from oral cancer patients rna-seq data?

nobody test this software?

HI,
I found there is no any issues about this software , sorry , it's not good ?

Outdated documentation

Hello again,

Mutation Accessor MA.scores.hg19.tar.bz2 link is outdated. Now they are working with version 3, and I think the equivalent file is at "http://mutationassessor.org/r3/MA_scores_rel3_hg19_full.tar.gz" path.

Regards,
L

Required databases

Hello,

Is it possible to omit any of databases from the list of required ones: COSMIC, dbSNP, ExAC, PolyPhen WHESS, Mutation Assessor. In my research only first three of listed ones are required.

Thank you.

Documentation of external_tools

Hi,

when running your command I received the following error:

annotating input file with ANNOVAR ...Can't open perl script "~/software/ISOWN/bin/../external_tools/annovar_2012-03-08/annotate_variation.pl": No such file or directory

I assume I need to download annovar and put it in that directory,but it might be good to document that, or add annovar to the list of dependencies on the README

databases in database_annotation.pl do not match README

Hi,

lines 14-19 of database_annotation.pl do not currently match the README docs, for example ExAC is:

my $ExAC="$cwd/../external_databases/ExAC.r0.3.sites.vep.vcf.20150421.vcf.gz";

but should be:

my $ExAC="$cwd/../external_databases/ExAC.r0.3.1.database.vcf";

if following the example from the README I think. Maybe these paths should not be hardcoded or if so the format_index commands should name the files themselves instead of the user?

Also the following file ../external_databases/hg19_random.fa does not exist (line 14), And is not mentioned in the docs as far as I could tell, Is this the entire hg19 build assembly?

Testing data

Hi,
There is seven individual cancer's training data here but I cannot find seven individual testing data.
Is there any way I can get/generate it?

Thanks

Error when converting VCF file to emaf

Dear Irina,

I got an error in the 3rd step when I had to convert VCF files to emaf file.

I used this command:
perl ${ISOWN_HOME}/bin/run_isown.pl ./annovar_results ./isown_results/test.output.txt " -trainingSet ${ISOWN_HOME}/training_data/BRCA_100_TrainSet.arff -sanityCheck false -classifier nbc"

And I got this error:

Reformat files in '/home/ISOWN/annovar_results' to emaf ...
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
at com.Processing.processVcf(Processing.java:119)
at com.runReformating.main(runReformating.java:39)

As I didn't get any information about the error it is difficult to know what can we do.
I would appreciate any feedback.

Thanks,
All the best,
Ibon

Segmentation fault (core dumped)

Dear madam/sir,

I used ISOWN to identify somatic mutations and get the following error message when dong the annotation

bin/qpipeline tabix -m 2020 -d bin/../external_databases/00-ALL.modified.vcf.gz -A -E -p dbSNP142_All_20141124 -i output/sample.temp.annovar.vcf -f bin/../external_databases/hg19_random.fa > output/sample.temp.dbSNP.vcfsh: line 1: 36641 Segmentation fault (core dumped) bin/qpipeline tabix -m 2020 -d bin/../external_databases/00-ALL.modified.vcf.gz -A -E -p dbSNP142_All_20141124 -i output/sample.temp.annovar.vcf -f bin/../external_databases/hg19_random.fa > output/sample.temp.dbSNP.vcf

Then I use gdb to debug the core file, there is the result

$ gdb bin/qpipeline core.36641
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/wfang/bin/ISOWN/bin/qpipeline...done.
[New LWP 36641]
Core was generated by 'bin/qpipeline tabix -m 2020 -d bin/../external_databases/00-ALL.modified.vcf.gz'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004250db in tabix_annotation_against_NCBI_VCF_dbSNP ()
Missing separate debuginfos, use: debuginfo-install zlib-1.2.7-17.el7.x86_64 (gdb)

How can I resolve it?

LU training sets

Hi there, I was wondering if meanwhile you have made Lung cancer training sets? and if yes could you please provide it for download?
Cheers

Issue about different reference human assemblies

Hi,

I'm new for this wonderful tool. When i look up the 00-All.vcf.gz, I think it uses GRCh38.p7 as the reference assembly. While the other databases use Hg19 as the reference. I'd like to know if it is right to use liftover utility to convert genome coordinates from GRCh38 to Hg19.

ArrayIndexOutOfBoundsException when generating training set

Hello again,

sorry for the insistence. My "16:10 issue" is related with the generation of the training set. I am trying to run generate_training_set dataset and java is breaking with this trace:

`
Reformat files in '/home/idibell/Documentos/pipelines/8.pujana/20180716.pdx.mutect/results/20180725.install.isown.db/2.vcf2' to emaf ...

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
at com.Processing.processVcf(Processing.java:119)
at com.runReformating.main(runReformating.java:39)

Generate training data set for '/home/idibell/Documentos/pipelines/8.pujana/20180716.pdx.mutect/results/20180725.install.isown.db/2.vcf2' ...

Exception in thread "main" java.lang.NullPointerException
at helper.Headers.(Headers.java:41)
at main.Generator.getVariant2samples(Generator.java:243)
at main.Generator.loadVariants(Generator.java:19)
at main.runGenerator.main(runGenerator.java:47)
`

What i am doing wrong?

My JAVA_HOME variable, btw is "/usr/lib/jvm/jre-1.8.0-openjdk-1.8.0.171-8.b10.el7_5.x86_64"

Regards,
Luis.

Tabix fails to index Mutation Accessor VCF file

In the file "bin/mutation-assessor_format_index_vcf.pl", change the following line:
system "bgzip $OUTPUT_FILE ; tabix -s 1 -b 2 -e 3 ${OUTPUT_FILE}.gz";
to
system "bgzip $OUTPUT_FILE ; tabix -p vcf ${OUTPUT_FILE}.gz";

Edit: tabix command also needs to be changed in the "bin/polyphen-whess_format_index.pl" file

Segmentation fault (core dumped) error

Hi I got the following error when i tried to annotate test data file provided in the test_data folder. How can I resolve it?

annotating input file with ANNOVAR ...NOTICE: Output files were written to test1data_final.temp.annovar.vcf.temp.convert2annovar.variant_function, test1data_final.temp.annovar.vcf.temp.convert2annovar.exonic_variant_function
NOTICE: Reading gene annotation from /home/akanksha/Documents/ISOWN/bin/../external_tools/annovar_2012-03-08/humandb/hg19_refGene.txt ... Done with 63481 transcripts (including 15216 without coding sequence annotation) for 27720 unique genes
NOTICE: Processing next batch with 40066 unique variants in 40066 input lines
NOTICE: Reading FASTA sequences from /home/akanksha/Documents/ISOWN/bin/../external_tools/annovar_2012-03-08/humandb/hg19_refGeneMrna.fa ... Done with 18834 sequences
WARNING: A total of 405 sequences will be ignored due to lack of correct ORF annotation

/home/akanksha/Documents/ISOWN/bin/qpipeline tabix -m 2020 -d /home/akanksha/Documents/ISOWN/bin/../external_databases/00-All.modified.vcf.gz -A -E -p dbSNP142_All_20141124 -i test1data_final.vcf.temp.annovar.vcf -f /home/akanksha/Documents/ISOWN/bin/../external_databases/hg19_random.fa > test1data_final.vcf.temp.dbSNP.vcfSegmentation fault (core dumped)

annotating input file with COSMIC ...

/home/akanksha/Documents/ISOWN/bin/qpipeline tabix -m 2020 -d /home/akanksha/Documents/ISOWN/bin/../external_databases/CosmicAll.gz -A -E -p COSMIC_69 -i test1data_final.vcf.temp.dbSNP.vcf -f /home/akanksha/Documents/ISOWN/bin/../external_databases/hg19_random.fa > test1data_final.vcf.temp.cosmic.vcfSegmentation fault (core dumped)

annotating input file with ExAC ...

/home/akanksha/Documents/ISOWN/bin/qpipeline tabix -m 2020 -d /home/akanksha/Documents/ISOWN/bin/../external_databases/ExAC.r0.3.1.database.vcf.gz -A -E -p ExAC.r0.3_20150421 -i test1data_final.vcf.temp.cosmic.vcf -f /home/akanksha/Documents/ISOWN/bin/../external_databases/hg19_random.fa > test1data_final.vcf.temp.exac.vcfSegmentation fault (core dumped)

annotating input file with MutationAccessor ...

/home/akanksha/Documents/ISOWN/bin/qpipeline tabix -m 2020 -d /home/akanksha/Documents/ISOWN/bin/../external_databases/2013_12_11_MA.vcf.gz -A -E -p 2013_12_11_MA -i test1data_final.vcf.temp.exac.vcf -f /home/akanksha/Documents/ISOWN/bin/../external_databases/hg19_random.fa > test1data_final.vcf.temp.ma.vcfSegmentation fault (core dumped)

nnotating input file with PolyPhen ...

annotating input file with sequence context ...

calculating flanking region ...Segmentation fault (core dumped)

final reformatting ...Segmentation fault (core dumped)

cleanup: deleting temporary files ( test1data_final.vcf*.temp.* ) ...

failed to load tabix

Hi
I am having the following error "fail to load tabix file" after running correctly annovar. The same error remains in the following databases dbSNP, ExAc and so on. Does anyone have experience with this?
Thanks!

LAML （ acute myelocytic leukemia ） training set

Hello, I want to ask if you have LAML or a similar training set, I will be appriciate if you can share it.
cheers