ntm / grexome-timc-secondary Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 2.0 11.88 MB

exome pipeline from TIMC - secondary analyses (GVCF to analysis-ready TSVs)

License: GNU General Public License v3.0

Perl 100.00%

grexome-timc-secondary's People

Contributors

Stargazers

Watchers

Forkers

manojmw jjjk123

grexome-timc-secondary's Issues

Final result file folders Empty ----E 8_extractTranscripts.pl: couldn't find one of HV/HET/OCHV/OCHET for OM

@ntm
I tried the vep command as in 3_runVEP.pl by using a vcf file choped from the whole as the pl runs. And it got result files. So I am really confused about the empty final result. How can I find the wrong step? Thanks again for your help!

test.zip
vepStats.zip

Here is log file. So the wrong because of 8_extractSamples.pl? I just use the example sample.xls and modified sample name with original colunm name.
sampleCLN.zip

During the process, the tmpdir is not empty.

I 2022-10-10 05:12:13: 8_extractSamples.pl - starting to run
Use of uninitialized value $header in scalar chomp at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/8_extractSamples.pl line 161.
Use of uninitialized value $header in split at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/8_extractSamples.pl line 162.
E: 8_extractSamples.pl - couldn't find OMF_HV or OMF_OTHERCAUSE_HV in header of infile OMF.csv
I 2022-10-10 05:12:14: 8_extractSamples.pl - ALL DONE, completed successfully!
I 2022-10-10 05:12:15: 8_extractTranscripts.pl - starting to run
Use of uninitialized value $header in scalar chomp at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/8_extractTranscripts.pl line 213.
Use of uninitialized value $header in split at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/8_extractTranscripts.pl line 214.
E 8_extractTranscripts.pl: couldn't find one of HV/HET/OCHV/OCHET for OM

Thanks again for your help!
Best wishes,
Chris

use CADD v1.6 (AKA CADD-Splice) for upgrading splice variants

dbNSFP v4.2a contains CADD v1.6 (AKA CADD-Splice), which is supposedly good for predicting the impact of variants on splicing. We want to integrate the CADD score in the LOW->MODHIGH algorithm for splicing variants, in 4_vcf2tsv.pl. Unfortunately currently the dbNSFP VEP plugin only annotates missense variants.
TODO: patch the dbNSFP VEP plugin, It may be sufficient to update %INCLUDE_SO at the top of:
https://github.com/Ensembl/VEP_plugins/blob/release/104/dbNSFP.pm

command line for 0_coverage.pl

@ntm
I am trying to follow your instruction in 0_coverage.pl to make coverage files.
I am so new to perl language. And perl 0_coverage.pl --help is empty. I don't know how to add the samples xlsx, candidatesFiles $transciptsFile , $gvcf (must be tabix-indexed), and an $outDir.

In your script, it wrote that
@argv == 5) ||
die "E $0: needs 5 args: a samples file, a comma-separated list of candidatesFiles, a tsv.gz, a GVCF and an outDir\n";
my ($samplesFile, $candidatesFiles, $transcriptsFile, $gvcf, $outDir) = @argv.

What should I do?And there is something like lib "/home/nthierry/Software/VariantEffectPredictor/ensembl-vep/"， Should I modify it?
Thanks again!!

Best regards,
Chris

dbNSFP VEP plugin error "transcript_match parameter specified but..." -> due to corrupt dbNSFP4.3a.zip

@ntm
Thanks for your great work!

I am trying to use your pipline in local system. However, when I run the secondary part, I just met:
WARNING: Failed to instantiate plugin dbNSFP: ERROR: transcript_match parameter specified but transcript-specific field detection failed at /xxxxx/ensembl-vep-107.0-0/dbNSFP.pm line 299.

Thses were my steps.
wget ftp://dbnsfp:[email protected]/dbNSFP4.3a.zip
unzip dbNSFP4.3a.zip
zcat dbNSFP4.3a_variant.chr1.gz | head -n1 > h
zgrep -h -v ^#chr dbNSFP4.3a_variant.chr* | sort -T /path/to/tmp_folder -k1,1 -k2,2n - | cat h - | bgzip -c > dbNSFP4.3a_grch38.gz
tabix -s 1 -b 2 -e 2 dbNSFP4.3a_grch38.gz
mv dbNSFP4.3a_grch38.gz dbNSFP4.3a.gz
mv dbNSFP4.3a_grch38.gz.tbi dbNSFP4.3a.gz.tbi

And I used Homo sapiens cache 107_GRCh38.

I just can't figure out the problem.
Looking forward your help.

Best wishes,
Chirs

How to edit &subCohorts() in config.pm? W grexome-TIMC-secondary.pl: sub-cohort file defined in &subCohorts() but this file doesn't exist. Skipping this sub-cohort

@ntm
I have no subcohort file.
And I modify the confog.pm as

sub subCohorts {
my %subCohorts = ("" => "");
return(%subCohorts);
}

After I run the secondary.pl, the log file shows that W grexome-TIMC-secondary.pl: sub-cohort file defined in &subCohorts() but this file doesn't exist. Skipping this sub-cohort.

Is there better way to modify the config.pm?
Thanks again for your attention and great work!

Best regards,
Chris

generate updated list of canonical transcripts

@ntm
Nicolas,
Thanks again for your great work.
I was trying to generate updated list of canonical transcripts follow your scripts. However, I met ERROR 2003 (HY000): Can't connect to MySQL server on 'ensembldb.ensembl.org:3306' (110).
I am new to Linux system, I have tried google for the solution, but I just can't figure it out. And I don't have root access.
Could you offer more details for 'Just run it from eg nicofree'? Or could you offer the listCanonicalTranscripts_22017.tsv.gz?
Thanks again for your help!

Best regards,
Chris

debug mode on, step3 failed: 256 at sec_.pl line 283.

@ntm
Nicolas,
Thanks for your patient reply!
I fixed the dbNSFP and Transcript problem with your help. Really exciting!
However, when I run the secondary.pl, the final result of four file folders is empty.
I don't have subcohort, so I just delete the step 9 subcohort in seconry.pl. And my test file is merged-gatk-gvcf from primary.pl of two patients.
The log file shows step1-6 no problem. Howerver, it shows somthing wrong starting from step 7.
My config only has ovary for 7_filterAndReorderAll.pl.
I also tried the debug mode. And it shows:
E clnsec_.pl: debug mode on, step3 failed: 256 at clnsec_.pl line 283.

step 3
$com .= " | perl $RealBin/3_runVEP.pl --cacheFile=".&vepCacheFile()." --genome=".&refGenome()." --dataDir=".&vepPluginDataPath()." --tmpDir=$tmpdir/runVepTmpDir/ ";
($debugVep) && ($com .= "--debug ");
if ($debug) {
$com .= "2> $outDir/step3.err > $outDir/step3.out";
system($com) && die "E $0: debug mode on, step3 failed: $?";
$com = "cat $outDir/step3.out ";
}

decompress infile and step 1
my $com = "$bgzip $inFile | perl $RealBin/1_filterBadCalls.pl --samplesFile=$samples --tmpdir=$tmpdir/FilterTmp/ --jobs $numJobs1 ";
if ($debug) {
# specific logfile from step and save its output
$com .= "2> $outDir/step1.err > $outDir/step1.out";
system($com) && die "E $0: debug mode on, step1 failed: $?";
# next step will read this step's output
$com = "cat $outDir/step1.out ";
}

Here is my step.1 out file.
step1.zip

Here is my command and Log file:

perl clnsec_.pl --samples=sampleCLN.xlsx --infile=grexomes_gatk_merged_221005.g.vcf.gz --outdir=SecondaryAnalyses_TEST --config=sec_config.pm 2> grexomeTIMCsec_TEST.log

2022-10-09 09:44:17: clnsec_.pl - starting to run
I clnsec_.pl: variant-caller id GATK will be appended to all filenames
I 2022-10-09 09:44:18: 4_vcf2tsv.pl - starting to run
I 2022-10-09 09:44:18: 2_sampleData2genotypes.pl - starting to run
I 2022-10-09 09:44:18: 5_addGTEX.pl - starting to run
I 2022-10-09 09:44:18: 3_runVEP.pl - starting to run
I 2022-10-09 09:44:19: 6_extractCohorts.pl - starting to run
I 2022-10-09 09:44:19: 1_filterBadCalls.pl - starting to run
I 2022-10-09 10:08:48: 3_runVEP.pl - finished parsing/processing chrom chr1
I 2022-10-09 10:28:05: 3_runVEP.pl - finished parsing/processing chrom chr2
I 2022-10-09 10:30:08: 6_extractCohorts.pl - done processing batch 10
I 2022-10-09 10:34:37: 3_runVEP.pl - finished parsing/processing chrom chr3
I 2022-10-09 10:50:26: 3_runVEP.pl - finished parsing/processing chrom chr4
I 2022-10-09 10:54:30: 6_extractCohorts.pl - done processing batch 20
I 2022-10-09 11:02:14: 3_runVEP.pl - finished parsing/processing chrom chr5
I 2022-10-09 11:20:21: 3_runVEP.pl - finished parsing/processing chrom chr6
I 2022-10-09 11:25:54: 6_extractCohorts.pl - batchNum=30, adjusting batchSize down to 2403
I 2022-10-09 11:26:10: 6_extractCohorts.pl - done processing batch 30
I 2022-10-09 11:26:16: 6_extractCohorts.pl - batchNum=45, adjusting batchSize up to 15728
I 2022-10-09 11:26:21: 6_extractCohorts.pl - done processing batch 40
I 2022-10-09 11:30:53: 3_runVEP.pl - finished parsing/processing chrom chr7
I 2022-10-09 11:38:35: 3_runVEP.pl - finished parsing/processing chrom chr8
I 2022-10-09 11:42:04: 6_extractCohorts.pl - done processing batch 50
I 2022-10-09 11:49:35: 3_runVEP.pl - finished parsing/processing chrom chr9
I 2022-10-09 12:04:59: 3_runVEP.pl - finished parsing/processing chrom chr10
I 2022-10-09 12:11:53: 6_extractCohorts.pl - batchNum=60, adjusting batchSize down to 2873
I 2022-10-09 12:12:14: 6_extractCohorts.pl - done processing batch 60
I 2022-10-09 12:12:26: 6_extractCohorts.pl - done processing batch 70
I 2022-10-09 12:12:33: 6_extractCohorts.pl - batchNum=75, adjusting batchSize up to 10342
I 2022-10-09 12:22:39: 3_runVEP.pl - finished parsing/processing chrom chr11
I 2022-10-09 12:33:51: 6_extractCohorts.pl - done processing batch 80
I 2022-10-09 12:42:20: 3_runVEP.pl - finished parsing/processing chrom chr12
I 2022-10-09 12:52:59: 6_extractCohorts.pl - batchNum=90, adjusting batchSize down to 2131
I 2022-10-09 12:53:07: 6_extractCohorts.pl - done processing batch 90
I 2022-10-09 12:53:26: 6_extractCohorts.pl - batchNum=105, adjusting batchSize up to 11365
I 2022-10-09 12:53:28: 6_extractCohorts.pl - done processing batch 100
I 2022-10-09 12:53:54: 3_runVEP.pl - finished parsing/processing chrom chr13
I 2022-10-09 13:05:22: 3_runVEP.pl - finished parsing/processing chrom chr14
I 2022-10-09 13:08:34: 6_extractCohorts.pl - done processing batch 110
I 2022-10-09 13:18:19: 3_runVEP.pl - finished parsing/processing chrom chr15
I 2022-10-09 13:23:15: 6_extractCohorts.pl - batchNum=120, adjusting batchSize down to 3176
I 2022-10-09 13:23:32: 6_extractCohorts.pl - done processing batch 120
I 2022-10-09 13:31:43: 3_runVEP.pl - finished parsing/processing chrom chr16
I 2022-10-09 13:36:54: 6_extractCohorts.pl - done processing batch 130
I 2022-10-09 13:36:58: 6_extractCohorts.pl - batchNum=135, adjusting batchSize down to 1929
I 2022-10-09 13:37:15: 6_extractCohorts.pl - done processing batch 140
I 2022-10-09 13:37:21: 6_extractCohorts.pl - batchNum=150, adjusting batchSize up to 12077
I 2022-10-09 13:37:47: 6_extractCohorts.pl - done processing batch 150
I 2022-10-09 13:45:20: 3_runVEP.pl - finished parsing/processing chrom chr17
I 2022-10-09 13:58:15: 6_extractCohorts.pl - done processing batch 160
I 2022-10-09 13:58:23: 3_runVEP.pl - finished parsing/processing chrom chr18
I 2022-10-09 14:04:31: 6_extractCohorts.pl - batchNum=165, adjusting batchSize down to 3704
I 2022-10-09 14:17:30: 3_runVEP.pl - finished parsing/processing chrom chr19
I 2022-10-09 14:22:20: 6_extractCohorts.pl - done processing batch 170
I 2022-10-09 14:22:35: 6_extractCohorts.pl - batchNum=180, adjusting batchSize down to 1708
I 2022-10-09 14:22:42: 6_extractCohorts.pl - done processing batch 180
I 2022-10-09 14:22:51: 6_extractCohorts.pl - batchNum=195, adjusting batchSize up to 15372
I 2022-10-09 14:22:53: 6_extractCohorts.pl - done processing batch 190
I 2022-10-09 14:23:07: 3_runVEP.pl - finished parsing/processing chrom chr20
I 2022-10-09 14:30:01: 3_runVEP.pl - finished parsing/processing chrom chr21
I 2022-10-09 14:34:29: 6_extractCohorts.pl - done processing batch 200
I 2022-10-09 14:34:31: 3_runVEP.pl - finished parsing/processing chrom chr22
I 2022-10-09 14:43:06: 1_filterBadCalls.pl - ALL DONE, completed successfully!
I 2022-10-09 14:43:06: 3_runVEP.pl - finished parsing/processing chrom chrX
I 2022-10-09 14:43:06: 2_sampleData2genotypes.pl - ALL DONE, completed successfully!
I 2022-10-09 14:49:24: 3_runVEP.pl - finished parsing/processing chrom chrY
I 2022-10-09 14:49:43: 3_runVEP.pl - finished parsing/processing chrom chrM
I 2022-10-09 14:50:19: 3_runVEP.pl - ALL DONE, completed successfully!
I 2022-10-09 14:50:20: 4_vcf2tsv.pl - ALL DONE, completed successfully!
I 2022-10-09 14:50:20: 5_addGTEX.pl - ALL DONE, completed successfully!
I 2022-10-09 14:50:36: 6_extractCohorts.pl - ALL DONE, completed successfully!
I 2022-10-09 14:50:37: 7_filterAndReorderAll.pl - starting to run
E 7_reorderColumns.pl: some newOrder titles were not found: GTEX_testis_RATIO
I 2022-10-09 14:50:38: 7_filterAndReorderAll.pl - ALL DONE, completed successfully!
I 2022-10-09 14:50:39: 8_extractSamples.pl - starting to run
Use of uninitialized value $header in scalar [chomp]at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/8_extractSamples.pl line 161.
Use of uninitialized value $header in split at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/8_extractSamples.pl line 162.
E: 8_extractSamples.pl - couldn't find OMF_HV or OMF_OTHERCAUSE_HV in header of infile OMF.csv
I 2022-10-09 14:50:40: 8_extractSamples.pl - ALL DONE, completed successfully!
I 2022-10-09 14:50:42: 8_extractTranscripts.pl - starting to run
Use of uninitialized value $header in scalar chomp at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/8_extractTranscripts.pl line 213.
Use of uninitialized value $header in split at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/8_extractTranscripts.pl line 214.
E 8_extractTranscripts.pl: couldn't find one of HV/HET/OCHV/OCHET for OMF
I 2022-10-09 14:50:43: 8_addPatientIDs.pl - starting to run
I 2022-10-09 14:50:43: 8_addPatientIDs.pl - ALL DONE, completed successfully!
I 2022-10-09 14:50:43: 9_requireUndiagnosed.pl - starting to run
Use of uninitialized value $header in scalar chomp at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/9_requireUndiagnosed.pl line 66.
Use of uninitialized value $header in split at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/9_requireUndiagnosed.pl line 67.
E: 9_requireUndiagnosed.pl couldn't find one of HV/HET for OMF
I 2022-10-09 14:50:45: 8_addPatientIDs.pl - starting to run
I 2022-10-09 14:50:46: 8_addPatientIDs.pl - ALL DONE, completed successfully!
I 2022-10-09 14:50:46: 7_filterAndReorderAll.pl - starting to run
Use of uninitialized value $header in scalar chomp at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/7_filterVariants.pl line 69.
Use of uninitialized value $header in concatenation (.) or string at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/7_filterVariants.pl line 70.
Use of uninitialized value $header in split at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/7_filterVariants.pl line 73.
E 7_filterVariants.pl: title CANONICAL required by script but missing, some VEP columns changed?
I 2022-10-09 14:50:47: 7_filterAndReorderAll.pl - ALL DONE, completed successfully!
I 2022-10-09 14:50:49: 10_qc_checkCausal.pl - starting to run
I 2022-10-09 14:50:49: 10_qc_checkCausal.pl - ALL DONE, completed successfully!
I 2022-10-09 14:50:49: clnsec_.pl - ALL DONE, completed successfully!

Looking forward your reply! Just Thanks a lot for your help with VEP and Transcript problems!!!

Best wishes,
Chris

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.