ntm / grexome-timc-secondary Goto Github PK
View Code? Open in Web Editor NEWexome pipeline from TIMC - secondary analyses (GVCF to analysis-ready TSVs)
License: GNU General Public License v3.0
exome pipeline from TIMC - secondary analyses (GVCF to analysis-ready TSVs)
License: GNU General Public License v3.0
@ntm
I tried the vep command as in 3_runVEP.pl by using a vcf file choped from the whole as the pl runs. And it got result files. So I am really confused about the empty final result. How can I find the wrong step? Thanks again for your help!
Here is log file. So the wrong because of 8_extractSamples.pl? I just use the example sample.xls and modified sample name with original colunm name.
sampleCLN.zip
During the process, the tmpdir is not empty.
I 2022-10-10 05:12:13: 8_extractSamples.pl - starting to run
Use of uninitialized value $header in scalar chomp at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/8_extractSamples.pl line 161.
Use of uninitialized value $header in split at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/8_extractSamples.pl line 162.
E: 8_extractSamples.pl - couldn't find OMF_HV or OMF_OTHERCAUSE_HV in header of infile OMF.csv
I 2022-10-10 05:12:14: 8_extractSamples.pl - ALL DONE, completed successfully!
I 2022-10-10 05:12:15: 8_extractTranscripts.pl - starting to run
Use of uninitialized value $header in scalar chomp at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/8_extractTranscripts.pl line 213.
Use of uninitialized value $header in split at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/8_extractTranscripts.pl line 214.
E 8_extractTranscripts.pl: couldn't find one of HV/HET/OCHV/OCHET for OM
Thanks again for your help!
Best wishes,
Chris
dbNSFP v4.2a contains CADD v1.6 (AKA CADD-Splice), which is supposedly good for predicting the impact of variants on splicing. We want to integrate the CADD score in the LOW->MODHIGH algorithm for splicing variants, in 4_vcf2tsv.pl. Unfortunately currently the dbNSFP VEP plugin only annotates missense variants.
TODO: patch the dbNSFP VEP plugin, It may be sufficient to update %INCLUDE_SO at the top of:
https://github.com/Ensembl/VEP_plugins/blob/release/104/dbNSFP.pm
@ntm
I am trying to follow your instruction in 0_coverage.pl to make coverage files.
I am so new to perl language. And perl 0_coverage.pl --help is empty. I don't know how to add the samples xlsx, candidatesFiles $transciptsFile , $gvcf (must be tabix-indexed), and an $outDir.
In your script, it wrote that
@argv == 5) ||
die "E $0: needs 5 args: a samples file, a comma-separated list of candidatesFiles, a tsv.gz, a GVCF and an outDir\n";
my ($samplesFile, $candidatesFiles, $transcriptsFile, $gvcf, $outDir) = @argv.
What should I do?And there is something like lib "/home/nthierry/Software/VariantEffectPredictor/ensembl-vep/"๏ผ Should I modify it?
Thanks again!!
Best regards,
Chris
@ntm
Thanks for your great work!
I am trying to use your pipline in local system. However, when I run the secondary part, I just met:
WARNING: Failed to instantiate plugin dbNSFP: ERROR: transcript_match parameter specified but transcript-specific field detection failed at /xxxxx/ensembl-vep-107.0-0/dbNSFP.pm line 299.
Thses were my steps.
wget ftp://dbnsfp:[email protected]/dbNSFP4.3a.zip
unzip dbNSFP4.3a.zip
zcat dbNSFP4.3a_variant.chr1.gz | head -n1 > h
zgrep -h -v ^#chr dbNSFP4.3a_variant.chr* | sort -T /path/to/tmp_folder -k1,1 -k2,2n - | cat h - | bgzip -c > dbNSFP4.3a_grch38.gz
tabix -s 1 -b 2 -e 2 dbNSFP4.3a_grch38.gz
mv dbNSFP4.3a_grch38.gz dbNSFP4.3a.gz
mv dbNSFP4.3a_grch38.gz.tbi dbNSFP4.3a.gz.tbi
And I used Homo sapiens cache 107_GRCh38.
I just can't figure out the problem.
Looking forward your help.
Best wishes,
Chirs
@ntm
I have no subcohort file.
And I modify the confog.pm as
sub subCohorts {
my %subCohorts = ("" => "");
return(%subCohorts);
}
After I run the secondary.pl, the log file shows that W grexome-TIMC-secondary.pl: sub-cohort file defined in &subCohorts() but this file doesn't exist. Skipping this sub-cohort.
Is there better way to modify the config.pm?
Thanks again for your attention and great work!
Best regards,
Chris
@ntm
Nicolas,
Thanks again for your great work.
I was trying to generate updated list of canonical transcripts follow your scripts. However, I met ERROR 2003 (HY000): Can't connect to MySQL server on 'ensembldb.ensembl.org:3306' (110).
I am new to Linux system, I have tried google for the solution, but I just can't figure it out. And I don't have root access.
Could you offer more details for 'Just run it from eg nicofree'? Or could you offer the listCanonicalTranscripts_22017.tsv.gz?
Thanks again for your help!
Best regards,
Chris
@ntm
Nicolas,
Thanks for your patient reply!
I fixed the dbNSFP and Transcript problem with your help. Really exciting!
However, when I run the secondary.pl, the final result of four file folders is empty.
I don't have subcohort, so I just delete the step 9 subcohort in seconry.pl. And my test file is merged-gatk-gvcf from primary.pl of two patients.
The log file shows step1-6 no problem. Howerver, it shows somthing wrong starting from step 7.
My config only has ovary for 7_filterAndReorderAll.pl.
I also tried the debug mode. And it shows:
E clnsec_.pl: debug mode on, step3 failed: 256 at clnsec_.pl line 283.
step 3
$com .= " | perl $RealBin/3_runVEP.pl --cacheFile=".&vepCacheFile()." --genome=".&refGenome()." --dataDir=".&vepPluginDataPath()." --tmpDir=$tmpdir/runVepTmpDir/ ";
($debugVep) && ($com .= "--debug ");
if ($debug) {
$com .= "2> $outDir/step3.err > $outDir/step3.out";
system($com) && die "E
$com = "cat $outDir/step3.out ";
}
decompress infile and step 1
my $com = "$bgzip $inFile | perl $RealBin/1_filterBadCalls.pl --samplesFile=$samples --tmpdir=$tmpdir/FilterTmp/ --jobs $numJobs1 ";
if ($debug) {
# specific logfile from step and save its output
$com .= "2> $outDir/step1.err > $outDir/step1.out";
system($com) && die "E
# next step will read this step's output
$com = "cat $outDir/step1.out ";
}
Here is my step.1 out file.
step1.zip
Here is my command and Log file:
perl clnsec_.pl --samples=sampleCLN.xlsx --infile=grexomes_gatk_merged_221005.g.vcf.gz --outdir=SecondaryAnalyses_TEST --config=sec_config.pm 2> grexomeTIMCsec_TEST.log
2022-10-09 09:44:17: clnsec_.pl - starting to run
I clnsec_.pl: variant-caller id GATK will be appended to all filenames
I 2022-10-09 09:44:18: 4_vcf2tsv.pl - starting to run
I 2022-10-09 09:44:18: 2_sampleData2genotypes.pl - starting to run
I 2022-10-09 09:44:18: 5_addGTEX.pl - starting to run
I 2022-10-09 09:44:18: 3_runVEP.pl - starting to run
I 2022-10-09 09:44:19: 6_extractCohorts.pl - starting to run
I 2022-10-09 09:44:19: 1_filterBadCalls.pl - starting to run
I 2022-10-09 10:08:48: 3_runVEP.pl - finished parsing/processing chrom chr1
I 2022-10-09 10:28:05: 3_runVEP.pl - finished parsing/processing chrom chr2
I 2022-10-09 10:30:08: 6_extractCohorts.pl - done processing batch 10
I 2022-10-09 10:34:37: 3_runVEP.pl - finished parsing/processing chrom chr3
I 2022-10-09 10:50:26: 3_runVEP.pl - finished parsing/processing chrom chr4
I 2022-10-09 10:54:30: 6_extractCohorts.pl - done processing batch 20
I 2022-10-09 11:02:14: 3_runVEP.pl - finished parsing/processing chrom chr5
I 2022-10-09 11:20:21: 3_runVEP.pl - finished parsing/processing chrom chr6
I 2022-10-09 11:25:54: 6_extractCohorts.pl - batchNum=30, adjusting batchSize down to 2403
I 2022-10-09 11:26:10: 6_extractCohorts.pl - done processing batch 30
I 2022-10-09 11:26:16: 6_extractCohorts.pl - batchNum=45, adjusting batchSize up to 15728
I 2022-10-09 11:26:21: 6_extractCohorts.pl - done processing batch 40
I 2022-10-09 11:30:53: 3_runVEP.pl - finished parsing/processing chrom chr7
I 2022-10-09 11:38:35: 3_runVEP.pl - finished parsing/processing chrom chr8
I 2022-10-09 11:42:04: 6_extractCohorts.pl - done processing batch 50
I 2022-10-09 11:49:35: 3_runVEP.pl - finished parsing/processing chrom chr9
I 2022-10-09 12:04:59: 3_runVEP.pl - finished parsing/processing chrom chr10
I 2022-10-09 12:11:53: 6_extractCohorts.pl - batchNum=60, adjusting batchSize down to 2873
I 2022-10-09 12:12:14: 6_extractCohorts.pl - done processing batch 60
I 2022-10-09 12:12:26: 6_extractCohorts.pl - done processing batch 70
I 2022-10-09 12:12:33: 6_extractCohorts.pl - batchNum=75, adjusting batchSize up to 10342
I 2022-10-09 12:22:39: 3_runVEP.pl - finished parsing/processing chrom chr11
I 2022-10-09 12:33:51: 6_extractCohorts.pl - done processing batch 80
I 2022-10-09 12:42:20: 3_runVEP.pl - finished parsing/processing chrom chr12
I 2022-10-09 12:52:59: 6_extractCohorts.pl - batchNum=90, adjusting batchSize down to 2131
I 2022-10-09 12:53:07: 6_extractCohorts.pl - done processing batch 90
I 2022-10-09 12:53:26: 6_extractCohorts.pl - batchNum=105, adjusting batchSize up to 11365
I 2022-10-09 12:53:28: 6_extractCohorts.pl - done processing batch 100
I 2022-10-09 12:53:54: 3_runVEP.pl - finished parsing/processing chrom chr13
I 2022-10-09 13:05:22: 3_runVEP.pl - finished parsing/processing chrom chr14
I 2022-10-09 13:08:34: 6_extractCohorts.pl - done processing batch 110
I 2022-10-09 13:18:19: 3_runVEP.pl - finished parsing/processing chrom chr15
I 2022-10-09 13:23:15: 6_extractCohorts.pl - batchNum=120, adjusting batchSize down to 3176
I 2022-10-09 13:23:32: 6_extractCohorts.pl - done processing batch 120
I 2022-10-09 13:31:43: 3_runVEP.pl - finished parsing/processing chrom chr16
I 2022-10-09 13:36:54: 6_extractCohorts.pl - done processing batch 130
I 2022-10-09 13:36:58: 6_extractCohorts.pl - batchNum=135, adjusting batchSize down to 1929
I 2022-10-09 13:37:15: 6_extractCohorts.pl - done processing batch 140
I 2022-10-09 13:37:21: 6_extractCohorts.pl - batchNum=150, adjusting batchSize up to 12077
I 2022-10-09 13:37:47: 6_extractCohorts.pl - done processing batch 150
I 2022-10-09 13:45:20: 3_runVEP.pl - finished parsing/processing chrom chr17
I 2022-10-09 13:58:15: 6_extractCohorts.pl - done processing batch 160
I 2022-10-09 13:58:23: 3_runVEP.pl - finished parsing/processing chrom chr18
I 2022-10-09 14:04:31: 6_extractCohorts.pl - batchNum=165, adjusting batchSize down to 3704
I 2022-10-09 14:17:30: 3_runVEP.pl - finished parsing/processing chrom chr19
I 2022-10-09 14:22:20: 6_extractCohorts.pl - done processing batch 170
I 2022-10-09 14:22:35: 6_extractCohorts.pl - batchNum=180, adjusting batchSize down to 1708
I 2022-10-09 14:22:42: 6_extractCohorts.pl - done processing batch 180
I 2022-10-09 14:22:51: 6_extractCohorts.pl - batchNum=195, adjusting batchSize up to 15372
I 2022-10-09 14:22:53: 6_extractCohorts.pl - done processing batch 190
I 2022-10-09 14:23:07: 3_runVEP.pl - finished parsing/processing chrom chr20
I 2022-10-09 14:30:01: 3_runVEP.pl - finished parsing/processing chrom chr21
I 2022-10-09 14:34:29: 6_extractCohorts.pl - done processing batch 200
I 2022-10-09 14:34:31: 3_runVEP.pl - finished parsing/processing chrom chr22
I 2022-10-09 14:43:06: 1_filterBadCalls.pl - ALL DONE, completed successfully!
I 2022-10-09 14:43:06: 3_runVEP.pl - finished parsing/processing chrom chrX
I 2022-10-09 14:43:06: 2_sampleData2genotypes.pl - ALL DONE, completed successfully!
I 2022-10-09 14:49:24: 3_runVEP.pl - finished parsing/processing chrom chrY
I 2022-10-09 14:49:43: 3_runVEP.pl - finished parsing/processing chrom chrM
I 2022-10-09 14:50:19: 3_runVEP.pl - ALL DONE, completed successfully!
I 2022-10-09 14:50:20: 4_vcf2tsv.pl - ALL DONE, completed successfully!
I 2022-10-09 14:50:20: 5_addGTEX.pl - ALL DONE, completed successfully!
I 2022-10-09 14:50:36: 6_extractCohorts.pl - ALL DONE, completed successfully!
I 2022-10-09 14:50:37: 7_filterAndReorderAll.pl - starting to run
E 7_reorderColumns.pl: some newOrder titles were not found: GTEX_testis_RATIO
I 2022-10-09 14:50:38: 7_filterAndReorderAll.pl - ALL DONE, completed successfully!
I 2022-10-09 14:50:39: 8_extractSamples.pl - starting to run
Use of uninitialized value $header in scalar [chomp]at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/8_extractSamples.pl line 161.
Use of uninitialized value $header in split at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/8_extractSamples.pl line 162.
E: 8_extractSamples.pl - couldn't find OMF_HV or OMF_OTHERCAUSE_HV in header of infile OMF.csv
I 2022-10-09 14:50:40: 8_extractSamples.pl - ALL DONE, completed successfully!
I 2022-10-09 14:50:42: 8_extractTranscripts.pl - starting to run
Use of uninitialized value $header in scalar chomp at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/8_extractTranscripts.pl line 213.
Use of uninitialized value $header in split at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/8_extractTranscripts.pl line 214.
E 8_extractTranscripts.pl: couldn't find one of HV/HET/OCHV/OCHET for OMF
I 2022-10-09 14:50:43: 8_addPatientIDs.pl - starting to run
I 2022-10-09 14:50:43: 8_addPatientIDs.pl - ALL DONE, completed successfully!
I 2022-10-09 14:50:43: 9_requireUndiagnosed.pl - starting to run
Use of uninitialized value $header in scalar chomp at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/9_requireUndiagnosed.pl line 66.
Use of uninitialized value $header in split at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/9_requireUndiagnosed.pl line 67.
E: 9_requireUndiagnosed.pl couldn't find one of HV/HET for OMF
I 2022-10-09 14:50:45: 8_addPatientIDs.pl - starting to run
I 2022-10-09 14:50:46: 8_addPatientIDs.pl - ALL DONE, completed successfully!
I 2022-10-09 14:50:46: 7_filterAndReorderAll.pl - starting to run
Use of uninitialized value $header in scalar chomp at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/7_filterVariants.pl line 69.
Use of uninitialized value $header in concatenation (.) or string at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/7_filterVariants.pl line 70.
Use of uninitialized value $header in split at /home/data/zlsz_01/CLN_dir/grexomePIP/grexome-TIMC-Secondary/7_filterVariants.pl line 73.
E 7_filterVariants.pl: title CANONICAL required by script but missing, some VEP columns changed?
I 2022-10-09 14:50:47: 7_filterAndReorderAll.pl - ALL DONE, completed successfully!
I 2022-10-09 14:50:49: 10_qc_checkCausal.pl - starting to run
I 2022-10-09 14:50:49: 10_qc_checkCausal.pl - ALL DONE, completed successfully!
I 2022-10-09 14:50:49: clnsec_.pl - ALL DONE, completed successfully!
Looking forward your reply! Just Thanks a lot for your help with VEP and Transcript problems!!!
Best wishes,
Chris
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.