ripkelab / ricopili Goto Github PK

View Code? Open in Web Editor NEW

42.0 27.0 22.0 1.55 MB

Main ricopili repo for public releases

Perl 97.30% Shell 2.45% Roff 0.25%

ricopili's People

Contributors

Stargazers

Watchers

ricopili's Issues

rp_config doesn't create clumper_info

The list of logs created by ./rp_config (line 1491) is missing the clumper_info log used by clump_nav3. Adding $conf{loloc}/clumper_info to that line of rp_config should fix that issue.

Relatedly, clump_navi references $homedir/clumper_info. Since this specifies the home directory, it may or may not end up being the same file clump_nav3's log. Not clear whether it's desired to have these two clumping scripts use the same log? In either case, the log for clump_navi should probably exist in the designated $loloc rather than the home directory.

(Credit @ce-carey for encountering this issue.)

Imputation crashes if genomic chunks are not quite empty

When dividing data into the ~950 genomic chunks for imputation, it is possible to end up with chunks that contain very few genotyped markers (i.e. <5), particularly near centromeres and at the ends of chromosomes. As long as at least 1 genotyped SNP is in the chunk, ricopili will try to phase and impute. During pre-phasing, if there are any individuals that are missing for all SNPs in a chunk, shapeit will exit with an error. When there are very few SNPs in the chunk it is likely that at least one individual will be completely missing and phasing will fail. Ricopili will eventually error out once it realizes it is stuck trying to phase those chunks.

Current workaround:

Identify failing chunks from the job list of the last phasing attempt (e.g. ./pi_sub/preph_mu1.job_list.s) and exclude them from imputation with --refiex flag. The cause of the error can be verified in shapeit log file ./pi_sub/haps_*/plink.*.chrX_XXX_XXX.shape.log for relevant chunk.

Possible solutions:

set minimum number of SNPs in chunk to count as not empty for imputation (maybe 10?)
directly check that no IDs are missing for all SNPs in chunk

wget 'illegal option' in my.wget

Hi, I noticed a likely bug in the most recent ricopili (2019_Jun_25.001). The my.wget script calls wget with a flag that used to be -nv (for --no-verbose), but now is just -n. Since there is no -n option in wget, it gets joined with the first letter of the URL, which was 'h' for me, into -nh. Since wget knows no -nh option either, it then stops.

ricopili/rp_bin/my.wget

Line 18 in 757fb9d

my $sc =system ("wget -n $ftp_name/$vcf_name");

Change genome build of standard ricopili daner format?

I am working on generating polygenic risk scores from base data that is in daner format with a hg19 genome build. My target data genome build is hg38. My base data daner file is smaller so I was hoping to lift it to hg38 to match the builds. Is there an essay way to do this for the daner format? I've been trying to find something that will work with the daner format.

trio flag broken for pcaer

pcaer_20 reliably fails at step me300 when specifying --trio.

The cause of the error is that the trio flag defines a trio_ex_file to be used with as ie_file (at line 978-981) which is passed to bcomb at line 1088, but the code to create that file (line 1049-1054) has been commented out by the if(0) on the block starting line 1022.

Currently unclear to me whether the fix is to re-activate the code creating the trio_ex_file or to stop that argument from being passed to bcomb.

Long SNP names crash PCA

Excessively long SNP names are not supported by eigensoft/smartpca, meaning smartpca - and thus pcaer_20 - will crash if long SNP names are present (e.g. >20 characters). This is entirely analogous to the limits on lengths of FID/IID, just much rarer since SNP names tends to be relatively compact and don't have extra length from the ricopili id tager.

This crash is currently being observed with unusually named indels (e.g. 1:6638843-CGCTGCTGCT-CGCTGCTGCTGCTGCTGCT). This permits at least 2 plausible solutions:

Remove indels from consideration when running PCA. This has the advantage of being a relatively simple filter to add as part of the "strict qc" filters before running smartpca, e.g. by adding --snps-only no-DI to the plink call at line 1178 of pcaer_20.
Do a temporary substitution/shortening of SNP names prior to running smartpca. This is more work, but allows indels to stay in the data so that the overall PCA workflow remains the same. Shortening of the SNP names could be done at lines 400-415 of eigen_pca3 while cM distances are being removed. I'm not sure any output that would need translation back (e.g. SNP weights from PCA solution) is being kept currently.

(thanks to @chiayenchen for identifying this issue)

clump_nav3 treats SNPs with lower case alleles as indels

If clump_nav3 is run with the --noindel flag set, SNPs with lower case alleles will be excluded. This occurs because indels are identified by checking whether the lengths of A1 and A2 are 1 and whether A1 and A2 are each one of A,C,G, or T (lines 719-724).

This likely doesn't make a difference for the usage of clump_nav3 within ricopili, but may impact users running clump_nav3 as a stand-alone.

Proposed fix:
Change the regexes in lines 722 and 723 to check =~ /[ACGTacgt]/ rather than =~ /[ACGT]/.

PCA summary plots rely on fam file over phenotype

Plots generated by pcaer (e.g. MDS plots, summary plot of PC association to phenotype) use phenotypes from the individuals FID tag (cas, con, mis) rather than from the fam file. This means that any updates to the phenotype after initial QC (e.g. for alternative outcomes, updated phenotyping, etc) that are applied to the fam file will not be reflected in the PCA output.

This is particularly problematic for the summary pdf, since the associations between the original phenotype (indicated in the FID) and the PCs will be reported rather than association with the updated/alternative phenotype, which may be misleading when using the pcaer output to identify necessary PC covariates for GWAS with the updated phenotype.

Affected code is in pca_plot_5 with the creation of the $mds_ow_file and subsequent use of the CC column in R for plotting (lines 350-600).

Cannot tell if RP install succeeded...expected files not written

Hello,

I'm trying to install Ricopili on a linux HPC cluster using the directions found here:
https://docs.google.com/document/d/14aa-oeT5hF541I8hHsDAL_42oyvlHRC5FWR7gir4xco

I've gotten to the point of running "rp_test_navi" and waiting for output. The pipeline completes but doesn't write expected files. The main error seems to be that it's trying to submit a job from a compute node and it also can't understand "batch_job_output_jid" in my custom config file (honestly, I can't understand what this should be either). See below for details on these errors.

Any help is appreciated. I think I'm close to having RP working, but I need some guidance. Thanks!

This is what I get after running the test module (I removed the boilerplate "RICOPILI" text for clarity):

mchiment@argon-login-1: test_2$ rp_test_navi
Config_file: /Users/mchiment/ricopili.conf

rp_test_navi - module of ricopili pipeline
version: 2019_Feb_18.001

https://sites.google.com/a/broadinstitute.org/ricopili/home
Stephan Ripke: [email protected]

testing email program
!!Warning!! : No mutt command available, trying mail
mail found in /usr/bin

also test scratchdir, ldscore, starting R

touch.1.finished not found, need to start job
touch.2.finished not found, need to start job
touch.3.finished not found, need to start job
touch.4.finished not found, need to start job
touch.5.finished not found, need to start job
touch.6.finished not found, need to start job
touch.7.finished not found, need to start job
touch.8.finished not found, need to start job
touch.9.finished not found, need to start job
touch.10.finished not found, need to start job
touch.11.finished not found, need to start job
touch.12.finished not found, need to start job
touch.13.finished not found, need to start job
touch.14.finished not found, need to start job
touch.15.finished not found, need to start job
touch.16.finished not found, need to start job
touch.17.finished not found, need to start job
touch.18.finished not found, need to start job
touch.19.finished not found, need to start job
touch.20.finished not found, need to start job
Config_file: /Users/mchiment/ricopili.conf
starting job_array, j.heavycomp.test
batch_taskid: $SGE_TASK_ID
this is the job_array sent: my.start_job --parn 16 -n $SGE_TASK_ID --jobfile heavycomp.job_list
Config_file: /Users/mchiment/ricopili.conf
stdout from array submission: Your job-array 7447606.1-2:1 ("heavycomp.test") has been submitted
dependent job ID:7447606
starting motherscript, depending on 7447606

20 jobs successfully submitted
please see tail of /Users/mchiment/ricopili/test_info for regular updates
also check bjobs -w for running jobs
possibly different command on different computer cluster: e.g. qstat -u USER
you will be informed via email if errors or successes occur

This looks OK, I think...but the manual says I should see these files:

rp_test_forest_join-nup.pdf (a merge of all PDFs)
rp_text.xls (a mock excel file)
ldsc.PGC_meta.r4.gz.tar.gz (the results from the LDScore analyses

My output (working) directory looks like this after running the test:

drwxr-x--- 2 mchiment its-rs-user 8 Apr 4 13:51 errandout
-rw-r--r-- 1 mchiment its-rs-user 271 Apr 4 13:50 heavycomp.job_list
-rwxr--r-- 1 mchiment its-rs-user 252 Apr 4 13:50 heavycomp.job_list.sub1.sh
-rwxr--r-- 1 mchiment its-rs-user 69 Apr 4 13:50 heavycomp.job_list.sub2.sh
-rw-r--r-- 1 mchiment its-rs-user 123 Apr 4 13:50 j.heavycomp.test
-rw-r--r-- 1 mchiment its-rs-user 67 Apr 4 13:50 j.heavycomp.test.id
-rw-r--r-- 1 mchiment its-rs-user 192 Apr 4 13:50 j.heavycomp.test.log
-rw-r--r-- 1 mchiment its-rs-user 120 Apr 4 13:51 j.qqplot.test
-rw-r--r-- 1 mchiment its-rs-user 0 Apr 4 13:51 j.qqplot.test.id
-rw-r--r-- 1 mchiment its-rs-user 186 Apr 4 13:51 j.qqplot.test.log
-rw-r--r-- 1 mchiment its-rs-user 69 Apr 4 13:50 j._te_test
-rw-r--r-- 1 mchiment its-rs-user 49 Apr 4 13:50 j._te_test.id
-rw-r--r-- 1 mchiment its-rs-user 185 Apr 4 13:50 j._te_test.log
-rw-r--r-- 1 mchiment its-rs-user 33K Apr 4 13:50 PGC_cohort1.ch.fl.r4.gz
-rw-r--r-- 1 mchiment its-rs-user 34K Apr 4 13:50 PGC_cohort2.ch.fl.r4.gz
-rw-r--r-- 1 mchiment its-rs-user 34K Apr 4 13:50 PGC_cohort3.ch.fl.r4.gz
-rw-r--r-- 1 mchiment its-rs-user 34K Apr 4 13:50 PGC_cohort4.ch.fl.r4.gz
-rw-r--r-- 1 mchiment its-rs-user 54K Apr 4 13:50 PGC_meta.r4.gz
-rw-r--r-- 1 mchiment its-rs-user 717 Apr 4 13:51 qqplot.job_list
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.10.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.11.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.12.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.13.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.14.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.15.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.16.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.17.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.18.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.19.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.1.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.20.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.2.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.3.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.4.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.5.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.6.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.7.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.8.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.9.finished

Looking in "errandout" we have from the logs:

mchiment@argon-login-1: errandout$ cat _te_test.e7447609
Unable to run job: denied: host "argon-lc-g11-16.hpc" is not a submit host
Exiting

AND

mchiment@argon-login-1: errandout$ cat _te_test.o7447609
Config_file: /Users/mchiment/ricopili.conf

.......testing email program....
!!Warning!! : No mutt command available, trying mail
mail found in /usr/bin

also test scratchdir, ldscore, starting R

succ: touch.1.finished
succ: touch.2.finished
succ: touch.3.finished
succ: touch.4.finished
succ: touch.5.finished
succ: touch.6.finished
succ: touch.7.finished
succ: touch.8.finished
succ: touch.9.finished
succ: touch.10.finished
succ: touch.11.finished
succ: touch.12.finished
succ: touch.13.finished
succ: touch.14.finished
succ: touch.15.finished
succ: touch.16.finished
succ: touch.17.finished
succ: touch.18.finished
succ: touch.19.finished
succ: touch.20.finished
PGC_cohort1.ch.fl.r4.gz.out-qq.pdf not found, need to start job
PGC_cohort2.ch.fl.r4.gz.out-qq.pdf not found, need to start job
PGC_cohort3.ch.fl.r4.gz.out-qq.pdf not found, need to start job
PGC_cohort4.ch.fl.r4.gz.out-qq.pdf not found, need to start job
Config_file: /Users/mchiment/ricopili.conf
starting job_array, j.qqplot.test
batch_taskid: $SGE_TASK_ID
this is the job_array sent: my.start_job --parn 16 -n $SGE_TASK_ID --jobfile qqplot.job_list
qsub -l h_vmem=2g -l h_rt=2:00:00 -tc 0 -t 1-1 -o /Users/mchiment/temp/test_ricopili/test_2/errandout -e /Users/mchiment/temp/test_ricopili/test_2/errandout -N qqplot.test j.qqplot.test > j.qqplot.test.id
->system call failed: 1
Config_file: /Users/mchiment/ricopili.conf
stdout from array submission:
dependent job ID:

Error: something seems wrong with entry batch_job_output_jid in your custom file
please revisit installation process
Something is very strange since the array submission output is empty as well

5 jobs successfully submitted
please see tail of /Users/mchiment/ricopili/test_info for regular updates
also check bjobs -w for running jobs
possibly different command on different computer cluster: e.g. qstat -u USER
you will be informed via email if errors or successes occur

Config custom file:

mchiment@argon-login-1: rp_bin$ cat rp_config.custom.txt

for details please refer to https://docs.google.com/document/d/14aa-oeT5hF541I8hHsDAL_42oyvlHRC5FWR7gir4xco/edit?usp=sharing

and https://docs.google.com/spreadsheets/d/1LhNYIXhFi7yXBC17UkjI1KMzHhKYz0j2hwnJECBGZk4/edit?usp=sharing

variable_name variable_value

rp_dependencies_dir /Users/mchiment/ricopili/dependencies
R_packages_dir /Users/mchiment/R/x86_64-pc-linux-gnu-library/3.5
starting_R module_SPACE_load_SPACE_R
path_to_Perlmodules /Users/mchiment/ricopili/dependencies
path_to_scratchdir /localscratch/Users/mchiment
starting_ldsc source_SPACE_activate_SPACE_ldsc;SPACE_python_SPACE/Users/mchiment/ricopili/dependencies/ldsc
ldsc_reference /Users/mchiment/ricopili/dependencies/ldsc
rp_user_initials msc
rp_user_email [email protected]
rp_logfiles ~/ricopili

---- jobarray and queueing parameters:

batch_jobcommand qsub
batch_memory_request -l_SPACE_h_vmem=XXXg
batch_walltime -l_SPACE_h_rt=HH:MM:SS
batch_array -t_SPACE_1-XXX
batch_max_parallel_jobs_per_one_array -tc_SPACE_YYY
batch_jobfile XXX
batch_name -N_SPACE_XXX
batch_stdout -o_SPACE_XXX
batch_stderr -e_SPACE_XXX
batch_job_dependency -hold_jid_SPACE_XXX
batch_array_task_id $SGE_TASK_ID
batch_other_job_flags NONE
batch_job_output_jid Your_SPACE_job-array_SPACE_XXX.1-YYY:1_SPACE("ZZZ")_SPACE_has_SPACE_been_SPACE_submitted
batch_ncores_per_node 56
batch_mem_per_node 32

Unknown file type used for job bcftools_impute2 in script my.prepvcf_imp2

Using latest rp_bins (2019_Jun_25.001).
When running refdir_navi for building a reference, (after fixing #63 and #62 manually) in the job bcftools_impute2 my.prepvcf_imp2 is called.

my.prepvcf_imp2 then tries to call
[root@06f061df1b71 1KGPv3_REFERENCE_PANEL]# /ricopili/rp_dep/bcftools/bcftools-1.9_bin/bcftools convert --vcf-ids -h ALL_v5a.20130502.chr22_1KG_0517.impute loc.ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.filtnorm.gz.reform.gz

That command fails with
Failed to open loc.ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.filtnorm.gz.reform.gz: unknown file type Error systemcode: 65280

daner Direction field potentially unreliable

Summary

In some cases metaber7 flips the sign of the meta-analysis result to keep the effect for the intended A1/A2, but does not also flip signs in the Direction field.

Details

The workflow for metaber7 is roughly:

Built a metal script for the meta-analysis
Read through each input GWAS file to build an index of descriptive info on each variant (e.g. chr/bp, a1/a2, info, freq). For the sake of tracking frequency, A1/A2 are taken from the first time the variant is encountered.
Create a "clean" copy of each input GWAS (excludes extreme ORs and SNPs with A1/A2 discordant with the indexed A1/A2.
Run meta-analysis using metal with the clean GWASs
Read through metal results, and write daner output with the meta-analysis results plus the info that metal isn't set to track (case/control freq, info, sample size, etc).

For step 5, it is possible that the order of A1/A2 reported by metal is not the order of A1/A2 indexed by metaber7 in step 2 (see Background below). In order to ensure that the allele frequency information added to the final output is correct, metaber7 checks for this swap. If the alleles have been swapped, metaber7 flips the sign of the effect in the meta-analysis results to match the A1/A2 order indexed in step 2. This maintains A1/A2 in the order indexed from the output GWAS, and reports the corresponding freq and effect size.

This is all reasonable, but the meta-analysis results also include a "Direction" field that reports the sign of the effect from each cohort in the meta-analysis. This field gets passed through directly from the metal output, and thus always reports the sign for the A1/A2 ordering in metal even when the the final output is reported a swapped order with the sign of the effect size flipped.

Example

Adapted from observed data, with numbers rounded/obscured to protect actual results

Dataset 1:

CHR SNP BP A1 A2 FRQ_A_500 FRQ_U_10000 INFO OR SE P ngt
1 rs123456789 87654321 G T 0.01000000000000000 0.02000000000000000 0.90000 1.0100000000000 0.3000000 0.9000000 0

Dataset 2:

CHR SNP BP A1 A2 FRQ_A_1500 FRQ_U_10000 INFO OR SE P ngt
1 rs123456789 87654321 G T 0.01000000000000000 0.02000000000000000 0.90000 0.9500000000000 0.2000000 0.7000000 0

Metal output:

MarkerName	Allele1	Allele2	Effect	StdErr	P-value	Direction	HetISq	HetChiSq	HetDf	HetPVal
rs123456789	t	g	0.0900	0.2000	0.500	-+	0.0	0.100	1	0.7000

Note: effect of T allele is positive, with Direction "-+" in the two cohorts, matching the above input with A1/A2 swapped.

Daner output:

CHR	SNP	BP	A1	A2	FRQ_A_5000	FRQ_U_10000	INFO	OR	SE	P	ngt	Direction	HetISqt	HetDf	HetPVa	Nca	Nco	Neff
1	rs123456789	87654321	G	T	0.01000	0.02000	0.900	0.9000	0.2000	0.5000	0	-+	0.0	1	0.7000	2000	20000	8000.00

Note: A1/A2 swapped to match input GWAS, and reports negative effect (OR < 1) of G allele. Direction field still reports "-+", which is result for T allele rather than G allele.

Possible Solutions

The relevant code block in metaber7 for step 5 is lines 923-1056. Should be relatively straight-forward to fix in one of two ways:

Update the Direction field when flipping the effect size from metal. This update could be done along the effect size flip at line 956. It would require parsing the Direction field string to make the relevant substitutions. Printing this updated string would occur at line 1019 (in place of $cells[6]).
Take A1/A2 order as reported by metal, and flip the indexed allele frequency to match. In place of the effect size flip at 956, frequencies fca{}/nca{} and fco{}/nco{} currently printed lins 996-1000 wound need to be updated. Also, the A1/A2 from metal (e.g. $a1_name) would need to be printed rather than the indexed alleles (e.g. $a1{$snp_name}) at line 975-976.

Not sure which of these might be preferred based on any impacts on downstream analysis/processing.

Additional Background

This issue arises in part due to strange allele swapping behavior from metal. While metal documentation states that the reported Allele1 is the first allele from the first occurrence of the marker in the input files, it appears this isn't actually the case. This alternative instruction file instead says the "lowest numeric reference allele" with be used if the input files differ, which seems closer to accurate, but still not quite right.

The relevant code block from the metal source code is:

bool FlipAlleles(String & al1, String & al2, double & effect, double & freq)
   {
   al1.ToLower();
   al2.ToLower();

   if (al1 > al2)
      {
      al1.Swap(al2);
      effect *= -1.0;
      freq = 1.0 - freq;
      }

   if (al1 == "a" || al1 == "c" && al2 == "g")
      return false;

   FlipAllele(al1);
   FlipAllele(al2);

   if (al1 > al2)
      {
      al1.Swap(al2);
      effect *= -1.0;
      freq = 1.0 - freq;
      }

   return true;
   }

This gets run for the first occurrence of every marker.

If it returns true, then FlipAllele(al1) and FlipAllele(al2) are run again prior to final output. Because this code actually alters al1 and al2, any switching from al1.Swap(al2) will propagate though to metal results files.

By my read, this means markers where the first occurrence has alleles C/A, C/T, G/A, G/C, G/T, and T/A markers will have their allele order swapped, while A/C, A/G, A/T, C/G, T/C, and T/G markers will have their allele orders preserved. Looking through some results from metal seems to confirm this. It's also consistent with the motivating example above that swaps G/T to T/G.

Imputation changes missing phenotypes to controls

In haps2dos4 lines 237-248, the fam file is processed to force all phenotypes to 1 or 2. The reason for this is unclear, but regardless this altered phenotype is propagated through the remainder of imputation.

As a result any individuals with non-case/control phenotypes, importantly including missing phenotypes of 0/-9, are recoded to cases. Unless an alternative phenotype is specified, this changed phenotype will also be used for GWAS.

danscore_multiplot_2 expects configfile at $HOME/ricopili.conf and not at $RPHOME/ricopili.conf

danscore_multiplot_2 expects the config file to be located at $HOME/ricopili.conf

ricopili/rp_bin/danscore_multiplot_2

Line 9 in 757fb9d

my $conf_file = $ENV{HOME}."/ricopili.conf";

Instead it should be located at $RPHOME/ricopili.conf.
my $conf_file = $ENV{RPHOME}."/ricopili.conf";

Syntax error in filtnorm_filter

A perl syntax error was recently introduced in filtnorm_filter here:

ricopili/rp_bin/filtnorm_filter

Line 147 in 757fb9d

print "$icc lines read\n" if ($icc % 10000 == 0) if ($debug);

Either of
print "$icc lines read\n" if ($icc % 10000 == 0 && $debug);
or
do {print "$icc lines read\n" if ($icc % 10000 == 0)} if ($debug);
should work better.

No access to dependencies on LISA

Hi,

I'm trying to add a fresh install of ricopili on LISA, but I am running into permission errors for accessing the dependencies in /home/pgcdac/rp_bin_1018/dependencies.

~~This is true on multiple accounts, including on accounts where ricopili is already installed~~

Please can you check whether previous permission settings on this account have been lost? Do users need to be members of a specific permission group to be able to use these dependencies on LISA?

Thanks

*EDIT: Existing installation does work correctly (error in the config had slipped in).

Error when running preimp_dir

Hello,
I was able to run rp_test_navi successfully, but I get an error when I attempt the following command to analyze a single data set (PLINK files). I try to running the following command:
preimp_dir --dis nan --pop mix --out GO_Quad_5removed_ATGC_filtered --maf 0.01
The first time I run the command it produces nan.names, for which I then change the value in the first column to "pndc1" and save it. I then run the same command as above once again:
preimp_dir --dis nan --pop mix --out GO_Quad_5removed_ATGC_filtered --maf 0.01

And it causes an error, here is the command line output of the command

-----------------------------------------------------
switched on SERIAL mode because of configuration file

       _                 _ _ _
  _ __(_) ___ ___  _ __ (_) (_)
 | '__| |/ __/ _ \| '_ \| | | |
 | |  | | (_| (_) | |_) | | | |
 |_|  |_|\___\___/| .__/|_|_|_|
                  |_|

#######################################################################
#######################################################################
##                                                                  ###
##   preimp_dir    - module of ricopili pipeline                    ###
##                      version: 2019_Oct_15.001                     ###
##                                                                  ###
## https://sites.google.com/a/broadinstitute.org/ricopili/home      ###
## Stephan Ripke: [email protected]                         ###
##                                                                  ###
#######################################################################
#######################################################################



Warning:
--------
GO_Quad_5removed_ATGC_filtered.fam: less than 10 cases and/or less than 10 controls, so will do only restricted QC (without association)

Running job: plague
-----------------------------------------------------
switched on SERIAL mode because of configuration file

       _                 _ _ _
  _ __(_) ___ ___  _ __ (_) (_)
 | '__| |/ __/ _ \| '_ \| | | |
 | |  | | (_| (_) | |_) | | | |
 |_|  |_|\___\___/| .__/|_|_|_|
                  |_|

#######################################################################
#######################################################################
##                                                                  ###
##   preimp_dir    - module of ricopili pipeline                    ###
##                      version: 2019_Oct_15.001                     ###
##                                                                  ###
## https://sites.google.com/a/broadinstitute.org/ricopili/home      ###
## Stephan Ripke: [email protected]                         ###
##                                                                  ###
#######################################################################
#######################################################################



Warning:
--------
GO_Quad_5removed_ATGC_filtered.fam: less than 10 cases and/or less than 10 controls, so will do only restricted QC (without association)

nan_pndc1_mix_PR with limited QC (no association)
Running job: qc
restricted QC
-------------
Value "-0.4,0.4" invalid for option xlimright (number expected)
Illegal division by zero at /data/Peter/Ricopili/rp_bin/meta2flags line 174.
meta2flags --meta pofi_nan_pndc1_mix_PR.meta

->system call failed: 255 at /data/Peter/Ricopili/rp_bin/rep_qc2_14 line 482.
-----------------------------------------------------
switched on SERIAL mode because of configuration file

       _                 _ _ _
  _ __(_) ___ ___  _ __ (_) (_)
 | '__| |/ __/ _ \| '_ \| | | |
 | |  | | (_| (_) | |_) | | | |
 |_|  |_|\___\___/| .__/|_|_|_|
                  |_|

#######################################################################
#######################################################################
##                                                                  ###
##   preimp_dir    - module of ricopili pipeline                    ###
##                      version: 2019_Oct_15.001                     ###
##                                                                  ###
## https://sites.google.com/a/broadinstitute.org/ricopili/home      ###
## Stephan Ripke: [email protected]                         ###
##                                                                  ###
#######################################################################
#######################################################################



Warning:
--------
GO_Quad_5removed_ATGC_filtered.fam: less than 10 cases and/or less than 10 controls, so will do only restricted QC (without association)

nan_pndc1_mix_PR with limited QC (no association)
Running job: qc
##################################################################
##### Error:
##### step qc has been done repeatedly without any progress
##### preimp pipeline stopped
##### /data/PNC/Ricopili_cleaning/phg000381.v2.NIMH_NeurodevelopmentalGenomics.genotype-calls-matrixfmt.Human610-Quadv1_B.c1.GRU-NPU/preimputation  preimp_dir --dis nan --pop mix --out GO_Quad_5removed_ATGC_filtered --maf 0.01
##### if reason does not appear obvious
##### have a look at the wiki page
##### https://sites.google.com/a/broadinstitute.org/ricopili/
##### or contact the developers
##### version: 2019_Oct_15.001
##################################################################

The PLINK data set I use has "-9" for all values in the phenotype column. It also has a value of "0" for all family, mother, and father IDs. Also it has "0" for some alleles in the BIM file. However, I do not think those are the issues because I tried changing the family IDs to real values and removing the SNPs with "0" as alleles from the data and the same problem continues.

Building a docker image with ricopili

The current installation process seems to demand interactive input from the user, but this makes it impossible to automatically install it in a docker image.
Would it be possible to add a configuration option that predefines all paths and other requirements to make it possible to install ricopili into a docker image? It could be something like "./rp_config --use-config config-file.conf" where config-file.conf contains everything that would otherwise have been provided manually by the person installing it. What do you think?

Running multiple jobs prevents detection of stuck jobs

The motherscripts detect failing/stuck tasks by comparing the current call in send_jobarray to that last task queued (as recorded in that master log file for that motherscript). If it's submitting the same job attempted previously, it errors and stops as intended.

This comparison relies on reading the last line of the master log file to match against. If 2+ jobs with the same master log file are running simultaneously (e.g. postimp with different phenotypes, pca with and without reference population samples, etc), the previous task submission may no longer be on the last line of the log file if one of the other jobs has submitted a task in the interim.

In the worst case scenario, 2+ jobs that are all failing can cycle on the same task infinitely, never stopping since they don't see their own previous task on the last line of the log file, and depending on the task potentially generating 100,000s of temp files in the process.

Same log below based on real-world example:

/path/to/datadir    postimp_navi_17 --mds file.mds_cov --coco 1,2,3,4 --out gwas_w_cov --addcov cov.txt danerlong.000921    Thu_Dec_17_13:00:22_2015
/path/to/datadir    postimp_navi_17 --mds file.mds_cov --coco 1,2,3,4 --out gwas_wo_cov danerlong.000921    Thu_Dec_17_13:02:02_2015
/path/to/datadir    postimp_navi_17 --mds file.mds_cov --coco 1,2,3,4 --out gwas_w_cov --addcov cov.txt danerlong.000921    Thu_Dec_17_13:03:37_2015
/path/to/datadir    postimp_navi_17 --mds file.mds_cov --coco 1,2,3,4 --out gwas_wo_cov danerlong.000921    Thu_Dec_17_13:05:05_2015
/path/to/datadir    postimp_navi_17 --mds file.mds_cov --coco 1,2,3,4 --out gwas_w_cov --addcov cov.txt danerlong.000921    Thu_Dec_17_13:06:21_2015
/path/to/datadir    postimp_navi_17 --mds file.mds_cov --coco 1,2,3,4 --out gwas_wo_cov danerlong.000921    Thu_Dec_17_13:07:39_2015
/path/to/datadir    postimp_navi_17 --mds file.mds_cov --coco 1,2,3,4 --out gwas_w_cov --addcov cov.txt danerlong.000921    Thu_Dec_17_13:09:01_2015
/path/to/datadir    postimp_navi_17 --mds file.mds_cov --coco 1,2,3,4 --out gwas_wo_cov danerlong.000921    Thu_Dec_17_13:10:18_2015

install fails for "custom" cluster

Current install is broken for me. It gets stuck while setting the sloc variable.

The latest commit changed the use of the %clusters variable on line 278 of rp_config, and setting $cluster directly.

This messed with the scratch directory creation logic. First, its not recognizing the path_to_scratchdir variable. It looks like this relies on $clusters{custom} == 1. For similar reasons, I get dumped into an infinite loop at line 1324 when bottoming out at the conditional statement there.

I think the fix should be just setting my %clusters = (..."custom", 1) on line 283.

postimp_navi script fails with --gwclump flag because of inconsistent current working directory

When executing the postimp_navi script with the --gwclump flag here

ricopili/rp_bin/postimp_navi

Lines 4579 to 4580 in 757fb9d

    
           if ($gwclump) { 
        
               chdir "$distdir";

the current working directory is changed to $distdir.
However the $numbers_script is then later executed in the $distdir as it does not change to a different directory itself.

ricopili/rp_bin/postimp_navi

Lines 4635 to 4648 in 757fb9d

    
           my $numbers_file = "basic.$outname.num.xls"; 
        
           unless (-e "$distdir/$numbers_file") { 
        
               my $sys = "$numbers_script basic.$outname $daner_all @daner_single"; 
        
           #    print "####################################### DEBUG ###############\n"; 
        
           #    print "$sys\n"; 
        
           #    print "$numbers_file\n"; 
        
           #    print "####################################### DEBUG ###############\n"; 
        
           #    sleep (4); 
        
               &mysystem ($sys); 
        
               &mysystem ("cp $numbers_file $distdir"); 
        
           }

This is a problem because the $numbers_script (more specifically the arguments as these are paths to files) expect to still be in $reportdir. This makes the $numbers_script fail because it cannot find the files it was given as arguments.
I propose always executing chdir "$reportdir"; like this:
`my $numbers_file = "basic.$outname.num.xls";

unless (-e "$distdir/$numbers_file") {

chdir "$reportdir";

my $sys = "$numbers_script basic.$outname $daner_all @daner_single\n";`

QC with a Trio dataset that only has cases

Line 573 dies: die "$! ($prefix.tdt)" unless open FILE, "< $prefix.tdt";

No tdt test is run because of lines 904-906
if ($noassoc) {
$assotdt = "";
}

Fix: Line 925 needs to be moved under the unless ($noassoc) { block

	my $numbers_file = "basic.$outname.num.xls";
	unless (-e "$distdir/$numbers_file") {
	my $sys = "$numbers_script basic.$outname $daner_all @daner_single";

	# print "####################################### DEBUG ###############\n";
	# print "$sys\n";
	# print "$numbers_file\n";
	# print "####################################### DEBUG ###############\n";
	# sleep (4);

	&mysystem ($sys);
	&mysystem ("cp $numbers_file $distdir");

	}

ripkelab / ricopili Goto Github PK

ricopili's People

Contributors

Stargazers

Watchers

Forkers

ricopili's Issues

testing email program !!Warning!! : No mutt command available, trying mail mail found in /usr/bin

also test scratchdir, ldscore, starting R

20 jobs successfully submitted please see tail of /Users/mchiment/ricopili/test_info for regular updates also check bjobs -w for running jobs possibly different command on different computer cluster: e.g. qstat -u USER you will be informed via email if errors or successes occur

.......testing email program.... !!Warning!! : No mutt command available, trying mail mail found in /usr/bin

also test scratchdir, ldscore, starting R

Error: something seems wrong with entry batch_job_output_jid in your custom file please revisit installation process Something is very strange since the array submission output is empty as well

5 jobs successfully submitted please see tail of /Users/mchiment/ricopili/test_info for regular updates also check bjobs -w for running jobs possibly different command on different computer cluster: e.g. qstat -u USER you will be informed via email if errors or successes occur

for details please refer to https://docs.google.com/document/d/14aa-oeT5hF541I8hHsDAL_42oyvlHRC5FWR7gir4xco/edit?usp=sharing

and https://docs.google.com/spreadsheets/d/1LhNYIXhFi7yXBC17UkjI1KMzHhKYz0j2hwnJECBGZk4/edit?usp=sharing

variable_name variable_value

---- jobarray and queueing parameters:

Summary

Details

Example

Possible Solutions

Additional Background

Recommend Projects

Recommend Topics

Recommend Org

testing email program
!!Warning!! : No mutt command available, trying mail
mail found in /usr/bin

20 jobs successfully submitted
please see tail of /Users/mchiment/ricopili/test_info for regular updates
also check bjobs -w for running jobs
possibly different command on different computer cluster: e.g. qstat -u USER
you will be informed via email if errors or successes occur

.......testing email program....
!!Warning!! : No mutt command available, trying mail
mail found in /usr/bin

Error: something seems wrong with entry batch_job_output_jid in your custom file
please revisit installation process
Something is very strange since the array submission output is empty as well

5 jobs successfully submitted
please see tail of /Users/mchiment/ricopili/test_info for regular updates
also check bjobs -w for running jobs
possibly different command on different computer cluster: e.g. qstat -u USER
you will be informed via email if errors or successes occur