Test data is required to code/test against when updating the clinical dictionaries or services.
This repository hosts test data to initialize a program with both clinical and molecular data.
The test data has been designed to test several different uses cases for both molecular and clinical data. The following chart summarizes the different clinical and molecular data states for different donors that are being tested. This dataset can be found in the ARGO QA Environment, with a summary of process data as found in QA RDPC here.
Donor ID | Primary Site | Vital Status | Gender | Clinical Complete | T/N Status | # T | # N | # Primary Diagnosis | # Treatments | # Follow Ups |
---|---|---|---|---|---|---|---|---|---|---|
Donor-1 | Brain | Alive | M | Yes | Paired | 2 | 1 | 1 | 1 | 3 |
Donor-2 | Breast | Alive | F | No | Single | 1 | 0 | 1 | 5 | 5 |
Donor-3 | Esophagus | Alive | F | No | Single | 0 | 1 | 1 | 3 | 3 |
Donor-4 | Esophagus | Alive | M | Yes | Paired | 1 | 1 | 1 | 1 (has multiple) | 4 |
Donor-5 | Pancreas | Deceased | M | Yes | Paired | 1 | 1 | 1 | 1 | 1 |
Donor-6 | Pancreas | Deceased | M | Yes | Paired | 1 | 1 | 1 | 2 | 2 |
Donor-7 | Pancreas | Deceased | M | No | Single | 1 | 0 | 1 | 3 (has multiple) | 3 |
Donor-8 | Pancreas | Alive | F | Yes | Paired | 1 | 1 | 1 | 1 | 2 |
Donor-9 | Colon | Alive | M | No | Single | 0 | 1 | 1 | 1 | |
Donor-10 | Colon | Deceased | M | No | Paired | 1 | 1 | 2 | 2 | 3 |
Replace the program_id
column in all clinical files with the correct Program Code of the program you are working with.
Register the samples using sample_registration.tsv
.
Submit all clinical files to the Clinical Submission UI.
To submit molecular data to a program:
- The program must exist as a song study
- you must have upload permissions for that study
- Configure your Song/Score clients using the correct Song/Score URLS for your environment, and an API Key with either system permissions or upload permissions to your study. A guide to configuration can be found on the ARGO docs site.
- If needed, update the
upload.sh
directory paths in the scripts. By default, this script is configured to work with theTEST-QA
data set with the test data in the structure defined in this repository.
Alignment Parameters: Workflow URL: "https://github.com/icgc-argo/dna-seq-processing-wfs.git"
Sample Parameters:
{
"analysis_id": "c52c6e97-2b13-451c-ac6e-972b13751c86",
"study_id": "ROSI-RU",
"score_url": "https://score.rdpc-qa.cancercollaboratory.org",
"song_url": "https://song.rdpc-qa.cancercollaboratory.org",
"ref_genome_fa": "/nfs-dev-1-vol-qa-1/reference/GRCh38_hla_decoy_ebv/GRCh38_hla_decoy_ebv.fa",
"download": {
"song_cpus": 2,
"song_mem": 2,
"score_cpus": 4,
"score_mem": 10,
"score_url": "https://submission-score.rdpc-qa.cancercollaboratory.org",
"song_url": "https://submission-song.rdpc-qa.cancercollaboratory.org"
},
"cpu": 6,
"mem": 18
}
Workflow Engine Parameters:
{
"revision": "1.5.1"
}
Workflow URL: https://github.com/icgc-argo/sanger-wxs-variant-calling
Sample Workflow Parameters:
{
"study_id": "ROSI-RU",
"tumour_aln_analysis_id": "228a611f-2fff-4e3d-8a61-1f2fffbe3d69",
"normal_aln_analysis_id": "36585e13-6082-4553-985e-136082a55336",
"max_retries": 3,
"first_retry_wait_time": 5,
"cleanup": true,
"song_url": "https://song.rdpc-qa.cancercollaboratory.org",
"score_url": "https://score.rdpc-qa.cancercollaboratory.org",
"download": {
"song_url": "https://song.rdpc-qa.cancercollaboratory.org",
"song_cpus": 2,
"song_mem": 2,
"score_url": "https://score.rdpc-qa.cancercollaboratory.org",
"score_cpus": 3,
"score_mem": 8
},
"sangerWxsVariantCaller": {
"cpus": 4,
"mem": 10,
"exclude": "chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr22,chrX,chrY,chrUn%,HLA%,%_alt,%_random,chrM,chrEBV",
"vagrent_annot": "/nfs-dev-1-vol-qa-1/reference/sanger-variant-calling/VAGrENT_ref_GRCh38_hla_decoy_ebv_ensembl_91.tar.gz",
"ref_genome_tar": "/nfs-dev-1-vol-qa-1/reference/sanger-variant-calling/core_ref_GRCh38_hla_decoy_ebv.tar.gz",
"ref_snv_indel_tar": "/nfs-dev-1-vol-qa-1/reference/sanger-variant-calling/SNV_INDEL_ref_GRCh38_hla_decoy_ebv-fragment.tar.gz"
},
"generateBas": {
"cpus": 2,
"mem": 8,
"ref_genome_fa": "/nfs-dev-1-vol-qa-1/reference/GRCh38_hla_decoy_ebv/GRCh38_hla_decoy_ebv.fa"
},
"repackSangerResults": {
"cpus": 2,
"mem": 4
},
"cavemanVcfFix": {
"cpus": 2,
"mem": 4
},
"prepSangerSupplement": {
"cpus": 2,
"mem": 4
},
"prepSangerQc": {
"cpus": 2,
"mem": 4
},
"extractSangerCall": {
"cpus": 2,
"mem": 4
},
"payloadGenVariantCall": {
"cpus": 2,
"mem": 4
},
"uploadVariant": {
"cpus": 2,
"mem": 4
}
}
Workflow Engine Parameters:
{
"projectDir": "/nfs-dev-1-vol-qa-1/test-projects",
"revision": "main"
}
Workflow URL:
https://github.com/icgc-argo/gatk-mutect2-variant-calling
Sample Workflow Parameters:
{
"study_id": "ROSI-RU",
"tumour_aln_analysis_id": "2abd861d-39fc-4f9f-bd86-1d39fc9f9f6f",
"normal_aln_analysis_id": "a9ba1bc5-ac14-40d7-ba1b-c5ac1490d784",
"song_url": "https://song.rdpc-qa.cancercollaboratory.org",
"score_url": "https://score.rdpc-qa.cancercollaboratory.org",
"publish_dir": "",
"max_retries": 3,
"first_retry_wait_time": 5,
"perform_bqsr": false,
"ref_fa": "/nfs-dev-1-vol-qa-1/reference/GRCh38_hla_decoy_ebv/GRCh38_hla_decoy_ebv.fa",
"mutect2_scatter_interval_files": "/nfs-dev-1-vol-qa-1/reference/gatk-resources/mutect2.scatter_by_chr/chr*.interval_list",
"germline_resource_vcfs": [
"/nfs-dev-1-vol-qa-1/reference/gatk-resources/af-only-gnomad.pass-only.hg38.vcf.gz"
],
"panel_of_normals": "/nfs-dev-1-vol-qa-1/reference/gatk-resources/1000g_pon.hg38.vcf.gz",
"contamination_variants": "/nfs-dev-1-vol-qa-1/reference/gatk-resources/af-only-gnomad.pass-only.biallelic.snp.hg38.vcf.gz",
"mem": 40,
"cpus": 8
}
Workflow Engine Parameters:
{
"projectDir": "/nfs-dev-1-vol-qa-1/test-projects",
"revision": "main"
}