hknahal / icgc17_dcc_submissions Goto Github PK
View Code? Open in Web Editor NEWScripts to migrate previously valid Release 16 submission files to Release 17
Scripts to migrate previously valid Release 16 submission files to Release 17
Migration of valid Release 16 Metadata and Primary Files to Release 17 (Dictionary version 0.9a) IMPORTANT NOTES: ================ - These scripts will overwrite the original submission files - These scripts can only be used on valid Release 16 submission files - One submission files are processed, please upload to DCC submission system and validate. - Files must be valid against current dictionary in order to be included in Release 17 Contents: ========= After archive is decompressed, a folder will be created called "migrate_icgc16_to_icgc17" with the following contents: 1. remove_columns.sh A simple bash script that will modify validated Release 16 submission files to structurally conform to Release 17 data specifications by removing columns no longer required: - Remove 'analyzed_sample_type' and 'analyzed_sample_type_other' fields from 'sample' file - Remove 'note' column from all files For more information, please see Release Notes available at http://docs.icgc.org/dictionary 2. specimen_type_mapping.py This Python script will map specimen_type DCC terms used in Release 16 to the current specimen_type codelist in Release 17. 3. mapping_specimen_type.pl An alternative Perl script that will perform the same function as the Python specimen_type_mapping.py script. 4. oldDCC_to_newDCC_specimen_type_mapping.txt A tab-delimited file consisting of a mapping between the old DCC specimen_type terms and the current DCC specimen_type terms. For more details, please review documentation at http://docs.icgc.org/specimen-type-mapping Requirements: ============= - Linux - If using Python script specimen_type_mapping.py: - at least Python 2.7 Python Modules: - gzip https://docs.python.org/2/library/gzip.html - bz2 https://docs.python.org/2/library/bz2.html - fnmatch https://docs.python.org/2/library/fnmatch.html - collections https://docs.python.org/2/library/collections.html - If using Perl script mapping_specimen_type.pl: Perl Modules: - PerlIO::gzip http://search.cpan.org/~majensen/PerlIO-via-gzip-0.021/lib/PerlIO/via/gzip.pm - PerlIO::via::Bzip2 http://search.cpan.org/~arjen/PerlIO-via-Bzip2-0.02/lib/PerlIO/via/Bzip2.pm Instructions: ============= 1. Run remove_columns.sh bash script to remove columns no longer needed To execute: If running the script from the same folder as submission files (only Release 16 files should be in this folder): > ./remove_columns.sh If running script outside of the submission file directory, you can use the "-i" flag to specify the directory where the Release 16 files reside: > ./remove_columns.sh -i <name of directory where Release 16 submission files are located> 2. Convert old DCC specimen_type terms to new DCC specimen_type terms. Both a Perl version and a Python version are offered to perform this conversion - you only need to use one! (ie. if you are a Perl user, use can use the Perl version) To execute: - If using Python script (specimen_type_mapping.py): > python specimen_type_mapping.py -i <name of input directory> - Alternatively, if you prefer the Perl script (mapping_specimen_type.pl): > perl mapping_specimen_type.pl <name of input directory> IMPORTANT NOTES: In most cases there is a one-to-one mapping between Release 16 specimen_type terms and Release 17 specimen_type terms. However, there are two previously used terms ("bone marrow" and "lymph node") which are ambigious (could be either normal or tumour) and will require the submitter to specify/convert these manually. Such cases will be reported in the DCC_specimen_type_mapping.log log file. Example: If the specimen_type was specified as "7" (lymph node) in Release 16, you will need to chagne it to a more specific specimen type, either 107 (Normal - lymph node) or 119 (Metastatic tumour - lymph node) _______________________________________________________________________________________________________________________ | | | | | | Old specimen_type: | Old DCC codelist ID: | New specimen_type: | New DCC Codelist ID: | |=======================|=======================|===============================================|======================| | lymph_node | 7 | Normal – lymph node | 107 | | | | Metastatic tumour – lymph node | 119 | |_______________________|_______________________|_______________________________________________|______________________| | bone marrow | 6 | Normal – bone marrow | 103 | | | | Primary tumour – blood derived (bone marrow) | 111 | | | | Recurrent tumour – blood derived (bone marrow)| 116 | |_______________________|_______________________|_______________________________________________|______________________| 3. Once files are converted, upload submission files to submission system at http://submissions.dcc.icgc.org and validate. If files are valid, please sign-off on your submission for it be included in Release 17 (** before submission deadline August 15th). For questions or concerns, please contact [email protected]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.