Giter Site home page Giter Site logo

ibvl's Introduction

IBVL

Repo organization

This repo is intended for Wasserman lab members working on data processing for IBVL.

The repo is organized in sub-folders depending on the different aspects of the data processing.

Metadata tracking

Concerning metadata tracking (tracking ofinformation associated with each sample).

One of the considered tool to track metadata is OpenCGA, refer to the openCGA folder for more information.

Nextflow Scripts

Concerning the scripts used to generate the IBVL.

The Nextflow wrapper is used to allow treacability and reproducibility, to review / comment the scripts, refer to the script folder

Import directory

How to run an import:

  1. copy the import/.env-sample file to import/.env and set values appropriately
  2. (optional) if you need to, run python tables.py to create the tables (database should be empty before this)
  3. python orchestrate.py will kick off the migration

The script creates a directory called "jobs", and a directory inside that called "1" the first time, "2" the second time, eg.

Each of these job folders has working data for the migration and two output logs (one for errors, one for progress). The working data is just (for each model) a file with the latest primary key, and a reverse lookup map for entity id (eg gene or variant or transcript id) to primary key.

Import environment vars

  • ORACLE_TABLE_PATH - (not necessarily just for oracle destinations) the full path to the directory containing pipeline output files
  • COPY_MAPS_FROM_JOB - The script maintains maps in order to resolve primary keys, they are persisted as json to the job directory, named using an incrementing number. If a job fails and you want to use the maps from a previous run, enter the run's job folder number as the value of this environment variable
  • SCHEMA_NAME - for an Oracle destination db, the schema name goes here.
  • START_AT_MODEL - to pick up after a previous migration run left off, you can enter the model name here, and the script will skip to that model (it runs in the order of keys as defined in the model_import_actions map)
  • (START_AT_FILE) - for convenience, you can also skip to a particular file in the first model dir imported, using natural sorting. Be very careful if using this in production as it will lead to false duplicates unless the primary key for new row insertions is corrected.

ibvl's People

Contributors

scorreard avatar brittanyhewitson avatar bradwbradw avatar melsiddieg avatar

Watchers

 avatar  avatar Phillip Richmond avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.