Giter Site home page Giter Site logo

ngs's People

Contributors

jshallcr avatar safisher avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ngs's Issues

Species names getting confusing in PIPELINE

PIPELINE uses the same species name argument for BLAST, STAR, RUM, and HTSEQ. This causes problems if the users hasn't named their libraries identically or if the species name doesn't mesh with the expected names hard-coded in the BLAST module.

Module versioning

Need to track version information for each module. Currently we just track the version of ngs.sh (incremented when modules change) and the external programs used.

STATS

Add a flag to STATS that will run stats on all subdirectories (ie all samples in the current directory) and output stats to a specified xls file.

Update args processing in all modules

Should use the following to process args rather than "while true" as this way we can force stop the processing of args when we only have one left as the last arg must be the sample name. Alternatively we should add a test after the "while true" loop to make sure we have one remaining argument.

while test $# -gt 1
do
echo $1
shift
done

Python version / location

Change Python scripts, removing the hard-coded location of Python. Default to the version of Python that is in the user's path.

Trim location output

Need a way to label the columns in the output files that contain the list of trim locations.

PIPELINE should pull modules from file

PIPELINE should pull list of modules and module args from a user-specified file. The file should be copied into a 'sampleID/pipeline' directory with a time stamp.

Sample Name trailing slash

Check sample name for trailing "/" and remove it if present. The slash at the end of the name is a cosmetic fix.

PIPELINE error testing

Currently there is no testing of failed runs meaning PIPELINE will continue to run even if a module fails. At the most basic level we should test for empty files in BLAST (raw.fq), TRIM (unaligned_1.fq) and HTSEQ (*.cnts.txt).

Generate module-specific version numbers

Each module should return a version number, with the pipeline being just another module. Users should be able to track which version of which module was used in each pipeline run.

Journal output

Journal output should be optional and based on parameter value.

Update analysis.log

Create a log subdirectory in sampleID to store log files. Break up analysis.log so that every time a module runs a new log file is saved in this directory using (time-stamp + module name) for file name.

When running PIPELINE, then the logs for all modules within the PIPELINE run would be stored in a single PIPELINE log.

Rework the log output so that the timestamps are comments and the log file is effectively a bash file and can be run without modification.

RUM output diretory

Save rum output in rum.$SPECIES instead of rum.trim. This would have implications in rumalign, rumstatus, htseq, stats, and post

Decouple directories and module names

Each directory should include a module file that contains the name of the module used to create that directory. This information could be added to the SAMPLE_ID.versions file. The file containing the module information can be used by STATS to determine which module is used to generate the stats. In this case STATS would be provided with a list of directories rather than a list of modules.

This will allow us to decouple the directory name from the module name and allow for more flexibility in running modules repeatedly. For example STAR could be run twice on two different genome versions or HTSEQ could be run repeated on different transcriptomes. This will also allow for meta-modules and more overall granularity in modules. For example we could run HTSEQ on exons then introns and use another (meta-)module to combine the exon and intron counts.

BLAST hits not counted

BLAST should output a file that lists all hits not counted and the species mapped (one line per read). This would make it easier for users to determine if there was a specific contaminant.

Continuing PIPELINE run

Allow PIPELINE to pick up where it left off. Need some way to flag when a module completes, so we know where to begin. We could look for the last created directory and rerun that module. Example, if INIT, FASTQC, and BLAST directories exist, then we should rerun BLAST and go from there (not expecting the BLAST module to have completed).

Document API for modules

  • init function
  • inclusion of an error checking function
  • stats function
  • printing of version information

TRIM version information

need to capture poly-A/T trimming parameters and other trimReads.py command line options somewhere.

Should we also update TRIM to include trimming parameters in versions files: each of the contaminants trimming options and the contaminants string?

hardcoded parameters

Allow for setting of ngs.sh parameters via environment variables: debug, journal file, repo resource directory, and executable locations.

change HTSEQ version information

adjust version output to only use file name of library file, rather than library path.
possibly include library path as separate output.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.