Giter Site home page Giter Site logo

onecode's Introduction

The ONEcode Data Framework

ONEcode is a powerful general data representation framework with a growing collection of associated software. It was originally designed for genomic data in the context of the Vertebrate Genomes Project (VGP).

Data is represented in a very simple ASCII file format that is easy for both humans and programs to read and interpret. Moreover, there is a corresponding compressed and indexed binary version for each ASCII file so that production tools built with the ONEcode library ONElib are very efficient in time and the size of the data files they manipulate. All fields are strongly typed and a specific collection of data types constituting a schema that is normaly encoded at the top of the file, itself in ONEcode format.

A generic viewer ONEview allows one to view ascii and binary files, and convert between them, also supporting viewing of a subset of objects in the file. Another core tool ONEstat can provide various checks including validating files against a given schema in a separate file.

To make the library and command line tools just type make in this top level directory:

make
make install // copies executables to ~/bin

To see a simple annotated ONEcode file and some example usage:

cat TEST/small.seq
make test

The package has no dependencies on other software. The .md files contain documentation, ONElib.c and ONElib.h contain the C code library for developers, and ONEview.c and ONEstat.c encode their respective programs.

The subdirectory SEQUENCE_UTILITIES contains a set of sequence utilities to interconvert between (compressed) fasta/fastq, ONEcode, and BAM/SAM, report statistics, and flexibly extract sequences. It contains its own README.md file.

The documents describing the framework, generic tools, and development library are as follows:

In addition, specific technical documentation can be found within individual source files, in particular ONElib.h.

Authors: Richard Durbin & Gene Myers

Date: October 2022
Updated: July 2024

onecode's People

Contributors

richarddurbin avatar thegenemyers avatar regevs avatar rchikhi avatar

onecode's Issues

API changes

  • listLength (in c++) gives the length of getIntList, getRealList and getString
  • getDNAchar should return a string
  • wrap getDNAchar and getDNA and the length of this is listLength
  • Don't wrap getDNA2bit
  • If field type is STRING, oneLen is the string length; if it's STRING_LIST it's the len list.
  • Think about what is a Python way to support a string list and iterating through (oneString, oneLne, oneNextString) as in ONELIB.h comment below oneNextString - we prefer a list of strings
  • getComment - return string
  • writeLine should accept a numpy array that will figure out the type and length of the array and delegate to the C++
  • writeLine is overloaded, sometime with a numpy array; once with just line type; one with a string

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.