Giter Site home page Giter Site logo

htslib-crypt4gh's Introduction

samtools

Build Status Build status Github All Releases

This is the official development repository for samtools.

The original samtools package has been split into three separate but tightly coordinated projects:

  • htslib: C-library for handling high-throughput sequencing data
  • samtools: mpileup and other tools for handling SAM, BAM, CRAM
  • bcftools: calling and other tools for handling VCF, BCF

See also http://github.com/samtools/

Building Samtools

See INSTALL for complete details. Release tarballs contain generated files that have not been committed to this repository, so building the code from a Git repository requires extra steps:

autoheader            # Build config.h.in (this may generate a warning about
                      # AC_CONFIG_SUBDIRS - please ignore it).
autoconf -Wno-syntax  # Generate the configure script
./configure           # Needed for choosing optional functionality
make
make install

By default, this will build against an HTSlib source tree in ../htslib. You can alter this to a source tree elsewhere or to a previously-installed HTSlib by configuring with --with-htslib=DIR.

Citing

Please cite this paper when using SAMtools for your publications.

Twelve years of SAMtools and BCFtools
Petr Danecek, James K Bonfield, Jennifer Liddle, John Marshall, Valeriu Ohan, Martin O Pollard, Andrew Whitwham, Thomas Keane, Shane A McCarthy, Robert M Davies, Heng Li
GigaScience, Volume 10, Issue 2, February 2021, giab008, https://doi.org/10.1093/gigascience/giab008

@article{10.1093/gigascience/giab008,
    author = {Danecek, Petr and Bonfield, James K and Liddle, Jennifer and Marshall, John and Ohan, Valeriu and Pollard, Martin O and Whitwham, Andrew and Keane, Thomas and McCarthy, Shane A and Davies, Robert M and Li, Heng},
    title = "{Twelve years of SAMtools and BCFtools}",
    journal = {GigaScience},
    volume = {10},
    number = {2},
    year = {2021},
    month = {02},
    abstract = "{SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods.The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines.Both SAMtools and BCFtools are freely available on GitHub under the permissive MIT licence, free for both non-commercial and commercial use. Both packages have been installed \\>1 million times via Bioconda. The source code and documentation are available from https://www.htslib.org.}",
    issn = {2047-217X},
    doi = {10.1093/gigascience/giab008},
    url = {https://doi.org/10.1093/gigascience/giab008},
    note = {giab008},
    eprint = {https://academic.oup.com/gigascience/article-pdf/10/2/giab008/36332246/giab008.pdf},
}

htslib-crypt4gh's People

Contributors

daviesrob avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

htslib-crypt4gh's Issues

Memory leaks when failing to init

Neither init_for_write nor init_for_read doesn't free the memory allocate to the various members of hFILE_crypt4gh, upon encountering an error.

Bound for the skip lengths in the edit lists

I feel that the specs should include something about the possible range for the skip lengths, if an edit list is used.

What we would like to avoid is carrying a segment in the data portion, which is in fact entirely skipped when decrypting, or reencrypting the header for another destination. Of course, we are not interested in producing such files, but the spec doesn't prevent it either. we don't want to be carrying malicious segments, which after edit list manipulation are not ignored anymore.

I am not sure if the following suggestion is enough to cover all cases properly, but I think that a skip length s should be bound as follows.

Let S be the segment size, and p be the current position within any segment.
Let p0 = p - (p % S) (ie, the start of the given segment).
Then, p + s < p0 + 2 * S.
In other words, for whatever position within a segment p%S, we have p%S + s < 2*S.

|----------------------|-----------------------| (2 segments)
           ô----------- s ------------ô
           |
           p

That way, even if p is the last byte of a segment, the next byte read must be in the adjacent segment (and we therefore never carry an "unused" segment).
Note: the current position p is calculated as the some of all lengths prior to s (and I didn't want to overload the issue with sums and subscripts)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.