Giter Site home page Giter Site logo

bigly's Introduction

bigly: a pileup library that embraces the huge

Build Status GoDoc

bigly is an API and a command-line app (binaries here). It is similar to samtools mpileup but it reports a huge number of additional variables that are useful for structural variant calling and visualization.

For each requested position, the struct below is filled by the appropriate position in any overlapping alignment that meets the requested filters:

// Pile holds the information about a single base.
type Pile struct {
	Chrom                  string
	Pos                    int
	Depth                  int    // count of reads passing filters.
	RefBase                byte   // Reference base a this position.
	MisMatches             uint32 // number of mismatches .
	ProperPairs            int    // count of reads with paired flag
	SoftStarts             uint32 // counts of base preceding an 'S' cigar op
	SoftEnds               uint32 // ...  following ...
	HardStarts             uint32 // counts of base preceding an 'H' cigar op
	HardEnds               uint32
	InsertionStarts        uint32 // counts of base preceding an 'I' cigar op
	InsertionEnds          uint32
	Deletions              uint32  // counts of deletions 'D' at this base
	Heads                  uint32  // counts of starts of reads at this base
	Tails                  uint32  // counts of ends of reads at this base
	Splitters              uint32  // count of non-secondary reads with SA tags.
	Splitters1             uint32  // count of non-secondary reads with exactly 1 SA tag.
	Bases                  []byte  // All bases from reads covering this position
	Quals                  []uint8 // All quals from reads covering this position
	MeanInsertSizeLP       uint32  // Calculated with left-most of pair
	MeanInsertSizeRM       uint32  // Calculated with right-most of pair
	OrientationPlusPlus    uint32  // Paired reads mapped in +/+ orientation
	OrientationMinusMinus  uint32  // Paired reads mapped in -/- orientation
	OrientationMinusPlus   uint32  // Paired reads mapped in -/+ orientation
    OrientationSplitter    uint32  // Count of +/- or -/+ splitters.
	Discordant             uint32  // Number of reads with insert size > ConcordantCutoff
	DiscordantChrom        uint32  // Number of reads mapping on different chroms
	DiscordantChromEntropy float32 // high value means all discordants came from same chrom.
	GC65                   uint32
	GC257                  uint32
	Duplicity65            float32 // measure of lack of sequence entropy.
	Duplicity257           float32 // measure of lack of sequence entropy.
	SplitterPositions      []int
	SplitterStrings        []string
}

The program in cmd/bigly/main.go is distributed as an example program of what one can do with this library--namely make an enhanced pileup in a few lines of code.

Usage

At this time, the usage of the example program is very simple. Default exclude flags are (sam.Unmapped | sam.QCFail | sam.Duplicate)

bigly $bam $chrom:$start-$end > o

if a reference is specified with -r it will report statistics about GC content in windows surrounding each base.

help:

bigly 0.2.0
usage: bigly [--minbasequality MINBASEQUALITY] [--minmappingquality MINMAPPINGQUALITY] [--excludeflag EXCLUDEFLAG] [--includeflag INCLUDEFLAG] [--mincliplength MINCLIPLENGTH] [--includebases] [--splitterverbosity SPLITTERVERBOSITY] [--reference REFERENCE] BAMPATH REGION

positional arguments:
  bampath
  region

options:
  --minbasequality MINBASEQUALITY, -q MINBASEQUALITY
                         base quality threshold [default: 10]
  --minmappingquality MINMAPPINGQUALITY, -Q MINMAPPINGQUALITY
                         mapping quality threshold [default: 5]
  --excludeflag EXCLUDEFLAG, -F EXCLUDEFLAG
  --includeflag INCLUDEFLAG, -f INCLUDEFLAG
  --mincliplength MINCLIPLENGTH, -c MINCLIPLENGTH
                         only count H/S clips of at least this length [default: 15]
  --includebases, -b     output each base and base quality score
  --splitterverbosity SPLITTERVERBOSITY, -s SPLITTERVERBOSITY
                         0-only count; 1:count and single most frequent; 2:all SAs; 3:dont shorten positions
  --reference REFERENCE, -r REFERENCE
                         optional path to reference fasta.
  --help, -h             display this help and exit
  --version              display version and exit

API

GoDoc Documentation is here: ![GoDoc] (https://godoc.org/github.com/brentp/bigly?status.png)

With a supporting library easing regional bam queries here: ![GoDoc] (https://godoc.org/github.com/brentp/bigly/bamat?status.png)

View the tests for more examples. bigly uses biogo library for bam access.

Plotting

python scripts/plotter.py bigly.output

will make a plot with python+matplotlib on output from bigly.

An example image looks like: Example

This is a homozygous deletion easily seen from the depth, but we can see that the deletion is nicely delineated by soft-clips and we see aberrant insert sizes bounding the deletion. In most libraries, we would also see splitters flanking the region.

TODO

  • Track strand of bases.
  • Report the 5th and 95th percentile of insert size.
  • More efficient insert-size calc

Credits

Ryan Layer's svv was the inspiration for the plotter.

Obviously, samtools is the reference pileup implementation.

bigly's People

Contributors

brentp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

bigly's Issues

features for webserver

  • coordinate y-axes across samples

  • add per-sample spinner.

  • cancel previous request when new one is made

  • make legend click work on all plots (currently just top plot)

  • add "weird orientation line"

  • bigly: split weird count into +/+, -/-, -/+

  • don't show soft clips < 4.

  • add simple repeats + micro satellites track

  • make IGV bounds match those from the plot exactly.

  • report insert size as number of std-devs above/below the mean calculated by sampling the bam.

  • change bigly to accept a bam.Reader

Did I use bamat or biogo correctly?

Hi Brent,

I was trying to fetch records from a bam file with golang and bamat seems to be the most straight-forward solution I have ever seen. But when I did some tests, biogo/hts or bamat seemed to give unexpected results. Did I use biogo/hts or bamat correctly?

Here is my code:

First I tested with samtools:

# this command gives only one record
# samtools/htslib version: 1.11
./samtools view ./test.bam  chr1:100000-100200

Then I tried biogo/hts & bamat

import (
	"bufio"
	"github.com/biogo/hts/bam"
	"github.com/biogo/hts/sam"
	"os"
)

/*
The entire bamat source codes are copied here
*/

// this program gives 2000+ records
// the position of the first record starts from the beginning of bam
func main() {

	bamat, _ := New("test.bam")
	defer bamat.Close()

	iter, _ := bamat.Query("chr1", 100000, 100200)
	for iter.Next() {
		rec := iter.Record()
		println(rec.Name)
	}
}

Additional information:
system: centos7

  1. test.bam is the alignment result of a paired-end sequencing experient.
  2. test.bam is sorted by chromosome and position
  3. test.bam.bai is built with samtools index command

I also printed the information of offsets and chunks:

// Query the BamAt with 0-base half-open interval.
func (b *BamAt) Query(chrom string, start int, end int) (*bam.Iterator, error) {
	if chrom == "" { // stdin
		return bam.NewIterator(b.Reader, nil)
	}
	ref := b.Refs[chrom]
	if end <= 0 {
		end = ref.Len() - 1
	}
	chunks, err := b.idx.Chunks(ref, start, end)
	for _, c := range chunks {
		fmt.Printf("Chunk, c.Begin.File: %v, c.Begin.Block: %v \n", c.Begin.File, c.Begin.Block)
		fmt.Printf("Chunk, c.End.File: %v, c.End.Block: %v \n", c.End.File, c.End.Block)
	}
	if err != nil {
		return nil, err
	}
	return bam.NewIterator(b.Reader, chunks)
}

output:

Chunk, c.Begin.File: 1157, c.Begin.Block: 0
Chunk, c.End.File: 152443, c.End.Block: 61430
Chunk, c.Begin.File: 305256, c.Begin.Block: 45799
Chunk, c.End.File: 369546, c.End.Block: 49973
Chunk, c.Begin.File: 4495641, c.Begin.Block: 6375
Chunk, c.End.File: 4495641, c.End.Block: 7201
Chunk, c.Begin.File: 4678143, c.Begin.Block: 22248
Chunk, c.End.File: 4678143, c.End.Block: 22521
Chunk, c.Begin.File: 5535256, c.Begin.Block: 32237
Chunk, c.End.File: 5535256, c.End.Block: 33167
Chunk, c.Begin.File: 8076117, c.Begin.Block: 22784
Chunk, c.End.File: 8076117, c.End.Block: 24440
Chunk, c.Begin.File: 8319574, c.Begin.Block: 22863
Chunk, c.End.File: 8319574, c.End.Block: 24508
Chunk, c.Begin.File: 54542702, c.Begin.Block: 55586
Chunk, c.End.File: 54542702, c.End.Block: 62429
Chunk, c.Begin.File: 59486419, c.Begin.Block: 10427
Chunk, c.End.File: 59486419, c.End.Block: 10975

This is my python code which gives the correct record:

import pysam

def query_bam(region, bamfile):

    samfile = pysam.AlignmentFile(bamfile, "rb")
    iterable = samfile.fetch(region=region)

    record_count = 0
    for record in iterable:
        print(record.query_name)
        record_count += 1

    samfile.close()
    return record_count

if __name__ == "__main__":
    bam_file = "test.bam"
    region = "chr1:100000-100200"
    total_records = query_bam(region, bam_file)
    print(f"Total records found: {total_records}")

headless systems

Hi,

nice, I like the plotting function very much.

Any chance the plotting function could take an argument to save an image directly to disk instead of requiring x11 ?

Then it would be a nice way to script lots of images without manual work for example, and supply them to users.

Thanks,
Colin

Track strand of bases

Hi,
I am using mpileup for supporting reads from forward or reverse strand, but it is very slow. If bigly can track it as well, I would like to give it a try.

Thanks!
Tommy

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.