Giter Site home page Giter Site logo

gia's Introduction

Hello!

I'm Noam Teyssier, a Bioinformatics PhD Candidate in the Kampmann Lab and Goodarzi Lab at UCSF.

I work at the intersection of functional genomics, machine learning, dynamical modeling, and systems biology in the context of neurodegenerative diseases.

I highly value open source and try to share all the tools I've developed during my own research.

gia's People

Contributors

mrvollger avatar noamteyssier avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

gia's Issues

Mix file formats when applicable

Need to be able to mix file formats (i.e. bed3, bed6, bed12) when provided as inputs to operations requiring 2 files.

Multiple Files

  • closest
  • intersect
  • subtract

Single Files

  • complement
  • extend
  • get_fasta
  • merge
  • sample
  • sort

Can close #65 once done

performance claims

I meant to do some testing on my own, but I may never get there. I'm one of the authors of BEDOPS. It is not easy to imagine a 6x or so improvement in runtimes, as these are linear (or n log n for sorting) time algorithms in bedops/closest-features utilities.

There are a couple of things that stand out to me in the bioarxiv paper. Mainly, timed tests are at most 1 second for the slowest tool which indicates very, very small inputs (Figures 1 and 2). If the trend held with large inputs, that would be far more interesting and impressive. Right now, the differences might be attributable to things that do not generalize beyond 1 second, for example.

The memory overhead shown for bedops (Figure 5) makes me think that they used the "megarow" build of BEDOPS. That build is meant for very large sequencing results (nanopore and pacbio). It scales to those much larger data at the cost of some small memory overhead but also considerable time overhead. It would be worth measuring time/memory against that larger build but also against the more popular (and default) build for utilities in BEDOPS.

You can use the switch-BEDOPS-binary-type utility to switch between typical (default) and megarow builds of utilities in BEDOPS.

Retain BED6 format?

Hi Noam!

I'm running a lot of bedtools intersect commands that I would love to replace with gia, but I was relying on the information in the bed6 format being retained.

e.g. fileA.bed
chr1 29300 29400

e.g. fileB.bed
chr1 29301 29400 CTAACTTTCCTATCAT-1 41 +
chr1 29328 29427 CTAACTTTCCTATCAT-1 40 -

e.g. output I need with the cell barcode.
chr1 29301 29400 CTAACTTTCCTATCAT-1 41 +

In this case, would I need to use bedrs instead of gia & create an interval type with my additional field?
-- Amanda

Stranded Methods

This is an issue to track the development of implementing stranded methods

  • Closest
  • Complement
  • Extend
  • Get Fasta
  • Intersect
  • Merge
  • Random
  • Sample
  • Sort
  • Subtract

Incorporate Streamed Methods

Streamable Methods

  • Closest
  • Complement
  • Extend
  • Intersect
  • Subtract

Named Streamable Methods

  • Closest
  • Complement
  • Extend
  • Intersect
  • Subtract

Autodetermine file format

Right now the default is to read everything in as BED3 unless providing an alternative format to the -T flag.

Ideally this should autodetermine the format to be the number of columns in the input and fall-back to BED3 if it fails to do so.

There should also be two flags for left format and right format in case this is necessary.

Support for intersect -wo flag

Write the original A and B entries plus the number of base
		pairs of overlap between the two features.
		- Overlaps restricted by -f and -r.
		  Only A features with overlap are reported.

GIA 0.2.0

Matching development of bedrs-0.2

  • Convert all instances of Containers into static structs of IntervalContainer
  • Convert all numeric instances of Bed3, Bed4, Bed6, Bed12 into bedrs structs
  • Handle mixed file formats and combinatorics with dispatch methods

Reproduce bedtools methods

  • intersect
  • window
  • closest
  • coverage
  • map
  • genomecov
  • merge
  • cluster
  • complement
  • shift
  • subtract
  • slop
  • flank
  • sort
  • random
  • shuffle
  • sample
  • spacing
  • unionbedg

bed12 support, -wo flag support

Thanks so much for providing this tool to the community! I am finding it much faster than the existing toolkits.
Would it be possible to provide support for the bed12 format? Additionally, would it be possible to extend functionality to additional bedtools intersect flags such as -wo?
For people working in the single-molecule sequencing space these additions would be massively helpful.

Thanks again!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.