Giter Site home page Giter Site logo

bwtool's Introduction

Getting started

See the wiki.

bwtool's People

Contributors

ghuls avatar yhoogstrate avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bwtool's Issues

random doesn't work

Seems like an infinite loop or something that it's just not working at all. Never mind whether the output is random-seeming. There's no output.

Fatal error: Exit code 1

With one of our 250 MB bigwig test files we get this error message:

Fatal error: Exit code 1 ()
errAbort re-entered due to out-of-memory condition. Exiting.

What does it mean? This process was able to allocate 30GB of memory.

Clustering bug

Seems to affect both matrix and aggregate. Not yet reproducible in small examples. It's a bit worrying. Perhaps a rollback to an earlier cluster.c is necessary.

chromosome order in "bwtool window"

What determines the chromosome order in "bwtool window"? For example I get this order on my bigwigs (just look at the first column):
GL456396.1 21200 21225 111
GL456354.1 195950 195975 2
GL456382.1 23125 23150 2
JH584298.1 184150 184175 2
GL456367.1 42025 42050 0
GL456216.1 66625 66650 8
GL456381.1 25825 25850 2
JH584297.1 205750 205775 0
GL456366.1 47025 47050 0
GL456394.1 24275 24300 14
GL456379.1 72350 72375 2
JH584296.1 199325 199350 4
MT 16250 16275 33
GL456393.1 55675 55700 2
GL456378.1 31575 31600 6
JH584295.1 1950 1975 0
GL456392.1 23600 23625 63
GL456350.1 227925 227950 4
GL456213.1 39300 39325 2
JH584294.1 191875 191900 0
JH584304.1 114425 114450 842
GL456212.1 153575 153600 2
JH584293.1 207925 207950 4
JH584303.1 158050 158075 0
GL456239.1 40025 40050 48
GL456389.1 28725 28750 0
GL456390.1 24625 24650 29
GL456211.1 241700 241725 0
19 61431525 61431550 0
18 90702600 90702625 0
17 94987225 94987250 0
16 98207725 98207750 0
15 104043650 104043675 0
14 124902200 124902225 0
13 120421600 120421625 0
12 120128975 120129000 0
11 122082500 122082525 0
10 130694950 130694975 0
JH584292.1 14900 14925 4
JH584302.1 155800 155825 1
GL456210.1 169700 169725 0
JH584301.1 259850 259875 0
GL456359.1 22925 22950 8
GL456360.1 31675 31700 6
GL456387.1 24650 24675 0
JH584300.1 182300 182325 0
GL456372.1 28625 28650 14
GL456221.1 206925 206950 4
GL456385.1 35200 35225 2
GL456219.1 175925 175950 0
GL456370.1 26725 26750 0
Y 91744650 91744675 0
X 171031250 171031275 0
GL456233.1 336900 336925 0
9 124595075 124595100 0
8 129401175 129401200 0
7 145441425 145441450 0
6 149736500 149736525 0
5 151834650 151834675 0
4 156508075 156508100 0
3 160039650 160039675 0
2 182113175 182113200 0
1 195471925 195471950 0

The order seems a bit random. My problem is that I just want to use the "bwtool window" output from column 4 to the last column, and for example concatenate this output from multiple files. But then I need to be sure about the ordering of the regions/chromosomes. Can I be sure that for similar bigwigs the chromosome order is always the same?
Thanks!

man pages

There are none. Just the command summary when no arguments are used.

License Question

Hello,

Just a question to clarify the license:
It seems like the resulting binary program will be both "GPL3" (which specifically requires the program to be OK to for all purposes, including commercial use) and JimKent's license (which forbids commercial use).
So basically, it looks like an impossible license... (since libjkweb is an integral part of the program).

summary sum of squares and sum

Jakob also suggested this for summary. With the sum of squares, the variance can be calculated. This perhaps makes the standard deviation redundant.

SAX page

  • Introduction
  • Usage
  • Example using small data
  • Example changing the alphabet size
  • Example with bedGraph output.
  • Example changing the -mean and/or -std defaults.
  • Show a diagram of how the discretization roughly follows the data.

chromgraph page

  • Introduction
  • Usage
  • Example showing basic usage with a genome-wide bigWig.
  • The same example with a different -every option.

lift chrom sizes

The chrom sizes handed into the bigWig writer is the one from the source bigWig and not the destination. Big problem.

-wig option

Maybe allow bigWig-writing programs to end it at the wig-writing step.

GSL for random

I should put this back in ifdef'd based on whether GSL is linked in or not.

find page

  • Introduction
  • Usage
  • Example showing thresholding
  • Example showing local extrema
  • Example showing local extrema with min distance separation (greedy)

semi-bug -decimals without number

I think perhaps there are several other options that expect an argument and if one isn't given it fails similarly. The current error message makes sense to me but probably wouldn't for a typical user.

extract page

  • Introduction
  • Usage
  • bed example
  • jsp example
  • a minus-strand example

lift page

In principle for the first milestone it's done.

Double comma after NA from extractOutBed()

Hi,

A tiny bug:

bwtool extract generates double commas after an NA
(and -tabs does not supress all commas)

This is due to line 71 in extractOutBed() from extract.c:
fprintf(out, "NA,%c", (tabs) ? '\t' : ',');

... which should be:
fprintf(out, "NA%c", (tabs) ? '\t' : ',');

Thanks,

Guy

"Address boundary error" with bwtool matrix -cluster

On some input files I have the following crash with bwtool matrix -cluster=10

./bwtool matrix -keep-be…” terminated by signal SIGSEGV (Address boundary error)

I traced the error to this code in beato/cluster.c

static int *k_means(struct cluster_bed_matrix *cbm, double t)
{
    /* output cluster label for each data point */
    int *labels; /* Labels for each cluster (size n) */
    int h, i, j; /* loop counters, of course :) */
    double old_error;
    double error = DBL_MAX; /* sum of squared euclidean distance */
    double **tmp_centroids; /* centroids and temp centroids (size k x m) */   
    int n = cbm->n;
    int m = cbm->m;
    int k = cbm->k;
    AllocArray(labels, n);
    AllocArray(tmp_centroids, k);
printf("k_means: 0\n");
    for (i = 0; i < k; i++)
        AllocArray(tmp_centroids[i], m);
    /* assert(data && k > 0 && k <= n && m > 0 && t >= 0); /\* for debugging *\/ */
    /* init ialization */
printf("k_means: 1\n");
    for (i = 0, h = cbm->num_na; i < k; h += (cbm->n-cbm->num_na) / k, i++)
    {
printf("k_means: 1:%d\n", i);
        /* pick k points as initial centroids */
        for (j = 0; j < m; j++) {
printf("k_means: 1:%d %d %d %d\n", i, j, m, h);
            cbm->centroids[i][j] = cbm->pbm->matrix[h][j];
        }
    }
...

For a working file:

do_kmeans_sort
do_kmeans_sort: 0
do_kmeans_sort float: 0.001000
k_means: 0
k_means: 1
k_means: 1:0
k_means: 1:0 0 10000 19982
k_means: 1:0 1 10000 19982
k_means: 1:0 2 10000 19982
k_means: 1:0 3 10000 19982
...
k_means: 1:9 9993 10000 19991
k_means: 1:9 9994 10000 19991
k_means: 1:9 9995 10000 19991
k_means: 1:9 9996 10000 19991
k_means: 1:9 9997 10000 19991
k_means: 1:9 9998 10000 19991
k_means: 1:9 9999 10000 19991
k_means: 2
k_means: 3
do_kmeans_sort: 1
do_kmeans_sort: 2
do_kmeans_sort: 3
do_kmeans_sort: 4
do_kmeans_sort: 5
output_matrix

For a non-working file:

do_kmeans_sort
do_kmeans_sort: 0
do_kmeans_sort float: 0.001000
k_means: 0
k_means: 1
k_means: 1:0
k_means: 1:0 0 10000 20000
Segmentation fault (core dumped)

I think h (=20000) is calculated wrongly:
I have a window size of 10000 and 20000 regions in my BED file.

aggregate multiple beds

I think there was a problem with the outputting when inputting multiple beds. It seemed that the output would be consecutive as opposed to concurrent (3 columns). This is probably an easy fix.

-condense segfaults

I tried with lift on a big file and it dumped a 26 GB core file. Maybe find some smaller examples.

matrix page

  • Introduction
  • Usage
  • Simple example with illustrations.
  • Mention -starts and -ends in reference to aggregate.
  • Example with clustering
  • Clustering example with -keep-bed
  • Example with -long-form
  • Example with -tiled-averages

configure running twice

I don't know when it started doing that but it doesn't seem to really matter except that it's annoying to wait for it again. It's probably some extra line in the Makefile.am or configure.ac triggering it. Hmm. It's probably easiest to figure out from the generated Makefile.

distribution page

That's probably enough. Maybe later an example plot in R for the next milestone.

Something wrong when making bwtool

When I run the command "make" in the bwtool folder, there is something wrong and I don't know how to fix it. Could you pleas help me to fix this problem?

$ make
/Applications/Xcode.app/Contents/Developer/usr/bin/make all-recursive
Making all in tests
make[2]: Nothing to be done for `all'.
gcc -DHAVE_CONFIG_H -I. -I/sw/include -MT aggregate.o -MD -MP -MF .deps/aggregate.Tpo -c -o aggregate.o aggregate.c
aggregate.c:7:26: fatal error: jkweb/common.h: No such file or directory
compilation terminated.
make[2]: *** [aggregate.o] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

I think the build of libbeator seemed to be normal.

Thank you!

bwtool agg crashes on ERCC spike-in sequences

Hey Andy,
i'm trying to run "bwtool agg" on ERCC spike-in sequences but it crashes. ERCC spike-ins are a set of 92 synthetic, unspliced sequences that one adds to an RNA mix before making a cDNA library. Conceptually, I guess each of those 92 sequences could be considered a separate chromosome. Their sequences are known, so one can map RNAseq reads onto them and generate BigWigs.

Here's the command I've used, followed by its standard error:

$ ~/bin/bwtool/bwtool agg -long-form -header -expanded 0:100:0 hsAll_Cap1_all_bothAdapters.ERCC.bw ERCC.bed /dev/stdout
Segmentation fault (core dumped)

You can get the two input files here:
http://genome.crg.es/~jlagarde/tmp/ERCC.bed
http://genome.crg.es/~jlagarde/tmp/hsAll_Cap1_all_bothAdapters.ERCC.bw

If you could look into this issue it would be great!
Cheers

-clusters and -expanded in aggregate

This is sort of a bug. It doesn't crash, it's just that -expanded doesn't do anything with -clusters. At the least, a message should be provided that says this enhancement isn't available yet.

Trying to install bwtool but there is something wrong with the binary_format()

It seems that bwtools is not compatible with the function binary_format() in /usr/local/include/htslib/hts.h. How can I solve it?

make[1]: Entering directory`/home/user/repo/tools/libbeato/beato'
gcc -DHAVE_CONFIG_H -I. -I..    -Wall -Wformat -Wimplicit -Wreturn-type -Wuninitialized -fno-strict-aliasing -g -O0 -I/home/user/include -MT metaBigBam.o -MD -MP -MF .deps/metaBigBam.Tpo -c -o metaBigBam.o metaBigBam.c
metaBigBam.c:572:13: error: ‘binary_format’ redeclared as different kind of symbol
 static void binary_format(char _s, int num)
             ^
In file included from /usr/local/include/htslib/sam.h:30:0,
                 from ../beato/metaBig.h:20,
                 from metaBigBam.c:16:
/usr/local/include/htslib/hts.h:86:5: note: previous definition of ‘binary_format’ was here
     binary_format, text_format,
     ^
make[1]: *_\* [metaBigBam.o] Error 1

Removing zeros

This remains an issue because doubles are stored as signed values, and it's never clear if all the other bits are zero anyway. Casting tricks aren't so desirable and care needs to be done to maintain that the casted-to integer has the same number of words/bits.

bwtool remove or other places ignoring zero may be changed to use epsilon values to approximate zero. I dunno.

window page

Pretty good already. More scripts or big examples could be useful.

paste/aggregate bigWig consistency check

This may be more universal than just pasting or using aggregate, but the features using multiple bigWigs should do a check to make sure the internal chromosome sizes are all consistent among multiple bigWigs in case someone uses bigWigs from different assemblies/species by accident.

Negative parameters not accepted

Anytime a negative number is given on the command-line it's interpreted as an option. I need to keep a central list of all the options so early in the program this ambiguity is resolved.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.