Giter Site home page Giter Site logo

randomstats not cleaning up about pybedtools HOT 9 OPEN

brentp avatar brentp commented on July 28, 2024
randomstats not cleaning up

from pybedtools.

Comments (9)

daler avatar daler commented on July 28, 2024

A while ago I did a major overhaul on the randomization stuff, implementing a new method (BedTool._randomintersection rather than BedTool.randomintersection) that fixed this.

Looks like I never made this method the default for BedTool.randomstats().

To use the new method, you can specify new=True and provide a genome_fn to BedTool.randomstats. To see the difference (both in syntax and cluttering of the temp dir), check out test/prevent_open_file_regression.

So for your example, this should do the trick:

gfn = pybedtools.chromsizes_to_file(pybedtools.chromsizes('hg19'))
res = bed.randomstats(loh.fn, 100, processes=25, new=True, genome_fn=gfn)

(side note: If you take a look at the leftover temp files, I think they should all be genome files)

from pybedtools.

brentp avatar brentp commented on July 28, 2024

that does the trick. can genome_fn be a required argument to avoid this?

from pybedtools.

daler avatar daler commented on July 28, 2024

Yeah, that's probably best. I still need to do a little more cleaning up and "officially" deprecate the old randomstats method; when that happens the genome_fn will be required.

from pybedtools.

brentp avatar brentp commented on July 28, 2024

got it.

would you consider adding _orig_pool kwag to random_op. it'd be nice be able to keep re-using a pool if I'm running this across multiple pairs of bed files.

from pybedtools.

daler avatar daler commented on July 28, 2024

Sure.

Implementation-wise, would you rather create your own pool and use it for various parallel calls like

mypool = multiprocessing.Pool(25)
bt.randomstats(_orig_pool=mypool, *args, **kwargs)
bt.random_op(_orig_pool=mypool, *args, **kwargs)
bt.random_jaccard(_orig_pool=mypool, *args, **kwargs)

or have a BedTool._pool instance variable that, if None, will initialize with n processes, but subsequent calls (when _orig_pool=True) re-use that auto-created one?

# initializes a pool, BedTool._pool = multiprocessing.Pool(25)
bt.randomstats(_orig_pool=True, processes=25, *args, **kwargs)

# subsequent calls re-use BedTool._pool
bt.randomstats(_orig_pool=True, processes=25, *args, **kwargs)

# set to None to re-initialize w/ different nprocs
bt._pool = None
bt.randomstats(_orig_pool=True, processes=500, *args, **kwargs)

from pybedtools.

brentp avatar brentp commented on July 28, 2024

I much prefer the former.

from pybedtools.

brentp avatar brentp commented on July 28, 2024

sorry for putting this in this thread, but it's another open file error. if i stream, it must be leaving open the process?

from pybedtools import BedTool

a = BedTool('chr1 1 2', from_string=True)
b = BedTool('chr1 1 2', from_string=True)

for i in range(10000):
    print i
    c = a.intersect(b, stream=True)

is that expected to leak?

from pybedtools.

daler avatar daler commented on July 28, 2024

In this case, I think the answer is yes:

The way streaming bedtools are closed is by hitting a StopIteration (see cbedtools.IntervalIterator). Since c in this example is never iterated over, it never gets a chance to raise a StopIteration to close the stream.

But it would be nice if the garbage collector saw that the streaming BedTool from iteration i-1 no longer has any references, and cleans it up (would a __del__ method be called then?). But this starts to get to the reference counting part of Python & Cython that I don't have a handle on yet. Any ideas?

from pybedtools.

brentp avatar brentp commented on July 28, 2024

i tried a number of things including __del__, but can't get it to work. it doesn't collect them until the program terminates...
Streaming over the results does prevent the error in this case.
I'm getting another file handles open error that I haven't been able to create a small test-case for..

from pybedtools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.