Giter Site home page Giter Site logo

Comments (9)

ctb avatar ctb commented on June 18, 2024

from sourmash.

lgautier avatar lgautier commented on June 18, 2024

May be drop __len__ altogether then. MinHash does not seem to implement of Python's sequence protocol anyways.

Note that PR #79 is fixing a bit of the C code to allow the use of getters/setters (which allow size or max_size to be properties).

from sourmash.

ctb avatar ctb commented on June 18, 2024

from sourmash.

lgautier avatar lgautier commented on June 18, 2024

May be drop __len__ altogether then. MinHash does not seem to implement of Python's sequence protocol anyways.

well, that could be fixed...?
__len__ is used for true/false or empty/not-empty calculations too, which I
like. Seems pythonic to have it work :)

Then if __len__ is desirable, it would make sense that get_mins() returns an object of identical length.

An alternative to that would be to have an attribute mins that is implementing __len__. Places where len(mh) is currently used would then become len(mh.mins) (keeping, length, and removing possible ambiguities).

from sourmash.

ctb avatar ctb commented on June 18, 2024

yes, from my perspective:

__len__ should return what len(mh.get_mins()) would return (for efficiency)

new attribute mh.max_size returns the minhash num

It is important to have a mh.get_mins() function so that we can pass
in with_abundance keyword. In that context I worry that mh.mins would
be confusing if we had both.

from sourmash.

lgautier avatar lgautier commented on June 18, 2024

What about the following as an alternative to mh.get_mins(with_abundance=True) ?

zip(mh.mins, mh.abundances)

from sourmash.

ctb avatar ctb commented on June 18, 2024

We would have to encode the logic in minhash_get_mins (in _minhash.cc) in Python then.

from sourmash.

ctb avatar ctb commented on June 18, 2024

This is probably OK and not too confusing, but it seems like something that should be done in fast C land if we already have the code working.

from sourmash.

lgautier avatar lgautier commented on June 18, 2024

I am uncertain the loss in performance would matter much (but depends on use cases). If C at all costs one could have a specialized method mh.mins_with_abundance() (reusing the code already implemented). An other way to look at might be that abundances are bolted onto a container of hash values and an alternative design pattern could be considered (e.g, additional class QuantifiedMinHash).

Not a priority and this can be revisited later, if at all.

from sourmash.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.