Comments (9)
from sourmash.
May be drop __len__
altogether then. MinHash
does not seem to implement of Python's sequence protocol anyways.
Note that PR #79 is fixing a bit of the C code to allow the use of getters/setters (which allow size
or max_size
to be properties).
from sourmash.
from sourmash.
May be drop
__len__
altogether then.MinHash
does not seem to implement of Python's sequence protocol anyways.
well, that could be fixed...?
__len__
is used for true/false or empty/not-empty calculations too, which I
like. Seems pythonic to have it work :)
Then if __len__
is desirable, it would make sense that get_mins()
returns an object of identical length.
An alternative to that would be to have an attribute mins
that is implementing __len__
. Places where len(mh)
is currently used would then become len(mh.mins)
(keeping, length, and removing possible ambiguities).
from sourmash.
yes, from my perspective:
__len__
should return what len(mh.get_mins())
would return (for efficiency)
new attribute mh.max_size
returns the minhash num
It is important to have a mh.get_mins()
function so that we can pass
in with_abundance
keyword. In that context I worry that mh.mins
would
be confusing if we had both.
from sourmash.
What about the following as an alternative to mh.get_mins(with_abundance=True)
?
zip(mh.mins, mh.abundances)
from sourmash.
We would have to encode the logic in minhash_get_mins
(in _minhash.cc
) in Python then.
from sourmash.
This is probably OK and not too confusing, but it seems like something that should be done in fast C land if we already have the code working.
from sourmash.
I am uncertain the loss in performance would matter much (but depends on use cases). If C at all costs one could have a specialized method mh.mins_with_abundance()
(reusing the code already implemented). An other way to look at might be that abundances are bolted onto a container of hash values and an alternative design pattern could be considered (e.g, additional class QuantifiedMinHash
).
Not a priority and this can be revisited later, if at all.
from sourmash.
Related Issues (20)
- GTDB accessions to LCAs? HOT 2
- rust beta lints are failing HOT 1
- If there is a command or way to annotate individual contigs without creating many files? HOT 4
- unexpected `sig grep`, `sig extract` behavior with identical md5s HOT 2
- update GTDB - R08-RS214 HOT 1
- (3rd party) docs for using sourmash for contamination analysis, etc.
- update `doc/` notebooks & notebook links
- links to tutorial notebooks are not working!?
- LINS tutorials HOT 1
- /.singularity.d/runscript: 3: exec: trim-low-abund.py: not found HOT 4
- `tax grep` doesn't search the identifiers!
- Transcriptome of interest, gather in metaTs signatures, repeat. Faster or directly to mapping? HOT 4
- Panic Error when using sourmash gather HOT 3
- `sig cat` selection relies on md5sum, which can pull in unwanted sigs with duplicated md5 HOT 1
- automated test builds are failing routinely (but sporadically) b/c of coverage HOT 1
- Comparing the taxonomic composition of multiple metagenomes HOT 2
- Finding best reference from NGS data HOT 4
- `sig check` very slow for large databases? HOT 1
- print empty results from `sourmash tax`?
- tutorial TODO: finding the best reference sequences for NGS data HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sourmash.