Comments (7)
@koke
@jmalkin
DataSketches-hive 1.1.0-incubating has been Released!
Please goto Downloads
from datasketches-hive.
This is a bug in lots of places, as noted in the PR. I'll start working to fix them all (that might take a bit but I'll try to get it done soon).
Wrapping the larger buffer will be harmless in most cases. The specific issue here is that there was an older version of the specialized SingleItemSketch that didn't include a flag indicating that it followed the slightly different single item format. To handle that problem, the only workaround that could be identified was to use the buffer length. In general, the images shouldn't need to rely on buffer size (other than ensuring it's not too small) when being read.
And checks against size for the empty sketch scenario should be a performance optimization, but work properly if parsing an actual empty sketch later. But we should fix this properly anyway.
from datasketches-hive.
My PR replaces all calls to getBytes() in the repo (aside from in the new code to wrap BytesWritable in a Memory). Had to copy bytes in a few places, but I was mostly able to just wrap using getLength() bytes. Just need it reviewed now.
from datasketches-hive.
@leerho Please revert that change and apply the PR that completely addresses the issue.
from datasketches-hive.
from datasketches-hive.
Thanks again @koke for finding this. We did add your unit test, while fixing the underlying issue everywhere in the repo so that future changes in other sketches won't risk triggering this problem again.
from datasketches-hive.
Is there a plan to do a new release with this? I just hit a new issue with the IntersectSketchUDF
and had to replicate the workaround for that one
from datasketches-hive.
Related Issues (8)
- HLL UDFs not compatible with Spark 2.2+ HOT 3
- DataToSketchUDAF does not support using custom seeds for sketch creation HOT 1
- Use new Union interface in MergeSketchUDAF HOT 1
- Consider hashing STRING columns as UTF-8 instead of UTF-16 in HLL HOT 6
- Set Operations don't support custom seeds HOT 1
- the error of intersect reaches 41% HOT 2
- ClassCast exception from dataToSketch HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datasketches-hive.