Comments (7)
some rough stream of conciousness thoughts on this...
the simplest way to go about this would be to just add something like...
public static TDigest createDigestFromBytes(ByteByffer buf) {
// TODO: update this class name whenever createDigest is modified
return AVLTreeDigest.fromBytes(buf);
}
...but this would break in cases of "rolling updates" to distributed systems, where one node might still be using tdigest-3.1 where AVLTreeDigest is the default, but another node has just upgraded to tdigest-3.42 where MergingDigest might be the new the default.
A better approach might be to include in the binary data a prefix indicating which implementation is used, so that any of them could be supported genericly, even if they are not hte current "default" impl. This could be done either by breaking backcompat in all of the current asByte/fromByte methods, or by introducing a new static factory method for the serialization, in addition to the deserialization, and making them responsible for writing/reading an impl prefix before delegating to the eixsting static methods. This wouldn't be very polymorphic, but would ensure that as long as client code used the API consistently, they would continue to work even as future versions of TDigest add future impls and/or changes the default impl.
half brained, ugly, example...
private static final Class<TDigest>[] SERIALIZATION_SUPPORTED_IMPLS = new Class<TDigest>[] {
// NEVER CHANGE THE ORDER OF THIS ARRAY, ONLY ADD TO THE END
// IF IMPLS ARE DELETED FROM CODE BASE, REPLACE WITH null TO PRESERVER INDEX OFFSETS
TreeDigest.class,
ArrayDigest.class,
AVLTreeDigest.class,
MergingDigest.class
};
/** only usable in conjunction with deserialize(ByteBuffer) */
public static void serialize(TDigest d, ByteBuffer buf) {
int prefix = 0;
for (Class<TDigest> c : SERIALIZATION_SUPPORTED_IMPLS) {
if (null != c && c.isInstance(d)) {
buf.putInt(0 - prefix);
d.asSmallBytes(buf);
return;
}
prefix++;
}
throw new Exception("TODO: unsupported impl error");
}
/** only usable in conjunction with serialize(TDigest, ByteBuffer) */
public static TDigest deserialize(TDigest d, ByteBuffer buf) {
final int prefix = buf.getInt();
assert prefix <= 0 : "TODO: better error handling";
Class<TDigest> c = SERIALIZATION_SUPPORTED_IMPLS[ 0 - prefix ];
// TODO: this gets more hairy then serialize ...
// either need to use reflection to call static fromBytes method in each class
// or abandon the SERIALIZATION_SUPPORTED_IMPLS array and just go with a simple switch
}
from t-digest.
Yeah.... good ideas.
I think also that we have enough ideas across the different kinds of
t-digest that we could just as easily define a relatively universal
serialization format that can deserialize into any t-digest. This allows
rolling updates, specific deserialization or non-specific,
give-me-the-best-kind deserialization.
On Fri, Apr 24, 2015 at 2:06 PM, Hoss Man [email protected] wrote:
some rough stream of conciousness thoughts on this...
the simplest way to go about this would be to just add something like...
public static TDigest createDigestFromBytes(ByteByffer buf) {
// TODO: update this class name whenever createDigest is modified
return AVLTreeDigest.fromBytes(buf);
}...but this would break in cases of "rolling updates" to distributed
systems, where one node might still be using tdigest-3.1 where
AVLTreeDigest is the default, but another node has just upgraded to
tdigest-3.42 where MergingDigest might be the new the default.A better approach might be to include in the binary data a prefix
indicating which implementation is used, so that any of them could be
supported genericly, even if they are not hte current "default" impl. This
could be done either by breaking backcompat in all of the current
asByte/fromByte methods, or by introducing a new static factory method for
the serialization, in addition to the deserialization, and making them
responsible for writing/reading an impl prefix before delegating to the
eixsting static methods. This wouldn't be very polymorphic, but would
ensure that as long as client code used the API consistently, they would
continue to work even as future versions of TDigest add future impls and/or
changes the default impl.half brained, ugly, example...
private static final Class[] SERIALIZATION_SUPPORTED_IMPLS = new Class[] {
// NEVER CHANGE THE ORDER OF THIS ARRAY, ONLY ADD TO THE END
// IF IMPLS ARE DELETED FROM CODE BASE, REPLACE WITH null TO PRESERVER INDEX OFFSETS
TreeDigest.class,
ArrayDigest.class,
AVLTreeDigest.class,
MergingDigest.class
};/** only usable in conjunction with deserialize(ByteBuffer) */
public static void serialize(TDigest d, ByteBuffer buf) {
int prefix = 0;
for (Class c : SERIALIZATION_SUPPORTED_IMPLS) {
if (null != c && c.isInstance(d)) {
buf.putInt(0 - prefix);
d.asSmallBytes(buf);
return;
}
prefix++;
}
throw new Exception("TODO: unsupported impl error");
}/** only usable in conjunction with serialize(TDigest, ByteBuffer) */
public static TDigest deserialize(TDigest d, ByteBuffer buf) {
final int prefix = buf.getInt();
assert prefix <= 0 : "TODO: better error handling";
Class c = SERIALIZATION_SUPPORTED_IMPLS[ 0 - prefix ];// TODO: this gets more hairy then serialize ...
// either need to use reflection to call static fromBytes method in each class
// or abandon the SERIALIZATION_SUPPORTED_IMPLS array and just go with a simple switch
}—
Reply to this email directly or view it on GitHub
#52 (comment).
from t-digest.
I ill take your word for it that a universal serialization format is possible - i haven't looked at enough diff impls to make sense of how they differ.
one other aspect of this that i just realized would be problematic is generalizing how the client knows how big to make the ByteBuffer - untill/unless issue #53 has a solution that makes the cost of smallByteSize()
equally "cheap" for all concrete impls, expecting clients to call that on an arbitrary TDigest isn't really a good idea.
from t-digest.
I am addressing this in the current release. My strategy is:
-
implement serializable for all current digests (AvlTree and MergingDigest)
-
you will be able serialize any type of digest and then deserialize it into any other.
-
there will be an easy and default way to deserialize to the "current best" concrete type.
from t-digest.
Well, 2 / 3 ain't bad.
Both AVLTreeDigest and MergingDigest are serializable, but there is currently no cross serialization. This meets goal 1. Goal 2 and 3 aren't there yet and I am going to punt it for now.
from t-digest.
See #87 for future work.
from t-digest.
Closing this in favor of the work on #87
from t-digest.
Related Issues (20)
- update digest
- TDigest with custom equals and hashcode implementation HOT 2
- Mergeability of t-digest HOT 3
- Allow AVLTreeDigest's to be identical to another given the same set of inputs HOT 1
- Release notes for 3.3? HOT 1
- Will merging multiple t-digest preserve the exact value of min/max? HOT 3
- Behavior when compression ratio is 1 HOT 1
- TDigest objet serializable HOT 1
- tag missing problem HOT 2
- Decay TDigest HOT 4
- T-Digest (Re)Construction
- Merge implementation of MergingDigest HOT 2
- Question on quantile calculation logic HOT 3
- -deleted- HOT 1
- Add support for double weights HOT 2
- how to implement sliding windows quantile? HOT 1
- Determining quality HOT 3
- OpenTelemetry, Summaries and TDigests HOT 5
- Have `TDigest` implement `Consumer` HOT 1
- New release? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from t-digest.