Comments (24)
You can overload Java serialization process to save what you want, for example in my example, methods getBytes and fromBytes already implements in your Digest classes.
In this example, fields of object are ignored and only declared "digestEncodeWithBytes" field is declared :
public abstract class TDigest implements Serializable {
// For serialisation, overload ObjectInput/OutputStream reflection
private static final ObjectStreamField[] serialPersistentFields = {
new ObjectStreamField("digestEncodeWithBytes", byte[].class)
};
// Or use getBytes(ByteBuffer) instead
public abstract byte[] getBytes();
// Serialization create the object which implements this abstract class, so call a method to init it
protected abstract void initWithBytes(byte[] b);
/**
* Overload {@link ObjectOutputStream#defaultWriteObject()}
*
* @param oos object output stream use for write
* @throws IOException error during write process
*/
private void writeObject(ObjectOutputStream oos) throws IOException {
oos.putFields().put("digestEncodeWithBytes", getBytes());
oos.writeFields();
}
/**
* Overload {@link ObjectInputStream#defaultReadObject()}
*
* @param ois object input stream use for read
* @throws IOException error during read process
* @throws ClassNotFoundException class use in serialised stream unknown
*/
private void readObject(ObjectInputStream ois) throws IOException, ClassNotFoundException {
Object o = ois.readFields().get("digestEncodeWithBytes", null);
if (o != null) {
byte[] b = (byte[]) o;
initWithBytes(b);
}
}
}
You can also overwrite deserialisation with : link
class HackedObjectInputStream extends ObjectInputStream {
public HackedObjectInputStream(InputStream in) throws IOException {
super(in);
}
@Override
protected ObjectStreamClass readClassDescriptor() throws IOException, ClassNotFoundException {
ObjectStreamClass resultClassDescriptor = super.readClassDescriptor();
if (resultClassDescriptor.getName().equals("oldpackage.Clazz"))
resultClassDescriptor = ObjectStreamClass.lookup(newpackage.Clazz.class);
return resultClassDescriptor;
}
}
from t-digest.
Isn't it possible to add a Transcoder interface which has byte[] asBytes(SerializationModel)
and SerializationModel fromBytes(byte[])
methods? SerializationModel here being a java object that is good enough to represent any digest. TDigest interface can now have a create method that takes SerializationModel as input which all digests implement. Basically taking out byte-level concerns out of the digest impls.
You can provide a default transcoder which does serialization similar to how it's done today (with bugfixes), while keeping the flexibility of accepting PRs from others that implement transcoders for other formats. If this line of thought makes sense to you and more importantly if we can come up with a common model to represent all digests, I will be happy to contribute a PR!
A little background: we have a project that does jvm profiling for clusters and all our components talk Protobuf. We deal with latencies, scheduler delays and other metrics and streaming quantiles make a lot of sense. Hence the interest in adopting tdigest but it does not have protobuf support today nor the design is friendly enough to submit a PR for the same
from t-digest.
from t-digest.
No. Basically, I wanted to enable bring your own serialization logic with some sane ones provided by the lib. By SerializationModel, I am implying one java class something on the lines of:
class SerializationModel {
int encodingType; double min; double max; float compression; // ... and other members
}
class MergingDigest {
// ...
static MergingDigest fromSerializationModel(SerializationModel);
}
interface Transcoder {
byte[] asBytes(SerializationModel);
SerializationModel fromBytes(byte[]);
}
class ProtobufTranscoder implements Transcoder {
// ...
}
This keeps the byte magic out of digest implementations and they are responsible for generating a uniform on-heap representation (which is SerializationModel, I know this name sucks)
Users can instantiate any out of the box transcoder or write their own. A way to maintain backward compatibility would be for MergingDigest, etc to optionally take a transcoder as constructor param and use that (or default one if not provided) to keep existing asBytes, fromBytes method working.
If you have any other ideas on how tdigest can support multiple serialization formats (and hopefully you consider that as a feature to be added in the first place), would love to hear about them.
from t-digest.
from t-digest.
Or DigestModel? I see this as a dto representing digests.
Sure, I have some free time next week so expect a PR then.
from t-digest.
You mention interchangeability of digest implementations. Presently, MergingDigest and AvlTreeDigest serialize different values so their serialized representation is not equivalent. Will need your inputs on structure of DigestModel and what fields of MergingDigest/AvlTreeDigest map to DigestModel fields.
The straightforward ones are min, max, compression, mean[]. But rest differ across impls.
from t-digest.
from t-digest.
@tdunning Need your inputs on what DigestModel looks like. As mentioned in previous comment, presently different fields are serialized across digest implementations
from t-digest.
from t-digest.
Apart from bufferSize, MergingDigest serializes lastUsedCell
. For verbose encoding, this is deserialized and set as a property in digest. For small encoding, lastUsedCell additionally represents the values to read and assign in mean/weight array. Rest of the array seems to be zero-filled since n
(array size) is part of serde as well.
Also, oddly MergingDigest#asBytes serializes tempMean.length but MergingDigest#fromBytes (verbose) does not expect it to be present in buffer. Is it a known issue or am I missing something there?
from t-digest.
from t-digest.
@anvinjain Do you have a PR coming?
We need to cut the new major release. If you can't get this in soon, it will have to wait.
from t-digest.
from t-digest.
from t-digest.
This is being pushed beyond the 3.2 release.
from t-digest.
@tdunning I have raised a PR around this.
from t-digest.
from t-digest.
@tdunning Did you get a chance to look at the PR?
from t-digest.
from t-digest.
@tdunning Ping. Is there a timeline for next release? If you don't see this PR #99 making the cut without major changes, let me know.
from t-digest.
@tdunning looks like the PR was never merged. This is of interest to us too, so if there are any review comments we can take a look at addressing them too. Thanks in advance.
from t-digest.
from t-digest.
Here's my proposal: #162
- Serializations can be configured via
serialization.properties
file. - The serialization gets instatniated with a
java.util.Properties
object containing all properties form theserialization.properties
file - It can load custom implementations via classForName as long as they have been added to the classpath and configured in the
serialization.properties
file. It's up to the implementation what to do with those properties.
Example for serialization.properties
file
tdigest.serializerClass=org.example.CustomSerializer
someProperty=someValue
someOtherProperty=someOtherValue
This is the current implementation of TDigestDefaultSerializer
public class TDigestDefaultSerializer extends AbstractTDigestSerializer<TDigest, byte[]> {
public TDigestDefaultSerializer(Properties properties) {
super(properties);
}
@Override
public byte[] serialize(TDigest tDigest) throws TDigestSerializerException {
ByteArrayOutputStream baos = new ByteArrayOutputStream(5120);
try (ObjectOutputStream out = new ObjectOutputStream(baos)) {
out.writeObject(tDigest);
return baos.toByteArray();
} catch (IOException e) {
throw new TDigestSerializerException(e);
}
}
@Override
public TDigest deserialize(byte[] object) throws TDigestSerializerException {
try (ObjectInputStream in = new ObjectInputStream(new ByteArrayInputStream(object))) {
return (TDigest) in.readObject();
} catch (ClassCastException | ClassNotFoundException | IOException e) {
throw new TDigestSerializerException(e);
}
}
}
This is the code for AbstractTDigestSerializer
public abstract class AbstractTDigestSerializer<T extends TDigest, O extends Object> implements TDigestSerializer<T, O> {
private Properties properties;
public AbstractTDigestSerializer(Properties properties) {
this.properties = properties;
}
public Properties getProperties() {
return properties;
}
}
Hope this is being seen as future proof.
There's still some work to do:
- code formatting (I use 🌑Eclipse) and review (especially how I do generics 😅)
- proper JavaDoc documentation in source code
- additional tests for custom serialization implementations and property file handling
from t-digest.
Related Issues (20)
- Mergeability of t-digest HOT 3
- Allow AVLTreeDigest's to be identical to another given the same set of inputs HOT 1
- Release notes for 3.3? HOT 1
- Will merging multiple t-digest preserve the exact value of min/max? HOT 3
- Behavior when compression ratio is 1 HOT 1
- TDigest objet serializable HOT 1
- tag missing problem HOT 2
- Decay TDigest HOT 4
- T-Digest (Re)Construction
- Merge implementation of MergingDigest HOT 2
- Question on quantile calculation logic HOT 3
- -deleted- HOT 1
- Add support for double weights HOT 2
- how to implement sliding windows quantile? HOT 1
- Determining quality HOT 3
- OpenTelemetry, Summaries and TDigests HOT 5
- Have `TDigest` implement `Consumer` HOT 1
- New release? HOT 3
- AssertionError if weight > 1 HOT 3
- Modifying T-digest that handle deletion HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from t-digest.