Giter Site home page Giter Site logo

Comments (7)

hossman avatar hossman commented on May 25, 2024

some rough stream of conciousness thoughts on this...

the simplest way to go about this would be to just add something like...

public static TDigest createDigestFromBytes(ByteByffer buf) {
  // TODO: update this class name whenever createDigest is modified
  return AVLTreeDigest.fromBytes(buf);
}

...but this would break in cases of "rolling updates" to distributed systems, where one node might still be using tdigest-3.1 where AVLTreeDigest is the default, but another node has just upgraded to tdigest-3.42 where MergingDigest might be the new the default.

A better approach might be to include in the binary data a prefix indicating which implementation is used, so that any of them could be supported genericly, even if they are not hte current "default" impl. This could be done either by breaking backcompat in all of the current asByte/fromByte methods, or by introducing a new static factory method for the serialization, in addition to the deserialization, and making them responsible for writing/reading an impl prefix before delegating to the eixsting static methods. This wouldn't be very polymorphic, but would ensure that as long as client code used the API consistently, they would continue to work even as future versions of TDigest add future impls and/or changes the default impl.

half brained, ugly, example...

private static final Class<TDigest>[] SERIALIZATION_SUPPORTED_IMPLS = new Class<TDigest>[] {
  // NEVER CHANGE THE ORDER OF THIS ARRAY, ONLY ADD TO THE END
  // IF IMPLS ARE DELETED FROM CODE BASE, REPLACE WITH null TO PRESERVER INDEX OFFSETS
  TreeDigest.class,
  ArrayDigest.class,
  AVLTreeDigest.class,
  MergingDigest.class
};

/** only usable in conjunction with deserialize(ByteBuffer) */
public static void serialize(TDigest d, ByteBuffer buf) {
  int prefix = 0;
  for (Class<TDigest> c : SERIALIZATION_SUPPORTED_IMPLS) {
    if (null != c && c.isInstance(d)) {
      buf.putInt(0 - prefix);
      d.asSmallBytes(buf);
      return;
    }
    prefix++;
  }
  throw new Exception("TODO: unsupported impl error");
}

/** only usable in conjunction with serialize(TDigest, ByteBuffer) */
public static TDigest deserialize(TDigest d, ByteBuffer buf) {
  final int prefix = buf.getInt();
  assert prefix <= 0 : "TODO: better error handling";
  Class<TDigest> c = SERIALIZATION_SUPPORTED_IMPLS[ 0 - prefix ];

  // TODO: this gets more hairy then serialize ...
  // either need to use reflection to call static fromBytes method in each class
  // or abandon the SERIALIZATION_SUPPORTED_IMPLS array and just go with a simple switch
}

from t-digest.

tdunning avatar tdunning commented on May 25, 2024

Yeah.... good ideas.

I think also that we have enough ideas across the different kinds of
t-digest that we could just as easily define a relatively universal
serialization format that can deserialize into any t-digest. This allows
rolling updates, specific deserialization or non-specific,
give-me-the-best-kind deserialization.

On Fri, Apr 24, 2015 at 2:06 PM, Hoss Man [email protected] wrote:

some rough stream of conciousness thoughts on this...

the simplest way to go about this would be to just add something like...

public static TDigest createDigestFromBytes(ByteByffer buf) {
// TODO: update this class name whenever createDigest is modified
return AVLTreeDigest.fromBytes(buf);
}

...but this would break in cases of "rolling updates" to distributed
systems, where one node might still be using tdigest-3.1 where
AVLTreeDigest is the default, but another node has just upgraded to
tdigest-3.42 where MergingDigest might be the new the default.

A better approach might be to include in the binary data a prefix
indicating which implementation is used, so that any of them could be
supported genericly, even if they are not hte current "default" impl. This
could be done either by breaking backcompat in all of the current
asByte/fromByte methods, or by introducing a new static factory method for
the serialization, in addition to the deserialization, and making them
responsible for writing/reading an impl prefix before delegating to the
eixsting static methods. This wouldn't be very polymorphic, but would
ensure that as long as client code used the API consistently, they would
continue to work even as future versions of TDigest add future impls and/or
changes the default impl.

half brained, ugly, example...

private static final Class[] SERIALIZATION_SUPPORTED_IMPLS = new Class[] {
// NEVER CHANGE THE ORDER OF THIS ARRAY, ONLY ADD TO THE END
// IF IMPLS ARE DELETED FROM CODE BASE, REPLACE WITH null TO PRESERVER INDEX OFFSETS
TreeDigest.class,
ArrayDigest.class,
AVLTreeDigest.class,
MergingDigest.class
};

/** only usable in conjunction with deserialize(ByteBuffer) */
public static void serialize(TDigest d, ByteBuffer buf) {
int prefix = 0;
for (Class c : SERIALIZATION_SUPPORTED_IMPLS) {
if (null != c && c.isInstance(d)) {
buf.putInt(0 - prefix);
d.asSmallBytes(buf);
return;
}
prefix++;
}
throw new Exception("TODO: unsupported impl error");
}

/** only usable in conjunction with serialize(TDigest, ByteBuffer) */
public static TDigest deserialize(TDigest d, ByteBuffer buf) {
final int prefix = buf.getInt();
assert prefix <= 0 : "TODO: better error handling";
Class c = SERIALIZATION_SUPPORTED_IMPLS[ 0 - prefix ];

// TODO: this gets more hairy then serialize ...
// either need to use reflection to call static fromBytes method in each class
// or abandon the SERIALIZATION_SUPPORTED_IMPLS array and just go with a simple switch
}


Reply to this email directly or view it on GitHub
#52 (comment).

from t-digest.

hossman avatar hossman commented on May 25, 2024

I ill take your word for it that a universal serialization format is possible - i haven't looked at enough diff impls to make sense of how they differ.

one other aspect of this that i just realized would be problematic is generalizing how the client knows how big to make the ByteBuffer - untill/unless issue #53 has a solution that makes the cost of smallByteSize() equally "cheap" for all concrete impls, expecting clients to call that on an arbitrary TDigest isn't really a good idea.

from t-digest.

tdunning avatar tdunning commented on May 25, 2024

I am addressing this in the current release. My strategy is:

  1. implement serializable for all current digests (AvlTree and MergingDigest)

  2. you will be able serialize any type of digest and then deserialize it into any other.

  3. there will be an easy and default way to deserialize to the "current best" concrete type.

from t-digest.

tdunning avatar tdunning commented on May 25, 2024

Well, 2 / 3 ain't bad.

Both AVLTreeDigest and MergingDigest are serializable, but there is currently no cross serialization. This meets goal 1. Goal 2 and 3 aren't there yet and I am going to punt it for now.

from t-digest.

tdunning avatar tdunning commented on May 25, 2024

See #87 for future work.

from t-digest.

tdunning avatar tdunning commented on May 25, 2024

Closing this in favor of the work on #87

from t-digest.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.