Giter Site home page Giter Site logo

gorilla-tsc's People

Contributors

burmanm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gorilla-tsc's Issues

Reconsider ZigZag for storing longs

When storing long values, ZigZag encoding should be considered (like with timestamps). While this can be done manually as well outside the library, it could as well be supported in the library.

This improves compression ratio when on negative/positive fluctuating serie (such as -1, 1, -1, 1)

Why must values be inserted in the increasing time order

In the data structure section of the readme, it says "Values must be inserted in the increasing time order, out-of-order insertions are not supported". If I understand it correctly, it requires the library user to insert data points with strict time order (no out of order data points). Can you elaborate on the reasoning of this? There seems no such limitation of the original paper and I did not see such login in your code either.

Thanks!

Move benchmarks to test part

The Benchmark parts are still residing in the main/ section of the source code, while they should be in the test. This causes the jmh-libraries and all its subdependencies to flow to implementing programs.

Incorrect Maven dependency

Dependency group id is incorrect in the README file, here's the correct one.

<dependency>
	<groupId>fi.iki.yak</groupId>
	<artifactId>compression-gorilla</artifactId>
	<version>1.1.0</version>
</dependency>

Enforce positive timestamps and maximal gaps

The timestamp of value 0 is somewhat specially treated by the library and used as initialiser for the field storedTimestamp in both compression & decompression.
This leads to a number of strange behaviours if in the middle of a series some timestamp is 0.

It might be a good idea to disallow all non-positive timestamps to protect users from those behaviours.

Also, the same could be applied for timestamps that are too far apart in time.

Fix travis issues

Travis should not try to use all the maven build modules (such as gpg).

Create copy constructors for LongArrayOutput and ByteBufferBitOutput

Provide a mechanism to get an exact copy of each BitOutput implementation.

If a user keeps a reference to the BitOutput implementation passed to a Compressor/GorillaCompressor, this will allow an exact copy to be made, finalized (as if Compressor.close() was called), and passed to a Decompressor/GorillaDecompressor without affecting the original Compressor/GorillaCompressor.

(pull request to follow)

Don't write end stream

There should be support for avoiding to write the end stream as this consumes several bytes per chunk. For chunks with small amount of datapoints, this is large overhead and completely avoidable. Perhaps, make it configurable and of course backwards compatible with older chunks.

Ability to use DFCM predictor

While last value predictor as used in the Gorilla library is often good for monitoring data, it's not very good predictor for many patterns. It should be configurable which predictor is used and implement the DFCM predictor as one opportunity (and turning current predictor to LastValuePredictor).

Also, by allowing to share the predictor instance between the blocks it's possible to use larger prediction tables and learn from previously stored values (instead of relearning always after a new block is used). This of course requires to rebuild the predictor from multiple blocks (instead of making a single block independent), but in theory could allow better compression ratio. This is of course up to the user to decide, this library should not do such decisions.

LongArrayOutput#flipByteWithoutExpandCheck runs into ArrayIndexOutOfBoundsException

private void checkAndFlipByte() { // Wish I could avoid this check in most cases... if(bitsLeft == 0) { flipByte(); } }

We only call flipByte() (and in turn expand allocation) when bitsLeft is exactly 0. This might not always be the case. Also when bits > bitsLeft, we call flipByteWithoutExpandCheck and if position has already reached the long array size, then we run into ArrayIndexOutOfBoundsException.

Still being maintained? And a few PRs if so

Notice this library is in use in a few OSS projects but the issues are quite old.

I had a few PRs to make but seeing as there isn't activity here in quite a while I was wondering @burmanm if you were interesting in passing the reins for maintenance elsewhere.

PR's I wanted to open:

  • Support for 5-bit leading zeros instead of 6 (consistent with original paper and used in some other libraries
  • Explicit Float32 support with 1 less bit

Thanks for the effort to create this library it's been quite useful for us!

Support use of only timestamp or value compression

In some cases, the use of only timestamp or only the value compression is preferred. One of these cases could be the use of high precision in timestamps, where this compression method is not at its best.

Make nextTimestamp() and nextValue() public and allow the use of them directly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.