Giter Site home page Giter Site logo

vigna / sux4j Goto Github PK

View Code? Open in Web Editor NEW
149.0 11.0 23.0 22.01 MB

Sux4J is an effort to bring succinct data structures to Java.

License: GNU Lesser General Public License v2.1

Shell 0.37% Makefile 0.12% Java 96.17% HTML 0.04% C 3.30%
java ranking selection elias-fano minimal-perfect-hash succinct-data-structure

sux4j's Introduction

Welcome to Sux4J!

Sux4J is an effort to bring succinct data structures to Java. Presently it provides a number of related implementations covering ranking/selection over bit arrays, compressed lists and [[monotone] minimal perfect hash] functions.

Building

You need Ant and Ivy. Then, run ant ivy-setupjars jar.

seba (mailto:[email protected])

sux4j's People

Contributors

boldip avatar vigna avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sux4j's Issues

Elias-Fano encoding fails to handle BitVectors with values greater than Integer.MAX_VALUE

The following test fails using sux4j 4.0.0 or 4.2.0:

    @Test
    public void testEliasFanoEncoding() {
        final long n = (long)Integer.MAX_VALUE + 1L;  // ok without + 1L

        // this works
        // final LongList longs = new LongArrayList(new long[]{n, n+1});                                                         

        // this fails
        final BitVector longs = LongArrayBitVector.ofLength(n + 1L);
        longs.set(n);
        assertTrue(longs.getBoolean(n));  // ok

        // final Select select = new HintedBsearchSelect(new Rank9(longs));  // this works
        final Select select = new SparseSelect(longs);
        assertEquals(n, select.select(0));
  }

The error is:

testEliasFanoEncoding(com.indeed.mph.TestEliasFanoEncoding)  Time elapsed: 0.125 sec  <<< ERROR!
java.lang.IndexOutOfBoundsException: Index (8589934591) is greater than or equal to length (3)
        at it.unimi.dsi.bits.AbstractBitVector.ensureRestrictedIndex(AbstractBitVector.java:58)
        at it.unimi.dsi.bits.LongArrayBitVector.set(LongArrayBitVector.java:377)
        at it.unimi.dsi.sux4j.util.EliasFanoMonotoneLongBigList.<init>(EliasFanoMonotoneLongBigList.java:254)
        at it.unimi.dsi.sux4j.util.EliasFanoMonotoneLongBigList.<init>(EliasFanoMonotoneLongBigList.java:224)
        at it.unimi.dsi.sux4j.bits.SparseSelect.<init>(SparseSelect.java:79)
        at it.unimi.dsi.sux4j.bits.SparseSelect.<init>(SparseSelect.java:65)
        at com.indeed.mph.TestEliasFanoEncoding.testEliasFanoEncoding(TestEliasFanoEncoding.java:31)

It works without error using the LongArrayList, and the HintedBsearchSelect works either way.

ASL 2.0

Is there a chance this can be licensed as ASL 2.0?

You provided no keys, but the bucketed hash store was not checked

I'm providing a bucketed hash store as shown below. The bucketed hash store size is 4480113. It works for a different set of keys.

Also, the program hangs after the exception, it's not being terminated correctly due to running threads?

 store = new BucketedHashStore<>(TransformationStrategies.rawByteArray(), outputPath.getParent().toFile());
 // adding 4480113 keys to store in a loop
 store.add(hash);


       
     mph = new GOVMinimalPerfectHashFunction.Builder<byte[]>()
                    .store(store)
                    .build();
Caused by: java.lang.IllegalStateException: You provided no keys, but the bucketed hash store was not checked
	at it.unimi.dsi.sux4j.mph.GOVMinimalPerfectHashFunction.<init>(GOVMinimalPerfectHashFunction.java:466)
	at it.unimi.dsi.sux4j.mph.GOVMinimalPerfectHashFunction$Builder.build(GOVMinimalPerfectHashFunction.java:271)

ArrayIndexOutOfBoundsException when using large set of keys with CHDMinimalPerfectHashFunction

Hi,

I'm using CHDMinimalPerfectHashFunction<byte[]> for about 1_00_000 keys. I see ArrayIndexOutOfBoundsException , while the same works if I use 5_00_00 keys.

version: sux4j:5.1.0
jdk 8

Stacktrace
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds for length 0
at it.unimi.dsi.bits.LongArrayBitVector.getLong(LongArrayBitVector.java:488)
at it.unimi.dsi.sux4j.util.EliasFanoLongBigList.getLong(EliasFanoLongBigList.java:242)
at it.unimi.dsi.sux4j.mph.CHDMinimalPerfectHashFunction.getLong(CHDMinimalPerfectHashFunction.java:589)

I tried running GOVMinimalPerfectHashFunction, it takes several minutes to complete for the 5_00_00 keys, so I switched to CHDMinimalPerfectHashFunction which completed in under 5 seconds, but this fails for large keys.

ZFastTrie.pred seems broken

Here are a couple of examples using ZFastTrie.pred and ZFastTrie.succ on Strings and Long.
In both cases succ seems to have the expected behavior but pred results are unexpected.

Am I doing something wrong ?
I'm using version 5.2.3

Example is a Scala worksheet but I believe the code is easy enough to understand.
The comment below each line is the returned value.

val ints = 1.to(50, 5).map(_.toLong: java.lang.Long).toIterable
// ints: Iterable[java.lang.Long] = Vector(1L, 6L, 11L, 16L, 21L, 26L, 31L, 36L, 41L, 46L)

val predsInt = new ZFastTrie(
  ints.asJava,
  bits.TransformationStrategies.fixedLong()
)
// predsInt: ZFastTrie[java.lang.Long] = {1, 6, 11, 16, 21, 26, 31, 36, 41, 46}

predsInt.pred(21L)
// res7: java.lang.Long = 21L
predsInt.pred(22L)
// res8: java.lang.Long = 16L
predsInt.pred(23L)
// res9: java.lang.Long = 16L
predsInt.pred(24L)
// res10: java.lang.Long = 26L

predsInt.succ(0L)
// res11: java.lang.Long = 1L
predsInt.succ(3L)
// res12: java.lang.Long = 6L
predsInt.succ(22L)
// res13: java.lang.Long = 26L
predsInt.succ(23L)
// res14: java.lang.Long = 26L
predsInt.succ(24L)
// res15: java.lang.Long = 26L
predsInt.succ(99L)
// res16: java.lang.Long = null

val nums = List("a", "e", "i", "o", "u")
// nums: List[String] = List("a", "e", "i", "o", "u")
val predsNums = new ZFastTrie(nums.asJava, bits.TransformationStrategies.prefixFreeIso[String])
// predsNums: ZFastTrie[String] = {a, e, i, o, u}

predsNums.pred("b")
// res17: String = null
predsNums.pred("bb")
// res18: String = null
predsNums.pred("d")
// res19: String = "e"
predsNums.pred("e")
// res20: String = "e"
predsNums.pred("f")
// res21: String = "a"
predsNums.pred("j")
// res22: String = "e"
predsNums.pred("aa")
// res23: String = null

predsNums.succ("b")
// res24: String = "e"
predsNums.succ("bb")
// res25: String = "e"
predsNums.succ("d")
// res26: String = "e"
predsNums.succ("e")
// res27: String = "e"
predsNums.succ("f")
// res28: String = "i"
predsNums.succ("j")
// res29: String = "o"
predsNums.succ("aa")
// res30: String = "e"

List("bb", "b", "a", "aa", "d", "e").sorted
// res31: List[String] = List("a", "aa", "b", "bb", "d", "e")

GOVMinimalPerfectHashFunction, optimize the call to getLong

I use the hash function to create unique hashes for millions of unique keys. Each call to getLong() creates int[3] and long[3], which stresses the GC and could be avoided, imho.

So i can use getLongByTriple(long[]), then i have at least the long[3] gone. But it still requires me to subclass the function, so that i get access to the transforming strategy and globalSeed.

And it could be even nicer to have access to getLongByTripleNoCheck(long[], int[]), since i know the keys and do not need the signature check either. But it is a private method.

So what could be an alternative?

I understand, that i need to reset int[] and long[] to 0 with each call. Also you probably want to hide this kind of implementation detail.

Maybe one can provide another structure, like a buffer, that has int[] and long[], but is only visible to the function. And one only has to make sure, that this buffer is not shared within different threads.

GOVBuffer buffer = function.createBuffer();
for(String key : keys) {
   long hash = function.getLong(key, buffer);
}

public class GOVMinimalPerfectHashFunction  ... {
  ...
  public long getLong(T key, GOVBuffer buff) {
       ...
  }
  public static class GOVBuffer {
      private final int[] e = new int[3];
      private final long[] h = new int[3];
      private void reset() {
             int[0] = 0;
             ...
      }
  }
}

Cannot retrieve provided arbitrary values when querying GOVMinimalPerfectHashFunction

When creating a GOVMinimalPerfectHashFunction from a ChunkedHashStore, there is no difference in using add(<T>) versus add(<T>, long) when building the ChunkedHashStore. In other words, I cannot retrieve the values that I provided when querying the GOVMinimalPerfectHashFunction with getLong(<T>).

ChunkedHashStore<CharSequence> chunkedHashStore = new ChunkedHashStore<>(
                TransformationStrategies.utf16());
chunkedHashStore.add("one", 1);
chunkedHashStore.add("ten", 10);
chunkedHashStore.add("four", 4);
chunkedHashStore.check();
GOVMinimalPerfectHashFunction<CharSequence> mph = 
                new GOVMinimalPerfectHashFunction.Builder<CharSequence>()
                        .store(chunkedHashStore)
                        .build();
System.out.println("one: "+mph.getLong("one"));
System.out.println("ten: "+mph.getLong("ten"));
System.out.println("four: "+mph.getLong("four"));

The above code prints

one: 0
ten: 2
four: 1

instead of the expected

one: 1
ten: 10
four: 4

The javadoc for GOVMinimalPerfectHashFunction states that

at construction time you can pass a ChunkedHashStore containing the keys (associated with any value)

Did I miss something?
(I'm using sux4j-4.0.0 with Java 8.)

no way to disable debug messages

Running GOV4Function function is very verbose, I find it hard to to track progress. I just greb -v " DEBUG " to finish task.

maybe switching to log4j will make handling logs more easier))

...
00:21:30.155 [pool-1-thread-4] DEBUG it.unimi.dsi.sux4j.mph.solve.Modulo2System - Active variables: 206 (13%)
00:21:30.156 [pool-1-thread-4] DEBUG it.unimi.dsi.sux4j.mph.solve.Linear4SystemSolver - System is unsolvable
...

Can you show how to run an example?

Hi.
I have read your paper "Monotone Minimal Perfect Hashing: Searching a Sorted Table with O(1) Accesses". It is excellent work.

Can you show me some instruction of how to run an example for monotone minimal perfect hashing based on this code (e.g., compile which file)?
I have no idea you how to organize the project.

EliasFanoMonotoneBigLongBigList is accidentally limited by LongArrayBitVector function

Hi, love your libraries. I am attempting to process a large graph that uses EliasFanoMonotoneBigLongBigList (mmapped, if it matters). I've hit an issue where getLong is failing where lower bits = 9 and any long requested over 15.27 billion errors inside LongArrayBitVector.word().

It seems that the Big version of the EF list should be using the LongBigArrayBitVector version instead of the LongArrayBitVector, at least for the lowerbits. Because the upperbits is still a single array limited to 2^31, there should probably be two functions.

ZFastTrie - null pointer on add operation

Error message:

Exception in thread "main" java.lang.NullPointerException
at it.unimi.dsi.sux4j.util.ZFastTrie.addBefore(ZFastTrie.java:778)
at it.unimi.dsi.sux4j.util.ZFastTrie.add(ZFastTrie.java:1153)
at java.base/java.util.AbstractCollection.addAll(AbstractCollection.java:352)
at com.creanga.sux4jissues.ZFastTrieNullPointer.main(ZFastTrieNullPointer.java:17)

Sux4J version: 4.3.0

To reproduce the issue run the following class:

https://github.com/cornelcreanga/Sux4J-issues/blob/master/src/main/java/com/creanga/sux4jissues/ZFastTrieNullPointer.java

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.