Giter Site home page Giter Site logo

libcds's People

Contributors

fclaude avatar lyda avatar marioariasga avatar mpetri avatar rcanovas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

libcds's Issues

Incorrect suffix array range returned by SuffixTreeY.Child() method

I constructed a suffix tree for the string in the following file www.csse.unimelb.edu.au/~kuruppu/testdata/REF.raw.gz

Then I was searching for the pattern 'tctcactaagata', repeatedly using the Child() method, and when it reached the last symbol, the suffix array range was [9879882, 9879885]. The correct suffix array range for this pattern is [9879882, 9879892]. I looked at how the Child() method was implemented and narrowed it down to the call to FChild() as the problem. FChild() should've returned the range [9879882, 9879892] but it returns [9879882, 9879885]. I didn't understand how the npr->find_RMQ() method was implemented to figure out what went wrong.

I constructed the suffix tree as follows:

SuffixTreeY *st = new SuffixTreeY(sequence, seqlen, DAC, CN_NPR, 32);

And I searched for the pattern as follows:

char p[] = "tctcactaagata";
size_t pl, pr, cl, cr;

st->Root(&pl, &pr);
size_t i = 0;
while (p[i])
{
    st->Child(pl, pr, p[i], &cl, &cr);
    if (cl == (size_t)-1 || cr == (size_t)-1)
        break;
    pl = cl;
    pr = cr;
    i++;
}

I hope this gives you enough information.

memory leak in WaveletTree

this little patch fixes a memory leak that has caused me endless headaches recently:

--- aa/src/static/sequence/WaveletTree.cpp  2012-06-29 22:56:16.405930063 +0200
+++ bb/src/static/sequence/WaveletTree.cpp  2012-06-29 22:56:38.841930596 +0200
@@ -96,8 +96,9 @@
     size_t WaveletTree::rank(uint symbol, size_t pos) const
     {
       uint * s = c->get_symbol(am->map(symbol));
-        return root->rank(s, pos, 0);
-   delete [] s;
+      size_t ret =  root->rank(s, pos, 0);
+      delete [] s;
+      return ret;
     }

     size_t WaveletTree::count(uint s) const

(sorry, no time for forking the repo etc.)

WaveletTreeNoptrs incorrectly assumes max element gets mapped to max element

Class WaveletTreeNoptrs allows me to specify a Mapper. I've implemented a custom mapper which maps most elements to 0 and a couple interesting ones to nonzero.

The constructor WaveletTreeNoptrs::WaveletTreeNoptrs(const Array & a, BitSequenceBuilder * bmb, Mapper * am) computes max_v = am->map(a.getMax()); and then sets height=bits(max_v);

When the maximum of the original values gets mapped to 0, the code sets the height of the tree to 0.

Perhaps I am missing the point but I suspect this is unintended?

Autotools problem

Running "./configure && make" gives the following error

Makefile:966 ***missing separator. Stop

Build error

When trying to compile on Mac OS X 10.9.2 with clang (Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn):

  CXX    static/coders/huff.lo
static/coders/huff.cpp:173:25: error: reference to 'bitset' is ambiguous
                        if ((code >> d) & 1) bitset(stream,ptr);
                                             ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/c++/v1/bitset:639:29: note: candidate found by name lookup is 'std::__1::bitset'
class _LIBCPP_TYPE_VIS_ONLY bitset
                            ^
../include/libcdsBasics.h:122:14: note: candidate found by name lookup is 'cds_utils::bitset'
        inline void bitset(uint * e, size_t p) {
                    ^

Segmentation fault when traversing object of type SuffixTreeY

I was attempting to construct a compressed suffix tree for this DNA sequence.

The suffix array is constructed successfully (at least I think so) but as soon as you invoke any methods that requires traversing the tree, a segmentation fault occurs. For example, invoking the following method call

cst->Child(node_vl, node_vr, 'c', &next_vl, &next_vr);

where node_vl and node_vr are obtained from cst->Root(), causes a segmentation fault.

The following methods were invoked to construct the suffix tree.

SuffixTree *cst;
SuffixTreeY csty(text, length, DAC, CN_NPR, 32);
cst = &csty;

The stack trace from gdb is as follows:
#0 0x00000000004086fa in cds_utils::get_field_64 (A=0xfe2820, len=6, index=65836) at ../includes/libcdsTrees.h:26
#1 0x0000000000411354 in cds_static::NPR_CN::find_RMQ (this=0xfc77e0, x=1, y=11309, r=1, min_r=0x7fffa70200e0)

at static/suffixtree/NPR_CN.cpp:413

#2 0x00000000004112c9 in cds_static::NPR_CN::find_RMQ (this=0xfc77e0, x=1, y=361925, r=0, min_r=0x7fffa7020198)

at static/suffixtree/NPR_CN.cpp:408

#3 0x0000000000411793 in cds_static::NPR_CN::find_RMQ (this=0xfc77e0, x=1, y=11581659, csa=0xfe2a60, lcp=0xfc76b0)

at static/suffixtree/NPR_CN.cpp:348

#4 0x0000000000402c39 in cds_static::SuffixTreeY::SDepth (this=0xfc7680, vl=0, vr=11581659) at static/suffixtree/SuffixTreeY.cpp:91
#5 0x0000000000404eba in cds_static::SuffixTreeY::Child (this=0xfc7680, vl=0, vr=11581659, a=99 'c', child_l=0x7fffa7020308,

child_r=0x7fffa7020300) at static/suffixtree/SuffixTreeY.cpp:279

#6 0x00000000004022e9 in testSuffixTree (s1=0xfc7680) at testSuffixTree.cpp:52
#7 0x00000000004024fc in main (argc=2, argv=0x7fffa7020698) at testSuffixTree.cpp:92

Hope this is enough information to fix the problem.

additional features

Hey,

in my local fork of libcds I have some more features which I'm not sure are of interest as libcds2 is currently in development.

The features are:

  • 64bit wavelet tree implementation (was able to create a wt over 15gb of text)
  • build the wavelet tree level wise to save lots of space
  • parallel wavelet tree construction (using pthreads)
  • return the k most frequent items within a range.
  • wt_coder_hutucker()
  • faster BitSequenceRG based on http://sux.dsi.unimi.it/ (and take advantage of popcnt() cpu instruction if available)
  • other minor fixes

should I push these features or should I wait for libcds2 to be in reasonable state?

-Matthias

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.