fclaude / libcds Goto Github PK
View Code? Open in Web Editor NEWCompact Data Structures Library
Home Page: http://libcds.recoded.cl
License: GNU Lesser General Public License v2.1
Compact Data Structures Library
Home Page: http://libcds.recoded.cl
License: GNU Lesser General Public License v2.1
I get a DNS error now when trying to hit http://libcds.recoded.cl. Has it moved?
I constructed a suffix tree for the string in the following file www.csse.unimelb.edu.au/~kuruppu/testdata/REF.raw.gz
Then I was searching for the pattern 'tctcactaagata', repeatedly using the Child() method, and when it reached the last symbol, the suffix array range was [9879882, 9879885]. The correct suffix array range for this pattern is [9879882, 9879892]. I looked at how the Child() method was implemented and narrowed it down to the call to FChild() as the problem. FChild() should've returned the range [9879882, 9879892] but it returns [9879882, 9879885]. I didn't understand how the npr->find_RMQ() method was implemented to figure out what went wrong.
I constructed the suffix tree as follows:
SuffixTreeY *st = new SuffixTreeY(sequence, seqlen, DAC, CN_NPR, 32);
And I searched for the pattern as follows:
char p[] = "tctcactaagata";
size_t pl, pr, cl, cr;
st->Root(&pl, &pr);
size_t i = 0;
while (p[i])
{
st->Child(pl, pr, p[i], &cl, &cr);
if (cl == (size_t)-1 || cr == (size_t)-1)
break;
pl = cl;
pr = cr;
i++;
}
I hope this gives you enough information.
this little patch fixes a memory leak that has caused me endless headaches recently:
--- aa/src/static/sequence/WaveletTree.cpp 2012-06-29 22:56:16.405930063 +0200
+++ bb/src/static/sequence/WaveletTree.cpp 2012-06-29 22:56:38.841930596 +0200
@@ -96,8 +96,9 @@
size_t WaveletTree::rank(uint symbol, size_t pos) const
{
uint * s = c->get_symbol(am->map(symbol));
- return root->rank(s, pos, 0);
- delete [] s;
+ size_t ret = root->rank(s, pos, 0);
+ delete [] s;
+ return ret;
}
size_t WaveletTree::count(uint s) const
(sorry, no time for forking the repo etc.)
Class WaveletTreeNoptrs allows me to specify a Mapper. I've implemented a custom mapper which maps most elements to 0 and a couple interesting ones to nonzero.
The constructor WaveletTreeNoptrs::WaveletTreeNoptrs(const Array & a, BitSequenceBuilder * bmb, Mapper * am)
computes max_v = am->map(a.getMax());
and then sets height=bits(max_v);
When the maximum of the original values gets mapped to 0, the code sets the height of the tree to 0.
Perhaps I am missing the point but I suspect this is unintended?
Running "./configure && make" gives the following error
Makefile:966 ***missing separator. Stop
When trying to compile on Mac OS X 10.9.2 with clang (Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn):
CXX static/coders/huff.lo
static/coders/huff.cpp:173:25: error: reference to 'bitset' is ambiguous
if ((code >> d) & 1) bitset(stream,ptr);
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/c++/v1/bitset:639:29: note: candidate found by name lookup is 'std::__1::bitset'
class _LIBCPP_TYPE_VIS_ONLY bitset
^
../include/libcdsBasics.h:122:14: note: candidate found by name lookup is 'cds_utils::bitset'
inline void bitset(uint * e, size_t p) {
^
I was attempting to construct a compressed suffix tree for this DNA sequence.
The suffix array is constructed successfully (at least I think so) but as soon as you invoke any methods that requires traversing the tree, a segmentation fault occurs. For example, invoking the following method call
cst->Child(node_vl, node_vr, 'c', &next_vl, &next_vr);
where node_vl and node_vr are obtained from cst->Root(), causes a segmentation fault.
The following methods were invoked to construct the suffix tree.
SuffixTree *cst;
SuffixTreeY csty(text, length, DAC, CN_NPR, 32);
cst = &csty;
The stack trace from gdb is as follows:
#0 0x00000000004086fa in cds_utils::get_field_64 (A=0xfe2820, len=6, index=65836) at ../includes/libcdsTrees.h:26
#1 0x0000000000411354 in cds_static::NPR_CN::find_RMQ (this=0xfc77e0, x=1, y=11309, r=1, min_r=0x7fffa70200e0)
at static/suffixtree/NPR_CN.cpp:413
#2 0x00000000004112c9 in cds_static::NPR_CN::find_RMQ (this=0xfc77e0, x=1, y=361925, r=0, min_r=0x7fffa7020198)
at static/suffixtree/NPR_CN.cpp:408
#3 0x0000000000411793 in cds_static::NPR_CN::find_RMQ (this=0xfc77e0, x=1, y=11581659, csa=0xfe2a60, lcp=0xfc76b0)
at static/suffixtree/NPR_CN.cpp:348
#4 0x0000000000402c39 in cds_static::SuffixTreeY::SDepth (this=0xfc7680, vl=0, vr=11581659) at static/suffixtree/SuffixTreeY.cpp:91
#5 0x0000000000404eba in cds_static::SuffixTreeY::Child (this=0xfc7680, vl=0, vr=11581659, a=99 'c', child_l=0x7fffa7020308,
child_r=0x7fffa7020300) at static/suffixtree/SuffixTreeY.cpp:279
#6 0x00000000004022e9 in testSuffixTree (s1=0xfc7680) at testSuffixTree.cpp:52
#7 0x00000000004024fc in main (argc=2, argv=0x7fffa7020698) at testSuffixTree.cpp:92
Hope this is enough information to fix the problem.
Hey,
in my local fork of libcds I have some more features which I'm not sure are of interest as libcds2 is currently in development.
The features are:
should I push these features or should I wait for libcds2 to be in reasonable state?
-Matthias
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.