Giter Site home page Giter Site logo

squeakr's People

Contributors

gmarcais avatar prashantpandey avatar rob-p avatar rtjohnso avatar sjackman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

squeakr's Issues

segfault *after* kmer counts complete?

So I'm under the impression that

  1. mantis needs squeakr-exact to create .ser files
  2. these .ser files can then be merged for querying and
  3. squeakr-count is now multithreaded.

But this doesn't seem to work out so good in practice:

[tim.triche@node069 single]$ THREADS=`cat /proc/cpuinfo | grep proc | wc -l` 
[tim.triche@node069 single]$ echo $THREADS 
80
[tim.triche@node069 single]$ free -h
              total        used        free      shared  buff/cache   available
Mem:           250G         10G        239G         46M        907M        238G
Swap:           11G          0B         11G

Ok looks good. Now let's take an therapy-related AML patient's ancient RNAseq data and index it:

[tim.triche@node069 single]$ squeakr-count -g -k 31 -s 31 -t $THREADS SRR621698.fastq.gz 
Reading from the fastq file and inserting in the QF
Total Time Elapsed: 184.994003seconds
Calc freq distribution: 
Total Time Elapsed: 8.228049seconds
Maximum freq: 329368
Num distinct elem: 312966013
Total num elems: 2172228383
Segmentation fault

Woops? Any ideas for debugging and unit testing are appreciated, since I'd like to scale this up for various search types. Thanks for a great tool and your support in getting it to run smoothly :-)

squeakr-query "Not Find: 0"

Whenever I have tried to query for a specific sequence using squeakr-query, it returns with "Not find: 0". I have tried sequences that are from the fastq files as well as sequences that should not be found in the genome, but I get the same result each time.

In addition, when I try to use squeakr-query using a sequence from the exact branch, it gives me the following error:
terminate called after throwing an instance of 'std::invalid_argument'
what(): stoll
Aborted

Segmentation fault

Hello,

We are trying to run Squeakr on environmental metagenomic datasets.
It is run on a node having 256 GB RAM, and 128 cores.

Squeakr was run on the samples whose number of estimated kmers is given here https://github.com/pierrepeterlongo/kmtricks_benchmarks/tree/master/tara-metag-bacterial/data/estimated_kmer_counts_metaG_bact

The command is

squeakr count -k 20 -c 1 -s ${log_slots} -o output_${sample}/res ${input_files} -t 128;

log_slots value is in [34,36], depending on the datasets.
output_${sample} directory is created before the run.

squeakr prints Reading from the fastq file and inserting in the CQF. and then stops indicating a seg fault. It lasts from 5 minutes to 20 minutes.

Max memory usage ranges between 2GB and 86GB depending on the sets.

Could this be related to this issue #32 ?

Any other idea ?

Thanks!
Pierre

Update release?

I have been unable to use v0.5 (currently on bioconda and the most recent release), but the current master branch works nicely (which I think is about 120 commits ahead of this).

Are you planning to make an updated release tagging this newer version? This would be really helpful re: bioconda packaging too.

Max Kmer length

Hi
I am doing analysis on kmer counting tool and squeakr one of it. I am trying to find out max kmer length for your squeakr.

What is the max kmer length for squeakr or Can you please help me out on this.

Regards
Tarang

Error opening file for serializing.: Is a directory

Thank you for developing this highly useful tool. I am having an issue unfortunately though while processing my fastq file of size ~500MB in both v0.6 and v0.7.

$ squeakr count -e -k 15 -t 32 -o ./ my.fastq
[2022-08-21 03:08:48.703] [squeakr_console] [info] Reading from the fastq file and inserting in the CQF.
[2022-08-21 03:08:53.871] [squeakr_console] [info] Trying to compress the final CQF.
[2022-08-21 03:08:54.365] [squeakr_console] [info] Estimated size of the final CQF: 29
[2022-08-21 03:08:54.365] [squeakr_console] [info] Calculating frequency distribution:
[2022-08-21 03:09:00.170] [squeakr_console] [info] Iteration: Total Time Elapsed: 5.804785 seconds
Error opening file for serializing.: Is a directory

This happens if I use -s 20 -t 1 too. Any insight on how to get around this issue?

Segmentation fault + question

Hi there :)

So, I just downloaded the code of Squeakr, compiled it and run squeakr-count on the test file: everything went fine.

Then, I tried squeakr-count on a different file: https://www.ncbi.nlm.nih.gov/sra/?term=ERR430991.
The Squeakr paper says "Squeakr takes the approximation of number of distinct k-mers as the number of slots to create the CQF". The number of distinct 31-mers in my file in roughly 30 million. Hence, I configured Squeakr with a CQF size parameter of 25 (roughly the log2 of 30 million).

So, I used the following command:

./squeakr-count 0 25 1 ERR430991.fastq

And the result is:

Reading from the fastq file and inserting in the QF
Segmentation fault (core dumped)

I tried again with 30 instead of 25 as CQF size parameter and this time, I obtained:

Reading from the fastq file and inserting in the QF
Total Time Elapsed: 51.129627seconds
Calc freq distribution:
Total Time Elapsed: 0.751258seconds
Maximum freq: 897
Num distinct elem: 28471250
Total num elems: 343450408

So, my first question is: am I doing something wrong when I pick the CQF size parameter? Should I overestimate the approximation of the number of distinct 31-mers in my file?

Also, I used KMC3 to compute k-mer counts from the same file and KMC3 give me in output 30311678 distinct 31-mers while Squeakr says there are 28471250 distinct 31-mers. Any ideas on the difference of about 1.5 million distinct 31-mers?

Thank you for your time and help!

Guillaume

lognumslots.sh sometimes underestimates the required number of slots

I've noticed that on a small number of read sets (e.g. SRR522088), lognumslots.sh underestimates the number of slots needed in the CQF for squeakr-exact

Here's my current workflow for gzipped fastq files

ntcard -k 20 -c 2 -t 10 -p $OUTPREFIX $INPUT
NUMSLOTS=$(lognumslots.sh $OUTPREFIX\_k20.hist)
squeakr-count -g -k 20 -s $NUMSLOTS -t 10 -o $OUTDIR/ $INPUT

In the case of SRR522088, the script computed 26 as the required number of slots, resulting in a segfault. When I set it to 27, it runs smoothly.

Since this script is only in the master branch, I was wondering if there's perhaps a version tuned for the exact branch that I may not be finding in the repo.

Assertion `new_value < current_remainder' failed.

Hi,

I run in the following error when running squeakr:
squeakr: src/gqf/gqf.c:1359: int insert1(QF*, __int128 unsigned, uint8_t): Assertion `new_value < current_remainder' failed. Aborted (core dumped)

My command is:
./squeakr count -k 20 -e -n -t 1 -o data/tmp.squeakr SRR1292579.fastq.gz

I also tried to not use the flags n and e and tried to use -c 50, but the error remained the same.

(SRR1292579.fastq.gz is downloaded from the sra.)

I am using the master branch and the test case runs through without an error, also when I use k >= 22 it seems to work fine, but from k<22 I get the error message above.

Do you have any idea what is causing this?

Command line parsing seems to be broken in exact branch

The exact branch has a different version of clipp.h than master and development do. It seems that the test for whether the command line parse was successful always fails.

I'm able to work around this by using clipp.h from the master branch.

Doc command line doesn't much help

There doesn't seem to be command lines in your current release, depsite the docs referring to them?

Can you make a new release if things have changed. Needed for packaging in brew/conda.

%  squeakr-count  -h
./squeakr-count [OPTIONS]
file format   : 0 - plain fastq, 1 - gzip compressed fastq, 2 - bzip2 compressed fastq
CQF size      : the log of the number of slots in the CQF
num of threads: number of threads to count
file(s)       : "filename" or "dirname/*" for all the files in a directory

Missing libz/libbz2 static libraries?

I cloned the repo today and when I ran make it complained about not finding the libz/libbz2 static libs in the libs/ subdir. I didn't see instructions on how to make those.

g++ -std=c++11 -Wall -Ofast -m64 -I. -Wno-unused-result -Wno-strict-aliasing main.cc hashutil.cc threadsafe-gqf/gqf.c  -lpthread -lssl -lcrypto -lboost_system -lboost_thread libs/libbz2.a libs/libz.a -o main
g++: error: libs/libbz2.a: No such file or directory
g++: error: libs/libz.a: No such file or directory
make: *** [main] Error 1

What am I missing?

Thanks,
Chris

Can't install on OSX, expected identifier FREAD

I get the following error message when I try to install squeaker on my laptop.

I'm using macOS Sierra 10.12.6

vpn5-210:squeakr-master yeredh$ make squeakr

g++ -std=c++11 -Wall   -Ofast -msse4.2 -D__SSE4_2_ -m64 -I. -Iinclude  -c -o obj/count.o src/count.cc

In file included from src/count.cc:39:

include/gqf_cpp.h:34:2: error: expected identifier

        FREAD

        ^

/usr/include/sys/fcntl.h:110:16: note: expanded from macro 'FREAD'

#define FREAD           0x0001

                        ^

1 error generated.

make: *** [obj/count.o] Error 1

segfault with fastq

run as: ./squeakr-count 1 20 4 u.fq.gz with the u.fq.gz attached here.
u.fq.gz

I get the backtrace:

(gdb) bt
#0  0x0000000000412d47 in shift_remainders (qf=0x7ffcfc33a1d0, start_index=65127, empty_index=294479)
    at threadsafe-gqf/gqf.c:720
#1  0x00000000004147b2 in insert1(QF *, __int128 unsigned, bool, bool) (qf=0x7ffcfc33a1d0, hash=14624935, lock=true, 
    spin=true) at threadsafe-gqf/gqf.c:1368
#2  0x0000000000415d89 in qf_insert (qf=0x7ffcfc33a1d0, key=14624935, value=0, count=1, lock=true, spin=true)
    at threadsafe-gqf/gqf.c:1822
#3  0x0000000000407bd4 in dump_local_qf_to_main (obj=0x2034c30) at main.cc:231
#4  0x0000000000408305 in reads_to_kmers (c=..., obj=0x2034c30) at main.cc:336
#5  0x00000000004085b2 in fastq_to_uint64kmers_prod (obj=0x2034c30) at main.cc:371
#6  0x0000000000411941 in boost::_bi::list1<boost::_bi::value<flush_object*> >::operator()<bool (*)(flush_object*), boost::_bi::list0> (this=0x2034e30, f=@0x2034e28: 0x408485 <fastq_to_uint64kmers_prod(flush_object*)>, a=...)
    at /usr/include/boost/bind/bind.hpp:253
#7  0x0000000000411594 in boost::_bi::bind_t<void, bool (*)(flush_object*), boost::_bi::list1<boost::_bi::value<flush_object*> > >::operator() (this=0x2034e28) at /usr/include/boost/bind/bind.hpp:893
#8  0x00000000004110be in boost::detail::thread_data<boost::_bi::bind_t<void, bool (*)(flush_object*), boost::_bi::list1<boost::_bi::value<flush_object*> > > >::run (this=0x2034c70) at /usr/include/boost/thread/detail/thread.hpp:116
#9  0x00007fc449e3d5d5 in ?? ()
#10 0x0000000000000000 in ?? ()

Segfault in shift_remainders

Hi,

I am getting a segfault for the SRA experiment SRR1660308 (with squeakr-exact). Unfortunately I could not reproduce with just the offending read.

GDB output:

gdb --args squeakr-count -f -k 20 -s 20 -t 1 -o cqfs/ raw/SRR1660308.fastq
...
(Debug output that I added to main.cc, printing kmers before inserting into the CQF)
...
kmer: TTCCGCTCCGCTACTGACGG int: 690553506639 hash: 591485291406
kmer: GTTCCGCTCCGCTACTGACG int: 997272097491 hash: 13331034665
kmer: GTCAGTAGCGGAGCGGAACA int: 970247229265 hash: 504142783357

Thread 2 "squeakr-count" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffae355700 (LWP 23423)]
0x000000000043f60a in shift_remainders (qf=0x7fffffffbd00, start_index=486398,
    empty_index=1058934) at threadsafe-gqf/gqf.c:716
716                                                                                     0
, bend, qf->metadata->bits_per_slot);
(gdb) bt
#0  0x000000000043f60a in shift_remainders (qf=0x7fffffffbd00, start_index=486398,
    empty_index=1058934) at threadsafe-gqf/gqf.c:716
#1  0x0000000000440f26 in insert1(QF *, __int128 unsigned, bool, bool) (
    qf=0x7fffffffbd00, hash=504142783357, lock=true, spin=false)
    at threadsafe-gqf/gqf.c:1362
#2  0x0000000000442378 in qf_insert (qf=0x7fffffffbd00, key=504142783357, value=0,
    count=1, lock=true, spin=false) at threadsafe-gqf/gqf.c:1816
#3  0x0000000000409821 in reads_to_kmers (c=..., obj=0x66e270) at main.cc:338
#4  0x0000000000409b1d in fastq_to_uint64kmers_prod (obj=0x66e270) at main.cc:379
#5  0x000000000043de3f in boost::_bi::list1<boost::_bi::value<flush_object*> >::operator(
)<bool (*)(flush_object*), boost::_bi::list0> (this=0x66f3d0,
    f=@0x66f3c8: 0x409a9f <fastq_to_uint64kmers_prod(flush_object*)>, a=...)
    at .../boost/bind/bind.hpp:259
#6  0x000000000043daad in boost::_bi::bind_t<void, bool (*)(flush_object*), boost::_bi::l
ist1<boost::_bi::value<flush_object*> > >::operator() (this=0x66f3c8)
    at.../boost/bind/bind.hpp:1294
#7  0x000000000043d6ca in boost::detail::thread_data<boost::_bi::bind_t<void, bool (*)(fl
ush_object*), boost::_bi::list1<boost::_bi::value<flush_object*> > > >::run (
    this=0x66f210)
    at .../boost/thread/detail/thread.hpp:11
6
#8  0x00007fffaf7abb99 in thread_proxy ()
   from .../lib/libboost_thread.so.1.66.0
#9  0x00007fffafbc6aa1 in start_thread () from /lib64/libpthread.so.0
#10 0x00007fffae911bcd in clone () from /lib64/libc.so.6

Let me know how I could help debugging this issue.

Cheers!

Segmentation fault while counting human genome

I am trying to use squeakr to count kmers from the human genome assembly GRCh38. As suggested in #31 I have taken each chromosome and added dummy quality values to convert it into FASTQ.

I am currently on master 346f581 and I run the following command

./squeaker count -k 20 -t 1 -o ./output human.fq

where human.fq is the FASTQ converted file of the human genome assembly. The output that I get is:

[2019-01-07 11:22:17.136] [squeakr_console] [info] Reading from the fastq file and inserting in the CQF.
Segmentation fault (core dumped)

Any ideas for what I can do to fix this? Thanks!

How to set the number of slots?

Hello again,

If I understand right, the log number of slots given to squeakr-count should be log_2(approximate number of distinct k-mers).

So for the example data you estimated there would be 2^20 distinct kmers.

How robust is this parameter? Would you recommend adjusting this parameter for every sequencing run with Mohamadi et al's ncard? Should I just leave it at 20? Will Mantis even work with different slot values?

From your paper for reference:

Squeakr needs the number of distinct kmers (approximate to next closet power of 2) as an input. Squeakr takes the approximation of number of distinct k-mers as the number of slots to create the CQF. We used Mohamadi et al.(2017) to estimate the number of distinct k-mers in datasets.

Cheers!

Plans to support FASTA?

Are there any plans to support FASTA files any time in the future?

Given a FASTA file as input, would it be sufficient to reformat it as a FASTQ file by flattening multi-line sequences to a single line and adding dummy quality scores?

endless loop in /src/count.cc

/src/count.cc gets stuck in a livelock when num_files exceeds the ip_files queue node limit (or it's element count).

In detail:

The queue node limit for ip_files is hard coded at l.60

The queue and num_files gets populated in the for loop at l.286, but the return value of ip_files.push() (l.296) isn't evaluated (same for the push at l.211 btw).

So if the node limit gets exceeded (or the push is failing for any other reason) the file pointers get dropped silently but num_files still gets incremented (l.297).

Which makes the outer while loop at l.208 an endless loop (since it iterates over num_files, which gets decremented in the inner while loop, which iterates over the elements of ip_files).

Maybe i missed it, but neither - the silent dropping of files and/or filesparts (which from my understanding can happen at l.299 aswell - just without triggering the livelock), nor the existing node limit seems to be documented.

Increasing the node limit to exceed my particular num_file count prevented the endless loop (i guess setting the node limit dynamically to opts.filenames.size() would be the most elegant way, but it seems boost::lockfree::queue's lockfree behavior depends on the disabled dynamic memory allocation.

Sincerely tkranz

Illegal instruction error

I am trying to run squeakr count (version 0.7) but I keep getting illegal instruction error, I set the -s option after running ntCard and lognumslots. Any idea of how to overcome this?

Here you can see the command I am using and the messages I am getting from squeakr

$ squeakr count -e -k 32 -c 1 -s 30 --no-counts -t 16 -o . file.fastq.gz
[2020-01-16 11:45:54.535] [squeakr_console] [info] Reading from the fastq file and inserting in the CQF.
Illegal instruction

thanks

threadsafe-gqf/gqf.c:2043:26: error: ‘UINT64_MAX’ was not declared in this scope

Since I don't have root permission, I have to install all the libraries in a local directory (ie a folder named "external"). I change the Makefile to include the libraries in this folder as below:

LIBINCLUDE=external/include/
LIB=external/libs/
CXXFLAGS += -I$(LIBINCLUDE) ..........
LDFLAGS += -L$(LIB) .........

When I compile the code, it gives me the error line above. I am wondering, which package or library does "UINT64_MAX" belong to. What are the alternatives to install this code without root permission?

Thanks!

qf_read usage

I have a program that loads a serialized QF from disk using qf_deserialize without issue. When I change qf_deserialize to qf_read the qf_read command succeeds, but the resulting QF doesn't find any kmers in a test where I know it should find all of them (and this test works with qf_deserialize).

Do I need to do some additional call to use the mmap'ed version?

/src/count.cc can silently drop input files

At l.299 a file reader gets silently deleted if reader::getFileReader() fails (at least when it does at l.192 in /include/reader.h).

I noticed this happening, because i increased ip_files limit/capacity too get around issue 42 and actually added an output at l.299 in count.cc.

In my particular case the number of (bz2 compressed) files i gave to squeakr count exceeded the configured maximum allowed open file count (ulimit -n). So after reaching that limit fopen() fails (after adding onother output - with "too many open files").

In that case (and from what i understand in every other case that would prevent a file from beeing opened) false is returned and the file reader just gets deleted. With nothing really indicating that this is happening!

I cant see any difference in reader.h l.184, l.187 (and l.198) so i think with plain fastq and gz compressed input the same could happen (but i havent tested it).

I think a warning output or even an error should be the response of count.cc after the delete at l.299 (maybe the error from fopen() would be nice too).

If i get the chance i will test if i can trigger this by just removing the reading rights of an input file.

Sincerely
tkranz

Cannot compile on OS X

Hi! When I try to compile on OS 10.12.6 I get the following error.

$ make
g++ -std=c++11 -Wall   -Ofast -msse4.2 -D__SSE4_2_ -m64 -I. -Wno-unused-result -Wno-strict-aliasing -Wno-unused-function  main.cc -c -o main.o
In file included from main.cc:46:
./hashutil.h:28:10: fatal error: 'openssl/evp.h' file not found
#include <openssl/evp.h>
         ^
1 error generated.
make: *** [main.o] Error 1

I've installed openssl with Homebrew, so I'm surprised that squeakr can't find the headers. Any tips?

Illegal hardware instruction

Hi,

I have compiled from source, but I get the following error when running

$ ./main 0 20 1 test.fastq
Reading from the fastq file and inserting in the QF
[1]    28352 illegal hardware instruction  ./main 0 20 1 test.fastq

I compiled with gcc-4.9. The following is the result of ldd main

    linux-vdso.so.1 =>  (0x00007ffe8a8f8000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fed2eee8000)
    libssl.so.1.0.0 => /lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007fed2ec88000)
    libcrypto.so.1.0.0 => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007fed2e8a8000)
    libboost_system.so.1.57.0 => /nfs/users/nfs_j/jl11/software/lib/libboost_system.so.1.57.0 (0x00007fed2e6a0000)
    libboost_thread.so.1.57.0 => /nfs/users/nfs_j/jl11/software/lib/libboost_thread.so.1.57.0 (0x00007fed2e480000)
    libstdc++.so.6 => /software/hgi/pkglocal/gcc-4.9.1/lib64/libstdc++.so.6 (0x00007fed2e170000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fed2de70000)
    libgcc_s.so.1 => /software/hgi/pkglocal/gcc-4.9.1/lib64/libgcc_s.so.1 (0x00007fed2dc58000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fed2d890000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fed2f108000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fed2d688000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fed2d470000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fed2d268000)

Any ideas why this is happening?

boost/thread/thread.hpp error installing on macOS

I have problem installing the program in my MacOS Mojave 10.14.5

Error message says:
src/count.cc:21:10: fatal error: 'boost/thread/thread.hpp' file not found
#include <boost/thread/thread.hpp>

Segmentation fault (core dumped)

./squeakr-count -f -k 28 -s 20 -t 1 -o ./ S008_20180206001-8_ffpedna_pan-cancer-v1_5717_S8_R2_001.fq
Reading from the fastq file and inserting in the QF
Segmentation fault (core dumped)

head -8 S008_20180206001-8_ffpedna_pan-cancer-v1_5717_S8_R1_001.fq
@NB551106:74:HG7CWBGX5:2:11106:12634:1554 1:N:0:AGTTCC
ACTCTGGCCTGGGTGACAGAGTGAGACTCGGGCTAAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAAAATAA
+
AAAAAAEEE////A/EAE/E///E6EA/A///<<EE//EEEEEEEEEEEEEEEEE6EEEEEEEEE6EEEE//AE/E<///<<EAAEA/EAAEE6/EEEEEEEEAA<AEEE//E///<<E<<///</E/E///A///A<<///////////
@NB551106:74:HG7CWBGX5:4:22601:22501:19465 1:N:0:AGTTCC
ACTCTGGCCTGGGTGACAGAGTGAGACTCGGGCTAAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
AAAAA/AAE///AA/EAEAE///EAAAA<//<A/EE6EEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEE//<EAA//////<<AA///A//E//E/EEEEE/E<//E///A///////6////</A////////A///6///6///////

I provide fastq files()S008_20180206001-8_ffpedna_pan-cancer-v1_5717_S8_R1_001.fq. Why did it throw a segmentation fault?

long reads?

could this tool be used to compute kb-long kmers (several kb as kmer length) in a genome and derive a mappability track?

Although working fine for short reads, gem-mappability apparently does not perform for kmer in the kb range plus the error rate of long reads makes exact kmer match futile.

I know I am asking a lot here, please be kind :-)

error: unknown type name gzFile_s on macOS 10.11 El Capitan

gzFile_s is not defined by zlib 1.2.5 on macOS 10.11 El Capitan.
It is defined by zlib 1.2.11 on Homebrew.
Better to use gzFile rather than gzFile_s* if possible.

❯❯❯ make
g++ -std=c++11 -Wall   -Ofast -msse4.2 -D__SSE4_2_ -m64 -I/Users/sjackman/.homebrew/include -I/Users/sjackman/.homebrew/opt/openssl/include -I. -Wno-unused-result -Wno-strict-aliasing -Wno-unused-function  main.cc -c -o main.o
In file included from main.cc:49:
./reader.h:35:21: error: unknown type name 'gzFile_s'; did you mean 'gzFile'?
                        reader(FILE *in, gzFile_s *in_gzip, BZFILE *in_b...
                                         ^~~~~~~~
                                         gzFile
/usr/include/zlib.h:1172:15: note: 'gzFile' declared here
typedef voidp gzFile;       /* opaque gzip file descriptor */
              ^
In file included from main.cc:49:

trying to understand why SSL is required

What functionality from SSL is being used in this package? Offhand the only thing I could think of is a hash function from the SSL crypto stuff. But the package uses one of the murmur hashes.

Digging through the source code, the only reference to SSL I could find was in hash_util.h where openssl/evp.h is #included. But I can't see where that is made use of.
(this is consistent with the guess that SSL is, or was, supplying a hash function)

So I commented out that SSL #include, and removed -lssl and -lcrypto from the Makefile. AS best I can tell things built OK. Count and query appear to work fine on the test example (and another test I contrived). Inner-product segfaults for me (well, most of the time; occasionally it doesn't segfault but reports a result just below 2^64). I can't find anything that would attribute this to missing SSL.

FWIW I'm on a reasonably recent Mac (and I did read issue#10), and am building from yesterday's repo.

But getting back to my original question -- is SSL really used for anything?

What does squeakr-count count?

Hey,

I tried to understand which kmers are counted by squeakr-count. I noticed that only the first k bases from the input reads are counted, and the kmers further down the reads are discarded. Also sometimes the reverse complement kmer is counted.

Is this within the specifications of squeakr-count?

Cheers!

segfault when run on a FASTA file

❯❯❯ ./main 0 20 1 test.fa
Reading from the fastq file and inserting in the QF
[1]    62215 segmentation fault  ./main 0 20 1 test.fa

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.