splatlab / squeakr Goto Github PK
View Code? Open in Web Editor NEWSqueakr: An Exact and Approximate k -mer Counting System
License: BSD 3-Clause "New" or "Revised" License
Squeakr: An Exact and Approximate k -mer Counting System
License: BSD 3-Clause "New" or "Revised" License
Building on MacOS has a new issue:
g++ -std=c++11 -Wall -Ofast -msse4.2 -D__SSE4_2_ -m64 -I. -Iinclude -c -o obj/partitioned_counter.o src/gqf/partitioned_counter.c
src/gqf/partitioned_counter.c:15:10: fatal error: 'sys/sysinfo.h' file not found
#include <sys/sysinfo.h>
^~~~~~~~~~~~~~~
1 error generated.
See: https://stackoverflow.com/questions/12523704/mac-os-x-equivalent-header-file-for-sysinfo-h-in-linux
A tagged stable release is a requirement to add a new formula to Homebrew/science. Thanks!
So I'm under the impression that
But this doesn't seem to work out so good in practice:
[tim.triche@node069 single]$ THREADS=`cat /proc/cpuinfo | grep proc | wc -l`
[tim.triche@node069 single]$ echo $THREADS
80
[tim.triche@node069 single]$ free -h
total used free shared buff/cache available
Mem: 250G 10G 239G 46M 907M 238G
Swap: 11G 0B 11G
Ok looks good. Now let's take an therapy-related AML patient's ancient RNAseq data and index it:
[tim.triche@node069 single]$ squeakr-count -g -k 31 -s 31 -t $THREADS SRR621698.fastq.gz
Reading from the fastq file and inserting in the QF
Total Time Elapsed: 184.994003seconds
Calc freq distribution:
Total Time Elapsed: 8.228049seconds
Maximum freq: 329368
Num distinct elem: 312966013
Total num elems: 2172228383
Segmentation fault
Woops? Any ideas for debugging and unit testing are appreciated, since I'd like to scale this up for various search types. Thanks for a great tool and your support in getting it to run smoothly :-)
g++ -std=c++11 -Wall -Ofast -msse4.2 -D__SSE4_2_ -m64 -I. -Iinclude -c -o obj/gqf_file.o src/gqf/gqf_file.c
src/gqf/gqf_file.c:51:8: error: use of undeclared identifier 'posix_fallocate'
ret = posix_fallocate(qf->runtimedata->f_info.fd, 0, total_num_bytes);
^
1 error generated.
make: *** [obj/gqf_file.o] Error 1
make: *** Waiting for unfinished jobs....
https://stackoverflow.com/questions/11497567/fallocate-command-equivalent-in-os-x
Whenever I have tried to query for a specific sequence using squeakr-query, it returns with "Not find: 0". I have tried sequences that are from the fastq files as well as sequences that should not be found in the genome, but I get the same result each time.
In addition, when I try to use squeakr-query using a sequence from the exact branch, it gives me the following error:
terminate called after throwing an instance of 'std::invalid_argument'
what(): stoll
Aborted
Please document the dependencies in README.md
Hello,
We are trying to run Squeakr on environmental metagenomic datasets.
It is run on a node having 256 GB RAM, and 128 cores.
Squeakr was run on the samples whose number of estimated kmers is given here https://github.com/pierrepeterlongo/kmtricks_benchmarks/tree/master/tara-metag-bacterial/data/estimated_kmer_counts_metaG_bact
The command is
squeakr count -k 20 -c 1 -s ${log_slots} -o output_${sample}/res ${input_files} -t 128;
log_slots value is in [34,36], depending on the datasets.
output_${sample} directory is created before the run.
squeakr prints Reading from the fastq file and inserting in the CQF.
and then stops indicating a seg fault. It lasts from 5 minutes to 20 minutes.
Max memory usage ranges between 2GB and 86GB depending on the sets.
Could this be related to this issue #32 ?
Any other idea ?
Thanks!
Pierre
I have been unable to use v0.5 (currently on bioconda and the most recent release), but the current master branch works nicely (which I think is about 120 commits ahead of this).
Are you planning to make an updated release tagging this newer version? This would be really helpful re: bioconda packaging too.
Hi
I am doing analysis on kmer counting tool and squeakr one of it. I am trying to find out max kmer length for your squeakr.
What is the max kmer length for squeakr or Can you please help me out on this.
Regards
Tarang
Thank you for developing this highly useful tool. I am having an issue unfortunately though while processing my fastq file of size ~500MB in both v0.6 and v0.7.
$ squeakr count -e -k 15 -t 32 -o ./ my.fastq
[2022-08-21 03:08:48.703] [squeakr_console] [info] Reading from the fastq file and inserting in the CQF.
[2022-08-21 03:08:53.871] [squeakr_console] [info] Trying to compress the final CQF.
[2022-08-21 03:08:54.365] [squeakr_console] [info] Estimated size of the final CQF: 29
[2022-08-21 03:08:54.365] [squeakr_console] [info] Calculating frequency distribution:
[2022-08-21 03:09:00.170] [squeakr_console] [info] Iteration: Total Time Elapsed: 5.804785 seconds
Error opening file for serializing.: Is a directory
This happens if I use -s 20 -t 1
too. Any insight on how to get around this issue?
% squeakr-count
<wait 3 seconds>
Segmentation fault (core dumped)
squeakr help -v
version 1.0
but release is 0.6 ?
This would be good to have:
% squeakr version
squeakr 0.6
Relates to #27
Hi there :)
So, I just downloaded the code of Squeakr, compiled it and run squeakr-count on the test file: everything went fine.
Then, I tried squeakr-count on a different file: https://www.ncbi.nlm.nih.gov/sra/?term=ERR430991.
The Squeakr paper says "Squeakr takes the approximation of number of distinct k-mers as the number of slots to create the CQF". The number of distinct 31-mers in my file in roughly 30 million. Hence, I configured Squeakr with a CQF size parameter of 25 (roughly the log2 of 30 million).
So, I used the following command:
./squeakr-count 0 25 1 ERR430991.fastq
And the result is:
Reading from the fastq file and inserting in the QF
Segmentation fault (core dumped)
I tried again with 30 instead of 25 as CQF size parameter and this time, I obtained:
Reading from the fastq file and inserting in the QF
Total Time Elapsed: 51.129627seconds
Calc freq distribution:
Total Time Elapsed: 0.751258seconds
Maximum freq: 897
Num distinct elem: 28471250
Total num elems: 343450408
So, my first question is: am I doing something wrong when I pick the CQF size parameter? Should I overestimate the approximation of the number of distinct 31-mers in my file?
Also, I used KMC3 to compute k-mer counts from the same file and KMC3 give me in output 30311678 distinct 31-mers while Squeakr says there are 28471250 distinct 31-mers. Any ideas on the difference of about 1.5 million distinct 31-mers?
Thank you for your time and help!
Guillaume
I've noticed that on a small number of read sets (e.g. SRR522088
), lognumslots.sh
underestimates the number of slots needed in the CQF for squeakr-exact
Here's my current workflow for gzipped fastq files
ntcard -k 20 -c 2 -t 10 -p $OUTPREFIX $INPUT
NUMSLOTS=$(lognumslots.sh $OUTPREFIX\_k20.hist)
squeakr-count -g -k 20 -s $NUMSLOTS -t 10 -o $OUTDIR/ $INPUT
In the case of SRR522088
, the script computed 26 as the required number of slots, resulting in a segfault. When I set it to 27, it runs smoothly.
Since this script is only in the master
branch, I was wondering if there's perhaps a version tuned for the exact
branch that I may not be finding in the repo.
Hi,
I run in the following error when running squeakr:
squeakr: src/gqf/gqf.c:1359: int insert1(QF*, __int128 unsigned, uint8_t): Assertion `new_value < current_remainder' failed. Aborted (core dumped)
My command is:
./squeakr count -k 20 -e -n -t 1 -o data/tmp.squeakr SRR1292579.fastq.gz
I also tried to not use the flags n and e and tried to use -c 50, but the error remained the same.
(SRR1292579.fastq.gz is downloaded from the sra.)
I am using the master branch and the test case runs through without an error, also when I use k >= 22 it seems to work fine, but from k<22 I get the error message above.
Do you have any idea what is causing this?
A comment near the top of gqf.c, "Can be 0 ... 8, 16, etc." looks like it's describing something that has moved.
It probably refers to BITS_PER_SLOT, which is #defined over in gqf.h.
The exact branch has a different version of clipp.h than master and development do. It seems that the test for whether the command line parse was successful always fails.
I'm able to work around this by using clipp.h from the master branch.
There doesn't seem to be command lines in your current release, depsite the docs referring to them?
Can you make a new release if things have changed. Needed for packaging in brew/conda.
% squeakr-count -h
./squeakr-count [OPTIONS]
file format : 0 - plain fastq, 1 - gzip compressed fastq, 2 - bzip2 compressed fastq
CQF size : the log of the number of slots in the CQF
num of threads: number of threads to count
file(s) : "filename" or "dirname/*" for all the files in a directory
I cloned the repo today and when I ran make it complained about not finding the libz/libbz2 static libs in the libs/ subdir. I didn't see instructions on how to make those.
g++ -std=c++11 -Wall -Ofast -m64 -I. -Wno-unused-result -Wno-strict-aliasing main.cc hashutil.cc threadsafe-gqf/gqf.c -lpthread -lssl -lcrypto -lboost_system -lboost_thread libs/libbz2.a libs/libz.a -o main
g++: error: libs/libbz2.a: No such file or directory
g++: error: libs/libz.a: No such file or directory
make: *** [main] Error 1
What am I missing?
Thanks,
Chris
I get the following error message when I try to install squeaker on my laptop.
I'm using macOS Sierra 10.12.6
vpn5-210:squeakr-master yeredh$ make squeakr
g++ -std=c++11 -Wall -Ofast -msse4.2 -D__SSE4_2_ -m64 -I. -Iinclude -c -o obj/count.o src/count.cc
In file included from src/count.cc:39:
include/gqf_cpp.h:34:2: error: expected identifier
FREAD
^
/usr/include/sys/fcntl.h:110:16: note: expanded from macro 'FREAD'
#define FREAD 0x0001
^
1 error generated.
make: *** [obj/count.o] Error 1
% squeakr --version
squeakr 0.6.1
(to stdout
not stderr)
run as: ./squeakr-count 1 20 4 u.fq.gz
with the u.fq.gz attached here.
u.fq.gz
I get the backtrace:
(gdb) bt
#0 0x0000000000412d47 in shift_remainders (qf=0x7ffcfc33a1d0, start_index=65127, empty_index=294479)
at threadsafe-gqf/gqf.c:720
#1 0x00000000004147b2 in insert1(QF *, __int128 unsigned, bool, bool) (qf=0x7ffcfc33a1d0, hash=14624935, lock=true,
spin=true) at threadsafe-gqf/gqf.c:1368
#2 0x0000000000415d89 in qf_insert (qf=0x7ffcfc33a1d0, key=14624935, value=0, count=1, lock=true, spin=true)
at threadsafe-gqf/gqf.c:1822
#3 0x0000000000407bd4 in dump_local_qf_to_main (obj=0x2034c30) at main.cc:231
#4 0x0000000000408305 in reads_to_kmers (c=..., obj=0x2034c30) at main.cc:336
#5 0x00000000004085b2 in fastq_to_uint64kmers_prod (obj=0x2034c30) at main.cc:371
#6 0x0000000000411941 in boost::_bi::list1<boost::_bi::value<flush_object*> >::operator()<bool (*)(flush_object*), boost::_bi::list0> (this=0x2034e30, f=@0x2034e28: 0x408485 <fastq_to_uint64kmers_prod(flush_object*)>, a=...)
at /usr/include/boost/bind/bind.hpp:253
#7 0x0000000000411594 in boost::_bi::bind_t<void, bool (*)(flush_object*), boost::_bi::list1<boost::_bi::value<flush_object*> > >::operator() (this=0x2034e28) at /usr/include/boost/bind/bind.hpp:893
#8 0x00000000004110be in boost::detail::thread_data<boost::_bi::bind_t<void, bool (*)(flush_object*), boost::_bi::list1<boost::_bi::value<flush_object*> > > >::run (this=0x2034c70) at /usr/include/boost/thread/detail/thread.hpp:116
#9 0x00007fc449e3d5d5 in ?? ()
#10 0x0000000000000000 in ?? ()
Hi,
I am getting a segfault for the SRA experiment SRR1660308 (with squeakr-exact). Unfortunately I could not reproduce with just the offending read.
GDB output:
gdb --args squeakr-count -f -k 20 -s 20 -t 1 -o cqfs/ raw/SRR1660308.fastq
...
(Debug output that I added to main.cc, printing kmers before inserting into the CQF)
...
kmer: TTCCGCTCCGCTACTGACGG int: 690553506639 hash: 591485291406
kmer: GTTCCGCTCCGCTACTGACG int: 997272097491 hash: 13331034665
kmer: GTCAGTAGCGGAGCGGAACA int: 970247229265 hash: 504142783357
Thread 2 "squeakr-count" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffae355700 (LWP 23423)]
0x000000000043f60a in shift_remainders (qf=0x7fffffffbd00, start_index=486398,
empty_index=1058934) at threadsafe-gqf/gqf.c:716
716 0
, bend, qf->metadata->bits_per_slot);
(gdb) bt
#0 0x000000000043f60a in shift_remainders (qf=0x7fffffffbd00, start_index=486398,
empty_index=1058934) at threadsafe-gqf/gqf.c:716
#1 0x0000000000440f26 in insert1(QF *, __int128 unsigned, bool, bool) (
qf=0x7fffffffbd00, hash=504142783357, lock=true, spin=false)
at threadsafe-gqf/gqf.c:1362
#2 0x0000000000442378 in qf_insert (qf=0x7fffffffbd00, key=504142783357, value=0,
count=1, lock=true, spin=false) at threadsafe-gqf/gqf.c:1816
#3 0x0000000000409821 in reads_to_kmers (c=..., obj=0x66e270) at main.cc:338
#4 0x0000000000409b1d in fastq_to_uint64kmers_prod (obj=0x66e270) at main.cc:379
#5 0x000000000043de3f in boost::_bi::list1<boost::_bi::value<flush_object*> >::operator(
)<bool (*)(flush_object*), boost::_bi::list0> (this=0x66f3d0,
f=@0x66f3c8: 0x409a9f <fastq_to_uint64kmers_prod(flush_object*)>, a=...)
at .../boost/bind/bind.hpp:259
#6 0x000000000043daad in boost::_bi::bind_t<void, bool (*)(flush_object*), boost::_bi::l
ist1<boost::_bi::value<flush_object*> > >::operator() (this=0x66f3c8)
at.../boost/bind/bind.hpp:1294
#7 0x000000000043d6ca in boost::detail::thread_data<boost::_bi::bind_t<void, bool (*)(fl
ush_object*), boost::_bi::list1<boost::_bi::value<flush_object*> > > >::run (
this=0x66f210)
at .../boost/thread/detail/thread.hpp:11
6
#8 0x00007fffaf7abb99 in thread_proxy ()
from .../lib/libboost_thread.so.1.66.0
#9 0x00007fffafbc6aa1 in start_thread () from /lib64/libpthread.so.0
#10 0x00007fffae911bcd in clone () from /lib64/libc.so.6
Let me know how I could help debugging this issue.
Cheers!
I am trying to use squeakr
to count kmers from the human genome assembly GRCh38. As suggested in #31 I have taken each chromosome and added dummy quality values to convert it into FASTQ.
I am currently on master 346f581 and I run the following command
./squeaker count -k 20 -t 1 -o ./output human.fq
where human.fq
is the FASTQ converted file of the human genome assembly. The output that I get is:
[2019-01-07 11:22:17.136] [squeakr_console] [info] Reading from the fastq file and inserting in the CQF.
Segmentation fault (core dumped)
Any ideas for what I can do to fix this? Thanks!
Hello again,
If I understand right, the log number of slots given to squeakr-count should be log_2(approximate number of distinct k-mers).
So for the example data you estimated there would be 2^20 distinct kmers.
How robust is this parameter? Would you recommend adjusting this parameter for every sequencing run with Mohamadi et al's ncard
? Should I just leave it at 20? Will Mantis even work with different slot values?
From your paper for reference:
Squeakr needs the number of distinct kmers (approximate to next closet power of 2) as an input. Squeakr takes the approximation of number of distinct k-mers as the number of slots to create the CQF. We used Mohamadi et al.(2017) to estimate the number of distinct k-mers in datasets.
Cheers!
Are there any plans to support FASTA files any time in the future?
Given a FASTA file as input, would it be sufficient to reformat it as a FASTQ file by flattening multi-line sequences to a single line and adding dummy quality scores?
/src/count.cc gets stuck in a livelock when num_files exceeds the ip_files queue node limit (or it's element count).
In detail:
The queue node limit for ip_files is hard coded at l.60
The queue and num_files gets populated in the for loop at l.286, but the return value of ip_files.push() (l.296) isn't evaluated (same for the push at l.211 btw).
So if the node limit gets exceeded (or the push is failing for any other reason) the file pointers get dropped silently but num_files still gets incremented (l.297).
Which makes the outer while loop at l.208 an endless loop (since it iterates over num_files, which gets decremented in the inner while loop, which iterates over the elements of ip_files).
Maybe i missed it, but neither - the silent dropping of files and/or filesparts (which from my understanding can happen at l.299 aswell - just without triggering the livelock), nor the existing node limit seems to be documented.
Increasing the node limit to exceed my particular num_file count prevented the endless loop (i guess setting the node limit dynamically to opts.filenames.size() would be the most elegant way, but it seems boost::lockfree::queue's lockfree behavior depends on the disabled dynamic memory allocation.
Sincerely tkranz
I am trying to run squeakr count
(version 0.7
) but I keep getting illegal instruction
error, I set the -s
option after running ntCard
and lognumslots
. Any idea of how to overcome this?
Here you can see the command I am using and the messages I am getting from squeakr
$ squeakr count -e -k 32 -c 1 -s 30 --no-counts -t 16 -o . file.fastq.gz
[2020-01-16 11:45:54.535] [squeakr_console] [info] Reading from the fastq file and inserting in the CQF.
Illegal instruction
thanks
Since I don't have root permission, I have to install all the libraries in a local directory (ie a folder named "external"). I change the Makefile to include the libraries in this folder as below:
LIBINCLUDE=external/include/
LIB=external/libs/
CXXFLAGS += -I$(LIBINCLUDE) ..........
LDFLAGS += -L$(LIB) .........
When I compile the code, it gives me the error line above. I am wondering, which package or library does "UINT64_MAX" belong to. What are the alternatives to install this code without root permission?
Thanks!
I have a program that loads a serialized QF from disk using qf_deserialize
without issue. When I change qf_deserialize
to qf_read
the qf_read
command succeeds, but the resulting QF doesn't find any kmers in a test where I know it should find all of them (and this test works with qf_deserialize
).
Do I need to do some additional call to use the mmap'ed version?
At l.299 a file reader gets silently deleted if reader::getFileReader() fails (at least when it does at l.192 in /include/reader.h).
I noticed this happening, because i increased ip_files limit/capacity too get around issue 42 and actually added an output at l.299 in count.cc.
In my particular case the number of (bz2 compressed) files i gave to squeakr count exceeded the configured maximum allowed open file count (ulimit -n). So after reaching that limit fopen() fails (after adding onother output - with "too many open files").
In that case (and from what i understand in every other case that would prevent a file from beeing opened) false is returned and the file reader just gets deleted. With nothing really indicating that this is happening!
I cant see any difference in reader.h l.184, l.187 (and l.198) so i think with plain fastq and gz compressed input the same could happen (but i havent tested it).
I think a warning output or even an error should be the response of count.cc after the delete at l.299 (maybe the error from fopen() would be nice too).
If i get the chance i will test if i can trigger this by just removing the reading rights of an input file.
Sincerely
tkranz
Hi! When I try to compile on OS 10.12.6 I get the following error.
$ make
g++ -std=c++11 -Wall -Ofast -msse4.2 -D__SSE4_2_ -m64 -I. -Wno-unused-result -Wno-strict-aliasing -Wno-unused-function main.cc -c -o main.o
In file included from main.cc:46:
./hashutil.h:28:10: fatal error: 'openssl/evp.h' file not found
#include <openssl/evp.h>
^
1 error generated.
make: *** [main.o] Error 1
I've installed openssl with Homebrew, so I'm surprised that squeakr can't find the headers. Any tips?
Hi,
I have compiled from source, but I get the following error when running
$ ./main 0 20 1 test.fastq
Reading from the fastq file and inserting in the QF
[1] 28352 illegal hardware instruction ./main 0 20 1 test.fastq
I compiled with gcc-4.9. The following is the result of ldd main
linux-vdso.so.1 => (0x00007ffe8a8f8000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fed2eee8000)
libssl.so.1.0.0 => /lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007fed2ec88000)
libcrypto.so.1.0.0 => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007fed2e8a8000)
libboost_system.so.1.57.0 => /nfs/users/nfs_j/jl11/software/lib/libboost_system.so.1.57.0 (0x00007fed2e6a0000)
libboost_thread.so.1.57.0 => /nfs/users/nfs_j/jl11/software/lib/libboost_thread.so.1.57.0 (0x00007fed2e480000)
libstdc++.so.6 => /software/hgi/pkglocal/gcc-4.9.1/lib64/libstdc++.so.6 (0x00007fed2e170000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fed2de70000)
libgcc_s.so.1 => /software/hgi/pkglocal/gcc-4.9.1/lib64/libgcc_s.so.1 (0x00007fed2dc58000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fed2d890000)
/lib64/ld-linux-x86-64.so.2 (0x00007fed2f108000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fed2d688000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fed2d470000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fed2d268000)
Any ideas why this is happening?
I have problem installing the program in my MacOS Mojave 10.14.5
Error message says:
src/count.cc:21:10: fatal error: 'boost/thread/thread.hpp' file not found
#include <boost/thread/thread.hpp>
./squeakr-count -f -k 28 -s 20 -t 1 -o ./ S008_20180206001-8_ffpedna_pan-cancer-v1_5717_S8_R2_001.fq
Reading from the fastq file and inserting in the QF
Segmentation fault (core dumped)
head -8 S008_20180206001-8_ffpedna_pan-cancer-v1_5717_S8_R1_001.fq
@NB551106:74:HG7CWBGX5:2:11106:12634:1554 1:N:0:AGTTCC
ACTCTGGCCTGGGTGACAGAGTGAGACTCGGGCTAAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAAAATAA
+
AAAAAAEEE////A/EAE/E///E6EA/A///<<EE//EEEEEEEEEEEEEEEEE6EEEEEEEEE6EEEE//AE/E<///<<EAAEA/EAAEE6/EEEEEEEEAA<AEEE//E///<<E<<///</E/E///A///A<<///////////
@NB551106:74:HG7CWBGX5:4:22601:22501:19465 1:N:0:AGTTCC
ACTCTGGCCTGGGTGACAGAGTGAGACTCGGGCTAAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
AAAAA/AAE///AA/EAEAE///EAAAA<//<A/EE6EEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEE//<EAA//////<<AA///A//E//E/EEEEE/E<//E///A///////6////</A////////A///6///6///////
I provide fastq files()S008_20180206001-8_ffpedna_pan-cancer-v1_5717_S8_R1_001.fq. Why did it throw a segmentation fault?
could this tool be used to compute kb-long kmers (several kb as kmer length) in a genome and derive a mappability track?
Although working fine for short reads, gem-mappability apparently does not perform for kmer in the kb range plus the error rate of long reads makes exact kmer match futile.
I know I am asking a lot here, please be kind :-)
gzFile_s
is not defined by zlib 1.2.5 on macOS 10.11 El Capitan.
It is defined by zlib 1.2.11 on Homebrew.
Better to use gzFile
rather than gzFile_s*
if possible.
❯❯❯ make
g++ -std=c++11 -Wall -Ofast -msse4.2 -D__SSE4_2_ -m64 -I/Users/sjackman/.homebrew/include -I/Users/sjackman/.homebrew/opt/openssl/include -I. -Wno-unused-result -Wno-strict-aliasing -Wno-unused-function main.cc -c -o main.o
In file included from main.cc:49:
./reader.h:35:21: error: unknown type name 'gzFile_s'; did you mean 'gzFile'?
reader(FILE *in, gzFile_s *in_gzip, BZFILE *in_b...
^~~~~~~~
gzFile
/usr/include/zlib.h:1172:15: note: 'gzFile' declared here
typedef voidp gzFile; /* opaque gzip file descriptor */
^
In file included from main.cc:49:
What functionality from SSL is being used in this package? Offhand the only thing I could think of is a hash function from the SSL crypto stuff. But the package uses one of the murmur hashes.
Digging through the source code, the only reference to SSL I could find was in hash_util.h where openssl/evp.h is #included. But I can't see where that is made use of.
(this is consistent with the guess that SSL is, or was, supplying a hash function)
So I commented out that SSL #include, and removed -lssl and -lcrypto from the Makefile. AS best I can tell things built OK. Count and query appear to work fine on the test example (and another test I contrived). Inner-product segfaults for me (well, most of the time; occasionally it doesn't segfault but reports a result just below 2^64). I can't find anything that would attribute this to missing SSL.
FWIW I'm on a reasonably recent Mac (and I did read issue#10), and am building from yesterday's repo.
But getting back to my original question -- is SSL really used for anything?
Hey,
I tried to understand which kmers are counted by squeakr-count. I noticed that only the first k bases from the input reads are counted, and the kmers further down the reads are discarded. Also sometimes the reverse complement kmer is counted.
Is this within the specifications of squeakr-count?
Cheers!
❯❯❯ ./main 0 20 1 test.fa
Reading from the fastq file and inserting in the QF
[1] 62215 segmentation fault ./main 0 20 1 test.fa
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.