anyks / asc Goto Github PK
View Code? Open in Web Editor NEWANYKS Spell-Checker
License: MIT License
ANYKS Spell-Checker
License: MIT License
Hello
Best LM ever. Feel very lucky that I decide to search for something other than KenLM/SRILM.
Thanks for your work!
So, I have 10Gb of corpuses. But when I tried to load just 58Mb one (1262705 one liners) I get 30 minutes estimate on loading text corpus. Why so?
And how could I crunch my 10GB?
Here is a command:
asc -alphabet "abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя" -size 3 -smoothing wittenbell -method train -debug 1 -w-arpa ./lm.arpa -w-map ./lm.map -w-vocab ./lm.vocab -w-ngram ./lm.ngrams -allow-unk -interpolate -corpus ./corpus.txt
p.s. Literally, don't get why you have so little stars on GitHub and zero exposure.
/usr/bin/ld: cannot find -ltcmalloc
$ apt search tcmalloc
Sorting... Done
Full Text Search... Done
libtcmalloc-minimal4/hirsute,now 2.9.1-0ubuntu1 amd64 [installed,automatic]
efficient thread-caching malloc
$ locate tcmalloc
/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4
/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4.5.9
/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal_debug.so.4
/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal_debug.so.4.5.9
...
Dependency on tcmalloc is not detected by cmake.
When I try to install the python version of the project on Windows 10 and Python 3.9.6, I run into the following error:
C:\Users\david>pip install anyks-sc
Collecting anyks-sc
Using cached anyks-sc-1.2.4.tar.gz (549 kB)
ERROR: Command errored out with exit status 1:
command: 'c:\users\david\appdata\local\programs\python\python39\python.exe' -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\david\\AppData\\Local\\Temp\\pip-install-i93g79m6\\anyks-sc_0d79c8a2239c4f54a0ea206090b228bd\\setup.py'"'"'; __file__='"'"'C:\\Users\\david\\AppData\\Local\\Temp\\pip-install-i93g79m6\\anyks-sc_0d79c8a2239c4f54a0ea206090b228bd\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\david\AppData\Local\Temp\pip-pip-egg-info-j03wzy3a'
cwd: C:\Users\david\AppData\Local\Temp\pip-install-i93g79m6\anyks-sc_0d79c8a2239c4f54a0ea206090b228bd\
Complete output (7 lines):
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\david\AppData\Local\Temp\pip-install-i93g79m6\anyks-sc_0d79c8a2239c4f54a0ea206090b228bd\setup.py", line 11, in <module>
description = fh.read()
File "c:\users\david\appdata\local\programs\python\python39\lib\encodings\cp1251.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 7669: character maps to <undefined>
I don't even know how to approach it.
When I try to install the project with pip (pip install anyks-sc
), I run into the following error:
Building wheels for collected packages: anyks-sc
Building wheel for anyks-sc (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /home/dale/p3/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-v4p9f9r4/anyks-sc_0d2fbb58093c43628ca7c7c33f923dd3/setup.py'"'"'; __file__='"'"'/tmp/pip-install-v4p9f9r4/anyks-sc_0d2fbb58093c43628ca7c7c33f923dd3/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-nbapg_gq
cwd: /tmp/pip-install-v4p9f9r4/anyks-sc_0d2fbb58093c43628ca7c7c33f923dd3/
Complete output (71 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/asc_pkg
copying asc_pkg/__init__.py -> build/lib.linux-x86_64-3.7/asc_pkg
running egg_info
writing anyks_sc.egg-info/PKG-INFO
writing dependency_links to anyks_sc.egg-info/dependency_links.txt
writing top-level names to anyks_sc.egg-info/top_level.txt
reading manifest file 'anyks_sc.egg-info/SOURCES.txt'
writing manifest file 'anyks_sc.egg-info/SOURCES.txt'
running build_ext
building 'asc' extension
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/asc_pkg
creating build/temp.linux-x86_64-3.7/asc_pkg/alm
creating build/temp.linux-x86_64-3.7/asc_pkg/alm/src
creating build/temp.linux-x86_64-3.7/asc_pkg/alm/contrib
creating build/temp.linux-x86_64-3.7/asc_pkg/alm/contrib/src
creating build/temp.linux-x86_64-3.7/asc_pkg/alm/contrib/src/cityhash
creating build/temp.linux-x86_64-3.7/asc_pkg/alm/contrib/src/bigint
creating build/temp.linux-x86_64-3.7/asc_pkg/asc
creating build/temp.linux-x86_64-3.7/asc_pkg/asc/src
creating build/temp.linux-x86_64-3.7/asc_pkg/asc/contrib
creating build/temp.linux-x86_64-3.7/asc_pkg/asc/contrib/src
creating build/temp.linux-x86_64-3.7/asc_pkg/asc/contrib/src/bloom
creating build/temp.linux-x86_64-3.7/asc_pkg/asc/contrib/src/hnswlib
gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/idw.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/idw.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/nwt.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/nwt.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/arpa.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/arpa.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/python.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/python.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/alphabet.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/alphabet.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/alm.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/alm.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/alm1.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/alm1.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/alm2.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/alm2.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/tokenizer.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/tokenizer.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/toolkit.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/toolkit.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/levenshtein.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/levenshtein.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/ablm.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/ablm.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from ./asc_pkg/alm/include/ablm.hpp:25:0,
from ./asc_pkg/alm/src/ablm.cpp:9:
./asc_pkg/alm/include/aspl.hpp:27:10: fatal error: openssl/md5.h: No such file or directory
#include <openssl/md5.h>
^~~~~~~~~~~~~~~
compilation terminated.
error: command 'gcc' failed with exit status 1
----------------------------------------
ERROR: Failed building wheel for anyks-sc
Probably, there is another missing dependency; this time in C++.
I use Python 3.7.4 on Ubuntu 18.04.5 LTS.
When trying to install the Pyhton version (with pip install anyks-sc
), I run onto the following error:
Collecting anyks-sc
Downloading anyks-sc-1.2.4.tar.gz (549 kB)
|████████████████████████████████| 549 kB 1.9 MB/s
ERROR: Command errored out with exit status 1:
command: /home/dale/p3/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-d20041xb/anyks-sc_ea24e87779224313b08ac1dc45c32e3a/setup.py'"'"'; __file__='"'"'/tmp/pip-install-d20041xb/anyks-sc_ea24e87779224313b08ac1dc45c32e3a/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-5_tv200w
cwd: /tmp/pip-install-d20041xb/anyks-sc_ea24e87779224313b08ac1dc45c32e3a/
Complete output (5 lines):
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-d20041xb/anyks-sc_ea24e87779224313b08ac1dc45c32e3a/setup.py", line 6, in <module>
import pybind11
ModuleNotFoundError: No module named 'pybind11'
Apparently, pybind11
should be included as a Python dependency.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.