Giter Site home page Giter Site logo

asc's People

Contributors

anyks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

asc's Issues

Slow corpus reading

Hello

Best LM ever. Feel very lucky that I decide to search for something other than KenLM/SRILM.
Thanks for your work!

So, I have 10Gb of corpuses. But when I tried to load just 58Mb one (1262705 one liners) I get 30 minutes estimate on loading text corpus. Why so?

And how could I crunch my 10GB?

Here is a command:

asc -alphabet "abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя" -size 3 -smoothing wittenbell -method train -debug 1 -w-arpa ./lm.arpa -w-map ./lm.map -w-vocab ./lm.vocab -w-ngram ./lm.ngrams -allow-unk -interpolate -corpus ./corpus.txt

p.s. Literally, don't get why you have so little stars on GitHub and zero exposure.

Cannot link on Ubuntu 21.04

/usr/bin/ld: cannot find -ltcmalloc

$ apt search tcmalloc
Sorting... Done
Full Text Search... Done
libtcmalloc-minimal4/hirsute,now 2.9.1-0ubuntu1 amd64 [installed,automatic]
  efficient thread-caching malloc
$ locate tcmalloc
/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4
/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4.5.9
/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal_debug.so.4
/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal_debug.so.4.5.9
...

Dependency on tcmalloc is not detected by cmake.

cannot install the project on windows: an encoding problem

When I try to install the python version of the project on Windows 10 and Python 3.9.6, I run into the following error:

C:\Users\david>pip install anyks-sc
Collecting anyks-sc
  Using cached anyks-sc-1.2.4.tar.gz (549 kB)
    ERROR: Command errored out with exit status 1:
     command: 'c:\users\david\appdata\local\programs\python\python39\python.exe' -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\david\\AppData\\Local\\Temp\\pip-install-i93g79m6\\anyks-sc_0d79c8a2239c4f54a0ea206090b228bd\\setup.py'"'"'; __file__='"'"'C:\\Users\\david\\AppData\\Local\\Temp\\pip-install-i93g79m6\\anyks-sc_0d79c8a2239c4f54a0ea206090b228bd\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\david\AppData\Local\Temp\pip-pip-egg-info-j03wzy3a'
         cwd: C:\Users\david\AppData\Local\Temp\pip-install-i93g79m6\anyks-sc_0d79c8a2239c4f54a0ea206090b228bd\
    Complete output (7 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\david\AppData\Local\Temp\pip-install-i93g79m6\anyks-sc_0d79c8a2239c4f54a0ea206090b228bd\setup.py", line 11, in <module>
        description = fh.read()
      File "c:\users\david\appdata\local\programs\python\python39\lib\encodings\cp1251.py", line 23, in decode
        return codecs.charmap_decode(input,self.errors,decoding_table)[0]
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 7669: character maps to <undefined>

I don't even know how to approach it.

An error when compiling the project using pip

When I try to install the project with pip (pip install anyks-sc), I run into the following error:

Building wheels for collected packages: anyks-sc
  Building wheel for anyks-sc (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /home/dale/p3/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-v4p9f9r4/anyks-sc_0d2fbb58093c43628ca7c7c33f923dd3/setup.py'"'"'; __file__='"'"'/tmp/pip-install-v4p9f9r4/anyks-sc_0d2fbb58093c43628ca7c7c33f923dd3/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-nbapg_gq
       cwd: /tmp/pip-install-v4p9f9r4/anyks-sc_0d2fbb58093c43628ca7c7c33f923dd3/
  Complete output (71 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.7
  creating build/lib.linux-x86_64-3.7/asc_pkg
  copying asc_pkg/__init__.py -> build/lib.linux-x86_64-3.7/asc_pkg
  running egg_info
  writing anyks_sc.egg-info/PKG-INFO
  writing dependency_links to anyks_sc.egg-info/dependency_links.txt
  writing top-level names to anyks_sc.egg-info/top_level.txt
  reading manifest file 'anyks_sc.egg-info/SOURCES.txt'
  writing manifest file 'anyks_sc.egg-info/SOURCES.txt'
  running build_ext
  building 'asc' extension
  creating build/temp.linux-x86_64-3.7
  creating build/temp.linux-x86_64-3.7/asc_pkg
  creating build/temp.linux-x86_64-3.7/asc_pkg/alm
  creating build/temp.linux-x86_64-3.7/asc_pkg/alm/src
  creating build/temp.linux-x86_64-3.7/asc_pkg/alm/contrib
  creating build/temp.linux-x86_64-3.7/asc_pkg/alm/contrib/src
  creating build/temp.linux-x86_64-3.7/asc_pkg/alm/contrib/src/cityhash
  creating build/temp.linux-x86_64-3.7/asc_pkg/alm/contrib/src/bigint
  creating build/temp.linux-x86_64-3.7/asc_pkg/asc
  creating build/temp.linux-x86_64-3.7/asc_pkg/asc/src
  creating build/temp.linux-x86_64-3.7/asc_pkg/asc/contrib
  creating build/temp.linux-x86_64-3.7/asc_pkg/asc/contrib/src
  creating build/temp.linux-x86_64-3.7/asc_pkg/asc/contrib/src/bloom
  creating build/temp.linux-x86_64-3.7/asc_pkg/asc/contrib/src/hnswlib
  gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/idw.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/idw.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
  gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/nwt.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/nwt.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
  gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/arpa.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/arpa.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
  gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/python.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/python.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
  gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/alphabet.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/alphabet.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
  gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/alm.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/alm.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
  gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/alm1.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/alm1.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
  gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/alm2.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/alm2.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
  gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/tokenizer.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/tokenizer.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
  gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/toolkit.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/toolkit.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
  gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/levenshtein.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/levenshtein.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  cc1plus: warning: unrecognized command line option ‘-Wno-unknown-attributes’
  gcc -pthread -B /home/dale/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./asc_pkg -I./asc_pkg/alm -I./asc_pkg/asc -I./asc_pkg/alm/include -I./asc_pkg/asc/include -I./asc_pkg/alm/contrib/include -I./asc_pkg/asc/contrib/include -I/home/dale/p3/lib/python3.7/site-packages/pybind11/include -I/home/dale/p3/include -I/home/dale/anaconda3/include/python3.7m -c ./asc_pkg/alm/src/ablm.cpp -o build/temp.linux-x86_64-3.7/./asc_pkg/alm/src/ablm.o -std=c++11 -O2 -fno-permissive -Wno-pedantic -Wno-unknown-attributes -DNOPYTHON
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  In file included from ./asc_pkg/alm/include/ablm.hpp:25:0,
                   from ./asc_pkg/alm/src/ablm.cpp:9:
  ./asc_pkg/alm/include/aspl.hpp:27:10: fatal error: openssl/md5.h: No such file or directory
   #include <openssl/md5.h>
            ^~~~~~~~~~~~~~~
  compilation terminated.
  error: command 'gcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for anyks-sc

Probably, there is another missing dependency; this time in C++.

I use Python 3.7.4 on Ubuntu 18.04.5 LTS.

implicit dependency on pybind11

When trying to install the Pyhton version (with pip install anyks-sc), I run onto the following error:

Collecting anyks-sc
  Downloading anyks-sc-1.2.4.tar.gz (549 kB)
     |████████████████████████████████| 549 kB 1.9 MB/s 
    ERROR: Command errored out with exit status 1:
     command: /home/dale/p3/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-d20041xb/anyks-sc_ea24e87779224313b08ac1dc45c32e3a/setup.py'"'"'; __file__='"'"'/tmp/pip-install-d20041xb/anyks-sc_ea24e87779224313b08ac1dc45c32e3a/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-5_tv200w
         cwd: /tmp/pip-install-d20041xb/anyks-sc_ea24e87779224313b08ac1dc45c32e3a/
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-d20041xb/anyks-sc_ea24e87779224313b08ac1dc45c32e3a/setup.py", line 6, in <module>
        import pybind11
    ModuleNotFoundError: No module named 'pybind11'

Apparently, pybind11 should be included as a Python dependency.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.