texta-tk / texta Goto Github PK
View Code? Open in Web Editor NEWTerminology EXtraction and Text Analytics (TEXTA) Toolkit
Home Page: https://git.texta.ee/texta/texta-rest
License: GNU General Public License v3.0
Terminology EXtraction and Text Analytics (TEXTA) Toolkit
Home Page: https://git.texta.ee/texta/texta-rest
License: GNU General Public License v3.0
Hi there and thanks for this great initiative!
Unfortunately, I can't import any data.
When trying to do so on http://localhost:8000/dataset_importer/ (there is no explicit link to this page in the interface BTW), by
choosing simple documents or archives,
selecting the appropriate file in Input data,
naming the dataset,
setting overwrite dataset or not,
the job is correctly submitted but processing does not happen.
The following error being logged:
[25/Jun/2018 15:26:04] "GET /static/base/img/bg.jpg HTTP/1.1" 200 34998 Exception ignored in: <module 'threading' from '/usr/local/lib/python3.5/threading.py'> Traceback (most recent call last): File "/usr/local/lib/python3.5/threading.py", line 1351, in _after_fork thread._stop() TypeError: 'Event' object is not callable [25/Jun/2018 15:26:19] "POST /dataset_importer/import HTTP/1.1" 200 0 Process Process-6: Traceback (most recent call last): File "/usr/local/lib/python3.5/multiprocessing/process.py", line 252, in _bootstrap self.run() File "/usr/local/lib/python3.5/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/texta/dataset_importer/importer/importer.py", line 296, in _import_dataset parameter_dict['file_path'] = download(parameter_dict['url'], parameter_dict['directory']) KeyError: 'url' [25/Jun/2018 15:26:19] "GET /dataset_importer/reload_table HTTP/1.1" 200 4436
Searcher doesn't highlight facts, if there are several listed within one constraint
if elasticsearch will die after start with:
ERROR: [1] bootstrap checks failed
max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
one should fix it with something like:
sysctl -w vm.max_map_count=262144
git clone https://github.com/texta-tk/texta.git
cd texta/docker/
docker-compose pull
docker-compose up
Linux texta-test-2 4.4.0-141-generic #167-Ubuntu SMP Wed Dec 5 10:40:15 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Docker version 18.09.1, build 4c52b90
docker-compose version 1.23.2, build 1110ad01
texta-elastic | [2019-01-16T13:34:13,668][INFO ][o.e.d.DiscoveryModule ] [TEXTA-1] using discovery type [zen]
texta-elastic | [2019-01-16T13:34:14,317][INFO ][o.e.n.Node ] [TEXTA-1] initialized
texta-elastic | [2019-01-16T13:34:14,318][INFO ][o.e.n.Node ] [TEXTA-1] starting ...
texta-elastic | [2019-01-16T13:34:14,512][INFO ][o.e.t.TransportService ] [TEXTA-1] publish_address {192.168.16.2:9300}, bound_addresses {0.0.0.0:9300}
texta-elastic | [2019-01-16T13:34:14,527][INFO ][o.e.b.BootstrapChecks ] [TEXTA-1] bound or publishing to a non-loopback address, enforcing bootstrap checks
texta-elastic | ERROR: [1] bootstrap checks failed
texta-elastic | [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
texta-elastic | [2019-01-16T13:34:14,538][INFO ][o.e.n.Node ] [TEXTA-1] stopping ...
texta-elastic | [2019-01-16T13:34:14,612][INFO ][o.e.n.Node ] [TEXTA-1] stopped
texta-elastic | [2019-01-16T13:34:14,612][INFO ][o.e.n.Node ] [TEXTA-1] closing ...
texta-elastic | [2019-01-16T13:34:14,627][INFO ][o.e.n.Node ] [TEXTA-1] closed
texta-elastic exited with code 78
Hi there,
texta looks like a great package. Just when trying to install in in my python 2.7 virtual env on Ubuntu, pip informs me that there is no django 2.0.2 for python 2.7 although this is explicitly required in the requirements.txt
Would it make sens to try to run it with an earlier version? Or with python 3.5? Or what am I missing?
Thanks for a hint and best regards,
Stefan
Hello, FYI,
your project uses dictor library, there have been updates in latest dictor version (0.1.1) that remove the eval() function from dictor code for security. As well as other additional changes (see readme)
Newest version is also has better performance parsing large JSON lookups.
Importing custom context processor throws import error in production mode (Django 1.10). Error is not present in development mode (tested with Django 1.9 & 1.10). 1434f54#diff-bdf3ecebd8379ca98cc89e545fc90899
Training language model results in traceback:
File " /texta/task_manager/tasks/workers/language_model_worker.py", line 50, in run
iter=int(num_passes)
File " /anaconda3/envs/texta-toolkit/lib/python3.5/site-packages/gensim/models/word2vec.py", line 748, in __init__
fast_version=FAST_VERSION)
File " /anaconda3/envs/texta-toolkit/lib/python3.5/site-packages/gensim/models/base_any2vec.py", line 633, in __init__
end_alpha=self.min_alpha, compute_loss=compute_loss)
File " /anaconda3/envs/texta-toolkit/lib/python3.5/site-packages/gensim/models/word2vec.py", line 856, in train
queue_factor=queue_factor, report_delay=report_delay, compute_loss=compute_loss, callbacks=callbacks)
File " /anaconda3/envs/texta-toolkit/lib/python3.5/site-packages/gensim/models/base_any2vec.py", line 938, in train
queue_factor=queue_factor, report_delay=report_delay, compute_loss=compute_loss, callbacks=callbacks)
File " /anaconda3/envs/texta-toolkit/lib/python3.5/site-packages/gensim/models/base_any2vec.py", line 421, in train
total_words=total_words, **kwargs)
File " /anaconda3/envs/texta-toolkit/lib/python3.5/site-packages/gensim/models/base_any2vec.py", line 1044, in _check_training_sanity
raise RuntimeError("you must first build vocabulary before training the model")
Debugging reveals it might be caused by data that is discarded in EsIterator?
First
response = self.es_m.scroll()
is called
then, another scroll is called overwriting results obtained previously.
response = self.es_m.scroll(scroll_id=scroll_id)
https://github.com/texta-tk/texta/blob/master/task_manager/tools/data_manager.py#L75-L83
To reproduce - add dataset with less than ES_SCROLL_SIZE rows.
Should update documentation for better representation of the project.
1 - Project Logo
2 - Requirements
3 - Logo's for companies/entities using TTK
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.