texta-tk / texta Goto Github PK

View Code? Open in Web Editor NEW

32.0 6.0 8.0 107.67 MB

Terminology EXtraction and Text Analytics (TEXTA) Toolkit

Home Page: https://git.texta.ee/texta/texta-rest

License: GNU General Public License v3.0

Python 99.63% Shell 0.16% Dockerfile 0.22%

ai artificial-intelligence natural-language-processing nlp nlp-machine-learning textanalytics django python

texta's People

Contributors

Stargazers

Watchers

Forkers

cbentes laurii o-github-o taagdev alvarlaigna dawn4seren wmramadan sankeerthrao

texta's Issues

Dataset Importer Error

Hi there and thanks for this great initiative!

Unfortunately, I can't import any data.

When trying to do so on http://localhost:8000/dataset_importer/ (there is no explicit link to this page in the interface BTW), by
choosing simple documents or archives,
selecting the appropriate file in Input data,
naming the dataset,
setting overwrite dataset or not,
the job is correctly submitted but processing does not happen.

The following error being logged:

[25/Jun/2018 15:26:04] "GET /static/base/img/bg.jpg HTTP/1.1" 200 34998 Exception ignored in: <module 'threading' from '/usr/local/lib/python3.5/threading.py'> Traceback (most recent call last): File "/usr/local/lib/python3.5/threading.py", line 1351, in _after_fork thread._stop() TypeError: 'Event' object is not callable [25/Jun/2018 15:26:19] "POST /dataset_importer/import HTTP/1.1" 200 0 Process Process-6: Traceback (most recent call last): File "/usr/local/lib/python3.5/multiprocessing/process.py", line 252, in _bootstrap self.run() File "/usr/local/lib/python3.5/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/texta/dataset_importer/importer/importer.py", line 296, in _import_dataset parameter_dict['file_path'] = download(parameter_dict['url'], parameter_dict['directory']) KeyError: 'url' [25/Jun/2018 15:26:19] "GET /dataset_importer/reload_table HTTP/1.1" 200 4436

Fact highlight

Searcher doesn't highlight facts, if there are several listed within one constraint

update docs: if in docker-compose elastic is looping in start-errror

if elasticsearch will die after start with:

ERROR: [1] bootstrap checks failed
max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

one should fix it with something like:

sysctl -w vm.max_map_count=262144

git clone https://github.com/texta-tk/texta.git
cd texta/docker/
docker-compose pull
docker-compose up

Linux texta-test-2 4.4.0-141-generic #167-Ubuntu SMP Wed Dec 5 10:40:15 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Docker version 18.09.1, build 4c52b90
docker-compose version 1.23.2, build 1110ad01

texta-elastic | [2019-01-16T13:34:13,668][INFO ][o.e.d.DiscoveryModule ] [TEXTA-1] using discovery type [zen]
texta-elastic | [2019-01-16T13:34:14,317][INFO ][o.e.n.Node ] [TEXTA-1] initialized
texta-elastic | [2019-01-16T13:34:14,318][INFO ][o.e.n.Node ] [TEXTA-1] starting ...
texta-elastic | [2019-01-16T13:34:14,512][INFO ][o.e.t.TransportService ] [TEXTA-1] publish_address {192.168.16.2:9300}, bound_addresses {0.0.0.0:9300}
texta-elastic | [2019-01-16T13:34:14,527][INFO ][o.e.b.BootstrapChecks ] [TEXTA-1] bound or publishing to a non-loopback address, enforcing bootstrap checks
texta-elastic | ERROR: [1] bootstrap checks failed
texta-elastic | [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
texta-elastic | [2019-01-16T13:34:14,538][INFO ][o.e.n.Node ] [TEXTA-1] stopping ...
texta-elastic | [2019-01-16T13:34:14,612][INFO ][o.e.n.Node ] [TEXTA-1] stopped
texta-elastic | [2019-01-16T13:34:14,612][INFO ][o.e.n.Node ] [TEXTA-1] closing ...
texta-elastic | [2019-01-16T13:34:14,627][INFO ][o.e.n.Node ] [TEXTA-1] closed
texta-elastic exited with code 78

django 2.0.2 doesn't seem to be available for python2.7

Hi there,
texta looks like a great package. Just when trying to install in in my python 2.7 virtual env on Ubuntu, pip informs me that there is no django 2.0.2 for python 2.7 although this is explicitly required in the requirements.txt

Would it make sens to try to run it with an earlier version? Or with python 3.5? Or what am I missing?

Thanks for a hint and best regards,
Stefan

dictor lib

Hello, FYI,

your project uses dictor library, there have been updates in latest dictor version (0.1.1) that remove the eval() function from dictor code for security. As well as other additional changes (see readme)

Newest version is also has better performance parsing large JSON lookups.

context processor import error in production

Importing custom context processor throws import error in production mode (Django 1.10). Error is not present in development mode (tested with Django 1.9 & 1.10). 1434f54#diff-bdf3ecebd8379ca98cc89e545fc90899

Error training language model

Training language model results in traceback:

  File "   /texta/task_manager/tasks/workers/language_model_worker.py", line 50, in run
    iter=int(num_passes)
  File "   /anaconda3/envs/texta-toolkit/lib/python3.5/site-packages/gensim/models/word2vec.py", line 748, in __init__
    fast_version=FAST_VERSION)
  File "   /anaconda3/envs/texta-toolkit/lib/python3.5/site-packages/gensim/models/base_any2vec.py", line 633, in __init__
    end_alpha=self.min_alpha, compute_loss=compute_loss)
  File "   /anaconda3/envs/texta-toolkit/lib/python3.5/site-packages/gensim/models/word2vec.py", line 856, in train
    queue_factor=queue_factor, report_delay=report_delay, compute_loss=compute_loss, callbacks=callbacks)
  File "   /anaconda3/envs/texta-toolkit/lib/python3.5/site-packages/gensim/models/base_any2vec.py", line 938, in train
    queue_factor=queue_factor, report_delay=report_delay, compute_loss=compute_loss, callbacks=callbacks)
  File "   /anaconda3/envs/texta-toolkit/lib/python3.5/site-packages/gensim/models/base_any2vec.py", line 421, in train
    total_words=total_words, **kwargs)
  File "   /anaconda3/envs/texta-toolkit/lib/python3.5/site-packages/gensim/models/base_any2vec.py", line 1044, in _check_training_sanity
    raise RuntimeError("you must first build vocabulary before training the model")

Debugging reveals it might be caused by data that is discarded in EsIterator?
First
response = self.es_m.scroll()
is called
then, another scroll is called overwriting results obtained previously.
response = self.es_m.scroll(scroll_id=scroll_id)
https://github.com/texta-tk/texta/blob/master/task_manager/tools/data_manager.py#L75-L83

To reproduce - add dataset with less than ES_SCROLL_SIZE rows.

[Documentation] Update presentation

Should update documentation for better representation of the project.

1 - Project Logo
2 - Requirements
3 - Logo's for companies/entities using TTK

texta-tk / texta Goto Github PK

texta's People

Contributors

Stargazers

Watchers

Forkers

texta's Issues

Dataset Importer Error

Fact highlight

update docs: if in docker-compose elastic is looping in start-errror

django 2.0.2 doesn't seem to be available for python2.7

dictor lib

context processor import error in production

Error training language model

[Documentation] Update presentation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent