Giter Site home page Giter Site logo

texta-tk / texta Goto Github PK

View Code? Open in Web Editor NEW
32.0 6.0 8.0 107.67 MB

Terminology EXtraction and Text Analytics (TEXTA) Toolkit

Home Page: https://git.texta.ee/texta/texta-rest

License: GNU General Public License v3.0

Python 99.63% Shell 0.16% Dockerfile 0.22%
ai artificial-intelligence natural-language-processing nlp nlp-machine-learning textanalytics django python

texta's People

Contributors

asula avatar erikjyrmann avatar githubuser88442 avatar gpaimla avatar helehh avatar jussuf avatar lindafr avatar mrkkollo avatar ranetp avatar rsirel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

texta's Issues

Dataset Importer Error

Hi there and thanks for this great initiative!

Unfortunately, I can't import any data.

When trying to do so on http://localhost:8000/dataset_importer/ (there is no explicit link to this page in the interface BTW), by
choosing simple documents or archives,
selecting the appropriate file in Input data,
naming the dataset,
setting overwrite dataset or not,
the job is correctly submitted but processing does not happen.

The following error being logged:

[25/Jun/2018 15:26:04] "GET /static/base/img/bg.jpg HTTP/1.1" 200 34998 Exception ignored in: <module 'threading' from '/usr/local/lib/python3.5/threading.py'> Traceback (most recent call last): File "/usr/local/lib/python3.5/threading.py", line 1351, in _after_fork thread._stop() TypeError: 'Event' object is not callable [25/Jun/2018 15:26:19] "POST /dataset_importer/import HTTP/1.1" 200 0 Process Process-6: Traceback (most recent call last): File "/usr/local/lib/python3.5/multiprocessing/process.py", line 252, in _bootstrap self.run() File "/usr/local/lib/python3.5/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/texta/dataset_importer/importer/importer.py", line 296, in _import_dataset parameter_dict['file_path'] = download(parameter_dict['url'], parameter_dict['directory']) KeyError: 'url' [25/Jun/2018 15:26:19] "GET /dataset_importer/reload_table HTTP/1.1" 200 4436

Fact highlight

Searcher doesn't highlight facts, if there are several listed within one constraint

update docs: if in docker-compose elastic is looping in start-errror

if elasticsearch will die after start with:

ERROR: [1] bootstrap checks failed
max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

one should fix it with something like:

sysctl -w vm.max_map_count=262144


git clone https://github.com/texta-tk/texta.git
cd texta/docker/
docker-compose pull
docker-compose up


Linux texta-test-2 4.4.0-141-generic #167-Ubuntu SMP Wed Dec 5 10:40:15 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Docker version 18.09.1, build 4c52b90
docker-compose version 1.23.2, build 1110ad01


texta-elastic | [2019-01-16T13:34:13,668][INFO ][o.e.d.DiscoveryModule ] [TEXTA-1] using discovery type [zen]
texta-elastic | [2019-01-16T13:34:14,317][INFO ][o.e.n.Node ] [TEXTA-1] initialized
texta-elastic | [2019-01-16T13:34:14,318][INFO ][o.e.n.Node ] [TEXTA-1] starting ...
texta-elastic | [2019-01-16T13:34:14,512][INFO ][o.e.t.TransportService ] [TEXTA-1] publish_address {192.168.16.2:9300}, bound_addresses {0.0.0.0:9300}
texta-elastic | [2019-01-16T13:34:14,527][INFO ][o.e.b.BootstrapChecks ] [TEXTA-1] bound or publishing to a non-loopback address, enforcing bootstrap checks
texta-elastic | ERROR: [1] bootstrap checks failed
texta-elastic | [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
texta-elastic | [2019-01-16T13:34:14,538][INFO ][o.e.n.Node ] [TEXTA-1] stopping ...
texta-elastic | [2019-01-16T13:34:14,612][INFO ][o.e.n.Node ] [TEXTA-1] stopped
texta-elastic | [2019-01-16T13:34:14,612][INFO ][o.e.n.Node ] [TEXTA-1] closing ...
texta-elastic | [2019-01-16T13:34:14,627][INFO ][o.e.n.Node ] [TEXTA-1] closed
texta-elastic exited with code 78

django 2.0.2 doesn't seem to be available for python2.7

Hi there,
texta looks like a great package. Just when trying to install in in my python 2.7 virtual env on Ubuntu, pip informs me that there is no django 2.0.2 for python 2.7 although this is explicitly required in the requirements.txt

Would it make sens to try to run it with an earlier version? Or with python 3.5? Or what am I missing?

Thanks for a hint and best regards,
Stefan

dictor lib

Hello, FYI,

your project uses dictor library, there have been updates in latest dictor version (0.1.1) that remove the eval() function from dictor code for security. As well as other additional changes (see readme)

Newest version is also has better performance parsing large JSON lookups.

Error training language model

Training language model results in traceback:

  File "   /texta/task_manager/tasks/workers/language_model_worker.py", line 50, in run
    iter=int(num_passes)
  File "   /anaconda3/envs/texta-toolkit/lib/python3.5/site-packages/gensim/models/word2vec.py", line 748, in __init__
    fast_version=FAST_VERSION)
  File "   /anaconda3/envs/texta-toolkit/lib/python3.5/site-packages/gensim/models/base_any2vec.py", line 633, in __init__
    end_alpha=self.min_alpha, compute_loss=compute_loss)
  File "   /anaconda3/envs/texta-toolkit/lib/python3.5/site-packages/gensim/models/word2vec.py", line 856, in train
    queue_factor=queue_factor, report_delay=report_delay, compute_loss=compute_loss, callbacks=callbacks)
  File "   /anaconda3/envs/texta-toolkit/lib/python3.5/site-packages/gensim/models/base_any2vec.py", line 938, in train
    queue_factor=queue_factor, report_delay=report_delay, compute_loss=compute_loss, callbacks=callbacks)
  File "   /anaconda3/envs/texta-toolkit/lib/python3.5/site-packages/gensim/models/base_any2vec.py", line 421, in train
    total_words=total_words, **kwargs)
  File "   /anaconda3/envs/texta-toolkit/lib/python3.5/site-packages/gensim/models/base_any2vec.py", line 1044, in _check_training_sanity
    raise RuntimeError("you must first build vocabulary before training the model")

Debugging reveals it might be caused by data that is discarded in EsIterator?
First
response = self.es_m.scroll()
is called
then, another scroll is called overwriting results obtained previously.
response = self.es_m.scroll(scroll_id=scroll_id)
https://github.com/texta-tk/texta/blob/master/task_manager/tools/data_manager.py#L75-L83

To reproduce - add dataset with less than ES_SCROLL_SIZE rows.

[Documentation] Update presentation

Should update documentation for better representation of the project.

1 - Project Logo
2 - Requirements
3 - Logo's for companies/entities using TTK

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.