Giter Site home page Giter Site logo

sergeyshk / ruts Goto Github PK

View Code? Open in Web Editor NEW
103.0 3.0 17.0 4.23 MB

Библиотека для извлечения статистик из текстов на русском языке.

Home Page: https://sergeyshk.github.io/ruTS/

License: MIT License

Python 98.69% Makefile 1.31%
nlp natural-language-processing computational-linguistics text-analytics russian-specific

ruts's Introduction

ruts's People

Contributors

alexeyvatolin avatar dependabot[bot] avatar sergeyshk avatar smekur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ruts's Issues

Разница между ruts.DiversityStats и отдельными функциями

Заметил странное поведение , все значения разные.
t1 = 'Бальзам хороший, но пришёл один а не два, как написано '

import ruts
ds = ruts.DiversityStats(t1)
ds.get_stats()

{'ttr': 1.0,
'rttr': 3.162277660168379,
'cttr': 2.23606797749979,
'httr': 1.0,
'sttr': 0,
'mttr': 0.0,
'dttr': 0,
'mattr': 1.0,
'msttr': 1.0,
'mtld': 0.0,
'mamtld': 1.0,
'hdd': -1,
'simpson_index': 0,
'hapax_index': 0}

vs

print('ttr' , ruts.diversity_stats.calc_ttr(t1))
print('rttr',ruts.diversity_stats.calc_rttr(t1))
print('cttr',ruts.diversity_stats.calc_cttr(t1))
print('httr',ruts.diversity_stats.calc_httr(t1))
print('sttr',ruts.diversity_stats.calc_sttr(t1))
print('mttr',ruts.diversity_stats.calc_mttr(t1))
print('dttr',ruts.diversity_stats.calc_dttr(t1))
print('mattr',ruts.diversity_stats.calc_mattr(t1))
print('msttr',ruts.diversity_stats.calc_msttr(t1))
print('mtld',ruts.diversity_stats.calc_mtld(t1))
print('mamtld',ruts.diversity_stats.calc_mamtld(t1))
print('hdd',ruts.diversity_stats.calc_hdd(t1))
print('simpson_index' , ruts.diversity_stats.calc_simpson_index(t1) )
print('hapax_index',ruts.diversity_stats.calc_hapax_index(t1) )

ttr 0.4
rttr 2.9664793948382653
cttr 2.0976176963403033
httr 0.7713465066366824
sttr 0.5314553128319692
mttr 0.1313826679597258
dttr 7.611354035728222
mattr 0.41
msttr 0.42
mtld 14.338133470257823
mamtld 12.708333333333334
hdd 0.4587105249530551
simpson_index 15.0
hapax_index 319.06649307394474

ошибка подсчета статистик на коротких текстах

Простой пример:

ds = DiversityStats('саид, ты опять абдулле насолил?').get_stats()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/ruts/diversity_stats.py", line 163, in get_stats
    'dttr': self.dttr,
  File "/usr/local/lib/python3.6/dist-packages/ruts/diversity_stats.py", line 119, in dttr
    return calc_dttr(self.words)
  File "/usr/local/lib/python3.6/dist-packages/ruts/diversity_stats.py", line 300, in calc_dttr
    return log10(n_words)**2 / (log10(n_words) - log10(n_lexemes))
ZeroDivisionError: float division by zero

Проверялось на 0.5.0

[Feature request] Опция "нормализации"/масштабирования в Basic stats

Предлагаю добавить опцию представления в нормализованных/относительных величинах большей части статистик из набора BasicStats(). Все количества слов, кроме общего числа слов делить на это общее число слов. Аналогично со знаками.
c_letters, c_syllables, n_complex_words, n_monosyllable_words, n_polysyllable_words, n_long_words, n_simple_words, n_unique_words делить/нормировать на n_words.
n_letters, n_punctuations, n_spaces делить/нормировать на n_chars.
Удобнее не самому делить, а сразу получать в выдаче.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.