Giter Site home page Giter Site logo

ReaderBench Python

Install

We recommend using virtual environments, as some packages require an exact version.
If you only want to use the package do the following:

  1. sudo apt-get install python3-pip, python3-venv, python3-dev
  2. python3 -m venv rbenv (create virutal environment named rbenv)
  3. source rbenv/bin/activate (activate virtual env)
  4. pip3 uninstall setuptools && pip3 install setuptools && pip3 install --upgrade pip && pip3 install --no-cache-dir rbpy-rb
  5. Use it as in: https://github.com/readerbench/ReaderBench/blob/master/usage.py

If you want to contribute to the code base of package:

  1. sudo apt-get install python3-pip, python3-venv, python3-dev
  2. git clone [email protected]:ReaderBench/readerbenchpy.git && cd readerbenchpy/
  3. python3 -m venv rbenv (create virutal environment named rbenv)
  4. source rbenv/bin/activate (activate virtual env)
  5. pip3 uninstall setuptools && pip3 install setuptools && pip3 install --upgrade pip
  6. pip3 install -r requirements.txt
  7. python3 nltk_download.py
    Optional: prei-install model for en (otherwise most of the English processings would fail and ask to run this command):
  8. python3 -m spacy download en_core_web_lg

If you want to install spellchecking (hunspell) also you need this non-python libraries:

  1. sudo apt-get install libhunspell-1.6-0 libhunspell-dev hunspell-ro
  2. pip3 install hunspell

Usage

For usage (parsing, lemmatization, NER, wordnet, content words, indices etc.) see file usage.py from https://github.com/readerbench/ReaderBench

Tips

You may also need some spacy models which are downloaded through spacy.
You have to download these spacy models by yourself, using the command:
python3 -m spacy download name_of_the_model The logger will also write instructions on which models you need, and how to download them.

Developer instructions

How to use Bert

Our models are also available in the HuggingFace platform: https://huggingface.co/readerbench

You can use them directly from HuggingFace:

# tensorflow
from transformers import AutoModel, AutoTokenizer, TFAutoModel
tokenizer = AutoTokenizer.from_pretrained("readerbench/RoBERT-base")
model = TFAutoModel.from_pretrained("readerbench/RoBERT-base")
inputs = tokenizer("exemplu de propoziție", return_tensors="tf")
outputs = model(inputs)

# pytorch
from transformers import AutoModel, AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("readerbench/RoBERT-base")
model = AutoModel.from_pretrained("readerbench/RoBERT-base")
inputs = tokenizer("exemplu de propoziție", return_tensors="pt")
outputs = model(**inputs)

or from ReaderBench:

from rb.core.lang import Lang
from rb.processings.encoders.bert import BertWrapper
from tensorflow import keras

bert_wrapper = BertWrapper(Lang.RO, max_seq_len=128)
inputs, bert_layer = bert_wrapper.create_inputs_and_model()
cls_output = bert_wrapper.get_output(bert_layer, "cls") # or "pool"

# Add decision layer and compile model
# eg. 
# hidden = keras.layers.Dense(..)(cls_output)
# output = keras.layers.Dense(..)(hidden)
# model = keras.Model(inputs=inputs, outputs=[output])
# model.compile(..)

bert_wrapper.load_weights() #must be called after compile

# Process inputs for model
feed_inputs = bert_wrapper.process_input(["text1", "text2", "text3"])
# feed_output = ...
# model.fit(feed_inputs, feed_output, ...)

How to use the logger

In each file you have to initialize the logger:

from rb.utils.rblogger import Logger  
logger = Logger.get_logger() 
logger.info("info msg")
logger.warning("warning msg")  
logger.error()

How to push the wheel on pip

  1. rm -r dist/
  2. pip3 install twine wheel
  3. ./upload_to_pypi.sh

How to run rb/core/cscl/csv_parser.py

  1. Do the installing steps from contribution
  2. run pip3 install xmltodict
  3. run EXPORT PYTHONPATH=/add/path/to/repo/readerbenchpy/
  4. add json resources in a jsons directory in readerbenchpy/rb/core/cscl/
  5. run cd rb/core/cscl/ && python3 csv_parser.py

Supported Date Formats

ReaderBench is able to perform conversation analysis from chats and communities. Each utterance must have the time expressed in one of the following formats:

  • %Y-%m-%d %H:%M:%S.%f %Z
  • %Y-%m-%d %H:%M:%S %Z
  • %Y-%m-%d %H:%M %Z
  • %Y-%m-%d %H:%M:%S.%f
  • %Y-%m-%d %H:%M:%S
  • %Y-%m-%d %H:%M where codifications are extracted from Python date format codes.

ReaderBench Research Team's Projects

cve2att-ck icon cve2att-ck

CVE2ATT&CK: BERT-based mapping of CVEs to MITRE ATT&CK Techniques

news-ro-offense icon news-ro-offense

a novel Romanian language dataset for offensive message detection with manually annotated comment from a local Romanian news website (stiri de cluj) into five classes

ro-fb-offense icon ro-fb-offense

FB-RO-Offense corpus, an offensive speech dataset containing 4,455 user-generated comments from Facebook live broadcasts

ro-mgt-detection icon ro-mgt-detection

Romanian multidomain human-machine dataset and detection of machine generated text

ro-offense icon ro-offense

RO-Offense: A Novel Romanian Dataset for Offensive Language in Online Comments

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.