Giter Site home page Giter Site logo

rasahq / rasa-nlu-examples Goto Github PK

View Code? Open in Web Editor NEW
188.0 14.0 78.0 3.79 MB

This repository contains examples of custom components for educational purposes.

Home Page: https://RasaHQ.github.io/rasa-nlu-examples/

License: Apache License 2.0

Python 99.62% Makefile 0.38%
rasa rasa-nlu

rasa-nlu-examples's Issues

Standardise Ways to Share Results

We'd love it if people could let us know what tools help them. With that in mind, it would be good to consider a page on the documentation where we can list results. It'd probably be best to seperate the results per language per project?

This thread is to collect ideas on this topic.

Non-English Namelists

This is a response to this question and this one. It seems clear that a pre-trained French model can't be expected to detect names that aren't from France very well. The questions on the forum are about French but I can certainly imagine this issue also being relevant for other languages too.

So as a next-best-idea. We should host common names per country somewhere so folks might use it as a lookup for Regex. This needs to be a community effort but it can be very helpful.

Benchmarking Results

We want to add a portion to the documentation where people can share some results of the tools in this library. We'll gladly link to any blogpost/github project as well but we'd love to hear/list in which scenarios our tools make a difference.

Anybody with results is free to ping @koaning here.

Support 2.0

Are you going to adapt examples to 2.0 version?

Add Sentencepiece Tokeniser support

SentencePiece is generally used to create byte pairs in any language, as I can find there is no inbuilt support for this kind of tokenisation in rasa. Even though this library uses BPEmb but it is only limited for pretrained embeddings and not tokenisation, since Whitespace tokeniser doesn't always perform good, i would like to have support for it. I am willing to do PR for this, but I don't know about the contribution steps here.

deploy rasa-nlu-examples with docker

hi every one,
I trained a new just-nlu-model with my own w2v model. now, i want to put it in production/staging Env.
but official version of rasa docker not support this project. can i ask you show me how create a docker file from official rasa docker to use my own w2v model?

missing GensimFeaturizer module after pip install git+

I put Gensimfeaturizer in rasa nlu pipeline and got this message:

Exception: Failed to find class 'GensimFeaturizer' in module 'rasa_nlu_examples.featurizers.dense'.

After try uninstall and install with pip git and python pip git, clone master branch with last commit and ... but the problem not solved.
finally, i found that the class .py file not present so i copy/paste gensim_featurizer.py in package installed folder(/rasa_nlu_example/featurizer/dense) and it did work!

Experiment with `padatious`

There was a package suggested here that might be worth exploring. It is suggested that it works really well on very small datasets. The project can be found here.

Printer should use Rich

The printer should print pretty objects. The rich library might be able to make a difference. It might make the output a lot clearer.

Before an implementation happens a path forward should be discussed here. Are we going to use tables? Colors?

Add LogisticRegression

I'd like to add another intent classification model here. Similar to the naive bayes model.

BlankSpacyTokenizer

Spacy has lots of blank tokenizers. They are rule-based, and therefore they should also technically have features that our simple white-space tokenizer doesn't.

Feature Request: spaCy POS tags

A lot of entities will be nouns. Even if we don't use spaCy as an entity detection engine, it does come with useful language detection features that might be useful for our pipeline. Would be worth an experiment.

NLU stopwords

The Rasa Countvectorizer currently has a stopwords attribute, but it unfortunately only works for the sparse features. Any word embeddings that belong to a stopword are still generated. It might make sense to build a component that actually removes the stopwords from the message before it is handled by other components.

Speed Up Tests

Currently we're using the cli runner to run smoke-tests. This is a good idea but it is incredibly slow. It has to close down and load up tensorflow at every pass and this is slowing down the CI on github.

It's probably much better to instead remove the cli and instead run the command that is triggered from the cli instead from within python.

FlashTextEntityExtractor

Regex can be slow. Instead, it might help to have an entity extractor that is based on flashtext. This can really help for long name-lists for example.

Doc2Vec instead of Word2Vec model for Gensim featurizer?

Wondering if doc2vec, instead of word2vec, model can be trained and deployed to Gensim featurizer?

Also if HFTransformer is chosen as the Language Model, thus the corresponding tokenizer and featurizer are used in NLU pipeline, can the Gensim featurizer can still be added to the pipeline to improve domain specific processing?

Thanks.

Dependency Management

It might be better to make it the users' responsibility to handle the dependencies and now have them install immediately via pip. I don't think we can use the [all] syntax when we're using github but we could explore that.

In any case, we really don't want folks to download the thai-language tools or pytorch if they're only using the bytepair embeddings.

Adding Stanza

In places where FastText wrapped into spaCy is no use, Stanza comes in handy - it can give us the necessary POS-es and lemmatization. It is, at least, the case for Estonian. Should be also for Finnish, Hebrew, Hindi, Hungarian, Indonesian, Irish, Korean, Latvian, Persian, Telugu, Urdu, etc.

Feature Request: Gensim Key-Value Paris

Gensim probably offers the easier way to train your own embeddings which might allow for users to use their own if they have a corpus that is reliable. I've understood that wikipedia is not a reliable source of online-slang for many languages.

Investigate CLTK

I found this project on github and it might offer tools interesting for, among other languages, Hindi. I do not know if the tokenizers are of high quality but they are documented here.

zemberek

From Tensorflow Turkey meetup. Let's investigate if we can add it here.

Turkish NLU data

I'd like to add Turkish NLU data if there isn't anyone else doing it at the moment.

Unclear error message when file is not found

Currently when the file is not found at the specified cache_dir in the config.yml file, the error message is very opaque:

ValueError: /path/to/file/wiki.es.bin cannot be opened for loading!

The problem was that the file wiki.es.bin does not exist at /path/to/file, though this is not at all obvious from the error message.

CustomPythonComponent

Goal

To make hacking around easier for our research department, it might make sense to have a component that can just apply a function on a message.

The idea is that you as a user can define a file, say custom_component.py like;

# custom_component.py
from rasa_nlu_examples.meta import CustomPythonComponent
model = load_fasttext()

def fasttext(message, setting_a=1, setting_b=2):
    """this is pseudocode"""
    model.process(message) 
    return message # this message now has extra features attached

MyFastTextTool = CustomPythonComponent(fasttext, setting_a=1, setting_b=2)

Once such a file is around your project, it'd be cool if you could do;

language: en

pipeline:
- name: WhitespaceTokenizer
- name: LexicalSyntacticFeaturizer
- name: rasa_nlu_examples.meta.Printer
  alias: before count vectors
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: rasa_nlu_examples.meta.Printer
  alias: after count vectors
- name: custom_component.MyFastTextTool
  setting_a: 1
  setting_b: 2
- name: DIETClassifier
  epochs: 100

This should make it much easier to add a featurizer. We shouldn't implement internal tools with it, but this might allow for some experimentation with actual rasa tools as opposed to jupyter notebooks.

Tokenizers for Less Common Languages

The whitespace tokenizer in Rasa is focussed on western languages. If there are languages who appreciate a different tokenizer then we might explore alternatives in this thread.

Github Workflow does not work with BytePairFeaturizer anymore, because FastTextFeaturizer can't be found

Hey there, I am using a CI/CD pipeline on github for a while with installing rasa nlu examples, training and testing the model.
It worked without any problems.
Today the workflow fails and i get this error:

ComponentNotFoundException: Failed to load the component 'rasa_nlu_examples.featurizers.dense.BytePairFeaturizer'. Failed to find module 'rasa_nlu_examples.featurizers.dense'. Either your pipeline configuration contains an error or the module you are trying to import is broken (e.g. the module is trying to import a package that is not installed). Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/rasa/nlu/registry.py", line 121, in get_component_class
    return rasa.shared.utils.common.class_from_module_path(component_name)
  File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/rasa/shared/utils/common.py", line 20, in class_from_module_path
    m = importlib.import_module(module_name)
  File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/rasa_nlu_examples/featurizers/dense/__init__.py", line 1, in <module>
    from .fasttext_featurizer import FastTextFeaturizer
  File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/rasa_nlu_examples/featurizers/dense/fasttext_featurizer.py", line 5, in <module>
    import fasttext
ModuleNotFoundError: No module named 'fasttext'

Error: Process completed with exit code 1.

ButtonConfirmAction

It might be nice to open source an action that, given a threshold of uncertainty, asks the user with buttons which intent was actually queried.

Tensorflow error while running benchmarking guide

Hi @koaning

I follow benchmarking guideline here
https://rasahq.github.io/rasa-nlu-examples/benchmarking/

but found this error


(binus) Wellys-MacBook-Pro:rasa-demo wellytambunan$ rasa test nlu --config basic-bytepair.config.yml           --cross-validation --runs 1 --folds 2           --out gridresults/basic-bytepair-config
2020-08-14 10:22:31 INFO     rasa.cli.test  - Test model using cross validation.
Traceback (most recent call last):
  File "/Users/wellytambunan/opt/anaconda3/envs/binus/bin/rasa", line 8, in <module>
    sys.exit(main())
  File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/rasa/__main__.py", line 92, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/rasa/cli/test.py", line 147, in run_nlu_test
    perform_nlu_cross_validation(config, nlu_data, output, vars(args))
  File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/rasa/test.py", line 243, in perform_nlu_cross_validation
    data, folds, nlu_config, output, **kwargs
  File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/rasa/nlu/test.py", line 1354, in cross_validate
    trainer = Trainer(nlu_config)
  File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/rasa/nlu/model.py", line 142, in __init__
    components.validate_requirements(cfg.component_names)
  File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/rasa/nlu/components.py", line 46, in validate_requirements
    from rasa.nlu import registry
  File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/rasa/nlu/registry.py", line 13, in <module>
    from rasa.nlu.classifiers.diet_classifier import DIETClassifier
  File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/rasa/nlu/classifiers/diet_classifier.py", line 9, in <module>
    import tensorflow_addons as tfa
  File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/tensorflow_addons/__init__.py", line 21, in <module>
    from tensorflow_addons import activations
  File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/tensorflow_addons/activations/__init__.py", line 21, in <module>
    from tensorflow_addons.activations.gelu import gelu
  File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/tensorflow_addons/activations/gelu.py", line 24, in <module>
    get_path_to_datafile("custom_ops/activations/_activation_ops.so"))
  File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 58, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
**tensorflow.python.framework.errors_impl.NotFoundError:** dlopen(/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/tensorflow_addons/custom_ops/activations/_activation_ops.so, 6): **Symbol not found:** __ZN10tensorflow11GetNodeAttrERKNS_9AttrSliceEN4absl11string_viewEPb
  Referenced from: /Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/tensorflow_addons/custom_ops/activations/_activation_ops.so
  Expected in: /Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.2.dylib
 in /Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/tensorflow_addons/custom_ops/activations/_activation_ops.so
(binus) Wellys-MacBook-Pro:rasa-demo wellytambunan$ 

SparseSpacyFeaturizer

If you have a look at all the attributes that spaCy generates for their tokens then you can imagine that some of these features can be useful for machine learning pipelines. To name a few:

  • is_oov: is the token part of the vocabulary/does it have a vector?
  • is_stop: is the token a stopword?
  • lemma_: what is the lemma of the token
  • pos/tag coarse/fine-grained part of speech information
  • morphological features
  • grammatical dependency

These can all have a discrete representation and could be added in general to a Rasa pipeline.

Missing `prepare_everything.py`

The make install command ends with

python tests/prepare_everything.py
python: can't open file 'tests/prepare_everything.py': [Errno 2] No such file or directory
make: *** [Makefile:4: install] Error 2

Also, the documentation refers to this file, but it doesn't exist.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.