rasahq / rasa-nlu-examples Goto Github PK
View Code? Open in Web Editor NEWThis repository contains examples of custom components for educational purposes.
Home Page: https://RasaHQ.github.io/rasa-nlu-examples/
License: Apache License 2.0
This repository contains examples of custom components for educational purposes.
Home Page: https://RasaHQ.github.io/rasa-nlu-examples/
License: Apache License 2.0
We'd love it if people could let us know what tools help them. With that in mind, it would be good to consider a page on the documentation where we can list results. It'd probably be best to seperate the results per language per project?
This thread is to collect ideas on this topic.
This is a response to this question and this one. It seems clear that a pre-trained French model can't be expected to detect names that aren't from France very well. The questions on the forum are about French but I can certainly imagine this issue also being relevant for other languages too.
So as a next-best-idea. We should host common names per country somewhere so folks might use it as a lookup for Regex. This needs to be a community effort but it can be very helpful.
Something to keep an eye on:
https://anoopkunchukuttan.github.io/indic_nlp_library/
If we receive feedback that this tool could be useful to add here, we should.
I've just been told it might work well for Turkish.
We want to add a portion to the documentation where people can share some results of the tools in this library. We'll gladly link to any blogpost/github project as well but we'd love to hear/list in which scenarios our tools make a difference.
Anybody with results is free to ping @koaning here.
Are you going to adapt examples to 2.0 version?
SentencePiece is generally used to create byte pairs in any language, as I can find there is no inbuilt support for this kind of tokenisation in rasa. Even though this library uses BPEmb but it is only limited for pretrained embeddings and not tokenisation, since Whitespace tokeniser doesn't always perform good, i would like to have support for it. I am willing to do PR for this, but I don't know about the contribution steps here.
Might be good to map numeric modifiers to entities.
https://forum.rasa.com/t/how-to-map-multiple-items-and-their-quantities-rasa-x/38571/11
hi every one,
I trained a new just-nlu-model with my own w2v model. now, i want to put it in production/staging Env.
but official version of rasa docker not support this project. can i ask you show me how create a docker file from official rasa docker to use my own w2v model?
I put Gensimfeaturizer in rasa nlu pipeline and got this message:
Exception: Failed to find class 'GensimFeaturizer' in module 'rasa_nlu_examples.featurizers.dense'.
After try uninstall and install with pip git and python pip git, clone master branch with last commit and ... but the problem not solved.
finally, i found that the class .py file not present so i copy/paste gensim_featurizer.py in package installed folder(/rasa_nlu_example/featurizer/dense) and it did work!
The printer should print pretty objects. The rich library might be able to make a difference. It might make the output a lot clearer.
Before an implementation happens a path forward should be discussed here. Are we going to use tables? Colors?
I'd like to add another intent classification model here. Similar to the naive bayes model.
Spacy has lots of blank tokenizers. They are rule-based, and therefore they should also technically have features that our simple white-space tokenizer doesn't.
A lot of entities will be nouns. Even if we don't use spaCy as an entity detection engine, it does come with useful language detection features that might be useful for our pipeline. Would be worth an experiment.
The Rasa Countvectorizer currently has a stopwords attribute, but it unfortunately only works for the sparse features. Any word embeddings that belong to a stopword are still generated. It might make sense to build a component that actually removes the stopwords from the message before it is handled by other components.
Currently we're using the cli runner to run smoke-tests. This is a good idea but it is incredibly slow. It has to close down and load up tensorflow at every pass and this is slowing down the CI on github.
It's probably much better to instead remove the cli and instead run the command that is triggered from the cli instead from within python.
Regex can be slow. Instead, it might help to have an entity extractor that is based on flashtext. This can really help for long name-lists for example.
I've noticed that we might only be interested in printing information about certain properties in the pipeline sofar. It'd be nice if we could configure the component to allow for that.
Wondering if doc2vec, instead of word2vec, model can be trained and deployed to Gensim featurizer?
Also if HFTransformer is chosen as the Language Model, thus the corresponding tokenizer and featurizer are used in NLU pipeline, can the Gensim featurizer can still be added to the pipeline to improve domain specific processing?
Thanks.
It might be better to make it the users' responsibility to handle the dependencies and now have them install immediately via pip. I don't think we can use the [all]
syntax when we're using github but we could explore that.
In any case, we really don't want folks to download the thai-language tools or pytorch if they're only using the bytepair embeddings.
Gensim probably offers the easier way to train your own embeddings which might allow for users to use their own if they have a corpus that is reliable. I've understood that wikipedia is not a reliable source of online-slang for many languages.
Fasttext has a tool for this. Might be nice to check if we can use it in a component.
Could it trigger a RulePolicy?
As reported RasaHQ/rasalit#37, it's something that needs fixing.
Issue described in detail here.
I'm wondering if the issue here isn't propagated to our WhitespaceTokenizer. It'd be good to confirm it's not an issue.
japanese tools https://www.aclweb.org/anthology/2020.nlposs-1.7/
From Tensorflow Turkey meetup. Let's investigate if we can add it here.
I'd like to add Turkish NLU data if there isn't anyone else doing it at the moment.
As per the discussion on Slack here (summary: we've decided against exposing token_pattern
and use analyzer
instead) we should remove these occurrences and replace them with the appropriate analyzer
argument (found in benchmarking.md and readme.md
)
Currently when the file
is not found at the specified cache_dir
in the config.yml
file, the error message is very opaque:
ValueError: /path/to/file/wiki.es.bin cannot be opened for loading!
The problem was that the file wiki.es.bin
does not exist at /path/to/file
, though this is not at all obvious from the error message.
Goal
To make hacking around easier for our research department, it might make sense to have a component that can just apply a function on a message.
The idea is that you as a user can define a file, say custom_component.py
like;
# custom_component.py
from rasa_nlu_examples.meta import CustomPythonComponent
model = load_fasttext()
def fasttext(message, setting_a=1, setting_b=2):
"""this is pseudocode"""
model.process(message)
return message # this message now has extra features attached
MyFastTextTool = CustomPythonComponent(fasttext, setting_a=1, setting_b=2)
Once such a file is around your project, it'd be cool if you could do;
language: en
pipeline:
- name: WhitespaceTokenizer
- name: LexicalSyntacticFeaturizer
- name: rasa_nlu_examples.meta.Printer
alias: before count vectors
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: rasa_nlu_examples.meta.Printer
alias: after count vectors
- name: custom_component.MyFastTextTool
setting_a: 1
setting_b: 2
- name: DIETClassifier
epochs: 100
This should make it much easier to add a featurizer. We shouldn't implement internal tools with it, but this might allow for some experimentation with actual rasa tools as opposed to jupyter notebooks.
The whitespace tokenizer in Rasa is focussed on western languages. If there are languages who appreciate a different tokenizer then we might explore alternatives in this thread.
Hey there, I am using a CI/CD pipeline on github for a while with installing rasa nlu examples, training and testing the model.
It worked without any problems.
Today the workflow fails and i get this error:
ComponentNotFoundException: Failed to load the component 'rasa_nlu_examples.featurizers.dense.BytePairFeaturizer'. Failed to find module 'rasa_nlu_examples.featurizers.dense'. Either your pipeline configuration contains an error or the module you are trying to import is broken (e.g. the module is trying to import a package that is not installed). Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/rasa/nlu/registry.py", line 121, in get_component_class
return rasa.shared.utils.common.class_from_module_path(component_name)
File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/rasa/shared/utils/common.py", line 20, in class_from_module_path
m = importlib.import_module(module_name)
File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/rasa_nlu_examples/featurizers/dense/__init__.py", line 1, in <module>
from .fasttext_featurizer import FastTextFeaturizer
File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/rasa_nlu_examples/featurizers/dense/fasttext_featurizer.py", line 5, in <module>
import fasttext
ModuleNotFoundError: No module named 'fasttext'
Error: Process completed with exit code 1.
It might make the tests a whole lot faster but right now SklearnIntentClassifier
only works with dense features.
It might be nice to open source an action that, given a threshold of uncertainty, asks the user with buttons which intent was actually queried.
I should add a guide on how to use these embeddings:
It might be worth it to also investigate this library. https://polyglot.readthedocs.io/en/latest/NamedEntityRecognition.html
I should add a guide on how to use these embeddings:
Hi @koaning
I follow benchmarking guideline here
https://rasahq.github.io/rasa-nlu-examples/benchmarking/
but found this error
(binus) Wellys-MacBook-Pro:rasa-demo wellytambunan$ rasa test nlu --config basic-bytepair.config.yml --cross-validation --runs 1 --folds 2 --out gridresults/basic-bytepair-config
2020-08-14 10:22:31 INFO rasa.cli.test - Test model using cross validation.
Traceback (most recent call last):
File "/Users/wellytambunan/opt/anaconda3/envs/binus/bin/rasa", line 8, in <module>
sys.exit(main())
File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/rasa/__main__.py", line 92, in main
cmdline_arguments.func(cmdline_arguments)
File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/rasa/cli/test.py", line 147, in run_nlu_test
perform_nlu_cross_validation(config, nlu_data, output, vars(args))
File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/rasa/test.py", line 243, in perform_nlu_cross_validation
data, folds, nlu_config, output, **kwargs
File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/rasa/nlu/test.py", line 1354, in cross_validate
trainer = Trainer(nlu_config)
File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/rasa/nlu/model.py", line 142, in __init__
components.validate_requirements(cfg.component_names)
File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/rasa/nlu/components.py", line 46, in validate_requirements
from rasa.nlu import registry
File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/rasa/nlu/registry.py", line 13, in <module>
from rasa.nlu.classifiers.diet_classifier import DIETClassifier
File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/rasa/nlu/classifiers/diet_classifier.py", line 9, in <module>
import tensorflow_addons as tfa
File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/tensorflow_addons/__init__.py", line 21, in <module>
from tensorflow_addons import activations
File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/tensorflow_addons/activations/__init__.py", line 21, in <module>
from tensorflow_addons.activations.gelu import gelu
File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/tensorflow_addons/activations/gelu.py", line 24, in <module>
get_path_to_datafile("custom_ops/activations/_activation_ops.so"))
File "/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 58, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
**tensorflow.python.framework.errors_impl.NotFoundError:** dlopen(/Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/tensorflow_addons/custom_ops/activations/_activation_ops.so, 6): **Symbol not found:** __ZN10tensorflow11GetNodeAttrERKNS_9AttrSliceEN4absl11string_viewEPb
Referenced from: /Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/tensorflow_addons/custom_ops/activations/_activation_ops.so
Expected in: /Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.2.dylib
in /Users/wellytambunan/opt/anaconda3/envs/binus/lib/python3.6/site-packages/tensorflow_addons/custom_ops/activations/_activation_ops.so
(binus) Wellys-MacBook-Pro:rasa-demo wellytambunan$
There's a guide here: https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4
We might be able to fetch a tokenizer from here and make it compatible for Rasa.
If you have a look at all the attributes that spaCy generates for their tokens then you can imagine that some of these features can be useful for machine learning pipelines. To name a few:
is_oov
: is the token part of the vocabulary/does it have a vector?is_stop
: is the token a stopword?lemma_
: what is the lemma of the tokenpos
/tag
coarse/fine-grained part of speech informationThese can all have a discrete representation and could be added in general to a Rasa pipeline.
The make install
command ends with
python tests/prepare_everything.py
python: can't open file 'tests/prepare_everything.py': [Errno 2] No such file or directory
make: *** [Makefile:4: install] Error 2
Also, the documentation refers to this file, but it doesn't exist.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.