sasafrass / straattaal Goto Github PK
View Code? Open in Web Editor NEWUse an RNN to generate Dutch slang and interpret your newly created words!
License: GNU General Public License v2.0
Use an RNN to generate Dutch slang and interpret your newly created words!
License: GNU General Public License v2.0
If these two notebooks are no longer needed for research / future reference purposes why might as well dispose of them. Nit: they kind of make the repository look like a jupyter notebook repo.
Allow users to edit database entries for their word: changing the meaning, changing the star, changing the groups to which it is shared, removing an entry. Linked to: #81
Current code (and my PR) can only use batch size = 1.
A larger batch size would improve processing speed and make the final models better.
There are probably some torch or torchtext tools to help you deal with varying data point sizes that we could use. (Or just manually pad everything and discard all padding)
The environment.yml should be updated for conda users to match the more up-to-date requirements.txt.
Multiple options here, since there is only a finite amount of words to generate (especially if temp is low)
Let me know what you think..
See #38 (comment)
Right now, model loader reconstructs model according to state_dict by using size of lstm.hh, which is not awful but also not too beautiful if the architecture changes to e.g. two layers, or if the model parameter names slightly change (e.g. lstm
to rnn
).
Could definitely be improved even though it is not an issue for models right now.
(The most generic solution would be to save the model directly (as a "binary") and not a state_dict, which makes the model instantly torch.load-able, but does introduce dependencies on the current exact RNN implementation/location of the definition, torch version, etc, which is why it is not the recommended way to save.)
Update the requirements.txt and environment.yml such that new users now install PyTorch==1.9.x, partially because of its native implementation of pad packed sequence and pack padded sequence.
This should probably be optimized - only load the current model/dataset once.
But I don't know how to do it in a flask-pythonic way ;)
Implement the word meaning functionality. Allow people to generate a word, add their interpretation of the meaning of that word, and persist this to database and make it available on their profile.
When a user goes to the profile page or index, display (a table with) all words and corresponding meanings.
The number of letters used for the loaded model is currently hardcoded. Find a better way to change this depending on the corpus we're dealing with. First solution would be to have this at least be variable for the Slang corpus.
File and line:
For clarity and maintainability purposes we should add return types to the load_model function in https://github.com/Sasafrass/straattaal/blob/9bf261746a020569a0dcf7ecc2c3a26dec1fc376/app/ml_models/rnn/loaded_rnn_model.py
The create group functionality should immediately add the user to the newly created group as well.
Doing an API call here is overkill. Rewrite the following line to an internal function call - may necessitate rewriting some structure.
Line 14 in 506881c
Currently we seem to be reloading the full CSS with every GET request (not sure if that's entirely wrong), which appears to be making the app quite slow.
As a user I am often disgruntled by needing to register/login before being able to use the (local) app
Maybe we can have a guest account as default where you can generate words but don't have fancy options that users have
For those that just want to play around
Enable inviting other people to your group so you can share words.
For better CSRF protection in case the loading of environment variables fails, fall back to a randomly generated string rather than a hard-coded string in:
Line 10 in 506881c
RNN created when loading from model state now has a hardcoded architecture.
In the README we should add/update generated slang words with their corresponding meaning.
Keep generating a new word N times until the edit distance to all items in dataset is non-zero. This increases the probability that we generate a word that is not identical to something in the dataset.
Write some of the arguments more generically as argparse arguments that are subsequently passed to the function.
For inference or re-loading a model, vocabulary file should be loaded instead of a whole dataset object.
Be able to select which words you want to share with a group, and subsequently be able to actually share those words with the group.
Allow users to star a word. This will allow them to save the word, for instance when they think the word is just cool but haven't come up with a meaning yet.
Move vocabulary functionality into one coherent vocabulary class, e.g. char_to_idx, idx_to_char, and sampling from the vocabulary.
Rewrite /api/generate_slang route to use the internal functionality rather than defining functionality itself. In this way, any update to how we generate words in the internal functionality should immediately take effect here as well, dispensing with the need to update this manually == fewer bugs.
More midnight issue ideas
We could keep a list of (convincing) generated words per category and create a sort of 'which word is a real word' quiz, where either
More of a philosophical question.
bvbvmcnv
)Any papers/blog posts on this are welcome!
Generated words are "in Dutch" and are probably most fun for Dutch speakers.
As a (parody of a) user, I am confused: should I be entering a definition in English?
In the future, we could provide the app in English/Dutch, depending on the language of the current model / user preference.
Each type of model should get their own associated database model, such that people can save their desired word to a different table.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.