The straattaal's discuss from sasafrass

Remove rnn_generator.ipynb and generate.ipynb if no longer needed.

If these two notebooks are no longer needed for research / future reference purposes why might as well dispose of them. Nit: they kind of make the repository look like a jupyter notebook repo.

Editing a database entry for a word

Allow users to edit database entries for their word: changing the meaning, changing the star, changing the groups to which it is shared, removing an entry. Linked to: #81

Support batch size > 1

Current code (and my PR) can only use batch size = 1.

A larger batch size would improve processing speed and make the final models better.

There are probably some torch or torchtext tools to help you deal with varying data point sizes that we could use. (Or just manually pad everything and discard all padding)

Update environment.yml to match requirements.txt.

The environment.yml should be updated for conda users to match the more up-to-date requirements.txt.

Model generates words that the user has already seen recently or assigned meaning to

Multiple options here, since there is only a finite amount of words to generate (especially if temp is low)

Keep a list of recent words to not generate, always incl. words the user has already assigned meaning to
have a 'skip'/'trash' button so that the word is 'blacklisted' for that user
both
other

Let me know what you think..

Right now, model loader reconstructs model according to state_dict by using size of lstm.hh, which is not awful but also not too beautiful if the architecture changes to e.g. two layers, or if the model parameter names slightly change (e.g. lstm to rnn).

Could definitely be improved even though it is not an issue for models right now.

(The most generic solution would be to save the model directly (as a "binary") and not a state_dict, which makes the model instantly torch.load-able, but does introduce dependencies on the current exact RNN implementation/location of the definition, torch version, etc, which is why it is not the recommended way to save.)

Upgrade PyTorch to 1.9 - both for conda and pip users.

Update the requirements.txt and environment.yml such that new users now install PyTorch==1.9.x, partially because of its native implementation of pad packed sequence and pack padded sequence.

Add Temperature Slider

Pressing 'generate word' reloads the model/vocabulary each time

This should probably be optimized - only load the current model/dataset once.

But I don't know how to do it in a flask-pythonic way ;)

Add Word Meaning Functionality

Implement the word meaning functionality. Allow people to generate a word, add their interpretation of the meaning of that word, and persist this to database and make it available on their profile.

Display all liked words and meanings.

When a user goes to the profile page or index, display (a table with) all words and corresponding meanings.

Fix hardcoded number of letters.

The number of letters used for the loaded model is currently hardcoded. Find a better way to change this depending on the corpus we're dealing with. First solution would be to have this at least be variable for the Slang corpus.

File and line:

straattaal/app/ml_models/rnn/loaded_rnn_model.py

Line 10 in 0493287

n_letters = 27 # TODO: Fix the hardcoded number of letters.

Add return types in load_model function

For clarity and maintainability purposes we should add return types to the load_model function in https://github.com/Sasafrass/straattaal/blob/9bf261746a020569a0dcf7ecc2c3a26dec1fc376/app/ml_models/rnn/loaded_rnn_model.py

Create Group Should Add User To Group

The create group functionality should immediately add the user to the newly created group as well.

Make internal function call instead of API call

Doing an API call here is overkill. Rewrite the following line to an internal function call - may necessitate rewriting some structure.

straattaal/app/main/routes.py

Line 14 in 506881c

response = requests.get("http://localhost:5000/api/generate_slang")

Fix loading static files with every request.

Currently we seem to be reloading the full CSS with every GET request (not sure if that's entirely wrong), which appears to be making the app quite slow.

Have a "guest user" as default option

As a user I am often disgruntled by needing to register/login before being able to use the (local) app

Maybe we can have a guest account as default where you can generate words but don't have fancy options that users have
For those that just want to play around

Invite other people to group

Enable inviting other people to your group so you can share words.

Add temperature slider

Fall back to randomized SECRET_KEY instead of hard-coded key.

For better CSRF protection in case the loading of environment variables fails, fall back to a randomly generated string rather than a hard-coded string in:

straattaal/config.py

Line 10 in 506881c

SECRET_KEY = os.environ.get('SECRET_KEY') or 'you-will-never-guess'

Un-hardcode hardcoded hidden size when loading RNN

RNN created when loading from model state now has a hardcoded architecture.

#20 (comment)

Update README words with their corresponding meaning.

In the README we should add/update generated slang words with their corresponding meaning.

Keep generating words N times until edit distance is non-zero

Keep generating a new word N times until the edit distance to all items in dataset is non-zero. This increases the probability that we generate a word that is not identical to something in the dataset.

Swipe Left For No

Argparse arguments for filename_sources, filename_datasets, and name in train_wrapper.py

Write some of the arguments more generically as argparse arguments that are subsequently passed to the function.

necessary to pass dataset to generate text, could be done with a 'vocabulary' object or similar

For inference or re-loading a model, vocabulary file should be loaded instead of a whole dataset object.

Make a smol coherent vocabulary class that can also create a vocabulary from a dataset
Ensure vocabulary size matches when setting the vocabulary of some model
Vocabularies can be used by multiple models, the order of vocabularies is deterministic

(#20 (comment))

Share words with group

Be able to select which words you want to share with a group, and subsequently be able to actually share those words with the group.

Starring a word

Allow users to star a word. This will allow them to save the word, for instance when they think the word is just cool but haven't come up with a meaning yet.

Swipe Right For Yes

Creation Of Groups - Flask

Implement Vocabulary Object.

Move vocabulary functionality into one coherent vocabulary class, e.g. char_to_idx, idx_to_char, and sampling from the vocabulary.

Rewrite /api/generate_slang route to use internal (new) functionality.

Rewrite /api/generate_slang route to use the internal functionality rather than defining functionality itself. In this way, any update to how we generate words in the internal functionality should immediately take effect here as well, dispensing with the need to update this manually == fewer bugs.

Creation Of Groups - Database

Turing test

~~More midnight issue ideas~~

We could keep a list of (convincing) generated words per category and create a sort of 'which word is a real word' quiz, where either

User is presented with two words, has to choose the real one
User is presented with a single word, must choose the class of that word

How to evaluate models?

Create Flash Cards

Translation

Generated words are "in Dutch" and are probably most fun for Dutch speakers.
As a (parody of a) user, I am confused: should I be entering a definition in English?
In the future, we could provide the app in English/Dutch, depending on the language of the current model / user preference.

Different database models to persist different types of generated words to their own model (e.g. slang, place names, family names).

Each type of model should get their own associated database model, such that people can save their desired word to a different table.

sasafrass / straattaal Goto Github PK

straattaal's Issues

Recommend Projects

Recommend Topics

Recommend Org