gagan3012 / keytotext Goto Github PK

View Code? Open in Web Editor NEW

439.0 14.0 60.0 4.14 MB

Keywords to Sentences

Home Page: https://share.streamlit.io/gagan3012/keytotext/UI/app.py

License: MIT License

Jupyter Notebook 78.37% Python 21.21% Makefile 0.27% Dockerfile 0.15%

sentences keywords t5 huggingface-transformers nlp streamlit keytotext docker api

keytotext's Introduction

keytotext

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Potential use case can include:

Marketing
Search Engine Optimization
Topic generation etc.
Fine tuning of topic modeling models

Model:

Keytotext is based on the Amazing T5 Model:

k2t: Model
k2t-base: Model
mrm8488/t5-base-finetuned-common_gen (by Manuel Romero): Model

Training Notebooks can be found in the Training Notebooks Folder

Note: To add your own model to keytotext Please read Models Documentation

Usage:

Example usage:

Example Notebooks can be found in the Notebooks Folder

pip install keytotext

Trainer:

Keytotext now has a trainer class than be used to train and finetune any T5 based model on new data. Updated Trainer docs here: Docs

Trainer example here:

from keytotext import trainer

UI:

pip install streamlit-tags

This uses a custom streamlit component built by me: GitHub

API:

The API is hosted in the Docker container and it can be run quickly. Follow instructions below to get started

docker pull gagan30/keytotext

docker run -dp 8000:8000 gagan30/keytotext

This will start the api at port 8000 visit the url below to get the results as below:

http://localhost:8000/api?data=["India","Capital","New Delhi"]

Note: The Hosted API is only available on demand

BibTex:

To quote keytotext please use this citation

@misc{bhatia, 
      title={keytotext},
      url={https://github.com/gagan3012/keytotext}, 
      journal={GitHub}, 
      author={Bhatia, Gagan}
}

References

https://github.com/Shivanandroy/simpleT5 (Shivanand Roy)
https://github.com/patil-suraj/question_generation (Suraj Patil)
https://github.com/MathewAlexander/T5_nlg (Mathew Alexander)

Articles about keytotext:

https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45 (Mathew Alexander)
Amazing Video by 1LittleCoder here: https://www.youtube.com/watch?v=I0iBzP-SxFY about keytotext
https://medium.com/mlearning-ai/generating-sentences-from-keywords-using-transformers-in-nlp-e89f4de5cf6b (Prakhar Mishra)

keytotext's People

Contributors

Stargazers

Watchers

Forkers

jrieke adbmd stjordanis vaibhavdih rajesh16702 c00renut daywatch nimesh0505 rakeshvarma-kasipeta joskid trendingtechnology sts-sadr hyunmu gdh756462786 himanshumoliya bondarchukb giacomofrisoni udaypratapyati ppijbb dbalabka stungkit heismart syedpeer riyadhctg techthiyanes connor-john claira-ai lamduykhang4869 rajivmehtaflex horetskyi bobycv06fpm fishguysword wangxuekui williamyorkl smartpimai anath2110benten amanrose22 aishwaryapisal9 hmcoder22 tonmoytalukder saied71 emanuelaboros holmes-lei galiph xmxoxo sagu12 zzrhh t1masavin liushuchun adinortey360 mkbeefcake suvakantasahoo gary-wf zhuifeng414 p8xtgdzy ax2l

keytotext's Issues

Create a pipeline for k2t

Eg: https://github.com/patil-suraj/question_generation/blob/master/pipelines.py

OSError - It seems like model does not exist anymore through their website

raise EnvironmentError(
OSError: gagan3012/k2t is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token or log in with huggingface-cli login and pass use_auth_token=True.

Finetune created sentences only.

Describe the solution you'd like
I'd love to be able to fine-tune the style/grammar of the resulting sentences without needing to have sentence and keyword pairs--only sentences.

I'm experimenting with using AI to create or modify quotes similar to famous, historic texts. For example, I'm currently working on fine-tuning T5 to convert text from the 17th-century to modern English. I'm doing this through datasets of KJV and modern Bible translation verse pairs. This is working very well, and preliminary models are on HuggingFace already.

Describe alternatives you've considered
I've considere:

trying to create or find a dataset where each Bible verse has keywords.
use other AI models to create keywords for each verse. Then, use the resulting dataset.

New models trainer

Language:

Model Link:

On HFhub

Comments:

Need more training power

Yestts

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

New TypeError: init() got an unexpected keyword argument 'progress_bar_refresh_rate'

I have imported the model and necessary libraries. I am getting the below error in google colab. I have used this model earlier also few months back and it was working fine. This is the new issue I am facing recently with the same code.

TypeError: init() got an unexpected keyword argument 'progress_bar_refresh_rate'

Imported libraries:

!pip install keytotext --upgrade
!sudo apt-get install git-lfs

from keytotext import trainer

Training Model:

model = trainer()
model.from_pretrained(model_name="t5-small")
model.train(train_df=df_train_final, test_df=df_test, batch_size=3, max_epochs=5,use_gpu=True)
model.save_model()

Have attached error screenshot

OS: Windows
Browser Chrome

Trainer typo + no grad_fn

Describe the bug

trainer() class contains keyword arguments that no longer exist in Pytorch Lightning Trainer class (such as gpus).
Even when these errors are corrected, attempting to train the model causes the following error: (TPU) Exception in device=TPU:6: element 0 of tensors does not require grad and does not have a grad_fn (GPU) Runtime Error: element 0 of tensors does not require grad and does not have a grad_fn

To Reproduce
Steps to reproduce the behavior:
Try to train the model.

Expected behavior
To train the model :)

Proper names

Is your feature request related to a problem? Please describe.
I am not sure if it would be a bug or merely a new feature, because i am not sure if you thought about this when building the model. Running some tests locally i found out that the KeyToText has issues to build a logical sentence when we pass a proper name (people names, like John or Paul) in the keywords. For example, if i pass the keywords [John, have, dog, cat], it builds "A cat and a dog are having a play date.". If i exchange John for man ([man, have, dog, cat]), it builds something much better ("A man has a cat and a dog").

Describe the solution you'd like
Maybe there is already an option which i was too dumb to find, but it would be nice if the model was able to be used with proper names as well.

Describe alternatives you've considered
Assuming that the model indeed has this issue, my first guess would be the training database, but i am no ML expert =)

Add Citations

Is your feature request related to a problem? Please describe.
Inspirations: https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Pipeline error on fresh install

Hi I'm getting this on a first run and fresh install

Global seed set to 42 Traceback (most recent call last): File "C:\Users\skint\PycharmProjects\spacynd2\testdata.py", line 1, in <module> from keytotext import pipeline File "C:\Users\skint\venv\lib\site-packages\keytotext\__init__.py", line 11, in <module> from .dataset import make_dataset File "C:\Users\skint\venv\lib\site-packages\keytotext\dataset.py", line 1, in <module> from cv2 import randShuffle ModuleNotFoundError: No module named 'cv2'

Local model problem

I downloaded the model and saved it on a local PC. By code:

namefolder= 'mrm8488-t5-base'
tokenizer = AutoTokenizer.from_pretrained('mrm8488/t5-base-finetuned-common_gen')
model = AutoModelForSeq2SeqLM.from_pretrained('mrm8488/t5-base-finetuned-common_gen')
tokenizer.save_pretrained("./"+namefolder)
model.save_pretrained("./"+namefolder)

I can not run the model. Error:
site-packages\keytotext\pipeline.py", line 75, in pipeline
task, list(SUPPORTED_TASKS.keys())
KeyError: "Unknown task E:\PH\_MODEL_TRANSFORMERS\mrm8488-t5-base, available tasks are ['k2t', 'k2t-base', 'mrm8488/t5-base-finetuned-common_gen', 'k2t- new']"

Thanks

Update Readme

Is your feature request related to a problem? Please describe.
THere are errors in the readme that need to be fixed

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Models in German

Language: ge

Model Link:

Comments:

Test issue

Test

Adding Trainer API for Keytotext

Is your feature request related to a problem? Please describe.
Adding the Keytotext trainer API

ValueError: transformers.models.auto.spec is None

'from keytotext import pipeline'

While running the above line, it is showing this error . "ValueError: transformers.models.auto.spec is None"

Inference API for Keytotext

Is your feature request related to a problem? Please describe.
It is difficult to host the UI on streamlit without API

Describe the solution you'd like
Inference API

"Oh no." ?

"Error running app. If this keeps happening, please file an issue."

Ok,...sure? I know nothing about this app.

Just saw your tweet, clicked the link to this repo, then clicked the link on the side. Got that message. Now what?

Chrome browser, Linux.

Importing Pipeline issue

Whenever I am trying to import pipeline after installing keytotext. I get the following error.
ValueError: transformers.models.auto.spec is None

Models in French

Language: fr

Model Link: TBA

Comments: TBA

Training notebook fails from pytorch-lightning "unexpected keyword argument"

Describe the bug
The given google colab notebook for the trainer fails

To Reproduce
Steps to reproduce the behavior:

Go to the Trainer Google Colab
Execute the cells
2nd cell fails at model.train(train_df=train_df[:100], test_df=test_df[:50], batch_size=2, max_epochs=3,use_gpu=True)
See error

Expected behavior
An initial execution to succeed.

Screenshots

Create Evaluation pipeline for k2t

Hi, I notice that given the same input keywords, across different runs, the generated text are the same, even setting different seeds by 'pl.seed_everything(..)'.

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Why is cv2 required?

keytotext/keytotext/dataset.py

Line 1 in 6f807b9

from cv2 import randShuffle

I'm using this framework to generate text from knowlege graph. Python interpreter keeps throwing "cv2 not installed" exception. Looks like the pip package doesn't contains cv2 as dependancy. I tried to delete this line in source code, the model works well. Is this line necessary for this project? Concerning about adding opencv to pip package? Thanks for your concern.

Create docs for keytotext

Is your feature request related to a problem? Please describe.
Create docs

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Remove k2t tiny

Describe the bug
k2t tiny is not producing good results and we will be removing it from k2t

How to achieve text after training model

@gagan3012 I have run the codes in your Trainer.ipynb file

The command in your collab is this

keywords=["ski", "mountain", "sky"]
model.predict(keywords)

But I wouldlike to predict text for the same.

How do I do it after training as you did.

Remove Notebooks and move to different repo

Describe the bug
Move the experimental notebooks to notebooks repo

support Chinese?

Adding new models to keytotext

Is your feature request related to a problem? Please describe.
Adding new models to keytotext: https://huggingface.co/mrm8488/t5-base-finetuned-common_gen

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Create Better UI

Is your feature request related to a problem? Please describe.
The current UI is not functional It needs to be fixed

Describe the solution you'd like
Better UI with a nicer design

Create Keytotext logo

TODO: Create logo

PyTorch Version Mismatch Error

Describe the bug
No package version specified in the configuration file. When using PyTorch>=1.8, cannot import name 'SAVE_STATE_WARNING' error will appear. After downgrading to PyTorch=1.4.0, I found the pre-trained model is not compatible with different versions. here

It seems not very friendly to 3090 users : )

but came across the following :

ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none)
ERROR: No matching distribution found for keytotext

My pip version is the latest. However, the above works just fine in colab. Please guide me through the fix?

gagan3012 / keytotext Goto Github PK

keytotext's Introduction

keytotext

Model:

Usage:

Trainer:

UI:

API:

BibTex:

References

Articles about keytotext:

keytotext's People

Contributors

Stargazers

Watchers

Forkers

keytotext's Issues

Language:

Model Link:

Comments:

TypeError: init() got an unexpected keyword argument 'progress_bar_refresh_rate'

Language: ge

Model Link:

Comments:

Language: fr

Model Link: TBA

Comments: TBA

Recommend Projects

Recommend Topics

Recommend Org