Giter Site home page Giter Site logo

gagan3012 / keytotext Goto Github PK

View Code? Open in Web Editor NEW
436.0 14.0 60.0 4.14 MB

Keywords to Sentences

Home Page: https://share.streamlit.io/gagan3012/keytotext/UI/app.py

License: MIT License

Jupyter Notebook 78.37% Python 21.21% Makefile 0.27% Dockerfile 0.15%
sentences keywords t5 huggingface-transformers nlp streamlit keytotext docker api

keytotext's Introduction

keytotext

pypi Version Downloads Open In Colab Streamlit App API Call Docker Call HuggingFace Documentation Status Code style: black CodeFactor

keytotext

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Potential use case can include:

  • Marketing
  • Search Engine Optimization
  • Topic generation etc.
  • Fine tuning of topic modeling models

Model:

Keytotext is based on the Amazing T5 Model: HuggingFace

  • k2t: Model
  • k2t-base: Model
  • mrm8488/t5-base-finetuned-common_gen (by Manuel Romero): Model

Training Notebooks can be found in the Training Notebooks Folder

Note: To add your own model to keytotext Please read Models Documentation

Usage:

Example usage: Open In Colab

Example Notebooks can be found in the Notebooks Folder

pip install keytotext

carbon (3)

Trainer:

Keytotext now has a trainer class than be used to train and finetune any T5 based model on new data. Updated Trainer docs here: Docs

Trainer example here: Open In Colab

from keytotext import trainer

carbon (6)

UI:

UI: Streamlit App

pip install streamlit-tags

This uses a custom streamlit component built by me: GitHub

image

API:

API: API Call Docker Call

The API is hosted in the Docker container and it can be run quickly. Follow instructions below to get started

docker pull gagan30/keytotext

docker run -dp 8000:8000 gagan30/keytotext

This will start the api at port 8000 visit the url below to get the results as below:

http://localhost:8000/api?data=["India","Capital","New Delhi"]

k2t_json

Note: The Hosted API is only available on demand

BibTex:

To quote keytotext please use this citation

@misc{bhatia, 
      title={keytotext},
      url={https://github.com/gagan3012/keytotext}, 
      journal={GitHub}, 
      author={Bhatia, Gagan}
}

References

Articles about keytotext:

keytotext's People

Contributors

anath2110benten avatar deepsource-autofix[bot] avatar deepsourcebot avatar gagan3012 avatar jrieke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

keytotext's Issues

Proper names

Is your feature request related to a problem? Please describe.
I am not sure if it would be a bug or merely a new feature, because i am not sure if you thought about this when building the model. Running some tests locally i found out that the KeyToText has issues to build a logical sentence when we pass a proper name (people names, like John or Paul) in the keywords. For example, if i pass the keywords [John, have, dog, cat], it builds "A cat and a dog are having a play date.". If i exchange John for man ([man, have, dog, cat]), it builds something much better ("A man has a cat and a dog").

Describe the solution you'd like
Maybe there is already an option which i was too dumb to find, but it would be nice if the model was able to be used with proper names as well.

Describe alternatives you've considered
Assuming that the model indeed has this issue, my first guess would be the training database, but i am no ML expert =)

New TypeError: __init__() got an unexpected keyword argument 'progress_bar_refresh_rate'

I have imported the model and necessary libraries. I am getting the below error in google colab. I have used this model earlier also few months back and it was working fine. This is the new issue I am facing recently with the same code.


TypeError: init() got an unexpected keyword argument 'progress_bar_refresh_rate'

Imported libraries:

!pip install keytotext --upgrade
!sudo apt-get install git-lfs

from keytotext import trainer

Training Model:

model = trainer()
model.from_pretrained(model_name="t5-small")
model.train(train_df=df_train_final, test_df=df_test, batch_size=3, max_epochs=5,use_gpu=True)
model.save_model()

Have attached error screenshot

  • OS: Windows
  • Browser Chrome
    Error

Create Better UI

Is your feature request related to a problem? Please describe.
The current UI is not functional It needs to be fixed

Describe the solution you'd like
Better UI with a nicer design

Finetune created sentences only.

Describe the solution you'd like
I'd love to be able to fine-tune the style/grammar of the resulting sentences without needing to have sentence and keyword pairs--only sentences.

I'm experimenting with using AI to create or modify quotes similar to famous, historic texts. For example, I'm currently working on fine-tuning T5 to convert text from the 17th-century to modern English. I'm doing this through datasets of KJV and modern Bible translation verse pairs. This is working very well, and preliminary models are on HuggingFace already.

Describe alternatives you've considered
I've considere:

  • trying to create or find a dataset where each Bible verse has keywords.
  • use other AI models to create keywords for each verse. Then, use the resulting dataset.

How to achieve text after training model

@gagan3012 I have run the codes in your Trainer.ipynb file

The command in your collab is this

keywords=["ski", "mountain", "sky"]
model.predict(keywords)

But I wouldlike to predict text for the same.

How do I do it after training as you did.

Adding new models to keytotext

Is your feature request related to a problem? Please describe.
Adding new models to keytotext: https://huggingface.co/mrm8488/t5-base-finetuned-common_gen

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Add Citations

Is your feature request related to a problem? Please describe.
Inspirations: https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Hi, I notice that given the same input keywords, across different runs, the generated text are the same, even setting different seeds by 'pl.seed_everything(..)'.

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Create docs for keytotext

Is your feature request related to a problem? Please describe.
Create docs

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Remove k2t tiny

Describe the bug
k2t tiny is not producing good results and we will be removing it from k2t

ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none)

Hi,

I tried to install keytotext via pip install keytotext --upgrade in local machine.

but came across the following :

ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none)
ERROR: No matching distribution found for keytotext

My pip version is the latest. However, the above works just fine in colab. Please guide me through the fix?

Add finetuning model to keytotext

Is your feature request related to a problem? Please describe.
Its difficult to use it without fine-tuning on new corpus so we need to build script to finetune it on new corpus

Local model problem

I downloaded the model and saved it on a local PC. By code:

namefolder= 'mrm8488-t5-base'
tokenizer = AutoTokenizer.from_pretrained('mrm8488/t5-base-finetuned-common_gen')
model = AutoModelForSeq2SeqLM.from_pretrained('mrm8488/t5-base-finetuned-common_gen')
tokenizer.save_pretrained("./"+namefolder)
model.save_pretrained("./"+namefolder)

I can not run the model. Error:
site-packages\keytotext\pipeline.py", line 75, in pipeline
task, list(SUPPORTED_TASKS.keys())
KeyError: "Unknown task E:\PH\_MODEL_TRANSFORMERS\mrm8488-t5-base, available tasks are ['k2t', 'k2t-base', 'mrm8488/t5-base-finetuned-common_gen', 'k2t- new']"

Thanks

"Oh no." ?

"Error running app. If this keeps happening, please file an issue."

Ok,...sure? I know nothing about this app.

Just saw your tweet, clicked the link to this repo, then clicked the link on the side. Got that message. Now what?

Chrome browser, Linux.

Inference API for Keytotext

Is your feature request related to a problem? Please describe.
It is difficult to host the UI on streamlit without API

Describe the solution you'd like
Inference API

Importing Pipeline issue

Whenever I am trying to import pipeline after installing keytotext. I get the following error.
ValueError: transformers.models.auto.spec is None

Training notebook fails from pytorch-lightning "unexpected keyword argument"

Describe the bug
The given google colab notebook for the trainer fails

To Reproduce
Steps to reproduce the behavior:

  1. Go to the Trainer Google Colab
  2. Execute the cells
  3. 2nd cell fails at model.train(train_df=train_df[:100], test_df=test_df[:50], batch_size=2, max_epochs=3,use_gpu=True)
  4. See error

Expected behavior
An initial execution to succeed.

Screenshots
image

Yestts

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Pipeline error on fresh install

Hi I'm getting this on a first run and fresh install

Global seed set to 42 Traceback (most recent call last): File "C:\Users\skint\PycharmProjects\spacynd2\testdata.py", line 1, in <module> from keytotext import pipeline File "C:\Users\skint\venv\lib\site-packages\keytotext\__init__.py", line 11, in <module> from .dataset import make_dataset File "C:\Users\skint\venv\lib\site-packages\keytotext\dataset.py", line 1, in <module> from cv2 import randShuffle ModuleNotFoundError: No module named 'cv2'

Why is cv2 required?

from cv2 import randShuffle

I'm using this framework to generate text from knowlege graph. Python interpreter keeps throwing "cv2 not installed" exception. Looks like the pip package doesn't contains cv2 as dependancy. I tried to delete this line in source code, the model works well. Is this line necessary for this project? Concerning about adding opencv to pip package? Thanks for your concern.

401 Client Error.

Describe the bug
image

Additional context
It just broke today, yesterday I used it normally

Trainer typo + no grad_fn

Describe the bug

  1. trainer() class contains keyword arguments that no longer exist in Pytorch Lightning Trainer class (such as gpus).
  2. Even when these errors are corrected, attempting to train the model causes the following error: (TPU) Exception in device=TPU:6: element 0 of tensors does not require grad and does not have a grad_fn (GPU) Runtime Error: element 0 of tensors does not require grad and does not have a grad_fn

To Reproduce
Steps to reproduce the behavior:
Try to train the model.

Expected behavior
To train the model :)

Update Readme

Is your feature request related to a problem? Please describe.
THere are errors in the readme that need to be fixed

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.