raghakot / keras-text Goto Github PK
View Code? Open in Web Editor NEWText Classification Library in Keras
Home Page: https://raghakot.github.io/keras-text/
License: MIT License
Text Classification Library in Keras
Home Page: https://raghakot.github.io/keras-text/
License: MIT License
Hi,
Do you have any notebook examples for any real text dataset that I can refer?
Thanks.
Selva
There is an issue with calling following code in Python 3+:
self.embeddings_index.values()[0]
Reason
In Python 3, dict.values() does not return list and following error will be raised:
dict_values does not support indexing
Solution
The line should be updated in python 3+ as following
list(self.embeddings_index.values())[0].shape[-1]
In following files:
Maybe related to the whole bunch of python 3 issues around the repo, but a simple
from keras_text.models import TokenModelFactory
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-8-96307cb1e937> in <module>()
1 # Will automagically handle padding for models that require padding (Ex: Yoon Kim CNN)
----> 2 from keras_text.models import TokenModelFactory
3 from keras_text.models import YoonKimCNN, AttentionRNN, StackedRNN
4 factory = TokenModelFactory(1, tokenizer.token_index, max_tokens=100, embedding_type='glove.6B.100d')
5 word_encoder_model = YoonKimCNN()
~/miniconda/envs/deeplearn/lib/python3.6/site-packages/keras_text/models/__init__.py in <module>()
----> 1 from token_model import TokenModelFactory
2 from sentence_model import SentenceModelFactory
3 from sequence_encoders import *
keras-text/keras_text/models/__init__.py
Lines 1 to 3 in b74247a
__init__.py
of models
from .token_model import TokenModelFactory
from .sentence_model import SentenceModelFactory
from .sequence_encoders import *
Importing Datasets leads to error.
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-35-a263e6a09a43> in <module>()
5 warnings.filterwarnings('ignore')
6
----> 7 from keras_text.data import Dataset
~/miniconda/envs/deeplearn/lib/python3.6/site-packages/keras_text/data.py in <module>()
4 import numpy as np
5
----> 6 from .import utils
7 from .import sampling
8
~/miniconda/envs/deeplearn/lib/python3.6/site-packages/keras_text/utils.py in <module>()
3 import numpy as np
4 import pickle
----> 5 import joblib
6 import jsonpickle
7
ModuleNotFoundError: No module named 'joblib'
After running this following code, I receive 'ModuleNotFoundError: No module named 'token_model''
`
with open('tweets1k.txt', 'r') as infile:
tweets = infile.readlines()
tokenizer = WordTokenizer()
tokenizer.build_vocab(tweets)
ds = Dataset(tweets, emojis, tokenizer=tokenizer)
ds.update_test_indices(test_size=0.2)
ds.save('dataset')
factory = TokenModelFactory(1, tokenizer.token_index, max_tokens=100, embedding_type='glove.6B.100d')
word_encoder_model = YoonKimCNN()
model = factory.build_model(token_encoder_model=word_encoder_model)
model.compile(optimizer='adam', loss='categorical_crossentropy')
model.summary()
`
How to solve that, please.
TypeError Traceback (most recent call last)
<ipython-input-41-09a65e684086> in <module>()
4 print(texts[0])
5 tokenizer = WordTokenizer()
----> 6 tokenizer.build_vocab(texts)
7
8 #ds = Dataset(X, y, tokenizer=tokenizer)
~/miniconda/envs/deeplearn/lib/python3.6/site-packages/keras_text/processing.py in build_vocab(self, texts, verbose, **kwargs)
381 count_tracker.finalize()
382 self._counts = count_tracker.counts
--> 383 progbar.update(len(texts), force=True)
384
385 def get_counts(self, i):
TypeError: update() got an unexpected keyword argument 'force'
I couldn't find any repos for install with conda install. Anyone were able to install it fro Anaconda environment?
Hi guys,
I have successfully trained a classifier with attention with context mecanism, but i 'm struggling with the way to call the function get_attention_tensor . Do you have any clues in order to make it work ?
Thanks !
Léo
@raghakot , can you add a example for training the model (any model) so that it will be easy for us to figure out the necessary steps. Thanks.
Hi,
A silly question, but I'm following along with the tutorial building the model, but I'm having trouble trying to perform inference with new data.
For example, if trained on an IMDB dataset with 0/1 labels, I want to infer/fit a new sentence and make use of the model but I want to do it in a proper way. Right now I'm taking the raw text ("I loved this movie.") and feeding it to the tokenizer.encode_texts() method then using the tokenizer's embeddings_index to attach the embeddings, etc....
But I'm pretty sure I'm doing it wrong. I was wondering if there is an example on doing out of sample inference after training the model. Thank you!
3 tokenizer = SentenceWordTokenizer()
----> 4 tokenizer.build_vocab(X)
~/anaconda3/envs/msa/lib/python3.6/site-packages/keras_text/processing.py in build_vocab(self, texts, verbose, **kwargs)
381 count_tracker.finalize()
382 self._counts = count_tracker.counts
--> 383 progbar.update(len(texts), force=True)
384
385
TypeError: update() got an unexpected keyword argument 'force'
Hi,
I use the packages to correctly train a model, now the question is how can I use the trained model to make predictions? I give the input of my training and testing sets as an array of lists of strings:
test_x = ['cat fat hat', 'lorem ipsum pretorium', ... ,'this is a list']
The Dataset routine essentially also creates numpy arrays of the lists of strings. Thus it should work with similar lists of arrays? I tried to use the model.predist(test_x), but I get a returned an error of:
Error when checking input: expected input_4 to have 3 dimensions, but got array with shape (20000, 1)
Any advice?
I get a ValueError when I try to make a dataset with strings as input. I want to assign 1 out of 5 classes to each string. I get this error:
ValueError: Found input variables with inconsistent numbers of samples: [21643, 108215]
even though my labels array has shape 21643 just like the shape of my input array.
When I change to a two class problem there is nog problem.
Using this code:
from keras_text.processing import WordTokenizer
tokenizer = WordTokenizer()
tokenizer.build_vocab(["this is a text", "an other "])
I get an error:
ypeError Traceback (most recent call last)
<ipython-input-12-a4643a71418a> in <module>()
1 from keras_text.processing import WordTokenizer
2 tokenizer = WordTokenizer()
----> 3 tokenizer.build_vocab(["this is a text hello", "an other "])
~/venvs/srPrimaryPredFull/lib/python3.6/site-packages/keras_text-0.1-py3.6.egg/keras_text/processing.py in build_vocab(self, texts, verbose, **kwargs)
367 self._num_texts = len(texts)
368
--> 369 for token_data in self.token_generator(texts, **kwargs):
370 indices, token = token_data[:-1], token_data[-1]
371 count_tracker.update(indices)
~/venvs/srPrimaryPredFull/lib/python3.6/site-packages/keras_text-0.1-py3.6.egg/keras_text/processing.py in token_generator(self, texts, **kwargs)
549 }
550
--> 551 for text_idx, doc in enumerate(nlp.pipe(texts, **kwargs)):
552 for word in doc:
553 processed_word = self._apply_options(word)
TypeError: pipe() got an unexpected keyword argument 'entity'
It seem to me that the code is not compatible with spacy 2.0.3, the latest version
Calling the function "apply_encoding_options" using Python3 raise following error:
AttributeError: 'filter' object has no attribute 'sort'
Reason:
Following two line cause the issue:
token_counts = filter(lambda x: x[1] >= min_token_count, token_counts) token_counts.sort(key=lambda x: x[1], reverse=True)
In python 3 filter function return object filter
In python 2.7 filter function return list (the code working correctly here)
Suggested Solution:
Edit function 'apply_encoding_options' inside 'processing.py' to order and filter without using filter object as following:
token_counts = sorted((x for x in token_counts if x[1] >= min_token_count), reverse=True, key=lambda x: x[1])
I run these libraries :
import torch
import matplotlib.pyplot as plt
import numpy as np
import argparse
import pickle
import os
from torchvision import transforms
from build_vocab import Vocabulary
from model import EncoderCNN, DecoderRNN
from PIL import Image
ModuleNotFoundError Traceback (most recent call last)
in ()
6 import os
7 from torchvision import transforms
----> 8 from build_vocab import Vocabulary
9 from model import EncoderCNN, DecoderRNN
10 from PIL import Image
ModuleNotFoundError: No module named 'build_vocab'
So please anyone find the solution for this error
Thanks !
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.