Comments (8)
Okay, I was curious and spent some time figuring it out...
The reshape operation
np_gray_img = np.reshape(gray_img, (1,256,32,1))
fails in your case. Try something like the following
np_gray_img = gray_img.T[None,:,:,None]
from ssd_detectors.
The code in the data_*.py
files is dataset specific and derives a GTUtility
class. Objects of the GTUtility class are only pickled to avoid the long preprocessing time of the datasets. The gt.mat
file is specific to the SyntText dataset.
The attributs image_names
, data
and text
of the GTUtility class are lists with as many elements as samples in the dataset.
image_names
contains strings of the image file names.
data
contains numpy arrays, where each row corresponds to a text instance and contains the vertices (x1, y1, x2, y2, x3, y3, x4, y4) of the oriented bounding box normalized by the image size, followd by a one hot encoding of the classification, which is in the text case always (0, 1).
text
contains lists with the text strings associated to the text instances and is used as ground truth for the recognition stage.
If you only want to do prediction, you can proceed as with the real world images in SL_predict.ipynb
from ssd_detectors.
Thank you for the detailed explanation. Let me try the predictions and get back to you if any issues.
Cheers,
Krishna
from ssd_detectors.
Another question, how do I test the CRNN part with my own data? I see that the pre-trained CRNN models use the .pkl code again. Is there any code to test with my own data similar to what you have, for cropping box prediction? (I understand that the CRNN takes the bounding boxes, cropped after detection by the first tbpp NN). I'm looking at all files in the repo, can't find any code yet.
Specifically, do I need to dump all the results of bounding-box detection into a .pkl, for the CRNN to predict, using?
Thanks.
Krishna
from ssd_detectors.
CRNN_train.ipynb
, SL_end2end_predict.ipynb
and sl_videotest.py
may be relevant for you.
In general, the input of the CRNN model is a batch of 32x256 grayscale images... The rest is up to you ;)
from ssd_detectors.
Hello,
I am trying to use your pretrained model of CRNN (with lstm or gru) to recognize text on my images. I am using images from ICDAR2015 scene text dataset. For this I am using a small code :
import numpy as np
import matplotlib.pyplot as plt
import os
import editdistance
import pickle
import time
from keras.optimizers import SGD, Adam
from keras.callbacks import ModelCheckpoint
from crnn_model import CRNN
from crnn_data import InputGenerator
from crnn_utils import decode
from ssd_training import Logger, ModelSnapshot
import cv2
from crnn_utils import alphabet87 as alphabet
##Model
input_width = 256
input_height = 32
batch_size = 128
input_shape = (input_width, input_height, 1)
model, model_pred = CRNN(input_shape, len(alphabet), gru=False)
experiment = 'crnn_lstm_synthtext'
path_to_weights = './checkpoints/201806162129_crnn_lstm_synthtext/weights.300000.h5'
#path_to_weights = './checkpoints/201806190711_crnn_gru_synthtext/weights.300000.h5'
model_pred.load_weights(path_to_weights)
path_to_cropped_text = "" # path to my cropped text
my_img = cv2.imread(path_to_cropped_text)
resized_img = cv2.resize(my_img, (256,32))
gray_img = cv2.cvtColor(resized_img, cv2.COLOR_BGR2GRAY)
np_gray_img = np.reshape(gray_img, (1,256,32,1))
prediction = model_pred.predict(np_gray_img)
##Decode predictions
chars = [alphabet[c] for c in np.argmax(prediction[0], axis=1)]
res_str = decode(chars)
Unfortunately, I am getting almost always the result as "N" as if my text is a letter N. I don't know why is this happening and maybe I am making a mistake on how to use your code.
My original image had shape (69, 256, 3) and then I resized it to be compatible with input shape and of course I changed it to grayscale. I checked the image after this transformation and the text is still pretty obvious (no distorsions) so I was wondering what I am doing wrong.
Any help is greatly appreciated!
from ssd_detectors.
Thanks a lot mate, that was the problem indeed. I am very curious as to why the operation you did (gray_img.T[None,:,:,None]) is different than my operation (np.reshape(gray_img, (1,256,32,1)). At the end they both give arrays with equal shapes (1, 256, 32, 1). Can you elaborate more on how are they different please? I am very curious!
from ssd_detectors.
OpenCV is not always as intuitive as it could be. However, the output of the cv functions has shape (32, 256) and you need the transpose.
For more details, please see the NumPy help.
from ssd_detectors.
Related Issues (20)
- TBPP model arbitrary input shape HOT 1
- encode/decode error for tbpp
- DSOD Low mAP HOT 1
- Target length zero error
- about environment set (tf version?) HOT 2
- SL_end2end_predict.ipynb: Model dimensions don't match that in weights file. HOT 4
- Request to add .pkl files to repo HOT 4
- DSODSL Output tensor format HOT 1
- CRNN output format HOT 1
- How to convert to tflite?
- SL_end2end_predict.ipynb fails on converting to .py with necessary modifications. HOT 9
- Light architectures for object detection HOT 5
- fit_generator for ssd training & Check loss during training HOT 1
- training with own dataset resize issue HOT 1
- While training, got <UnknownError: AttributeError: 'NoneType' object has no attribute 'shape'> HOT 1
- links to download models not working HOT 2
- TypeError: map() got an unexpected keyword argument 'deterministic' error HOT 3
- How to modify the size of anchor? HOT 4
- Metrics issue ? HOT 4
- Problem during with Crowd Human dataset HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ssd_detectors.