Giter Site home page Giter Site logo

Testing on my own data. about ssd_detectors HOT 8 CLOSED

mvoelk avatar mvoelk commented on July 30, 2024
Testing on my own data.

from ssd_detectors.

Comments (8)

mvoelk avatar mvoelk commented on July 30, 2024 2

Okay, I was curious and spent some time figuring it out...

The reshape operation

np_gray_img = np.reshape(gray_img, (1,256,32,1))

fails in your case. Try something like the following

np_gray_img = gray_img.T[None,:,:,None]

from ssd_detectors.

mvoelk avatar mvoelk commented on July 30, 2024

The code in the data_*.py files is dataset specific and derives a GTUtility class. Objects of the GTUtility class are only pickled to avoid the long preprocessing time of the datasets. The gt.mat file is specific to the SyntText dataset.

The attributs image_names, data and text of the GTUtility class are lists with as many elements as samples in the dataset.
image_names contains strings of the image file names.
data contains numpy arrays, where each row corresponds to a text instance and contains the vertices (x1, y1, x2, y2, x3, y3, x4, y4) of the oriented bounding box normalized by the image size, followd by a one hot encoding of the classification, which is in the text case always (0, 1).
text contains lists with the text strings associated to the text instances and is used as ground truth for the recognition stage.

If you only want to do prediction, you can proceed as with the real world images in SL_predict.ipynb

from ssd_detectors.

krish240574 avatar krish240574 commented on July 30, 2024

Thank you for the detailed explanation. Let me try the predictions and get back to you if any issues.
Cheers,
Krishna

from ssd_detectors.

krish240574 avatar krish240574 commented on July 30, 2024

Another question, how do I test the CRNN part with my own data? I see that the pre-trained CRNN models use the .pkl code again. Is there any code to test with my own data similar to what you have, for cropping box prediction? (I understand that the CRNN takes the bounding boxes, cropped after detection by the first tbpp NN). I'm looking at all files in the repo, can't find any code yet.

Specifically, do I need to dump all the results of bounding-box detection into a .pkl, for the CRNN to predict, using?
Thanks.
Krishna

from ssd_detectors.

mvoelk avatar mvoelk commented on July 30, 2024

CRNN_train.ipynb, SL_end2end_predict.ipynb and sl_videotest.py may be relevant for you.

In general, the input of the CRNN model is a batch of 32x256 grayscale images... The rest is up to you ;)

from ssd_detectors.

sniper0110 avatar sniper0110 commented on July 30, 2024

Hello,

I am trying to use your pretrained model of CRNN (with lstm or gru) to recognize text on my images. I am using images from ICDAR2015 scene text dataset. For this I am using a small code :

import numpy as np
import matplotlib.pyplot as plt
import os
import editdistance
import pickle
import time

from keras.optimizers import SGD, Adam
from keras.callbacks import ModelCheckpoint

from crnn_model import CRNN
from crnn_data import InputGenerator
from crnn_utils import decode
from ssd_training import Logger, ModelSnapshot
import cv2
from crnn_utils import alphabet87 as alphabet


##Model
input_width = 256
input_height = 32
batch_size = 128
input_shape = (input_width, input_height, 1)

model, model_pred = CRNN(input_shape, len(alphabet), gru=False)
experiment = 'crnn_lstm_synthtext'
path_to_weights = './checkpoints/201806162129_crnn_lstm_synthtext/weights.300000.h5'
#path_to_weights = './checkpoints/201806190711_crnn_gru_synthtext/weights.300000.h5'
model_pred.load_weights(path_to_weights)


path_to_cropped_text = "" # path to my cropped text

my_img = cv2.imread(path_to_cropped_text)
resized_img = cv2.resize(my_img, (256,32))
gray_img = cv2.cvtColor(resized_img, cv2.COLOR_BGR2GRAY)
np_gray_img = np.reshape(gray_img, (1,256,32,1))

prediction = model_pred.predict(np_gray_img)


##Decode predictions
chars = [alphabet[c] for c in np.argmax(prediction[0], axis=1)]
res_str = decode(chars)

Unfortunately, I am getting almost always the result as "N" as if my text is a letter N. I don't know why is this happening and maybe I am making a mistake on how to use your code.

My original image had shape (69, 256, 3) and then I resized it to be compatible with input shape and of course I changed it to grayscale. I checked the image after this transformation and the text is still pretty obvious (no distorsions) so I was wondering what I am doing wrong.

Any help is greatly appreciated!

from ssd_detectors.

sniper0110 avatar sniper0110 commented on July 30, 2024

Thanks a lot mate, that was the problem indeed. I am very curious as to why the operation you did (gray_img.T[None,:,:,None]) is different than my operation (np.reshape(gray_img, (1,256,32,1)). At the end they both give arrays with equal shapes (1, 256, 32, 1). Can you elaborate more on how are they different please? I am very curious!

from ssd_detectors.

mvoelk avatar mvoelk commented on July 30, 2024

OpenCV is not always as intuitive as it could be. However, the output of the cv functions has shape (32, 256) and you need the transpose.

For more details, please see the NumPy help.

from ssd_detectors.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.