Comments (14)
They are logits instead of probabilities, softmax should be applied to get probability
from crnn.pytorch.
@hellbago why are you pred dim is 38 * 1 * 37?Mine is 26 * 1 * 37
from crnn.pytorch.
Hey @hellbago,
If l understand well for each prediction let's say
a----v--a-i-l-a-bb-l-e-- => available
when we print(preds)
we get the following vector
(0 ,.,.) =
-106.3455 -115.3943 -114.5584 ... -115.6788 -110.2145 -112.2794
(1 ,.,.) =
-67.3953 -92.3248 -92.7227 ... -88.7459 -81.5368 -88.8212
(2 ,.,.) =
-56.8008 -89.8197 -92.8852 ... -85.0180 -77.4713 -85.3732
...
(35,.,.) =
-38.8606 -79.6700 -81.3100 ... -71.8229 -57.3992 -68.8093
(36,.,.) =
-39.6410 -75.7699 -75.7648 ... -70.3662 -55.7602 -68.3655
(37,.,.) =
-45.2289 -77.6819 -77.0921 ... -73.9425 -59.1527 -70.6211
[torch.cuda.FloatTensor of size 38x1x37 (GPU 0)]
such that 37 is the length of `alphabet ="0123456789abcdefghijklmnopqrstuvwxyz"+blank `
My question is s follow :
since the prediction is available such that length(available) = 9
- what represents the following vectors for the prediction "available" ?
(0 ,.,.) = -106.3455 -115.3943 -114.5584 ... -115.6788 -110.2145 -112.2794
(1 ,.,.) = -67.3953 -92.3248 -92.7227 ... -88.7459 -81.5368 -88.8212
(2 ,.,.) = -56.8008 -89.8197 -92.8852 ... -85.0180 -77.4713 -85.3732
.
.
.
(37,.,.) = -45.2289 -77.6819 -77.0921 ... -73.9425 -59.1527 -70.6211
- l don't understand the dimension [torch.cuda.FloatTensor of size
38x1x37
(GPU 0)], what is 38 ?
how to read38x1x37
Thanks a lot @hellbago for your comment and answer .
from crnn.pytorch.
Hi @meijieru ,
the values of the vectors :
0 ,.,.) = -106.3455 -115.3943 -114.5584 ... -115.6788 -110.2145 -112.2794
(1 ,.,.) = -67.3953 -92.3248 -92.7227 ... -88.7459 -81.5368 -88.8212
(2 ,.,.) = -56.8008 -89.8197 -92.8852 ... -85.0180 -77.4713 -85.3732
.
.
.
(37,.,.) = -45.2289 -77.6819 -77.0921 ... -73.9425 -59.1527 -70.6211
represents the output of B-LSTM
, aren't they ?
If yes. Then these values are logits (inverse of sigmoid). CTC layer
take these values and apply softmax to get probabilities.
However l can't find where l can print these probabilites from CTC class
class _CTC(Function):
def forward(self, acts, labels, act_lens, label_lens):
is_cuda = True if acts.is_cuda else False
acts = acts.contiguous()
loss_func = warp_ctc.gpu_ctc if is_cuda else warp_ctc.cpu_ctc
grads = torch.zeros(acts.size()).type_as(acts)
minibatch_size = acts.size(1)
costs = torch.zeros(minibatch_size)
loss_func(acts,
grads,
labels,
label_lens,
act_lens,
minibatch_size,
costs)
self.grads = grads
self.costs = torch.FloatTensor([costs.sum()])
return self.costs
def backward(self, grad_output):
return self.grads, None, None, None
class CTCLoss(Module):
def __init__(self):
super(CTCLoss, self).__init__()
def forward(self, acts, labels, act_lens, label_lens):
"""
acts: Tensor of (seqLength x batch x outputDim) containing output from network
labels: 1 dimensional Tensor containing all the targets of the batch in one sequence
act_lens: Tensor of size (batch) containing size of each output sequence from the network
act_lens: Tensor of (batch) containing label length of each example
"""
_assert_no_grad(labels)
_assert_no_grad(act_lens)
_assert_no_grad(label_lens)
return _CTC()(acts, labels, act_lens, label_lens)
from crnn.pytorch.
Hey @hellbago ,
let me first thank you for your tag.
How did you get these discrete values when doing print(preds)
?
What do these discrete values represent ?
your tensor is of length 26 but it supposed to be 27 , alphabet= 26 + blank
Cheers
from crnn.pytorch.
Hi @ahmedmazari-dhatim. As @meijieru said in a previous comment, the values inside preds represent logits. Since they are negatives, they corresponds to probabilities less than 0.5. You can see these value, by debugging the code in demo.py and see the content of preds after the instruction 'preds=model(image)'. The tensor that I obtain has dimension 38X1X37. 37 is the dimension of the alphabet +1(blank), while 38 is the dimension of the sequence of feature vectors that are in input to the recurrent layers. I obtain 38 and not the standard 26 since a rescale the image size in order to keep the aspect ratio, so the width of the input image can be variable while the height is fixed to 32
from crnn.pytorch.
Hi @hellbago ,
Thank you for your answer.
-
So 38 represents the output of CNN and the input of RNN, so what is the dimension of your input image to CNN ?
-
I get stuck at understanding the meaning of the vector prediction
let's say that our model predict the following :
a----v--a-i-l-a-bb-l-e-- => available
How can l read these values according to the predicted value available
38x1x37
(0 ,.,.) =
-106.3455 -115.3943 -114.5584 ... -115.6788 -110.2145 -112.2794
.
.
(37,.,.) =
-45.2289 -77.6819 -77.0921 ... -73.9425 -59.1527 -70.6211
for instance how can l read the first value of ` (0 ,.,.) = -106.3455` and ```
(37,.,.) =
-45.2289
according to available
. Are the predicted values for each character ?
a
v
a
i
l
a
b
l
e
Thank you again
from crnn.pytorch.
Hi @ahmedmazari-dhatim ,
I have a little idea about working and implementation of CTCloss. But by reading comments of @meijieru and @hellbago this is what I inferred.
Recognised word is - "a-----v--a-i-l-a-bb-l-e--- => available"
alphabet is - "'0123456789abcdefghijklmnopqrstuvwxyz" with first position being the empty space "-".
The preds variable stores - some random data. But, len(preds) is always 26 and so is
len(a-----v--a-i-l-a-bb-l-e---). This is true for every other word.
RECOGNISED WORD LENGTH IS ALWAYS 26.
-
Now, have a look at preds[0][0] - It will give you 37 different numbers. Important is one which is highest. For preds[0][0] the highest number is -88.9130 which is at 12th position in this list. Hence in recognized word first character is 12th character of alphabet '0123456789abcdefghijklmnopqrstuvwxyz" (space being the first character of alphabet) which is 'a'.
-
consider 7th character of "a-----v--a-i-l-a-bb-l-e---" which is 'v'. Have a look at preds[7][0]. The maximum number is the 32nd number. Hence, 7th character in recognized string is the 32nd character of alphabet which is "v".
To get the probabilities of each recognized characters ("a","v","a","i","l","a","b","l","e") I made the following changes in my demo.py. I calculated the softmax and multiplied by 100 to increase precision.
Note - here I am calculating 100 times probability value.
m = torch.nn.Softmax()
`model.eval()
preds = model(image)
temp = preds
_, preds = preds.max(2)
preds = preds.squeeze(2)
preds = preds.transpose(1, 0).contiguous().view(-1)
preds_size = Variable(torch.IntTensor([preds.size(0)]))
raw_pred = converter.decode(preds.data, preds_size.data, raw=True)
sim_pred = converter.decode(preds.data, preds_size.data, raw=False)
print('%-20s => %-20s' % (raw_pred, sim_pred))
#print('after dict - ' + spellchecker.suggest(sim_pred)[0])
arr = preds.data.numpy()
for i in range(0,len(temp)):
if arr[i] != 0:
prob = torch.max(m(temp[i])*100000)
print(prob)`
But, I am getting very high probability values. Can you please guide me If my approach is correct?
from crnn.pytorch.
Hi @random123user ,
Thanks a lot for your answer. l give a try and let you now (about your code) .
l have a question for you :
we have len(a-----v--a-i-l-a-bb-l-e---)
=26
we assume that preds
give 37 different numbers. Important is one which is highest. Then :
from
preds[0, : ] up to preds[25, : ]
it returns the values (highest at each vector) that maps the value a-----v--a-i-l-a-bb-l-e---
- What about the remaining
preds[26,:] up to preds[36,:]
? the length of the word is 26. @meijieru - You said that : "len(preds) is always 26", l am not sure about that. l can have a word with length 35, 42, 50..whatever . how do you deal with words with these lengths ?
Thanks a lot
from crnn.pytorch.
Hey @ahmedmazari-dhatim ,
Thanks for the reply. I really have no idea what will happen if the length of the word increases beyond 26.
I am trying to use this code to improve accuracy in ICDAR 2015 recognition dataset. But, the data set contains some vertical and inverted words. So, for inverted text, my initial approach was to get the recognition confidence value in two possible rotations (0 deg and 180 deg), compare them and determine the correct orientation.
But since confidence values are close I am not able to get any comparison criteria in them. Sorry for asking a different question than the discussed one, but can you please tell me if there is any way to do this comparison. Or is there any other repository which deals with the problem of vertical/inverted text recognition?
from crnn.pytorch.
Hey @random123user @hellbago @meijieru ,
Starting from your question, l think the most trivial way to do that is to apply rotation in order to get the sequence in horizontal format respecting the height= 32.
Otherwise, you have to adapt the CRNN to any height variable.
Coming back to my first question :
Even if the length of word is less or equal to 26 , l 'm wondering What the remaining preds[26,:] up to preds[36,:] represents ?
my question is related to your answer as follow :
**```
Now, have a look at preds[0][0] - It will give you 37 different numbers. Important is one which is highest. For preds[0][0] the highest number is -88.9130 which is at 12th position in this list. Hence in recognized word first character is 12th character of alphabet '0123456789abcdefghijklmnopqrstuvwxyz" (space being the first character of alphabet) which is 'a'.
consider 7th character of "a-----v--a-i-l-a-bb-l-e---" which is 'v'. Have a look at preds[7][0]. The maximum number is the 32nd number. Hence, 7th character in recognized string is the 32nd character of alphabet which is "v".
l would be very grateful if you can add extra information and correct me if l'm wrong.
Thank you
from crnn.pytorch.
Hi @meijieru ,
Why the logits are all negative ? it means that the most high probability value we can get is 0.5 wich matches with logit= 0. It's not so low to get the most probable sequence at 0.5 ?
@hellbago , @random123user :
for ctcloss please look at this tutorial . It explains well how ctc works
https://github.com/SeanNaren/warp-ctc/blob/pytorch_bindings/torch_binding/TUTORIAL.md
Thank you
from crnn.pytorch.
to answer your question about why you got high proba l tried your code l got the following
what's wrong with my model
m=torch.nn.Softmax()
model.eval()
preds = model(image)
temps=preds.cpu()
prob=torch.max(m(temps)*100)
error with prob variable
assert input.dim() == 2, 'Softmax requires a 2D tensor as input'
AssertionError: Softmax requires a 2D tensor as input
or you used :
preds = model(image)
preds=preds[:,0,:] # to get a two d vector ?
temps=preds
from crnn.pytorch.
By the way, where are the values that controls the number of output labels?
For clarification, I understand that the alphabet has 37 labels.
I am curious as to why 26 outputs are given by the system on the default configuration.
from crnn.pytorch.
Related Issues (20)
- loss值是特别大,但是还不知道那么下降,怎么解决这个问题?
- Height of image to 1
- Why My CTC Loss is always Nan or Inf value? HOT 6
- 出现报错RuntimeError: CUDA error: an illegal memory access was encountered HOT 3
- 不收敛,loss下降到15左右就震荡不动了 HOT 3
- What is the difference between warp_ctc_pytorch.ctcloss and torch.nn.ctcloss? HOT 3
- 训练icdar2015
- 训练自己的数据时报错:an illegal memory access was encountered HOT 1
- Utility of method oneHot in utils.py
- Is it normal for the model to predict the same string for the whole train batch and val?
- raise UnidentifiedImageError( PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x0000024631A5E360> HOT 1
- CRNN过拟合求解The Problem Of Overfitting
- The Problem Of Over fitting过拟合问题
- The loss occurred in Nan's case,What is the reason? HOT 1
- 关于模型结构最后两层的BN层 HOT 1
- Why is the pooling layer in the network different from the 1 * 2 pooling in the original paper?
- thi project support python3?
- 训练数据集用的是什么??? HOT 1
- RuntimeError: CUDA error: an illegal memory access was encountered HOT 1
- during multi-language model training, accuracy of English/numbers is worse than Chinese characters, how to improve?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crnn.pytorch.