Giter Site home page Giter Site logo

Comments (14)

meijieru avatar meijieru commented on June 15, 2024 7

They are logits instead of probabilities, softmax should be applied to get probability

from crnn.pytorch.

wanhaipeng avatar wanhaipeng commented on June 15, 2024 1

@hellbago why are you pred dim is 38 * 1 * 37?Mine is 26 * 1 * 37

from crnn.pytorch.

ahmedmazari-dhatim avatar ahmedmazari-dhatim commented on June 15, 2024

Hey @hellbago,

If l understand well for each prediction let's say

a----v--a-i-l-a-bb-l-e-- => available
when we print(preds)
we get the following vector


(0 ,.,.) =
-106.3455 -115.3943 -114.5584 ... -115.6788 -110.2145 -112.2794

(1 ,.,.) =
-67.3953 -92.3248 -92.7227 ... -88.7459 -81.5368 -88.8212

(2 ,.,.) =
-56.8008 -89.8197 -92.8852 ... -85.0180 -77.4713 -85.3732
...

(35,.,.) =
-38.8606 -79.6700 -81.3100 ... -71.8229 -57.3992 -68.8093

(36,.,.) =
-39.6410 -75.7699 -75.7648 ... -70.3662 -55.7602 -68.3655

(37,.,.) =
-45.2289 -77.6819 -77.0921 ... -73.9425 -59.1527 -70.6211
[torch.cuda.FloatTensor of size 38x1x37 (GPU 0)]

such that 37 is the length of `alphabet ="0123456789abcdefghijklmnopqrstuvwxyz"+blank `

My question is s follow :
since the prediction is available such that length(available) = 9

  1. what represents the following vectors for the prediction "available" ?
(0 ,.,.) = -106.3455 -115.3943 -114.5584 ... -115.6788 -110.2145 -112.2794

(1 ,.,.) = -67.3953 -92.3248 -92.7227 ... -88.7459 -81.5368 -88.8212

(2 ,.,.) = -56.8008 -89.8197 -92.8852 ... -85.0180 -77.4713 -85.3732
.
.
.
(37,.,.) = -45.2289 -77.6819 -77.0921 ... -73.9425 -59.1527 -70.6211
  1. l don't understand the dimension [torch.cuda.FloatTensor of size 38x1x37 (GPU 0)], what is 38 ?
    how to read 38x1x37

Thanks a lot @hellbago for your comment and answer .

from crnn.pytorch.

ahmedmazari-dhatim avatar ahmedmazari-dhatim commented on June 15, 2024

Hi @meijieru ,
the values of the vectors :

0 ,.,.) = -106.3455 -115.3943 -114.5584 ... -115.6788 -110.2145 -112.2794

(1 ,.,.) = -67.3953 -92.3248 -92.7227 ... -88.7459 -81.5368 -88.8212

(2 ,.,.) = -56.8008 -89.8197 -92.8852 ... -85.0180 -77.4713 -85.3732
.
.
.
(37,.,.) = -45.2289 -77.6819 -77.0921 ... -73.9425 -59.1527 -70.6211

represents the output of B-LSTM, aren't they ?
If yes. Then these values are logits (inverse of sigmoid). CTC layer take these values and apply softmax to get probabilities.
However l can't find where l can print these probabilites from CTC class

class _CTC(Function):
    def forward(self, acts, labels, act_lens, label_lens):
        is_cuda = True if acts.is_cuda else False
        acts = acts.contiguous()
        loss_func = warp_ctc.gpu_ctc if is_cuda else warp_ctc.cpu_ctc
        grads = torch.zeros(acts.size()).type_as(acts)
        minibatch_size = acts.size(1)
        costs = torch.zeros(minibatch_size)
        loss_func(acts,
                  grads,
                  labels,
                  label_lens,
                  act_lens,
                  minibatch_size,
                  costs)
        self.grads = grads
        self.costs = torch.FloatTensor([costs.sum()])
        return self.costs

    def backward(self, grad_output):
        return self.grads, None, None, None


class CTCLoss(Module):
    def __init__(self):
        super(CTCLoss, self).__init__()

    def forward(self, acts, labels, act_lens, label_lens):
        """
        acts: Tensor of (seqLength x batch x outputDim) containing output from network
        labels: 1 dimensional Tensor containing all the targets of the batch in one sequence
        act_lens: Tensor of size (batch) containing size of each output sequence from the network
        act_lens: Tensor of (batch) containing label length of each example
        """
        _assert_no_grad(labels)
        _assert_no_grad(act_lens)
        _assert_no_grad(label_lens)
        return _CTC()(acts, labels, act_lens, label_lens)

from crnn.pytorch.

ahmedmazari-dhatim avatar ahmedmazari-dhatim commented on June 15, 2024

Hey @hellbago ,
let me first thank you for your tag.

How did you get these discrete values when doing print(preds) ?
What do these discrete values represent ?
your tensor is of length 26 but it supposed to be 27 , alphabet= 26 + blank

Cheers

from crnn.pytorch.

hellbago avatar hellbago commented on June 15, 2024

Hi @ahmedmazari-dhatim. As @meijieru said in a previous comment, the values inside preds represent logits. Since they are negatives, they corresponds to probabilities less than 0.5. You can see these value, by debugging the code in demo.py and see the content of preds after the instruction 'preds=model(image)'. The tensor that I obtain has dimension 38X1X37. 37 is the dimension of the alphabet +1(blank), while 38 is the dimension of the sequence of feature vectors that are in input to the recurrent layers. I obtain 38 and not the standard 26 since a rescale the image size in order to keep the aspect ratio, so the width of the input image can be variable while the height is fixed to 32

from crnn.pytorch.

ahmedmazari-dhatim avatar ahmedmazari-dhatim commented on June 15, 2024

Hi @hellbago ,
Thank you for your answer.

  1. So 38 represents the output of CNN and the input of RNN, so what is the dimension of your input image to CNN ?

  2. I get stuck at understanding the meaning of the vector prediction
    let's say that our model predict the following :
    a----v--a-i-l-a-bb-l-e-- => available

How can l read these values according to the predicted value available
38x1x37

(0 ,.,.) =
-106.3455 -115.3943 -114.5584 ... -115.6788 -110.2145 -112.2794
.
.
(37,.,.) =
-45.2289 -77.6819 -77.0921 ... -73.9425 -59.1527 -70.6211

for instance how can l read the first value of ` (0 ,.,.) = -106.3455` and  ```
(37,.,.) =
-45.2289

according to available . Are the predicted values for each character ?
a
v
a
i
l
a
b
l
e

Thank you again

from crnn.pytorch.

random123user avatar random123user commented on June 15, 2024

Hi @ahmedmazari-dhatim ,

I have a little idea about working and implementation of CTCloss. But by reading comments of @meijieru and @hellbago this is what I inferred.

Recognised word is - "a-----v--a-i-l-a-bb-l-e--- => available"
alphabet is - "'0123456789abcdefghijklmnopqrstuvwxyz" with first position being the empty space "-".

The preds variable stores - some random data. But, len(preds) is always 26 and so is
len(a-----v--a-i-l-a-bb-l-e---). This is true for every other word.

RECOGNISED WORD LENGTH IS ALWAYS 26.

  1. Now, have a look at preds[0][0] - It will give you 37 different numbers. Important is one which is highest. For preds[0][0] the highest number is -88.9130 which is at 12th position in this list. Hence in recognized word first character is 12th character of alphabet '0123456789abcdefghijklmnopqrstuvwxyz" (space being the first character of alphabet) which is 'a'.

  2. consider 7th character of "a-----v--a-i-l-a-bb-l-e---" which is 'v'. Have a look at preds[7][0]. The maximum number is the 32nd number. Hence, 7th character in recognized string is the 32nd character of alphabet which is "v".

To get the probabilities of each recognized characters ("a","v","a","i","l","a","b","l","e") I made the following changes in my demo.py. I calculated the softmax and multiplied by 100 to increase precision.
Note - here I am calculating 100 times probability value.

m = torch.nn.Softmax()

`model.eval()
preds = model(image)
temp = preds
_, preds = preds.max(2)

preds = preds.squeeze(2)
preds = preds.transpose(1, 0).contiguous().view(-1)

preds_size = Variable(torch.IntTensor([preds.size(0)]))
raw_pred = converter.decode(preds.data, preds_size.data, raw=True)
sim_pred = converter.decode(preds.data, preds_size.data, raw=False)
print('%-20s => %-20s' % (raw_pred, sim_pred))
#print('after dict - ' + spellchecker.suggest(sim_pred)[0])
arr = preds.data.numpy()
for i in range(0,len(temp)):
if arr[i] != 0:
prob = torch.max(m(temp[i])*100000)
print(prob)`

But, I am getting very high probability values. Can you please guide me If my approach is correct?

from crnn.pytorch.

ahmedmazari-dhatim avatar ahmedmazari-dhatim commented on June 15, 2024

Hi @random123user ,
Thanks a lot for your answer. l give a try and let you now (about your code) .

l have a question for you :
we have len(a-----v--a-i-l-a-bb-l-e---) =26
we assume that preds give 37 different numbers. Important is one which is highest. Then :
from
preds[0, : ] up to preds[25, : ] it returns the values (highest at each vector) that maps the value a-----v--a-i-l-a-bb-l-e---

  1. What about the remaining preds[26,:] up to preds[36,:] ? the length of the word is 26. @meijieru
  2. You said that : "len(preds) is always 26", l am not sure about that. l can have a word with length 35, 42, 50..whatever . how do you deal with words with these lengths ?

Thanks a lot

from crnn.pytorch.

random123user avatar random123user commented on June 15, 2024

Hey @ahmedmazari-dhatim ,

Thanks for the reply. I really have no idea what will happen if the length of the word increases beyond 26.

I am trying to use this code to improve accuracy in ICDAR 2015 recognition dataset. But, the data set contains some vertical and inverted words. So, for inverted text, my initial approach was to get the recognition confidence value in two possible rotations (0 deg and 180 deg), compare them and determine the correct orientation.

But since confidence values are close I am not able to get any comparison criteria in them. Sorry for asking a different question than the discussed one, but can you please tell me if there is any way to do this comparison. Or is there any other repository which deals with the problem of vertical/inverted text recognition?

from crnn.pytorch.

ahmedmazari-dhatim avatar ahmedmazari-dhatim commented on June 15, 2024

Hey @random123user @hellbago @meijieru ,

Starting from your question, l think the most trivial way to do that is to apply rotation in order to get the sequence in horizontal format respecting the height= 32.
Otherwise, you have to adapt the CRNN to any height variable.

Coming back to my first question :
Even if the length of word is less or equal to 26 , l 'm wondering What the remaining preds[26,:] up to preds[36,:] represents ?

my question is related to your answer as follow :

**```

Now, have a look at preds[0][0] - It will give you 37 different numbers. Important is one which is highest. For preds[0][0] the highest number is -88.9130 which is at 12th position in this list. Hence in recognized word first character is 12th character of alphabet '0123456789abcdefghijklmnopqrstuvwxyz" (space being the first character of alphabet) which is 'a'.

consider 7th character of "a-----v--a-i-l-a-bb-l-e---" which is 'v'. Have a look at preds[7][0]. The maximum number is the 32nd number. Hence, 7th character in recognized string is the 32nd character of alphabet which is "v".




l would be very grateful if you can add extra information and correct me if l'm wrong.

 Thank you

from crnn.pytorch.

ahmedmazari-dhatim avatar ahmedmazari-dhatim commented on June 15, 2024

Hi @meijieru ,

Why the logits are all negative ? it means that the most high probability value we can get is 0.5 wich matches with logit= 0. It's not so low to get the most probable sequence at 0.5 ?

@hellbago , @random123user :
for ctcloss please look at this tutorial . It explains well how ctc works

https://github.com/SeanNaren/warp-ctc/blob/pytorch_bindings/torch_binding/TUTORIAL.md

Thank you

from crnn.pytorch.

ahmedmazari-dhatim avatar ahmedmazari-dhatim commented on June 15, 2024

Hi @random123user

to answer your question about why you got high proba l tried your code l got the following

what's wrong with my model


m=torch.nn.Softmax()
  model.eval()
  preds = model(image)
  temps=preds.cpu()
  prob=torch.max(m(temps)*100)

error with prob variable

assert input.dim() == 2, 'Softmax requires a 2D tensor as input'
AssertionError: Softmax requires a 2D tensor as input

or you used :

preds = model(image)
preds=preds[:,0,:] # to get  a two d vector ?
temps=preds

from crnn.pytorch.

rohun-tripathi avatar rohun-tripathi commented on June 15, 2024

By the way, where are the values that controls the number of output labels?
For clarification, I understand that the alphabet has 37 labels.
I am curious as to why 26 outputs are given by the system on the default configuration.

from crnn.pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.