I just run a very simple hg8 architecture, the log is as following. <div class="sn

Hi all <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

hello it looks like i also have same problem <div class="snippet-clipboard-content

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

validation acc drops down drastically after epoch 10,about bearpaw/pytorch-pose

Comments (21)

xingyizhou commented on May 28, 2024 8

Hi all @dongzhuoyao @gdwei @djangogo @salihkaragoz @Ben-Park @gdjmck @wisp5 @rockeyben,
Thanks for the report! In short, down-grading the pytorch version to 0.1.12 will resolve this bug. However it's not an elegant way to do so. I have investigated this bug for some time (See xingyizhou/pytorch-pose-hg-3d#16) but it is still unresolved. Recently I found this bug also occurs in other architecture besides HourglassNet (but still for dense output) on pytorch version > 0.1.12, while v0.1.12 always works fine. I also found a similar bug report at https://discuss.pytorch.org/t/model-eval-gives-incorrect-loss-for-model-with-batchnorm-layers/7561/2, which also use an MSE/L1 loss.

A natural conjecture is that the bug might come from a bug of pytorch BN implementation after 0.1.12 and might occur when applying a network with dense output and BN layer (but not reproducible). The bug might less likely from the data processing or hourglass implementation since this repo and mine implementation (https://github.com/xingyizhou/pytorch-pose-hg-3d/tree/2D) are independent. Please correct me if you have any counterexample or you have more progress on this bug. You are welcomed to discuss more about this bug with me by dropping me an email at [email protected] . Thanks!

from pytorch-pose.

gdjmck commented on May 28, 2024 5

I use the same pretrained model and test it with different testing batch size, and surprisingly it get very different precision rate. Smaller the testing batch size is, the higher precision rate is, with a batch size of 2, I got 80+% and only 50+% when batch size is 6 or 8. It is very abnormal to me, as I thought it shouldn't matter with the batch size.

from pytorch-pose.

xizero00 commented on May 28, 2024 3

Thanks @xingyizhou
I make it concrete.
for pytorch's high version (such as 0.4.0 or 0.4.1)
go to python's directory in your system
For windows:
and change the batch_norm in PYTHONDIR/Lib/site-packages/torch/nn/functional.py
For linux:
change the batch_norm in /usr/lib/python2.7/dist-packages/torch/nn/functional.py
or /usr/lib/python3.5/dist-packages/torch/nn/functional.py

def batch_norm(input, running_mean, running_var, weight=None, bias=None,
               training=False, momentum=0.1, eps=1e-5):
    r"""Applies Batch Normalization for each channel across a batch of data.

    See :class:`~torch.nn.BatchNorm1d`, :class:`~torch.nn.BatchNorm2d`,
    :class:`~torch.nn.BatchNorm3d` for details.
    """
    if training:
        size = list(input.size())
        if reduce(mul, size[2:], size[0]) == 1:
            raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
     return torch.batch_norm(
         input, weight, bias, running_mean, running_var,
         training, momentum, eps, torch.backends.cudnn.enabled
     )

change to

def batch_norm(input, running_mean, running_var, weight=None, bias=None,
               training=False, momentum=0.1, eps=1e-5):
    r"""Applies Batch Normalization for each channel across a batch of data.

    See :class:`~torch.nn.BatchNorm1d`, :class:`~torch.nn.BatchNorm2d`,
    :class:`~torch.nn.BatchNorm3d` for details.
    """
    if training:
        size = list(input.size())
        if reduce(mul, size[2:], size[0]) == 1:
            raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
    return torch.batch_norm(
        input, weight, bias, running_mean, running_var,
        training, momentum, eps, False
    )

hope it will help others.
and I also provide a bash script to patch pytorch automatically(if you use python 3.x, just change it )

#!/usr/bin/env bash
set -e
# patch for BN for pytorch after v0.1.12

PYPATH=/usr/local/lib/python2.7/dist-packages
PYTORCH_VERSION=`python -c "import torch as t; print(t.__version__)"`

if [ ! -e "${PYPATH}/torch/nn/functional.py.bak" ] && [ -e "${PYPATH}/torch/nn/functional.py" ]; then
    # backup
    sudo cp ${PYPATH}/torch/nn/functional.py ${PYPATH}/torch/nn/functional.py.bak
    # patch pytorch
    if [ "${PYTORCH_VERSION}" == "0.4.0" ]; then
        # for pytorch v0.4.0
        sudo sed -i "1194s/torch\.backends\.cudnn\.enabled/False/g" ${PYPATH}/torch/nn/functional.py
    elif [ "${PYTORCH_VERSION}" == "0.4.1" ]; then
        # for pytorch v0.4.1
        sudo sed -i "1254s/torch\.backends\.cudnn\.enabled/False/g" ${PYPATH}/torch/nn/functional.py
    fi
    echo "patch pytorch ${PYTORCH_VERSION} successfully"
else
    echo "You have patched the pytorch!"
fi

from pytorch-pose.

xingyizhou commented on May 28, 2024 2

Hi all,
As pointed by @leoxiaobin, turn off cudnn of BN layer resolves the issue. It can be realized by set torch.backends.cudnn.enabled = False in main.py, which disables cudnn for all layers and slows down the training by about 1.5x time, or re-build pytorch from source by hacking cudnn in BN layers https://github.com/pytorch/pytorch/blob/e8536c08a16b533fe0a9d645dd4255513f9f4fdd/aten/src/ATen/native/Normalization.cpp#L46 .

from pytorch-pose.

gdwei commented on May 28, 2024 1

Hey, I've met just the same problem with you. I tried to train a 2 stacks HG network with the original code and params. I guess this is cause by strong over-fitting, but I am not sure why overfitting occurs. Do you have any idea?
Here's the log:
log.txt

from pytorch-pose.

bearpaw commented on May 28, 2024

Hmm, seems this problem cannot be reproduced. Would you mind to train again and see whether everything goes well?

from pytorch-pose.

xizero00 commented on May 28, 2024

Hi, @gdwei @bearpaw
I have found the same problem in my side. I used the model having 8 hourglass modules

Epoch   LR      Train Loss      Val Loss        Train Acc       Val Acc
1.000000        0.000250        0.006231        0.008109        0.194155        0.332968
2.000000        0.000250        0.005188        0.006057        0.387743        0.477342
3.000000        0.000250        0.004838        0.005032        0.502596        0.584106
4.000000        0.000250        0.004606        0.004787        0.562090        0.629260
5.000000        0.000250        0.004426        0.004789        0.600115        0.638421
6.000000        0.000250        0.004286        0.004692        0.627019        0.674266
7.000000        0.000250        0.004173        0.004733        0.649596        0.681682
8.000000        0.000250        0.004089        0.005544        0.662832        0.644043
9.000000        0.000250        0.004001        0.005081        0.680730        0.703755
10.000000       0.000250        0.003925        0.005816        0.692677        0.705782
11.000000       0.000250        0.003865        0.005736        0.702876        0.713184
12.000000       0.000250        0.003804        0.007214        0.713316        0.689739
13.000000       0.000250        0.003744        0.009516        0.722215        0.716273
14.000000       0.000250        0.003682        0.016769        0.731847        0.655829
15.000000       0.000250        0.003640        0.026813        0.735956        0.637782
16.000000       0.000250        0.003587        0.033836        0.743873        0.287533
17.000000       0.000250        0.003552        0.055812        0.747483        0.110421
18.000000       0.000250        0.003506        0.090679        0.754163        0.026939
19.000000       0.000250        0.003469        0.246852        0.760248        0.052983
20.000000       0.000250        0.003439        0.478084        0.763902        0.020653

This is my log.

from pytorch-pose.

gdwei commented on May 28, 2024

@djangogo for me, setting a smaller learning rate could help, for example, you may set it to be around 1e-5, other tricks on adjusting learning rate could also be helpful.

from pytorch-pose.

salihkaragoz commented on May 28, 2024

hello it looks like i also have same problem

Processing |################################| (944/944) Data: 0.000173s | Batch: 0.141s | Total: 0:02:12 | ETA: 0:00:01 | Loss: 4.7316 | Acc:  0.0011

Epoch: 3 | LR: 0.00025000
Processing |################################| (22333/22333) Data: 0.001675s | Batch: 0.322s | Total: 2:39:51 | ETA: 0:00:01 | Loss: 0.0036 | Acc:  0.5417
Processing |################################| (944/944) Data: 0.000312s | Batch: 0.147s | Total: 0:02:18 | ETA: 0:00:01 | Loss: 5096.5776 | Acc:  0.0000

Epoch: 4 | LR: 0.00025000
Processing |################################| (22333/22333) Data: 0.001971s | Batch: 0.333s | Total: 2:39:24 | ETA: 0:00:01 | Loss: 0.0036 | Acc:  0.5267
Processing |################################| (944/944) Data: 0.000199s | Batch: 0.148s | Total: 0:02:19 | ETA: 0:00:01 | Loss: 53171.3798 | Acc:  0.0021

Epoch: 5 | LR: 0.00025000
Processing |################################| (22333/22333) Data: 0.000199s | Batch: 0.324s | Total: 2:40:41 | ETA: 0:00:01 | Loss: 0.0035 | Acc:  0.5406
Processing |################################| (944/944) Data: 0.000270s | Batch: 0.147s | Total: 0:02:18 | ETA: 0:00:01 | Loss: 166093.0824 | Acc:  0.0000

Epoch: 6 | LR: 0.00025000
Processing |################################| (22333/22333) Data: 0.001795s | Batch: 0.326s | Total: 2:39:21 | ETA: 0:00:01 | Loss: 0.0035 | Acc:  0.5556
Processing |################################| (944/944) Data: 0.000197s | Batch: 0.144s | Total: 0:02:16 | ETA: 0:00:01 | Loss: 808754.2019 | Acc:  0.0001

Epoch: 7 | LR: 0.00025000
Processing |################################| (22333/22333) Data: 0.002017s | Batch: 0.342s | Total: 2:40:26 | ETA: 0:00:01 | Loss: 0.0035 | Acc:  0.5202
Processing |################################| (944/944) Data: 0.000228s | Batch: 0.147s | Total: 0:02:18 | ETA: 0:00:01 | Loss: 377698.3226 | Acc:  0.0000

Epoch: 8 | LR: 0.00025000
Processing |##########                      | (7420/22333) Data: 0.002047s | Batch: 0.424s | Total: 0:53:10 | ETA: 1:47:54 | Loss: 0.0038 | Acc:  0.3522^CProcess Process-15:

from pytorch-pose.

dongzhuoyao commented on May 28, 2024

@gdwei could you report your result when you set lr=1e-5?

from pytorch-pose.

Ben-Park commented on May 28, 2024

In our lab., two researchers tried to run the codes, but only one researcher had the same problem. We have kept training over and over, but the result is similar. We utilized same code(without modifying anyting), and same data(copy the data from one's environment to another). Anybody has idea???

Env. generating problem
Ubuntu 16.04, python 2.7.13, PyTorch 0.3.0, openCV 3.3.0., 1080Ti

Env. no problem
Ubuntu 16.04, python 2.7.12, PyTorch 0.3.0., openCV 3.3.0., GTX TITAN

(added)
At the environment which makes problem, the hourglass successfully trained by learning rate 1e-4. Pls check.

from pytorch-pose.

rockeyben commented on May 28, 2024

Hey guys, I have the same problem too, if I finetune the 8 stack hg network to train a new dataset (about 30K images), the validation accuracy drops down dramaticly after only 4 epochs.

I use lr 2.5e-4

I am trying 1e-4, let's see what will happen

I use 2.5e-5 now, and I only restore the bottom 4 stacks to start the 8 stack training. It seems that overfitting still exists, but I think actually the problem is not that serious as the printed accuracy shows. I wonder the valid acc drops like a disaster because this code use a threshold when caculating accuracy

def dist_acc(dists, thr=0.5):
    ''' Return percentage below threshold while ignoring values with a -1 '''
    if dists.ne(-1).sum() > 0:
        return dists.le(thr).eq(dists.ne(-1)).sum()*1.0 / dists.ne(-1).sum()

from pytorch-pose.

Ben-Park commented on May 28, 2024

The solution "xingyizhou" metioned works for me. On the environment where Pytorch 0.4.0 makes problem, i reinstall Pytorch 0.2.0, and then the learning finally works at the end without any problem.

However, i still cannot understand why Pytorch 0.4.0 doesnt work with certain computing environment with 1080Ti, TITAN X.

from pytorch-pose.

wpeebles commented on May 28, 2024

I'm experiencing the same issue running the MPII example with PyTorch 0.3.1, a 1080 and with default parameters except stacks=1.

from pytorch-pose.

moizsaifee commented on May 28, 2024

def batch_norm(input, running_mean, running_var, weight=None, bias=None,
               training=False, momentum=0.1, eps=1e-5):
    r"""Applies Batch Normalization for each channel across a batch of data.

    See :class:`~torch.nn.BatchNorm1d`, :class:`~torch.nn.BatchNorm2d`,
    :class:`~torch.nn.BatchNorm3d` for details.
    """
    if training:
        size = list(input.size())
        if reduce(mul, size[2:], size[0]) == 1:
            raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
     return torch.batch_norm(
         input, weight, bias, running_mean, running_var,
         training, momentum, eps, torch.backends.cudnn.enabled
     )

change to

def batch_norm(input, running_mean, running_var, weight=None, bias=None,
               training=False, momentum=0.1, eps=1e-5):
    r"""Applies Batch Normalization for each channel across a batch of data.

    See :class:`~torch.nn.BatchNorm1d`, :class:`~torch.nn.BatchNorm2d`,
    :class:`~torch.nn.BatchNorm3d` for details.
    """
    if training:
        size = list(input.size())
        if reduce(mul, size[2:], size[0]) == 1:
            raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
    return torch.batch_norm(
        input, weight, bias, running_mean, running_var,
        training, momentum, eps, False
    )

hope it will help others.
and I also provide a bash script to patch pytorch automatically(if you use python 3.x, just change it )

#!/usr/bin/env bash
set -e
# patch for BN for pytorch after v0.1.12

PYPATH=/usr/local/lib/python2.7/dist-packages
PYTORCH_VERSION=`python -c "import torch as t; print(t.__version__)"`

if [ ! -e "${PYPATH}/torch/nn/functional.py.bak" ] && [ -e "${PYPATH}/torch/nn/functional.py" ]; then
    # backup
    sudo cp ${PYPATH}/torch/nn/functional.py ${PYPATH}/torch/nn/functional.py.bak
    # patch pytorch
    if [ "${PYTORCH_VERSION}" == "0.4.0" ]; then
        # for pytorch v0.4.0
        sudo sed -i "1194s/torch\.backends\.cudnn\.enabled/False/g" ${PYPATH}/torch/nn/functional.py
    elif [ "${PYTORCH_VERSION}" == "0.4.1" ]; then
        # for pytorch v0.4.1
        sudo sed -i "1254s/torch\.backends\.cudnn\.enabled/False/g" ${PYPATH}/torch/nn/functional.py
    fi
    echo "patch pytorch ${PYTORCH_VERSION} successfully"
else
    echo "You have patched the pytorch!"
fi

def batch_norm(input, running_mean, running_var, weight=None, bias=None,
               training=False, momentum=0.1, eps=1e-5):
    r"""Applies Batch Normalization for each channel across a batch of data.

    See :class:`~torch.nn.BatchNorm1d`, :class:`~torch.nn.BatchNorm2d`,
    :class:`~torch.nn.BatchNorm3d` for details.
    """
    if training:
        size = list(input.size())
        if reduce(mul, size[2:], size[0]) == 1:
            raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
     return torch.batch_norm(
         input, weight, bias, running_mean, running_var,
         training, momentum, eps, torch.backends.cudnn.enabled
     )

change to

def batch_norm(input, running_mean, running_var, weight=None, bias=None,
               training=False, momentum=0.1, eps=1e-5):
    r"""Applies Batch Normalization for each channel across a batch of data.

    See :class:`~torch.nn.BatchNorm1d`, :class:`~torch.nn.BatchNorm2d`,
    :class:`~torch.nn.BatchNorm3d` for details.
    """
    if training:
        size = list(input.size())
        if reduce(mul, size[2:], size[0]) == 1:
            raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
    return torch.batch_norm(
        input, weight, bias, running_mean, running_var,
        training, momentum, eps, False
    )

hope it will help others.
and I also provide a bash script to patch pytorch automatically(if you use python 3.x, just change it )

#!/usr/bin/env bash
set -e
# patch for BN for pytorch after v0.1.12

PYPATH=/usr/local/lib/python2.7/dist-packages
PYTORCH_VERSION=`python -c "import torch as t; print(t.__version__)"`

if [ ! -e "${PYPATH}/torch/nn/functional.py.bak" ] && [ -e "${PYPATH}/torch/nn/functional.py" ]; then
    # backup
    sudo cp ${PYPATH}/torch/nn/functional.py ${PYPATH}/torch/nn/functional.py.bak
    # patch pytorch
    if [ "${PYTORCH_VERSION}" == "0.4.0" ]; then
        # for pytorch v0.4.0
        sudo sed -i "1194s/torch\.backends\.cudnn\.enabled/False/g" ${PYPATH}/torch/nn/functional.py
    elif [ "${PYTORCH_VERSION}" == "0.4.1" ]; then
        # for pytorch v0.4.1
        sudo sed -i "1254s/torch\.backends\.cudnn\.enabled/False/g" ${PYPATH}/torch/nn/functional.py
    fi
    echo "patch pytorch ${PYTORCH_VERSION} successfully"
else
    echo "You have patched the pytorch!"
fi

Huge Thanks for your fix. I was running into this issue as well, your fix seems to help

from pytorch-pose.

stickOverCarrot commented on May 28, 2024

hello everyone,I have a problem when i was training the model from scratch. The model have 2 stacks HG network with the original code and params.And the lr is 2.5e-4. Can somebody can help me to solve the problem?Thanks!

Epoch LR Train Loss Val Loss Train Acc Val Acc
1.000000 0.000250 0.000820 0.001804 0.012718 0.016598
2.000000 0.000250 0.000605 0.001752 0.023349 0.018282
3.000000 0.000250 0.000601 0.003543 0.026636 0.016567
4.000000 0.000250 0.000601 0.002009 0.030169 0.023730
5.000000 0.000250 0.000605 0.007554 0.021984 0.024761
6.000000 0.000250 0.000593 0.001323 0.021573 0.023211
6.000000 0.000250 0.000581 0.000750 0.030056 0.037204
7.000000 0.000250 0.000579 0.001724 0.062914 0.042289
8.000000 0.000250 0.000574 0.006119 0.078612 0.029493
9.000000 0.000250 0.000568 0.002073 0.092811 0.032917
10.000000 0.000250 0.000565 0.002764 0.103415 0.082355
11.000000 0.000250 0.000559 0.004456 0.118935 0.069083
12.000000 0.000250 0.000554 0.001235 0.136579 0.111532
13.000000 0.000250 0.000551 0.001291 0.157845 0.139160
14.000000 0.000250 0.000546 0.000833 0.172080 0.187071
15.000000 0.000250 0.000540 0.000677 0.188202 0.137926
16.000000 0.000250 0.000536 0.000822 0.204400 0.126236
17.000000 0.000250 0.000529 0.007549 0.223867 0.023203
18.000000 0.000250 0.000514 0.001865 0.248268 0.100743
19.000000 0.000250 0.000500 0.001187 0.281679 0.162482
20.000000 0.000250 0.000491 0.002932 0.311082 0.045916
21.000000 0.000250 0.000484 0.001115 0.335782 0.107354
22.000000 0.000250 0.000476 0.009399 0.356727 0.008605
23.000000 0.000250 0.000470 0.000646 0.369132 0.005161
24.000000 0.000250 0.000463 0.003118 0.386070 0.022706
25.000000 0.000250 0.000457 0.000577 0.399309 0.018974
26.000000 0.000250 0.000451 0.000582 0.417991 0.019046
27.000000 0.000250 0.000446 0.001388 0.432328 0.010517
28.000000 0.000250 0.000441 0.000769 0.444523 0.019780
29.000000 0.000250 0.000436 0.000739 0.456251 0.014726
30.000000 0.000250 0.000432 0.001276 0.469130 0.056828
31.000000 0.000250 0.000428 0.001579 0.478094 0.023356
32.000000 0.000250 0.000423 0.000569 0.491334 0.006764
33.000000 0.000250 0.000420 0.000907 0.499913 0.045504
34.000000 0.000250 0.000416 0.000600 0.508063 0.101544
35.000000 0.000250 0.000412 0.000581 0.516998 0.077281
36.000000 0.000250 0.000408 0.000618 0.525647 0.047941
37.000000 0.000250 0.000404 0.000635 0.534216 0.036322
38.000000 0.000250 0.000402 0.000749 0.539956 0.002505
39.000000 0.000250 0.000401 0.000553 0.542420 0.070047
40.000000 0.000250 0.000396 0.000551 0.552550 0.038140
41.000000 0.000250 0.000393 0.000577 0.558081 0.025742
42.000000 0.000250 0.000390 0.000560 0.564162 0.031878
43.000000 0.000250 0.000387 0.000560 0.569910 0.008823
44.000000 0.000250 0.000384 0.000576 0.575056 0.007596
45.000000 0.000250 0.000381 0.000549 0.581056 0.003241
46.000000 0.000250 0.000379 0.000550 0.584639 0.000000
47.000000 0.000250 0.000377 0.000561 0.589435 0.026145
48.000000 0.000250 0.000374 0.000548 0.593368 0.020744

from pytorch-pose.

DNALuo commented on May 28, 2024

I use the same pretrained model and test it with different testing batch size, and surprisingly it get very different precision rate. Smaller the testing batch size is, the higher precision rate is, with a batch size of 2, I got 80+% and only 50+% when batch size is 6 or 8. It is very abnormal to me, as I thought it shouldn't matter with the batch size.

Hi, do you resolve the issue? I am still suffering it from finetuning it and only gain around 60+% val accuracy.

from pytorch-pose.

gdjmck commented on May 28, 2024

I use the same pretrained model and test it with different testing batch size, and surprisingly it get very different precision rate. Smaller the testing batch size is, the higher precision rate is, with a batch size of 2, I got 80+% and only 50+% when batch size is 6 or 8. It is very abnormal to me, as I thought it shouldn't matter with the batch size.

Hi, do you resolve the issue? I am still suffering it from finetuning it and only gain around 60+% val accuracy.

Oh, it was something about the accuarcy method that it only return 1 or 0 for the batch sent in, so with small batch size you'll have higher probability to get them all right and get 1 otherwise you get 0.

from pytorch-pose.

DNALuo commented on May 28, 2024

I use the same pretrained model and test it with different testing batch size, and surprisingly it get very different precision rate. Smaller the testing batch size is, the higher precision rate is, with a batch size of 2, I got 80+% and only 50+% when batch size is 6 or 8. It is very abnormal to me, as I thought it shouldn't matter with the batch size.

Hi, do you resolve the issue? I am still suffering it from finetuning it and only gain around 60+% val accuracy.

Oh, it was something about the accuarcy method that it only return 1 or 0 for the batch sent in, so with small batch size you'll have higher probability to get them all right and get 1 otherwise you get 0.

So it is caused by your wrong evaluation code, not because of other reasons ?

from pytorch-pose.

gdjmck commented on May 28, 2024

I use the same pretrained model and test it with different testing batch size, and surprisingly it get very different precision rate. Smaller the testing batch size is, the higher precision rate is, with a batch size of 2, I got 80+% and only 50+% when batch size is 6 or 8. It is very abnormal to me, as I thought it shouldn't matter with the batch size.

Hi, do you resolve the issue? I am still suffering it from finetuning it and only gain around 60+% val accuracy.

Oh, it was something about the accuarcy method that it only return 1 or 0 for the batch sent in, so with small batch size you'll have higher probability to get them all right and get 1 otherwise you get 0.

So it is caused by your wrong evaluation code, not because of other reasons ?

Yeah, I used the evaluation code of the source code back then, maybe it conflicted with the PyTorch version and it worked all right when I fixed it. Did your performance on testset differs with different batch size?

from pytorch-pose.

DNALuo commented on May 28, 2024

I use the same pretrained model and test it with different testing batch size, and surprisingly it get very different precision rate. Smaller the testing batch size is, the higher precision rate is, with a batch size of 2, I got 80+% and only 50+% when batch size is 6 or 8. It is very abnormal to me, as I thought it shouldn't matter with the batch size.

Hi, do you resolve the issue? I am still suffering it from finetuning it and only gain around 60+% val accuracy.

Oh, it was something about the accuarcy method that it only return 1 or 0 for the batch sent in, so with small batch size you'll have higher probability to get them all right and get 1 otherwise you get 0.

So it is caused by your wrong evaluation code, not because of other reasons ?

Yeah, I used the evaluation code of the source code back then, maybe it conflicted with the PyTorch version and it worked all right when I fixed it. Did your performance on testset differs with different batch size?

Yes, but I am using other dataset to finetune hourglass model on that. My model works well on MPII but quite worse when finetuning on that dataset, at least a big difference with the results provided by other researchers. But their code is written in torch, so now I am searching for anything that can help.

from pytorch-pose.

validation acc drops down drastically after epoch 10 about pytorch-pose HOT 21 OPEN

Comments (21)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent