input size and crop about u-2-net HOT 10 CLOSED

xuebinqin commented on July 16, 2024

input size and crop

from u-2-net.

Comments (10)

xuebinqin commented on July 16, 2024

Thanks for your insightful question. In the training process, resizing the image to 320x320 is to achieve the training batch (within each batch the image s'spatial resolutions have to be the same). RandomCrop the 320x320 images to 288x288 is actually a dataset augmentation skill to obtain position invariant models. In the testing process, we need to have the same scale with the training stage, therefore, only resizing operation is used then. In the latest updates, we changed the bilinear upsampling by replacing the upsampling factor (e.g. 2 or 4) to upsampling target size (height, width). That means the upsampling operation will updample the source feature maps to the exact same size with the target feature maps regardless of the size of the target size (could be odd or even numbers), which avoids errors in concatenation. Therefore, the model now is able to handle arbitrary input size. You can give it a try by changing the RescaleT(320) to whatever you want. But the performance is not guaranteed as we mentioned in the tips.: the modification is to facilitate the retraining of our network on different datasets.

…

On May 18, 2020, at 1:00 PM, dinoByteBr ***@***.***> wrote: Thanks a lot for you awesome performing model! I'm wondering about scaling and random crop, for training you first scale and then crop to 288x288 and thus the tensor has this size (288), what role does then scaling play here and why you talk about 320x320 as input size instead of 288x288? RescaleT(320), RandomCrop(288), With your latest model update, upscaling supports different ratios, as it looks like for me, or is only squarish input supported or e.g. 640x480 as well? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#22>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADSGORLFE4PUUTG23WXBCDTRSGAT5ANCNFSM4NEKY46A>.

from u-2-net.

dinoByteBr commented on July 16, 2024

thanks for your detailed answer, I start to be afraid completely miss the point here, sorry if this question is too dumb.
as far as I see it, it doesn't matter with what size RescaleT is called in training, the inputs for net() is always the crop size (288) in net(inputs_v) -> crop happens after scale.

I understood crop is for data aug, but should't be then the test size the same as used for crop?

from u-2-net.

xuebinqin commented on July 16, 2024

"but should't be then the test size the same as used for crop?" RES: No. The networks are usually (theoretically) translation invariant but not scale invariant. The cropping mainly changes the translation. But it doesn't change the receptive fields. In both training and test, keeping the scaling consistent is necessary, while cropping isn't. Because most of the networks are not scale invariant. Besides, cropping in testing will introduce another problem. How can we achieve the complete prediction map of the whole input image in the testing process.

…

On Mon, May 18, 2020 at 3:39 PM dinoByteBr ***@***.***> wrote: thanks for your detailed answer, I start to be afraid completely miss the point here, sorry if this question is too dumb. as far as I see it, it doesn't matter with what size RescaleT is called in training, the inputs for net() is always the crop size (288) in net(inputs_v) -> crop happens after scale. I understood crop is for data aug, but should't be then the test size the same as used for crop? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADSGORNLKWQZMJCHGAWXVWTRSGTIJANCNFSM4NEKY46A> .

-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

from u-2-net.

dinoByteBr commented on July 16, 2024

thanks now all is clear and I could reproduce good results with arbitrary size!
Although I still wonder how the model can handle same objects which are just further away or closer, if its not scale invariant. Anyway, I don't bother you anymore, thanks again!

from u-2-net.

mgstar1021 commented on July 16, 2024

Thanks for your great works!

I am using this on iOS application and converted to MLModel . When I use MLModel, I want any arbitrary size for input but it seems to support only square size. for example portrait, landscape like (240 * 320, 320 * 300)

I am getting an error with arbitrary size. What is the solution? Is there a problem with converting?

from u-2-net.

xuebinqin commented on July 16, 2024

It is safer to resize all the input to 320x320, which will theoretically give better results. Since there are several downsample and upsample operations, your size may trigger some errors in that part. So it is good to show the error, otherwise we cann't give exact solutions.

…

On Mon, Jul 19, 2021 at 6:59 AM mgstar1021 ***@***.***> wrote: Thanks for your great works! I am using this on iOS application and converted to MLModel . When I use MLModel, I want any arbitrary size for input but it seems to support only square size. for example portrait, landscape like (240 * 320, 320 * 300) I am getting an error with arbitrary size. What is the solution? Is there a problem with converting? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADSGORN2ZYJNOR6XRGDYUA3TYOIHHANCNFSM4NEKY46A> .

-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

from u-2-net.

mgstar1021 commented on July 16, 2024

It is safer to resize all the input to 320x320, which will theoretically give better results. Since there are several downsample and upsample operations, your size may trigger some errors in that part. So it is good to show the error, otherwise we cann't give exact solutions.
…
On Mon, Jul 19, 2021 at 6:59 AM mgstar1021 @.***> wrote: Thanks for your great works! I am using this on iOS application and converted to MLModel . When I use MLModel, I want any arbitrary size for input but it seems to support only square size. for example portrait, landscape like (240 * 320, 320 * 300) I am getting an error with arbitrary size. What is the solution? Is there a problem with converting? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORN2ZYJNOR6XRGDYUA3TYOIHHANCNFSM4NEKY46A .
-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

Thanks for your reply. Is it better to use square image than portrait or landscape image which sets one side(height or width) to 320? Is it difficult to support it?

from u-2-net.

xuebinqin commented on July 16, 2024

It is not different to support arbitrary resolution. But our model are trained on 320x320 and it should give the best performance on that size.

…

On Mon, Jul 19, 2021 at 6:54 PM mgstar1021 ***@***.***> wrote: It is safer to resize all the input to 320x320, which will theoretically give better results. Since there are several downsample and upsample operations, your size may trigger some errors in that part. So it is good to show the error, otherwise we cann't give exact solutions. … <#m_7269284139668149187_> On Mon, Jul 19, 2021 at 6:59 AM mgstar1021 *@*.***> wrote: Thanks for your great works! I am using this on iOS application and converted to MLModel . When I use MLModel, I want any arbitrary size for input but it seems to support only square size. for example portrait, landscape like (240 * 320, 320 * 300) I am getting an error with arbitrary size. What is the solution? Is there a problem with converting? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#22 (comment) <#22 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORN2ZYJNOR6XRGDYUA3TYOIHHANCNFSM4NEKY46A . -- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/ Thanks for your reply. Is it better to use square image than portrait or landscape image which sets one side(height or width) to 320? Is it difficult to support it? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADSGORPP7AVIBLHRR6PY6BTTYQ4DDANCNFSM4NEKY46A> .

-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

from u-2-net.

rdutta1999 commented on July 16, 2024

"but should't be then the test size the same as used for crop?" RES: No. The networks are usually (theoretically) translation invariant but not scale invariant. The cropping mainly changes the translation. But it doesn't change the receptive fields. In both training and test, keeping the scaling consistent is necessary, while cropping isn't. Because most of the networks are not scale invariant. Besides, cropping in testing will introduce another problem. How can we achieve the complete prediction map of the whole input image in the testing process.
…
On Mon, May 18, 2020 at 3:39 PM dinoByteBr @.***> wrote: thanks for your detailed answer, I start to be afraid completely miss the point here, sorry if this question is too dumb. as far as I see it, it doesn't matter with what size RescaleT is called in training, the inputs for net() is always the crop size (288) in net(inputs_v) -> crop happens after scale. I understood crop is for data aug, but should't be then the test size the same as used for crop? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORNLKWQZMJCHGAWXVWTRSGTIJANCNFSM4NEKY46A .
-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

Hey, first of all, thank you for your work. Eagerly waiting to see your new paper (and model).

Regarding the fact that the input sizes are different for training (288x288 after cropping) and training (320x320 after resizing), you say that scaling has to be consistent since generally models are not scale-invariant.
This brings me to the following questions:-

Since cropping is being applied after resizing, the model gets 288x288 sized input images during training, whereas during testing, the input images are 320x320. Since the images are of different shapes, aren't their scales different (thus leading to a case of scale-variance)?
Wouldn't it be better to apply a percentage based cropping (to account for different dataset images) and then resizing them to 320x320? If this is done, the model would have the same input size (320x320) during both training and testing, thus keeping the scaling consistent.

Once again, thanks a lot for your work. This model is a godsend.
I have been using it for my own background removal module (trained on 720x720 images and L2 loss to predict an alpha matte).

from u-2-net.

xiemeilong commented on July 16, 2024

"but should't be then the test size the same as used for crop?" RES: No. The networks are usually (theoretically) translation invariant but not scale invariant. The cropping mainly changes the translation. But it doesn't change the receptive fields. In both training and test, keeping the scaling consistent is necessary, while cropping isn't. Because most of the networks are not scale invariant. Besides, cropping in testing will introduce another problem. How can we achieve the complete prediction map of the whole input image in the testing process.
…
On Mon, May 18, 2020 at 3:39 PM dinoByteBr @.***> wrote: thanks for your detailed answer, I start to be afraid completely miss the point here, sorry if this question is too dumb. as far as I see it, it doesn't matter with what size RescaleT is called in training, the inputs for net() is always the crop size (288) in net(inputs_v) -> crop happens after scale. I understood crop is for data aug, but should't be then the test size the same as used for crop? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORNLKWQZMJCHGAWXVWTRSGTIJANCNFSM4NEKY46A .
-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

Hey, first of all, thank you for your work. Eagerly waiting to see your new paper (and model).

Regarding the fact that the input sizes are different for training (288x288 after cropping) and training (320x320 after resizing), you say that scaling has to be consistent since generally models are not scale-invariant. This brings me to the following questions:-

Since cropping is being applied after resizing, the model gets 288x288 sized input images during training, whereas during testing, the input images are 320x320. Since the images are of different shapes, aren't their scales different (thus leading to a case of scale-variance)?

Wouldn't it be better to apply a percentage based cropping (to account for different dataset images) and then resizing them to 320x320? If this is done, the model would have the same input size (320x320) during both training and testing, thus keeping the scaling consistent.

Once again, thanks a lot for your work. This model is a godsend. I have been using it for my own background removal module (trained on 720x720 images and L2 loss to predict an alpha matte).

@xuebinqin I have the same doubts

from u-2-net.

input size and crop about u-2-net HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent