Giter Site home page Giter Site logo

input size and crop about u-2-net HOT 10 CLOSED

xuebinqin avatar xuebinqin commented on July 16, 2024
input size and crop

from u-2-net.

Comments (10)

xuebinqin avatar xuebinqin commented on July 16, 2024

from u-2-net.

dinoByteBr avatar dinoByteBr commented on July 16, 2024

thanks for your detailed answer, I start to be afraid completely miss the point here, sorry if this question is too dumb.
as far as I see it, it doesn't matter with what size RescaleT is called in training, the inputs for net() is always the crop size (288) in net(inputs_v) -> crop happens after scale.

I understood crop is for data aug, but should't be then the test size the same as used for crop?

from u-2-net.

xuebinqin avatar xuebinqin commented on July 16, 2024

from u-2-net.

dinoByteBr avatar dinoByteBr commented on July 16, 2024

thanks now all is clear and I could reproduce good results with arbitrary size!
Although I still wonder how the model can handle same objects which are just further away or closer, if its not scale invariant. Anyway, I don't bother you anymore, thanks again!

from u-2-net.

mgstar1021 avatar mgstar1021 commented on July 16, 2024

Thanks for your great works!

I am using this on iOS application and converted to MLModel . When I use MLModel, I want any arbitrary size for input but it seems to support only square size. for example portrait, landscape like (240 * 320, 320 * 300)

I am getting an error with arbitrary size. What is the solution? Is there a problem with converting?

from u-2-net.

xuebinqin avatar xuebinqin commented on July 16, 2024

from u-2-net.

mgstar1021 avatar mgstar1021 commented on July 16, 2024

It is safer to resize all the input to 320x320, which will theoretically give better results. Since there are several downsample and upsample operations, your size may trigger some errors in that part. So it is good to show the error, otherwise we cann't give exact solutions.

On Mon, Jul 19, 2021 at 6:59 AM mgstar1021 @.***> wrote: Thanks for your great works! I am using this on iOS application and converted to MLModel . When I use MLModel, I want any arbitrary size for input but it seems to support only square size. for example portrait, landscape like (240 * 320, 320 * 300) I am getting an error with arbitrary size. What is the solution? Is there a problem with converting? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORN2ZYJNOR6XRGDYUA3TYOIHHANCNFSM4NEKY46A .
-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

Thanks for your reply. Is it better to use square image than portrait or landscape image which sets one side(height or width) to 320? Is it difficult to support it?

from u-2-net.

xuebinqin avatar xuebinqin commented on July 16, 2024

from u-2-net.

rdutta1999 avatar rdutta1999 commented on July 16, 2024

"but should't be then the test size the same as used for crop?" RES: No. The networks are usually (theoretically) translation invariant but not scale invariant. The cropping mainly changes the translation. But it doesn't change the receptive fields. In both training and test, keeping the scaling consistent is necessary, while cropping isn't. Because most of the networks are not scale invariant. Besides, cropping in testing will introduce another problem. How can we achieve the complete prediction map of the whole input image in the testing process.

On Mon, May 18, 2020 at 3:39 PM dinoByteBr @.***> wrote: thanks for your detailed answer, I start to be afraid completely miss the point here, sorry if this question is too dumb. as far as I see it, it doesn't matter with what size RescaleT is called in training, the inputs for net() is always the crop size (288) in net(inputs_v) -> crop happens after scale. I understood crop is for data aug, but should't be then the test size the same as used for crop? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORNLKWQZMJCHGAWXVWTRSGTIJANCNFSM4NEKY46A .
-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

Hey, first of all, thank you for your work. Eagerly waiting to see your new paper (and model).

Regarding the fact that the input sizes are different for training (288x288 after cropping) and training (320x320 after resizing), you say that scaling has to be consistent since generally models are not scale-invariant.
This brings me to the following questions:-

  1. Since cropping is being applied after resizing, the model gets 288x288 sized input images during training, whereas during testing, the input images are 320x320. Since the images are of different shapes, aren't their scales different (thus leading to a case of scale-variance)?
  2. Wouldn't it be better to apply a percentage based cropping (to account for different dataset images) and then resizing them to 320x320? If this is done, the model would have the same input size (320x320) during both training and testing, thus keeping the scaling consistent.

Once again, thanks a lot for your work. This model is a godsend.
I have been using it for my own background removal module (trained on 720x720 images and L2 loss to predict an alpha matte).

from u-2-net.

xiemeilong avatar xiemeilong commented on July 16, 2024

"but should't be then the test size the same as used for crop?" RES: No. The networks are usually (theoretically) translation invariant but not scale invariant. The cropping mainly changes the translation. But it doesn't change the receptive fields. In both training and test, keeping the scaling consistent is necessary, while cropping isn't. Because most of the networks are not scale invariant. Besides, cropping in testing will introduce another problem. How can we achieve the complete prediction map of the whole input image in the testing process.

On Mon, May 18, 2020 at 3:39 PM dinoByteBr @.***> wrote: thanks for your detailed answer, I start to be afraid completely miss the point here, sorry if this question is too dumb. as far as I see it, it doesn't matter with what size RescaleT is called in training, the inputs for net() is always the crop size (288) in net(inputs_v) -> crop happens after scale. I understood crop is for data aug, but should't be then the test size the same as used for crop? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORNLKWQZMJCHGAWXVWTRSGTIJANCNFSM4NEKY46A .
-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

Hey, first of all, thank you for your work. Eagerly waiting to see your new paper (and model).

Regarding the fact that the input sizes are different for training (288x288 after cropping) and training (320x320 after resizing), you say that scaling has to be consistent since generally models are not scale-invariant. This brings me to the following questions:-

  1. Since cropping is being applied after resizing, the model gets 288x288 sized input images during training, whereas during testing, the input images are 320x320. Since the images are of different shapes, aren't their scales different (thus leading to a case of scale-variance)?
  2. Wouldn't it be better to apply a percentage based cropping (to account for different dataset images) and then resizing them to 320x320? If this is done, the model would have the same input size (320x320) during both training and testing, thus keeping the scaling consistent.

Once again, thanks a lot for your work. This model is a godsend. I have been using it for my own background removal module (trained on 720x720 images and L2 loss to predict an alpha matte).

@xuebinqin I have the same doubts

from u-2-net.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.