Giter Site home page Giter Site logo

Architecture Insight about u-2-net HOT 3 OPEN

xuebinqin avatar xuebinqin commented on August 15, 2024 3
Architecture Insight

from u-2-net.

Comments (3)

xuebinqin avatar xuebinqin commented on August 15, 2024

Thanks for your efforts in exploring it. There are mainly two factors contributing to the segmentation of the tiny 'gaps' or fine 'structures': 1) relatively "high" resolution of the feature map, 2) global and local feature extraction capabilities of network modules. The first factor, relatively "high" resolution, is easy to understand. As we know, the higher the resolution is, the more details we can perceive. Although the resolution of 320x320 is not that high, the tiny structure is recognizable if you view the tiny structures in zoomed-in view. But when we focus on the tiny 'gaps' of fine 'structures', we usually ignore the important fact: we are actually inferencing the tiny gaps from a large scale/global contexture. For example, we can recognize the hairs because we know that they are hairs of a girl, if the face and body or the girl are covered by other stuffs, it will be difficult for us to recognize the hairs (it might not be a good example). So only high resolution are not enough and large or global information are important in segmenting these tiny 'gaps'. Most of other networks only have one receptive in each stage. Although they can also provide high resolution feature maps just before the prediction. But the global contexture info are missing because of small convolution filters in relatively high resolution feature maps. The RSU blocks in each stage of the encoder and decoder is able to achieve both global and local contexture information so that they enable the segmenting of tiny 'gaps'. Another advantage of the RSU blocks against PSP or inception-like blocks is that it downsamples the feature maps to achieve larger scale info and upsamples to recover the resolution. Both downsample and upsample operations are gradually conducted, which avoids degradation of features by drastic downsampling and upsampling.

from u-2-net.

bluesky314 avatar bluesky314 commented on August 15, 2024

High level information is needed to for low level processing to extract more fine features. You're right, this does happen in a regular unet but the operations are very far apart. This reminds me of cross-scale connections used in EfficientDet(biFPN) and Path Aggregation networks where the idea is very similar but implemented differently. The same global features are propagated whereas here, each block has its own local/global features. How do you think this would differ from that? How do you contrast this with HRNet which has concurrent multi-resolution pathways so all levels can talk to each other?

from u-2-net.

bluesky314 avatar bluesky314 commented on August 15, 2024

@Nathanua Would appreciate your thoughts

from u-2-net.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.