Hi there and great work! I've actually also figured out the very same concept myself p

Parameters and computation about densenet HOT 15 CLOSED

liuzhuang13 commented on August 9, 2024

Parameters and computation

from densenet.

Comments (15)

liuzhuang13 commented on August 9, 2024 1

Zero padding means padding the image by zero and then cropping it to the be of original size. It's equivalent to a translation and then padding the other side with zeros. I think from this view it makes more sense.

from densenet.

ibmua commented on August 9, 2024

Oops, I see you've actually published # of params inside your paper.
Mind, btw, that Zagoruyko's numbers are wrong. He messed up the dataset szagoruyko/wide-residual-networks#17 (comment)

from densenet.

ibmua commented on August 9, 2024

You can see that DenseNet performed worse with 7m params vs WideResnet with 2.7m on SVHN. I'm seeing it from your paper that that's because Sergey didn't mess up the data in that one, though, you both only did [0..1] scaling when std+mean would have probably done better. Can't say certainly, but likely.

from densenet.

ibmua commented on August 9, 2024

On the other hand, one can always just go ahead and test the speed himself. =) Though, that's a last resort measure considering the need for that info to be in tables/figures.

from densenet.

liuzhuang13 commented on August 9, 2024

Mind, btw, that Zagoruyko's numbers are wrong. He messed up the dataset szagoruyko/wide-residual-networks#17 (comment)

Thanks for your remind. Yeah, we read the wide-resnet paper, and knew that the wide-resnet's preprocessing (whitend instead of only normalized) and data augmentation(relect-padding instead of zero-padding) are both different from and slightly heavier than us. But ours is more widely used (see our references in the paper), it follows most of the publications.

We cannot rerun every baseline methods, and we think it's fair to compare our model with wide-resnet under the setting above.

from densenet.

liuzhuang13 commented on August 9, 2024

We appreciate your remind on #parameters and #computation. We'll consider including #computation in our next updates.

For training time reference, our Densenet(L=40, k=12) with batch size 64 and 300 epochs takes about 7 hours to finish on one TITAN X GPU. This includes about 0.5 hour test time.

Also as you said, it's important to keep other settings the same when comparing different architectures. So we keep every hyperparameter and other setting the same with the official implementation of ResNet, and use standard preprocessing and data augmentations. We're interested in the performance comparison of your architecture with densenets, under the same setting. But note that it's possible that a set of hyperparameters and settings is good for one architecture but bad for another architecture.

from densenet.

ibmua commented on August 9, 2024

Yes, of course you wouldn't rerun every model =) I'm just saying that for you to now know that some reevaluation of how good is DenseNet model vs ResNet is, or at least will soon be needed. Zagoruyko is going to retest his model, and, I'm guessing, update his paper accordingly.

Reguarding mirroring/reflections, fb.resnet.torch uses those too https://github.com/facebook/fb.resnet.torch/blob/master/datasets/cifar10.lua actually, for images of such types it's the most default kind of preprocessing, much more usual than zero-padding. For images of numbers and letters on the other hand, of course, it's not used at all in general. Except, maybe, for hand-selected parts of dataset.

our Densenet(L=40, k=12) with batch size 64 and 300 epochs takes about 7 hours to finish on one TITAN X GPU

Yes, I actually saw that info on the main page. =) It's just that it would be more interesting to see it in a figure with dots mapped on accuracy and #parameters (#computation in another one). I'll suggest that to Sergey too.

Selecting proper hyperparameters is indeed an important part and one should select best hyperparams for it when evaluating model's performance. Though it's also a part of model's evaluation to tell how critically is it's performance dependent on having some very exact hyperparameters.

from densenet.

liuzhuang13 commented on August 9, 2024

for images of such types it's the most default kind of preprocessing, much more usual than zero-padding

It seems to me that fb.resnet.torch's code use zero-padding instead of reflect padding. https://github.com/facebook/fb.resnet.torch/blob/master/datasets/transforms.lua
And zero-padding is more usual than reflecting padding (see our references).

For SVHN dataset we followed wide resnets' preprocessing and data augmentation(no data augmentation).

Thanks for your information. Just to clarify we didn't play any tricks and tried to keep the comparison as fair as possible.

from densenet.

ibmua commented on August 9, 2024

https://github.com/facebook/fb.resnet.torch/blob/master/datasets/cifar10.lua
Notice
t.HorizontalFlip(0.5),

As for what method is more popular - to tell the truth, I honestly don't know. Reflections just seem a more natural thing for this type of images and for convolutional networks with large pooling layer at the end like in all modern architectures. You see, for some images padding can remove an important part of an image and that padding you add adds some informational noise. I don't even understand how padding actually improves anything other than, maybe, giving more attention to the central part of the image and, maybe, helping learn to classify a cut part of an object as this object.

If you'll get a dataset where the classifier object is not located in the image center, but at many times near edges, my guess is zero padding will do you no good.

Meanwhile reflections can only hurt in some very rare cases when the object is not symmetric and when a symmetric reflection of it should classify as a different object. In real life that's almost completely restricted to symbols and characters. So it's the least risky and most obvious type of augmentation.

Scaling and rotation seem to me as much more meaningful augmentations than zero-padding for convolutional networks, but I guess for CIFAR they don't work very well, as the images there are very tiny. They should work to some extent if you're upscaling the images first, though. But that will make the network much slower and will probably defeat the purpose, as CIFAR is more of just a quick playground to test ideas than something like an actual data. If people were to just restrict themselves to testing on purely mirrored CIFAR without zero-padding that would be of great help, IMHO, as there's plenty of ways to zero-pad the dataset, but doing this makes no sense at all as the only purpose of CIFAR is just to compare ideas, not to win some competition. I'm saying we should keep reflections, though, because these definitely can't hurt and are a very tiny unambiguous type of augmentation that should help those networks that would otherwise tend to become overfit very fast, but are not inherently bad, as actual datasets are usually much bigger and have a possibility of rotation, scaling and other better augmentations. Meanwhile, my guess is that zero-padding will probably help them to a much smaller degree and is not a very fair type of augmentation as in CIFAR it almost completely eliminates any other things from the picture but the actual classifier-object when you combine the cuts made by zero-padding algo.

from densenet.

ibmua commented on August 9, 2024

Okay, I just understood that we were talking about different things =D Reflection-padding means padding with non-0 pixels. Now I get it. Yeah, reflection-padding is likely not a very good type of augmentation, IMO. You're providing the net with a lot of garbage data. With zero-padding you do that as well, but it quickly learns that that data is garbage, while in this type of augmentation, I don't think it's that easy.

from densenet.

ibmua commented on August 9, 2024

I just confused this with image horizontal mirroring.

from densenet.

ibmua commented on August 9, 2024

Yeah, I just found it that in WRN and so in my code it seems like we used reflection padding type. Lol.

I'll continue my investigations into CNNs with Cifar without using any type of padding.

from densenet.

ibmua commented on August 9, 2024

As for how any type of zero-padding influences learning, I guess, it tells that patterns from the cut-out parts should probably be generally ignored and that the most important part of the object is in those patterns that are located in the middle. Quite a lot of additional info.

from densenet.

revilokeb commented on August 9, 2024

@ibmua regarding zero vs reflection padding you might also have a look at their opinion https://twitter.com/karpathy/status/720622989289644033

from densenet.

ibmua commented on August 9, 2024

As far as I see it, with zero padding you're effectively adding some gray color instead of unimportant part of an image, therefore making classifier more indifferent to the cut-out part and to such large gray color patches. With reflection-padding you're adding very risky info. You're still reaping the benefit of making classifier more indifferent to the cut-out part, but the info you're adding will be impressed into weights. Likely, much more than with zero-padding. Yeah, maybe, for some cases it works better. Because it adds indifference to some background-ish patterns. But it's risky, IMHO.

Anyways, not cutting the image altogether is more interesting to me from a perspective of evaluation of NN quality. IMHO, it's probably only reasonable to use it if you're cooking the final net for production. It's pretty orthogonal to the network model itself. This is just a way of using your knowledge about the exact dataset to make it universally easier to grasp by any type of a regression model.

from densenet.

Parameters and computation about densenet HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent