Giter Site home page Giter Site logo

Comments (14)

kozistr avatar kozistr commented on May 23, 2024 1

hello!

maybe, there's only one time loss calculation, but SAM needs 2 times fwd & bwd in your code. At the SAM Optimizer part, actually the loss is calculated at only here (loss = m_backbone_loss).

to fix the code,

    with torch.cuda.device(CUDA_VISIBLE_DEVICES):
        inputs = data[0].cuda()
        labels = data[1].cuda()

    iters += 1
    optimizers.zero_grad()  

    # i don't know which criterion you used, 
    # but, (in case of `creiterion` is the pytorch built-in `cross entropy` function), 
    # it could be better to use `reduction=mean`.
    # it's equal to 1) and 2)
    # 1) loss = criterion(scores, labels, reduction='mean') # output is scalar
    # 2) target_loss = criterion(scores, labels,)
    #    loss = torch.sum(target_loss) / target_loss.size(0)

    # -----------------SAM Optimizer -------------------
    # first forward-backward pass
    # need to assign the result of `criterion` to `loss`
    loss = criterion(models(inputs)[0], labels, reduction='mean')
    loss.backward(retain_graph=True)
    optimizers.first_step(zero_grad=True)
 
    # second forward-backward pass
    # need to assign the result of `criterion` to `loss`
    loss = criterion(models(inputs)[0], labels, reduction='mean')
    loss.backward(retain_graph=True)
    optimizers.second_step(zero_grad=True)

from pytorch_optimizer.

kozistr avatar kozistr commented on May 23, 2024 1

IndexError: dimension specified as 0 but tensor has no dimensions

could you upload the whole error messages?

also, you should comment the codes like below! SAM Optimizer parts already do fwd & bwd twice!

+) how size of models(inputs)[0] is ? it should be (bs, num_classes) logits!

    #scores, _, features = models(inputs)   
    # scores = models(inputs)     
    # target_loss = criterion(scores, labels)
    # loss = torch.sum(target_loss) / target_loss.size(0)       

from pytorch_optimizer.

kozistr avatar kozistr commented on May 23, 2024 1

reduction

oh sorry for the codes

I mean that set reduction parameter to mean at the definition like your code! criterion = nn.CrossEntropyLoss(reduction='mean').

removing the reduction parameter should be fine! (loss = criterion(models(inputs)[0], labels, reduction='mean')
to loss = criterion(models(inputs)[0], labels)

from pytorch_optimizer.

manza-ari avatar manza-ari commented on May 23, 2024 1

yeah! you are right, I got you earlier and tried without (reduction='mean') in train_epoch() but another error was raised that says

File "/home/kanza/workspace/WithoutDictionary/RandomFixedSAM/main.py", line 120, in
train(models, criterion, optimizers, schedulers, dataloaders, args.no_of_epochs, EPOCHL)
File "/home/kanza/workspace/WithoutDictionary/RandomFixedSAM/train_test.py", line 64, in train
loss = train_epoch(models, criterion, optimizers, dataloaders)
File "/home/kanza/workspace/WithoutDictionary/RandomFixedSAM/train_test.py", line 48, in train_epoch
loss = criterion(models(inputs)[0], labels)
File "/home/kanza/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kanza/anaconda3/lib/python3.9/site-packages/torch/nn/modules/loss.py", line 1164, in forward
return F.cross_entropy(input, target, weight=self.weight,
File "/home/kanza/anaconda3/lib/python3.9/site-packages/torch/nn/functional.py", line 3014, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: size mismatch (got input: [100], target: [10])

from pytorch_optimizer.

kozistr avatar kozistr commented on May 23, 2024 1

i guess RuntimeError: size mismatch (got input: [100], target: [10]) error means that the output of the model is 100 (num_classes of model : 100), but the label (num_classes) is 10.

so, correcting the num_classes should solve the issue

from pytorch_optimizer.

kozistr avatar kozistr commented on May 23, 2024 1

Yeah, I changed the number of classes to 100 in ResNet but .... I don't know

Screenshot from 2022-08-20 19-38-06

then the label data you have is total 10 classes (num_classes is 10). so changing num_classes of the model to 10!

num_classes must be equal to the number of dataset (label) classes : )

for example, in case of MNIST dataset, it has 10 classes (0 ~ 9), so the output of the model (num_classes) should be 10.

from pytorch_optimizer.

kozistr avatar kozistr commented on May 23, 2024 1

oh you are using CIFAR100 dataset. then as you said, num_classes 100 is correct. also your transformation code seems no problem i think.

how about checking the number of classes of data_train? perhaps, it doesn't actually load the CIFAR100 dataset, but instead CIFAR10. (my rough guess)

i guess there's an issue with the dataset part!

from pytorch_optimizer.

manza-ari avatar manza-ari commented on May 23, 2024 1

I think there was some bug, it was not working for any of the datasets, so I made the changes again with a new folder. and everything is working fine now. I cannot thank you enough for helping me today.
I have a question!
Let's go a little back to the original train_epoch() of the original repository where loss is calculated by following the method

target_loss = criterion(scores, labels)
loss = torch.sum(target_loss) / target_loss.size(0)

and now we are doing it like this, I hope we are not doing anything mathematically or logically wrong.

`for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
with torch.cuda.device(CUDA_VISIBLE_DEVICES):
inputs = data[0].cuda()
labels = data[1].cuda()
iters += 1
optimizers.zero_grad()
scores, _, features = models(inputs)
#target_loss = criterion(scores, labels)
#m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)
#loss = m_backbone_loss
# -----------------SAM Optimizer -------------------
# first forward-backward pass
loss = criterion(models(inputs)[0], labels)
loss.backward(retain_graph=True)
optimizers.first_step(zero_grad=True)

    # second forward-backward pass
    loss = criterion(models(inputs)[0], labels)
    loss.backward(retain_graph=True)
    optimizers.second_step(zero_grad=True) `

from pytorch_optimizer.

kozistr avatar kozistr commented on May 23, 2024 1

I think there was some bug, it was not working for any of the datasets, so I made the changes again with a new folder. and everything is working fine now. I cannot thank you enough for helping me today. I have a question! Let's go a little back to the original train_epoch() of the original repository where loss is calculated by following the method

target_loss = criterion(scores, labels) loss = torch.sum(target_loss) / target_loss.size(0)

and now we are doing it like this, I hope we are not doing anything mathematically or logically wrong.

`for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])): with torch.cuda.device(CUDA_VISIBLE_DEVICES): inputs = data[0].cuda() labels = data[1].cuda() iters += 1 optimizers.zero_grad() scores, _, features = models(inputs) #target_loss = criterion(scores, labels) #m_backbone_loss = torch.sum(target_loss) / target_loss.size(0) #loss = m_backbone_loss # -----------------SAM Optimizer ------------------- # first forward-backward pass loss = criterion(models(inputs)[0], labels) loss.backward(retain_graph=True) optimizers.first_step(zero_grad=True)

    # second forward-backward pass
    loss = criterion(models(inputs)[0], labels)
    loss.backward(retain_graph=True)
    optimizers.second_step(zero_grad=True) `

i'm glad you solve the issuse :)

in the aspect of the loss value, there's no difference between

criterion(scores, labels) and target_loss = criterion(scores, labels); loss = torch.sum(target_loss ) / target_loss .size()[0].

because criterion(scores, labels) returns a single scalar value, which likes e.g. 0.1234. so torch.sum(target_loss) / target_loss.size()[0] won't affect to the loss!

in the aspect of the SAM part, the final loss must be backpropagated like loss.backward(retain_graph=True). (assume torch.sum(target_loss) / target_loss.size()[0] affects to the loss)

But, in your original codes, the final loss (target_loss = criterion(scores, labels); loss = torch.sum(target_loss) / target_loss.size()[0]) isn't passed to the .backward(), but only criterion(scores, labels) part though.

I hope this could help!

from pytorch_optimizer.

manza-ari avatar manza-ari commented on May 23, 2024

yea I am using "Cross Entropy" . Okay I edited as per your recommendation but now it says
IndexError: dimension specified as 0 but tensor has no dimensions
`def train_epoch(models, criterion, optimizers, dataloaders):

models.train()

global iters
for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
    with torch.cuda.device(CUDA_VISIBLE_DEVICES):
        inputs = data[0].cuda()
        labels = data[1].cuda()
    iters += 1
    optimizers.zero_grad()   
    #scores, _, features = models(inputs)   
    scores = models(inputs)     
    target_loss = criterion(scores, labels)
    loss = torch.sum(target_loss) / target_loss.size(0)        
    
    # -----------------SAM Optimizer -------------------
    
    # first forward-backward pass
    loss = criterion(models(inputs)[0], labels, reduction='mean')
    loss.backward(retain_graph=True)
    optimizers.first_step(zero_grad=True)
    
    # second forward-backward pass
    loss = criterion(models(inputs)[0], labels, reduction='mean')
    loss.backward(retain_graph=True)
    optimizers.second_step(zero_grad=True)`

from pytorch_optimizer.

manza-ari avatar manza-ari commented on May 23, 2024

So in main() I made everything like this

` # Loss, criterion and scheduler (re)initialization
criterion = nn.CrossEntropyLoss(reduction='mean')
base_optimizer = torch.optim.SGD
optim_backbone = SAM(models.parameters(), base_optimizer, lr=LR, momentum=MOMENTUM, weight_decay=WDECAY)

        sched_backbone = lr_scheduler.MultiStepLR(optim_backbone, milestones=MILESTONES)
        optimizers =  optim_backbone
        schedulers = sched_backbone`

and tran_epoch() is

`def train_epoch(models, criterion, optimizers, dataloaders):

models.train()

global iters
for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
    with torch.cuda.device(CUDA_VISIBLE_DEVICES):
        inputs = data[0].cuda()
        labels = data[1].cuda()
    iters += 1
    optimizers.zero_grad()   
    #scores, _, features = models(inputs)   
    #scores = models(inputs)     
    #target_loss = criterion(scores, labels)
    #loss = torch.sum(target_loss) / target_loss.size(0)        
    
    # -----------------SAM Optimizer -------------------

    # first forward-backward pass
    loss = criterion(models(inputs)[0], labels, reduction='mean')
    loss.backward(retain_graph=True)
    optimizers.first_step(zero_grad=True)
    
    # second forward-backward pass
    loss = criterion(models(inputs)[0], labels, reduction='mean')
    loss.backward(retain_graph=True)
    optimizers.second_step(zero_grad=True)`

and now the error is

Train a Model.
/home/kanza/anaconda3/lib/python3.9/site-packages/torch/optim/lr_scheduler.py:131: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "
Finished.
Trial 1/1 || Cycle 1/10 || Label set size 5: Test acc 1.04
Train a Model.
Traceback (most recent call last):
File "/home/kanza/workspace/WithoutDictionary/RandomFixedSAM/main.py", line 120, in
train(models, criterion, optimizers, schedulers, dataloaders, args.no_of_epochs, EPOCHL)
File "/home/kanza/workspace/WithoutDictionary/RandomFixedSAM/train_test.py", line 64, in train
loss = train_epoch(models, criterion, optimizers, dataloaders)
File "/home/kanza/workspace/WithoutDictionary/RandomFixedSAM/train_test.py", line 48, in train_epoch
loss = criterion(models(inputs)[0], labels, reduction='mean')
File "/home/kanza/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'reduction'

from pytorch_optimizer.

manza-ari avatar manza-ari commented on May 23, 2024

Yeah, I changed the number of classes to 100 in ResNet but .... I don't know

Screenshot from 2022-08-20 19-38-06

from pytorch_optimizer.

manza-ari avatar manza-ari commented on May 23, 2024

I changed the number of classes in the backbone, I am using CIFAR100 and changed number of classes

`class ResNet(nn.Module):
def init(self, block, num_blocks, num_classes=100):
super(ResNet, self).init()
self.in_planes = 64

    self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
    self.bn1 = nn.BatchNorm2d(64)
    self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
    self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
    self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
    self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
    self.linear = nn.Linear(512*block.expansion, num_classes)
    # self.linear2 = nn.Linear(1000, num_classes)

def _make_layer(self, block, planes, num_blocks, stride):
    strides = [stride] + [1]*(num_blocks-1)
    layers = []
    for stride in strides:
        layers.append(block(self.in_planes, planes, stride))
        self.in_planes = planes * block.expansion
    return nn.Sequential(*layers)

def forward(self, x):
    out = F.relu(self.bn1(self.conv1(x)))
    out1 = self.layer1(out)
    out2 = self.layer2(out1)
    out3 = self.layer3(out2)
    out4 = self.layer4(out3)
    out = F.avg_pool2d(out4, 4)
    outf = out.view(out.size(0), -1)
    # outl = self.linear(outf)
    out = self.linear(outf)
    #return out, outf, [out1, out2, out3, out4]
    return out

def ResNet18(num_classes = 100):
return ResNet(BasicBlock, [2,2,2,2], num_classes)`

load_dataset() is also sharing the number of classes

`def load_dataset(dataset):
train_transform = T.Compose([
T.RandomHorizontalFlip(),
T.RandomCrop(size=32, padding = 4), #ImageNet 224
T.ToTensor(),
T.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010]) # T.Normalize((0.5071, 0.4867, 0.4408), (0.2675, 0.2565, 0.2761)) # CIFAR-100
])

test_transform = T.Compose([
    T.ToTensor(),
    T.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010]) # T.Normalize((0.5071, 0.4867, 0.4408), (0.2675, 0.2565, 0.2761)) # CIFAR-100
])
if dataset == 'cifar100':
    data_train = CIFAR100('../cifar100', train=True, download=True, transform=train_transform)
    data_unlabeled = MyDataset(dataset, True, test_transform)
    data_test  = CIFAR100('../cifar100', train=False, download=True, transform=test_transform)
    NO_CLASSES = 100
    adden = ADDENDUM
    no_train = NUM_TRAIN `

from pytorch_optimizer.

kozistr avatar kozistr commented on May 23, 2024

I'll close the issues (w/ your previous issue). if you have any questions, feel free to repoen the issue or create another issue :)

best regards

from pytorch_optimizer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.