Comments (14)
hello!
maybe, there's only one time loss calculation, but SAM needs 2 times fwd & bwd in your code. At the SAM Optimizer
part, actually the loss is calculated at only here (loss = m_backbone_loss
).
to fix the code,
with torch.cuda.device(CUDA_VISIBLE_DEVICES):
inputs = data[0].cuda()
labels = data[1].cuda()
iters += 1
optimizers.zero_grad()
# i don't know which criterion you used,
# but, (in case of `creiterion` is the pytorch built-in `cross entropy` function),
# it could be better to use `reduction=mean`.
# it's equal to 1) and 2)
# 1) loss = criterion(scores, labels, reduction='mean') # output is scalar
# 2) target_loss = criterion(scores, labels,)
# loss = torch.sum(target_loss) / target_loss.size(0)
# -----------------SAM Optimizer -------------------
# first forward-backward pass
# need to assign the result of `criterion` to `loss`
loss = criterion(models(inputs)[0], labels, reduction='mean')
loss.backward(retain_graph=True)
optimizers.first_step(zero_grad=True)
# second forward-backward pass
# need to assign the result of `criterion` to `loss`
loss = criterion(models(inputs)[0], labels, reduction='mean')
loss.backward(retain_graph=True)
optimizers.second_step(zero_grad=True)
from pytorch_optimizer.
IndexError: dimension specified as 0 but tensor has no dimensions
could you upload the whole error messages?
also, you should comment the codes like below! SAM Optimizer
parts already do fwd & bwd twice!
+) how size of models(inputs)[0]
is ? it should be (bs, num_classes) logits!
#scores, _, features = models(inputs)
# scores = models(inputs)
# target_loss = criterion(scores, labels)
# loss = torch.sum(target_loss) / target_loss.size(0)
from pytorch_optimizer.
reduction
oh sorry for the codes
I mean that set reduction
parameter to mean
at the definition like your code! criterion = nn.CrossEntropyLoss(reduction='mean')
.
removing the reduction
parameter should be fine! (loss = criterion(models(inputs)[0], labels, reduction='mean')
to loss = criterion(models(inputs)[0], labels)
from pytorch_optimizer.
yeah! you are right, I got you earlier and tried without (reduction='mean') in train_epoch() but another error was raised that says
File "/home/kanza/workspace/WithoutDictionary/RandomFixedSAM/main.py", line 120, in
train(models, criterion, optimizers, schedulers, dataloaders, args.no_of_epochs, EPOCHL)
File "/home/kanza/workspace/WithoutDictionary/RandomFixedSAM/train_test.py", line 64, in train
loss = train_epoch(models, criterion, optimizers, dataloaders)
File "/home/kanza/workspace/WithoutDictionary/RandomFixedSAM/train_test.py", line 48, in train_epoch
loss = criterion(models(inputs)[0], labels)
File "/home/kanza/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kanza/anaconda3/lib/python3.9/site-packages/torch/nn/modules/loss.py", line 1164, in forward
return F.cross_entropy(input, target, weight=self.weight,
File "/home/kanza/anaconda3/lib/python3.9/site-packages/torch/nn/functional.py", line 3014, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: size mismatch (got input: [100], target: [10])
from pytorch_optimizer.
i guess RuntimeError: size mismatch (got input: [100], target: [10])
error means that the output of the model is 100 (num_classes of model : 100), but the label (num_classes) is 10.
so, correcting the num_classes
should solve the issue
from pytorch_optimizer.
Yeah, I changed the number of classes to 100 in ResNet but .... I don't know
then the label data you have is total 10 classes (num_classes is 10). so changing num_classes
of the model to 10!
num_classes
must be equal to the number of dataset (label) classes : )
for example, in case of MNIST dataset, it has 10 classes (0 ~ 9), so the output of the model (num_classes) should be 10.
from pytorch_optimizer.
oh you are using CIFAR100
dataset. then as you said, num_classes
100 is correct. also your transformation code seems no problem i think.
how about checking the number of classes of data_train
? perhaps, it doesn't actually load the CIFAR100 dataset, but instead CIFAR10. (my rough guess)
i guess there's an issue with the dataset part!
from pytorch_optimizer.
I think there was some bug, it was not working for any of the datasets, so I made the changes again with a new folder. and everything is working fine now. I cannot thank you enough for helping me today.
I have a question!
Let's go a little back to the original train_epoch() of the original repository where loss is calculated by following the method
target_loss = criterion(scores, labels)
loss = torch.sum(target_loss) / target_loss.size(0)
and now we are doing it like this, I hope we are not doing anything mathematically or logically wrong.
`for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
with torch.cuda.device(CUDA_VISIBLE_DEVICES):
inputs = data[0].cuda()
labels = data[1].cuda()
iters += 1
optimizers.zero_grad()
scores, _, features = models(inputs)
#target_loss = criterion(scores, labels)
#m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)
#loss = m_backbone_loss
# -----------------SAM Optimizer -------------------
# first forward-backward pass
loss = criterion(models(inputs)[0], labels)
loss.backward(retain_graph=True)
optimizers.first_step(zero_grad=True)
# second forward-backward pass
loss = criterion(models(inputs)[0], labels)
loss.backward(retain_graph=True)
optimizers.second_step(zero_grad=True) `
from pytorch_optimizer.
I think there was some bug, it was not working for any of the datasets, so I made the changes again with a new folder. and everything is working fine now. I cannot thank you enough for helping me today. I have a question! Let's go a little back to the original train_epoch() of the original repository where loss is calculated by following the method
target_loss = criterion(scores, labels) loss = torch.sum(target_loss) / target_loss.size(0)
and now we are doing it like this, I hope we are not doing anything mathematically or logically wrong.
`for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])): with torch.cuda.device(CUDA_VISIBLE_DEVICES): inputs = data[0].cuda() labels = data[1].cuda() iters += 1 optimizers.zero_grad() scores, _, features = models(inputs) #target_loss = criterion(scores, labels) #m_backbone_loss = torch.sum(target_loss) / target_loss.size(0) #loss = m_backbone_loss # -----------------SAM Optimizer ------------------- # first forward-backward pass loss = criterion(models(inputs)[0], labels) loss.backward(retain_graph=True) optimizers.first_step(zero_grad=True)
# second forward-backward pass loss = criterion(models(inputs)[0], labels) loss.backward(retain_graph=True) optimizers.second_step(zero_grad=True) `
i'm glad you solve the issuse :)
in the aspect of the loss value, there's no difference between
criterion(scores, labels)
and target_loss = criterion(scores, labels); loss = torch.sum(target_loss ) / target_loss .size()[0]
.
because criterion(scores, labels)
returns a single scalar value, which likes e.g. 0.1234. so torch.sum(target_loss) / target_loss.size()[0]
won't affect to the loss
!
in the aspect of the SAM part, the final loss must be backpropagated like loss.backward(retain_graph=True)
. (assume torch.sum(target_loss) / target_loss.size()[0]
affects to the loss)
But, in your original codes, the final loss (target_loss = criterion(scores, labels); loss = torch.sum(target_loss) / target_loss.size()[0]
) isn't passed to the .backward()
, but only criterion(scores, labels)
part though.
I hope this could help!
from pytorch_optimizer.
yea I am using "Cross Entropy" . Okay I edited as per your recommendation but now it says
IndexError: dimension specified as 0 but tensor has no dimensions
`def train_epoch(models, criterion, optimizers, dataloaders):
models.train()
global iters
for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
with torch.cuda.device(CUDA_VISIBLE_DEVICES):
inputs = data[0].cuda()
labels = data[1].cuda()
iters += 1
optimizers.zero_grad()
#scores, _, features = models(inputs)
scores = models(inputs)
target_loss = criterion(scores, labels)
loss = torch.sum(target_loss) / target_loss.size(0)
# -----------------SAM Optimizer -------------------
# first forward-backward pass
loss = criterion(models(inputs)[0], labels, reduction='mean')
loss.backward(retain_graph=True)
optimizers.first_step(zero_grad=True)
# second forward-backward pass
loss = criterion(models(inputs)[0], labels, reduction='mean')
loss.backward(retain_graph=True)
optimizers.second_step(zero_grad=True)`
from pytorch_optimizer.
So in main() I made everything like this
` # Loss, criterion and scheduler (re)initialization
criterion = nn.CrossEntropyLoss(reduction='mean')
base_optimizer = torch.optim.SGD
optim_backbone = SAM(models.parameters(), base_optimizer, lr=LR, momentum=MOMENTUM, weight_decay=WDECAY)
sched_backbone = lr_scheduler.MultiStepLR(optim_backbone, milestones=MILESTONES)
optimizers = optim_backbone
schedulers = sched_backbone`
and tran_epoch() is
`def train_epoch(models, criterion, optimizers, dataloaders):
models.train()
global iters
for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
with torch.cuda.device(CUDA_VISIBLE_DEVICES):
inputs = data[0].cuda()
labels = data[1].cuda()
iters += 1
optimizers.zero_grad()
#scores, _, features = models(inputs)
#scores = models(inputs)
#target_loss = criterion(scores, labels)
#loss = torch.sum(target_loss) / target_loss.size(0)
# -----------------SAM Optimizer -------------------
# first forward-backward pass
loss = criterion(models(inputs)[0], labels, reduction='mean')
loss.backward(retain_graph=True)
optimizers.first_step(zero_grad=True)
# second forward-backward pass
loss = criterion(models(inputs)[0], labels, reduction='mean')
loss.backward(retain_graph=True)
optimizers.second_step(zero_grad=True)`
and now the error is
Train a Model.
/home/kanza/anaconda3/lib/python3.9/site-packages/torch/optim/lr_scheduler.py:131: UserWarning: Detected call oflr_scheduler.step()
beforeoptimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order:optimizer.step()
beforelr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call oflr_scheduler.step()
beforeoptimizer.step()
. "
Finished.
Trial 1/1 || Cycle 1/10 || Label set size 5: Test acc 1.04
Train a Model.
Traceback (most recent call last):
File "/home/kanza/workspace/WithoutDictionary/RandomFixedSAM/main.py", line 120, in
train(models, criterion, optimizers, schedulers, dataloaders, args.no_of_epochs, EPOCHL)
File "/home/kanza/workspace/WithoutDictionary/RandomFixedSAM/train_test.py", line 64, in train
loss = train_epoch(models, criterion, optimizers, dataloaders)
File "/home/kanza/workspace/WithoutDictionary/RandomFixedSAM/train_test.py", line 48, in train_epoch
loss = criterion(models(inputs)[0], labels, reduction='mean')
File "/home/kanza/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'reduction'
from pytorch_optimizer.
Yeah, I changed the number of classes to 100 in ResNet but .... I don't know
from pytorch_optimizer.
I changed the number of classes in the backbone, I am using CIFAR100 and changed number of classes
`class ResNet(nn.Module):
def init(self, block, num_blocks, num_classes=100):
super(ResNet, self).init()
self.in_planes = 64
self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
self.linear = nn.Linear(512*block.expansion, num_classes)
# self.linear2 = nn.Linear(1000, num_classes)
def _make_layer(self, block, planes, num_blocks, stride):
strides = [stride] + [1]*(num_blocks-1)
layers = []
for stride in strides:
layers.append(block(self.in_planes, planes, stride))
self.in_planes = planes * block.expansion
return nn.Sequential(*layers)
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out1 = self.layer1(out)
out2 = self.layer2(out1)
out3 = self.layer3(out2)
out4 = self.layer4(out3)
out = F.avg_pool2d(out4, 4)
outf = out.view(out.size(0), -1)
# outl = self.linear(outf)
out = self.linear(outf)
#return out, outf, [out1, out2, out3, out4]
return out
def ResNet18(num_classes = 100):
return ResNet(BasicBlock, [2,2,2,2], num_classes)`
load_dataset() is also sharing the number of classes
`def load_dataset(dataset):
train_transform = T.Compose([
T.RandomHorizontalFlip(),
T.RandomCrop(size=32, padding = 4), #ImageNet 224
T.ToTensor(),
T.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010]) # T.Normalize((0.5071, 0.4867, 0.4408), (0.2675, 0.2565, 0.2761)) # CIFAR-100
])
test_transform = T.Compose([
T.ToTensor(),
T.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010]) # T.Normalize((0.5071, 0.4867, 0.4408), (0.2675, 0.2565, 0.2761)) # CIFAR-100
])
if dataset == 'cifar100':
data_train = CIFAR100('../cifar100', train=True, download=True, transform=train_transform)
data_unlabeled = MyDataset(dataset, True, test_transform)
data_test = CIFAR100('../cifar100', train=False, download=True, transform=test_transform)
NO_CLASSES = 100
adden = ADDENDUM
no_train = NUM_TRAIN `
from pytorch_optimizer.
I'll close the issues (w/ your previous issue). if you have any questions, feel free to repoen the issue or create another issue :)
best regards
from pytorch_optimizer.
Related Issues (20)
- Question about using Ranger21 with Hugging Face Trainer HOT 4
- get_chebyshev_schedule not working HOT 2
- Ranger21 causing loss to spike and model never converges HOT 8
- Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training HOT 2
- Prodigy: An Expeditiously Adaptive Parameter-Free Learner HOT 2
- LOMO: LOw-Memory Optimization HOT 1
- Can a variant of Lion named Tiger be added to your package? HOT 2
- sophiah bug HOT 5
- sophiah in https://github.com/booydar/LM-RMT HOT 8
- Adding the CAME optimizer HOT 2
- Lookahead is not a subclass of torch.optim.Optimizer HOT 4
- Empty Docs Sections HOT 6
- Request to add 4-bit AdamW HOT 3
- ipex failed for Adan from pytorch_optimizer HOT 1
- Improvement to SAM: SAM as an Optimal Relaxation of Bayes HOT 1
- FR: Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term (WSAM) HOT 2
- Ranger21 has undocumented required arguments HOT 3
- [Feature request]REX LR scheduler HOT 2
- Aida optimizer HOT 3
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch_optimizer.