junrq / nas Goto Github PK
View Code? Open in Web Editor NEWNeural architecture search(NAS)
Neural architecture search(NAS)
I mesure the speed in my device, and I retrain it on the new speed.txt file.
But a mistake appeared:
Traceback (most recent call last):
File "/workspace/fbnet-pytorch/train_cifar10.py", line 105, in
speed_f=config.speed_f)
File "/workspace/fbnet-pytorch/model.py", line 84, in init
self._speed = torch.tensor(self._speed, requires_grad=False)
ValueError: expected sequence of length 8 at dim 1 (got 9)
what should I do?
Thank you for sharing this great code!
I wonder if you have tested the released architecture of FBNETA,B and C? I calculated the parameters of FBNET-B, but it doesn't match with the origin paper.
in_channels | out_channels | kernel | expansion | groups | stride | input | params |
---|---|---|---|---|---|---|---|
3 | 16 | 3 | 1 | 1 | 2 | 224 | 432 |
16 | 16 | 3 | 1 | 1 | 1 | 112 | 656 |
16 | 24 | 3 | 6 | 1 | 2 | 112 | 4704 |
24 | 24 | 5 | 1 | 1 | 1 | 56 | 1752 |
24 | 24 | 3 | 1 | 1 | 1 | 56 | 1368 |
24 | 24 | 3 | 1 | 1 | 1 | 56 | 1368 |
24 | 32 | 5 | 6 | 1 | 2 | 56 | 11664 |
32 | 32 | 5 | 3 | 1 | 1 | 28 | 8544 |
32 | 32 | 3 | 6 | 1 | 1 | 28 | 14016 |
32 | 32 | 5 | 6 | 1 | 1 | 28 | 17088 |
32 | 64 | 5 | 6 | 1 | 2 | 28 | 23232 |
64 | 64 | 5 | 1 | 1 | 1 | 14 | 9792 |
64 | 64 | 0 | 0 | 0 | 0 | 14 | 0 |
64 | 64 | 5 | 3 | 1 | 1 | 14 | 29376 |
64 | 112 | 5 | 6 | 1 | 1 | 14 | 77184 |
112 | 112 | 3 | 1 | 1 | 1 | 14 | 26096 |
112 | 112 | 5 | 1 | 1 | 1 | 14 | 27888 |
112 | 112 | 5 | 3 | 1 | 1 | 14 | 83664 |
112 | 184 | 5 | 6 | 1 | 2 | 14 | 215712 |
184 | 184 | 5 | 1 | 1 | 1 | 7 | 72312 |
184 | 184 | 5 | 6 | 1 | 1 | 7 | 433872 |
184 | 184 | 5 | 6 | 1 | 1 | 7 | 433872 |
184 | 352 | 3 | 6 | 1 | 1 | 7 | 601680 |
352 | 1504 | 1 | 1 | 1 | 1 | 7 | 529408 |
1504 | 1000 | 1 | 1 | 1 | 1 | 1 | 1504000 |
4129680 |
The calculated parameters is 4.1M, which is lower than the reported 4.5M in the paper. The flops is also not consist. I wonder if you have calculated the number of parameters or flops?
Besides:
Did you test BN layer in implementation? From the paper and your implementation, I didn't find BN in the middle, however, it is used in Shufflenet v2. I wonder if you have tested the effect? will the performance be better is you remove the BN layer in the middle?
Have you ever searched architectures with latency on GPU. I first evaluate the lantency on my Titan XP GPU and search FBNet architectures and found it tends to select the most parameter module with almost no variance. However, in the original paper, the author says that he sampled 6 different architectures after training. I wonder if your searched architecture on GPU/CPU latency has the observation that the learned distribution could generate both large model like FBNET-C and light-weight model like FBNET-A
Why do you use 1984 as FC layer width? Tab. 1 in paper have 1504 and 1984, which make me confusing.
there is three branch in your project, which is latest?
The size of CIFAR-10 is (32, 32), and the size of ImageNet is (224, 224). Why the input_shape is (1, 3, 108, 108)?
def measure(blocks, input_shape = (1, 3, 108, 108), result_path='speed_custom.txt'):
@JunrQ
put lat
in loss = [..., stop_grad(lat)]
and print it with print(outputs[-1].asnumpy())
Thanks for yours code! I think 'and' in line 224 should be changed to 'or'.
Line 224 in d430c32
What is the model size (FLOPs and # of params) for the CIFAR-10 trained model? How should we constraint the number of FLOPS for the final searched model?
Test issues
Parameters should be initialized with the same array in different gpus. @JunrQ
It seams that 1e-9 is not suitable for your new code. Experiment outputs costs is about 6e-04, and the accompany loss is 2.414636e+00 during the former steps.
NAS/snas/snas/train_cifar10.py
Line 31 in f5b0f25
According to your pytorch-cifar10.alpha.0.01.init_lat.438.log, the final acc result is 0.84164 and the lat is 283.62738ms. Unfortunately I couldn't remake it. It was strange that my acc was just 0.67671 and my lat is 553.25839ms. Additionally the lat_loss was increasing when i saw it on tensorboard.
Look forward to your reply.
Hello,
I trained a super net on cifar to get the suitable theta, then I try to retrain a sample net only to find the output of each layer to be NAN for just 3 batch. Do you have met this situation before? What should I do?
Thanks.
You didn't use kernel size in class FBnetblock.
What's wrong with Bn?
Hi JunrQ,
Thanks for your work, it helps a lot. When I trained the FBNet on CIFAR-10, the accuracy began to quickly drop at epoch 66. And when it comes to epoch 71, the accuracy is about 0.1 and both loss
and ce are nan. Is there something wrong or normal?
By the way, the lowest loss is about 8.5 at epoch 27 and then it gets higher.
Thanks!
In the original paper of SNAS, after the best architecture is found, the model will be retrained on the training set. Could you please upload the code for retraining? Thank you so much
Line 42 in 5c23276
Hi, JunrQ:
Thanks for your work, it is really quite helpful~
I have a question: I found that in your FBNet source code, you generate batch_size models for batch_size samples per batch, however, the total loss is summed and the loss.backward() function is called. So how this backward() function is applied? For a single model or for batch_size model? Besides, I wonder that why you use this method for FBNet while a single model is generated, loss.backward() is called and then two .step() function is applied in SNAS code.
have you reproduced the results in original paper?
In my understanding, MAC means "Multipy Add Cost", but the following code seams to calculate the memory cost of DilConv.
` MAC1 = (self.width * self.height) * (op.op[1].in_channels + op.op[1].in_channels)
MAC1 += (op.op[1].kernel_size[0] ** 2 *op.op[1].in_channels * op.op[1].out_channels) / op.op[1].groups
MAC2 = (self.width * self.height) * (op.op[2].in_channels + op.op[2].in_channels)
MAC2 += (op.op[2].kernel_size[0] ** 2 *op.op[2].in_channels * op.op[2].out_channels) / op.op[2].groups
MAC = MAC1 + MAC2
`
Line 55 in 5c23276
Hi, thanks for your code. But when I training imagenet dataset, I found there are some bugs. For example, code can not found self.samples in data.py. So, do you have check your imagenet code before you upload?
Sometimes nn.functional.gumbel_softmax will return nan if using GPU to calculate. It will not happen if using CPU to calculate.
test code:
import torch
import torch.nn as nn
import math
if __name__ == "__main__":
batch_size = 128
temperature = 5.0
theta = torch.FloatTensor([1.753356814384460449,1.898535370826721191,0.6992630958557128906,
0.2227068245410919189,0.6384450793266296387,1.431323885917663574,
-0.05012089386582374573, -0.06672633439302444458])
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
t_gpu = theta.repeat(batch_size, 1).to(device)
max_num = 1000000
nan_num = 0
for i in range(max_num):
weight = nn.functional.gumbel_softmax(t_gpu, temperature)
if math.isnan(torch.sum(weight)):
nan_num+=1
print("GPU: nan {:.3f}% probability happen, tot {}".format(100.0 * nan_num / max_num, nan_num))
nan_num = 0
t_cpu = theta.repeat(batch_size, 1)
for i in range(max_num):
weight = nn.functional.gumbel_softmax(t_cpu, temperature)
if math.isnan(torch.sum(weight)):
nan_num+=1
print("CPU: nan {:.3f}% probability happen, tot {}".format(100.0 * nan_num / max_num, nan_num))
got results:
GPU: nan 0.004% probability happen, tot 38
CPU: nan 0.000% probability happen, tot 0
I'm not sure if it is a bug of pytorch or a bug of gumbel_softmax or there are some restrictions for the value of theta.
Line 50 in 5c23276
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.