Giter Site home page Giter Site logo

nas's People

Contributors

junrq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

nas's Issues

I mesure the speed in my device

I mesure the speed in my device, and I retrain it on the new speed.txt file.
But a mistake appeared:
Traceback (most recent call last):
File "/workspace/fbnet-pytorch/train_cifar10.py", line 105, in
speed_f=config.speed_f)
File "/workspace/fbnet-pytorch/model.py", line 84, in init
self._speed = torch.tensor(self._speed, requires_grad=False)
ValueError: expected sequence of length 8 at dim 1 (got 9)

what should I do?

Params, BN, Learned Distribution and FC width layer about FBnet and your implementation

Thank you for sharing this great code!

I wonder if you have tested the released architecture of FBNETA,B and C? I calculated the parameters of FBNET-B, but it doesn't match with the origin paper.

in_channels out_channels kernel expansion groups stride input params
3 16 3 1 1 2 224 432
16 16 3 1 1 1 112 656
16 24 3 6 1 2 112 4704
24 24 5 1 1 1 56 1752
24 24 3 1 1 1 56 1368
24 24 3 1 1 1 56 1368
24 32 5 6 1 2 56 11664
32 32 5 3 1 1 28 8544
32 32 3 6 1 1 28 14016
32 32 5 6 1 1 28 17088
32 64 5 6 1 2 28 23232
64 64 5 1 1 1 14 9792
64 64 0 0 0 0 14 0
64 64 5 3 1 1 14 29376
64 112 5 6 1 1 14 77184
112 112 3 1 1 1 14 26096
112 112 5 1 1 1 14 27888
112 112 5 3 1 1 14 83664
112 184 5 6 1 2 14 215712
184 184 5 1 1 1 7 72312
184 184 5 6 1 1 7 433872
184 184 5 6 1 1 7 433872
184 352 3 6 1 1 7 601680
352 1504 1 1 1 1 7 529408
1504 1000 1 1 1 1 1 1504000
4129680

The calculated parameters is 4.1M, which is lower than the reported 4.5M in the paper. The flops is also not consist. I wonder if you have calculated the number of parameters or flops?

Besides:

  1. Did you test BN layer in implementation? From the paper and your implementation, I didn't find BN in the middle, however, it is used in Shufflenet v2. I wonder if you have tested the effect? will the performance be better is you remove the BN layer in the middle?

  2. Have you ever searched architectures with latency on GPU. I first evaluate the lantency on my Titan XP GPU and search FBNet architectures and found it tends to select the most parameter module with almost no variance. However, in the original paper, the author says that he sampled 6 different architectures after training. I wonder if your searched architecture on GPU/CPU latency has the observation that the learned distribution could generate both large model like FBNET-C and light-weight model like FBNET-A

  3. Why do you use 1984 as FC layer width? Tab. 1 in paper have 1504 and 1984, which make me confusing.

Some questions in measure_speed.py

The size of CIFAR-10 is (32, 32), and the size of ImageNet is (224, 224). Why the input_shape is (1, 3, 108, 108)?
def measure(blocks, input_shape = (1, 3, 108, 108), result_path='speed_custom.txt'):

How to reproduce your result?

According to your pytorch-cifar10.alpha.0.01.init_lat.438.log, the final acc result is 0.84164 and the lat is 283.62738ms. Unfortunately I couldn't remake it. It was strange that my acc was just 0.67671 and my lat is 553.25839ms. Additionally the lat_loss was increasing when i saw it on tensorboard.
image
Look forward to your reply.

My config param were shown as this:
截图

I try to train a sample net

Hello,
I trained a super net on cifar to get the suitable theta, then I try to retrain a sample net only to find the output of each layer to be NAN for just 3 batch. Do you have met this situation before? What should I do?
Thanks.

training FBNet on CIFAR-10

Hi JunrQ,

Thanks for your work, it helps a lot. When I trained the FBNet on CIFAR-10, the accuracy began to quickly drop at epoch 66. And when it comes to epoch 71, the accuracy is about 0.1 and both loss
and ce are nan. Is there something wrong or normal?

By the way, the lowest loss is about 8.5 at epoch 27 and then it gets higher.

Thanks!

Forward and backward for FBNet

Hi, JunrQ:

Thanks for your work, it is really quite helpful~
I have a question: I found that in your FBNet source code, you generate batch_size models for batch_size samples per batch, however, the total loss is summed and the loss.backward() function is called. So how this backward() function is applied? For a single model or for batch_size model? Besides, I wonder that why you use this method for FBNet while a single model is generated, loss.backward() is called and then two .step() function is applied in SNAS code.

what meaning MAC means in MixedOp?

In my understanding, MAC means "Multipy Add Cost", but the following code seams to calculate the memory cost of DilConv.

` MAC1 = (self.width * self.height) * (op.op[1].in_channels + op.op[1].in_channels)
MAC1 += (op.op[1].kernel_size[0] ** 2 *op.op[1].in_channels * op.op[1].out_channels) / op.op[1].groups
MAC2 = (self.width * self.height) * (op.op[2].in_channels + op.op[2].in_channels)
MAC2 += (op.op[2].kernel_size[0] ** 2 *op.op[2].in_channels * op.op[2].out_channels) / op.op[2].groups

    MAC = MAC1 + MAC2

`

MAC1 = (self.width * self.height) * (op.op[1].in_channels + op.op[1].in_channels)

imagenet training code bug.

Hi, thanks for your code. But when I training imagenet dataset, I found there are some bugs. For example, code can not found self.samples in data.py. So, do you have check your imagenet code before you upload?

got nan when calculate gumbel_softmax

Sometimes nn.functional.gumbel_softmax will return nan if using GPU to calculate. It will not happen if using CPU to calculate.

test code:

import torch
import torch.nn as nn
import math

if __name__ == "__main__":
    batch_size = 128
    temperature = 5.0
    theta = torch.FloatTensor([1.753356814384460449,1.898535370826721191,0.6992630958557128906,
                                0.2227068245410919189,0.6384450793266296387,1.431323885917663574,
                                -0.05012089386582374573, -0.06672633439302444458])
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    t_gpu = theta.repeat(batch_size, 1).to(device)
    max_num = 1000000
    nan_num = 0
    for i in range(max_num):
        weight = nn.functional.gumbel_softmax(t_gpu, temperature)
        if math.isnan(torch.sum(weight)):
            nan_num+=1
    print("GPU: nan {:.3f}% probability happen, tot {}".format(100.0 * nan_num / max_num, nan_num))
    nan_num = 0
    t_cpu = theta.repeat(batch_size, 1)
    for i in range(max_num):
        weight = nn.functional.gumbel_softmax(t_cpu, temperature)
        if math.isnan(torch.sum(weight)):
            nan_num+=1
    print("CPU: nan {:.3f}% probability happen, tot {}".format(100.0 * nan_num / max_num, nan_num))

got results:

GPU: nan 0.004% probability happen, tot 38
CPU: nan 0.000% probability happen, tot 0

I'm not sure if it is a bug of pytorch or a bug of gumbel_softmax or there are some restrictions for the value of theta.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.