bayeswatch / nas-without-training Goto Github PK
View Code? Open in Web Editor NEWCode for Neural Architecture Search without Training (ICML 2021)
Code for Neural Architecture Search without Training (ICML 2021)
Hi, I notice that you are not using the loss to backward, rather the forward result.
But leaving the target interface and the criterion in your code. I wonder why? Are you trying to use the target to calculate the loss to backward but it looks bad?
Related codes:
nas-without-training/search.py
Line 58 in a98f872
nas-without-training/search.py
Line 50 in a98f872
Hi! I have a question about the score caculation. In your code, you sum all the modules which has relu to caculate network.K, i.e. , the K_{H}. Then you caculate logdet(network.K) without normalization which should set the diagonal entry as 1. But in different network, the total number of modules which has relu is different. So without normalization the network.K,it seems to prefer the network which has more modules with relu. I would like to question if the normalization of the netwrk.K is necessary. In my opinion, the normaliztion may be right in some way? Hope you give me an answer. Thanks!
In your codes both 'search.py' and 'plot_histograms.py', the shape of Jacob is [batch_size, CHW] and the shape of correlation matrix is [batch_size, batch_size]. However, the correct shape of the correlation matrix should be [CHW, CHW], actually C=3, H=32, W=32. Therefore, the shape of Jacob should be [CHW, batch_size]. This caused the wrong results of correlation matrix histograms and incorrect architecture score.
Hi
Which IDE are you using?
Best regards,
PeterPham
I'm having a hard time sifting through the code to the find the answer.
Should one compute the NASWOT score on a network in train or eval mode?
I get that we are trying to get a measure of how much the network creates these linear separations relative to a dataset. To me this would suggest that something like dropout should not be considered when computing the score and hence would speak towards using an eval model. But including batchnorm seems like it should be since we are normalizing the data (speaking for using the train mode). But if the batch norm is unfit, then possibly eval is better.
input would be appreciated
Firstly, it is very nice for you to share this project with us. However, I have some issues with the histograms of the correlation matrix in your paper.
I also try to plot the histograms and my implementation steps are as follows:
The histograms I plot are as follows:
Compared to the histograms your paper reported, the above histograms have two main difference:
So is there anything wrong in my implementation? Looking forward to your reply.
I want to measure almost everything related to the "best" arch found. For example, model size in MB, how many bytes are used by the inferences, etc...
I only know how to do it with tensorflow API, so there's any way to import the final model, or any architecture and "transform" in a TF model?
Just out of interest, did you also measure the scores when actually training the networks after each epoch? How do they develop?
I was wondering if the scores are even more correlated to the final model's accuracy when they are measured not directly after initialization but after 1 or 2 epochs.
I do understand that the comparison of the eigval from a uncorrelated matrix (i.e. identity matrix) would all be ones.
But, I'm not understanding how the KL divergence form is expressed as np.log(eigval) + 1/eigval.
I'm in the understanding that KL divergence = -sum[ f(x) log( g(x) / f(x) ]
maybe I"m just not understanding how f(x) and g(x) is expressed in the form of eigval.. hope to have some enlightment.. thanks
Hello!
First, thanks a lot for your paper and your code.
As for the issue, I found an inconsistency in your implementation of NAS-Bench-101.
The number of convolution filters reported in the NAS-Bench-101 paper is 128.
However, as far as I can judge from your code, 16 filters are implemented instead:
parser.add_argument('--stem_out_channels', default=16, type=int, help='output channels of stem convolution (nasbench101)')
This change made a big difference for my trainless metric.
While I suppose this modification originates from the memory limitation problem,
I believe it should be explicitly mentioned both in the code and in your paper.
Cheers,
Ekaterina
Hello,
I'm posting this question here because I don't know anywhere else to ask it but I can take it down if it's not the proper medium to ask it.
Looking at the paper, it seemed to me that the relatively large variation in score due to different initialisation could mean that if we average the scoring given by n different initialisation we could have a much more consistent scoring.
However, after implementing a quick version of that idea, the result showed no significative improvement.
Can someone explain to me why I was wrong ?
Thanks in advance
First, thanks for your nice work!
But I have a question that when you calculate the Jacob and you multiply the output with an all-one matrix to get a scalar.
For example, when the batch_size=1, the input shape is 32x32x1 (flatten to 1024), and the output shape is 10. By definition, we should solve the gradient of each output with respect to the input, so we get a Jacob matrix of 10x1024. However, when you multiply the output with an all-one matrix to get a scalar, you can only get a Jacob matrix with shape 1x1024.
Hence, what does this scalar mean? And are these two matrices(10x1024 vs 1x1024) equivalent?
Best wishes and look forward to your reply.
Hi, thanks for your intersting work. I notice that determinant is used in this code rather than norm, is it better than the latter?
I would like to find commonalities between well performing architectures by parsing info
of InferCell
. I'm struggling to decipher the description of the cell structure, though.
example:
info :: nodes=4, inC=64, outC=64,
[1<-(I0-L0) | 2<-(I0-L1,I1-L2) | 3<-(I0-L3,I1-L4,I2-L5)],
|nor_conv_3x3~0|+|nor_conv_3x3~0|none~1|+|nor_conv_1x1~0|skip_connect~1|skip_connect~2|
Could you please break down how to interpret this? E.g. how to figure out which layers the skip connection links? Or point me to a documentation of TinyNetwork
?
@jack-willturner, So I was trying this methodology with standard classifiers. I am not getting anything useful and I am getting scores as the multiple of batch size.
Models chosen -
"alexnet", "resnet18", "resnet34", "resnet50", "vgg11", "vgg13", "densenet121", "squeezenet1_0", "squeezenet1_1"
scores of 3 runs with Batch Size 512
[-515.8197304827107, -518.2625130150999, -516.5254794393328, -516.4521991392903, -513.0445545389762, -513.0351409136783, -531.7337148363374, -526.7972858922, -523.973769767945]
[-515.9147741416524, -517.8983410664996, -515.5328530338215, -515.9739907753669, -513.055884116389, -513.0808931741735, -533.8211941483576, -518.7453811789933, -518.6898948738232]
[-515.8902849753072, -518.636564681304, -516.60644909449, -515.057577734371, -513.0407785346342, -513.0632672108508, -527.9074191760042, -518.9160167714085, -521.2741464876093]
Scores of 3 runs with Batch Size 128 -
[-128.26608670187613, -128.9030387211241, -128.50543259689053, -128.45200275464504, -128.0648502228396
5, -128.06361606092634, -131.5393815597839, -130.2799460305505, -129.34274766217487]
[-128.27500716523485, -128.8883337356769, -128.3406425853629, -128.5692902076405, -128.06446535511876,
-128.06785661838, -132.06972535717804, -128.77940184757026, -128.60743606369488]
[-128.28348337334037, -129.04774963648924, -128.53128977454088, -128.25740881809102, -128.066172268912
1, -128.06880782087086, -130.62699720044384, -128.84707498551833, -128.83386057804955]
Scores of 3 runs with Batch Size 16 -
[-16.00536177981646, -16.023735408663864, -16.00541814906714, -16.01275948633266, -16.001070293337335, -16.000959487618033, -16.088079533081366, -16.00829194903181, -16.008265848684317]
[-16.005564368012045, -16.037812752585545, -16.01270083749682, -16.007351551415354, -16.000882548413536, -16.000910242701053, -16.246709654891255, -16.06660858188009, -16.014516625366127]
[-16.005648553602278, -16.020132359932354, -16.011393278637634, -16.006370556147218, -16.001099750108267, -16.000646158922933, -16.20028038506129, -16.014573365687735, -16.004068435407078]
Any Idea about this ?
Hi, I am confused that what's happened if the input size becomes larger, e.g. 224. Under this condition, the dimension of Jacob matrix becomes 3x224x224 (after flatten)
Is your work failed when the Jacob matrix dimension becomes very large?
Hi, I observe that you have modified the method in ICLR 2021, and the main difference includes two folds:
I also try to reimplement the results based on your repo. However, I failed. I mainly modify two partial codes as follows:
class RandomErasing:
def __init__(self, batch_size=256):
self.batch_size = batch_size
self.random_erasing = torchvision.transforms.RandomErasing(p=0.9, scale=(0.02,0.04))
def __call__(self, image):
images = []
for i in range(self.batch_size):
image_erase = self.random_erasing(image)
images.append(image_erase)
images_out = torch.stack(images)
return images_out
def get_datasets(name, root, cutout, random_erasing=True):
if name == 'cifar10':
mean = [x / 255 for x in [125.3, 123.0, 113.9]]
std = [x / 255 for x in [63.0, 62.1, 66.7]]
elif name == 'cifar100':
mean = [x / 255 for x in [129.3, 124.1, 112.4]]
std = [x / 255 for x in [68.2, 65.4, 70.4]]
elif name.startswith('imagenet-1k'):
mean, std = [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
elif name.startswith('ImageNet16'):
mean = [x / 255 for x in [122.68, 116.66, 104.01]]
std = [x / 255 for x in [63.22, 61.26 , 65.09]]
else:
raise TypeError("Unknow dataset : {:}".format(name))
# Data Argumentation
if name == 'cifar10' or name == 'cifar100':
lists = [transforms.RandomHorizontalFlip(), transforms.RandomCrop(32, padding=4), transforms.ToTensor(), transforms.Normalize(mean, std)]
if cutout > 0 : lists += [CUTOUT(cutout)]
if random_erasing: lists += [RandomErasing()]
train_transform = transforms.Compose(lists)
test_transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize(mean, std)])
xshape = (1, 3, 32, 32)
elif name.startswith('ImageNet16'):
lists = [transforms.RandomHorizontalFlip(), transforms.RandomCrop(16, padding=2), transforms.ToTensor(), transforms.Normalize(mean, std)]
if cutout > 0 : lists += [CUTOUT(cutout)]
if random_erasing: lists += [RandomErasing()]
train_transform = transforms.Compose(lists)
test_transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize(mean, std)])
xshape = (1, 3, 16, 16)
elif name == 'tiered':
lists = [transforms.RandomHorizontalFlip(), transforms.RandomCrop(80, padding=4), transforms.ToTensor(), transforms.Normalize(mean, std)]
if cutout > 0 : lists += [CUTOUT(cutout)]
if random_erasing: lists += [RandomErasing()]
train_transform = transforms.Compose(lists)
test_transform = transforms.Compose([transforms.CenterCrop(80), transforms.ToTensor(), transforms.Normalize(mean, std)])
xshape = (1, 3, 32, 32)
elif name.startswith('imagenet-1k'):
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
if name == 'imagenet-1k':
xlists = [transforms.RandomResizedCrop(224)]
xlists.append(
transforms.ColorJitter(
brightness=0.4,
contrast=0.4,
saturation=0.4,
hue=0.2))
xlists.append( Lighting(0.1))
elif name == 'imagenet-1k-s':
xlists = [transforms.RandomResizedCrop(224, scale=(0.2, 1.0))]
else: raise ValueError('invalid name : {:}'.format(name))
xlists.append( transforms.RandomHorizontalFlip(p=0.5) )
xlists.append( transforms.ToTensor() )
xlists.append( normalize )
train_transform = transforms.Compose(xlists)
test_transform = transforms.Compose([transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), normalize])
xshape = (1, 3, 224, 224)
else:
raise TypeError("Unknow dataset : {:}".format(name))
if name == 'cifar10':
train_data = dset.CIFAR10 (root, train=True , transform=train_transform, download=True)
test_data = dset.CIFAR10 (root, train=False, transform=test_transform , download=True)
assert len(train_data) == 50000 and len(test_data) == 10000
elif name == 'cifar100':
train_data = dset.CIFAR100(root, train=True , transform=train_transform, download=True)
test_data = dset.CIFAR100(root, train=False, transform=test_transform , download=True)
assert len(train_data) == 50000 and len(test_data) == 10000
elif name.startswith('imagenet-1k'):
train_data = dset.ImageFolder(osp.join(root, 'train'), train_transform)
test_data = dset.ImageFolder(osp.join(root, 'val'), test_transform)
assert len(train_data) == 1281167 and len(test_data) == 50000, 'invalid number of images : {:} & {:} vs {:} & {:}'.format(len(train_data), len(test_data), 1281167, 50000)
elif name == 'ImageNet16':
train_data = ImageNet16(root, True , train_transform)
test_data = ImageNet16(root, False, test_transform)
assert len(train_data) == 1281167 and len(test_data) == 50000
elif name == 'ImageNet16-120':
train_data = ImageNet16(root, True , train_transform, 120)
test_data = ImageNet16(root, False, test_transform , 120)
assert len(train_data) == 151700 and len(test_data) == 6000
elif name == 'ImageNet16-150':
train_data = ImageNet16(root, True , train_transform, 150)
test_data = ImageNet16(root, False, test_transform , 150)
assert len(train_data) == 190272 and len(test_data) == 7500
elif name == 'ImageNet16-200':
train_data = ImageNet16(root, True , train_transform, 200)
test_data = ImageNet16(root, False, test_transform , 200)
assert len(train_data) == 254775 and len(test_data) == 10000
else: raise TypeError("Unknow dataset : {:}".format(name))
class_num = Dataset2Class[name]
return train_data, test_data, xshape, class_num
def eval_score(jacob, labels=None):
corrs = np.corrcoef(jacob)
corrs = np.where((corrs>0.0)&(corrs<0.25), 1.0, 0.0)
score = np.sum(corrs)
return score
Other codes keep the same as the previous codes. But I obtain 85.75% cifar-10 test accuracy with N=10, runs=500, seed=1. Could you please give me some tips to reimplement the results in your latest paper?
Hi @jack-willturner , I came across the code in search.py and noticed that the sample architecture indices are generated from np.random.randint, which will result in duplicated numbers.
It should not be a big issue when the sampling size is relatively small. However, I think it is better to replace it with np.random.choice to avoid evaluating the same network twice.
Thanks,
Bill
I provide my solution here:
In nasspace.py insert:
if len(config['bot_muls']) > 0:
if config['bot_muls'][0] == 0:
config['bot_muls'][0] = 1
after line280
Thanks for the interesting work.
Did you happen to experiment on the full Imagenet? Are results consistent for that dataset as well?
Can anyone tell me how to download the ImageNet16-120 dataset? Or is it created from the original Imagenet dataset? If it is created from the original ImageNet dataset, can anyone tell me if there is a script that I can use to create the ImageNet16-120 dataset.
Thank you
Is the method in the paper suitable for the mobilenetv2-based search space?
I needed 'cifar-split.txt' to test 'sh scorehook.sh'.
In this project, I couldn't find the file.
After googling, I got 'cifar-split.txt' from PRDARTS project(https://github.com/salesforce/PR-DARTS/tree/main/PRDARTS_search/configs)
for someone who need help with this.
Did anyone else get this message today? I'm not entirely 100% sure what has changed.
Does this mean all reqHistoricalData, data feed subscriptions and order entry commands were previously in lots of 100. And now they must all be either given or will be returned in actual shares. So previously shares might have been shown as 100, and now it will be shown as 10,000?
"Effective in TWS version 985 and later, for US stocks the bid, ask, and last size quotes are shown in shares (not in lots). API users have the option to configure the TWS API to work in compatibility mode for older programs, but we recommend migrating to "quotes in shares" at your earliest convenience. To use compatibility mode, from the Global Configuration menu select API followed by the Settings page. Once there, check "Bypass US Stocks market data in shares warning for API orders.""
According to the paper:
KH in these plots is normalised so that the diagonal entries are 1.
But there is no code implying that KH has been normalized? Do I miss anything? Looking forwards to your reply.
I setup conda environement as per the instruction but it seems pytorch is not being installed with GPU.
torch.cuda.is_available() returning False.
In counting_forward_hook
function,
it sums up K and K2, but in the paper, we should calculate Na-dist(ci, cj)
Any relationship between K+K2 with Na-dist(ci, cj)
?
Thank you very much
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.