The msn from facebookresearch

module 'cyanure' has no attribute 'preprocess'

How to install the right cyanure version?

can help add vit-small-8 and vit-base-4 config file?

The loss has converged at early stage?

I used the default vits-16 config to train on Imagenet1k end to end. But I found that the loss has converged to 2.492 after one epoch. Is that normal?
And if so, how does the preformance improve since the loss seems not decrease any more in the next hundreds epoch?
And if not, is there anything I did wrong? the config I used is as follows

criterion: ent_weight: 0.0 final_sharpen: 0.25 me_max: true memax_weight: 1.0 num_proto: 1024 start_sharpen: 0.25 temperature: 0.1 batch_size: 32 use_ent: true use_sinkhorn: true data: color_jitter_strength: 0.5 pin_mem: true num_workers: 10 image_folder: /gruntdata6/xinshulin/data/imagenet/new_train/1 label_smoothing: 0.0 patch_drop: 0.15 rand_size: 224 focal_size: 96 rand_views: 1 focal_views: 10 root_path: /gruntdata6/xinshulin/data/imagenet/new_train logging: folder: checkpoint/msn_os_logs4/ write_tag: msn-experiment-1 meta: bottleneck: 1 copy_data: false drop_path_rate: 0.0 hidden_dim: 2048 load_checkpoint: false model_name: deit_small output_dim: 256 read_checkpoint: null use_bn: true use_fp16: false use_pred_head: false optimization: clip_grad: 3.0 epochs: 800 final_lr: 1.0e-06 final_weight_decay: 0.4 lr: 0.001 start_lr: 0.0002 warmup: 15 weight_decay: 0.04

License

Hi, amazing work. Any thoughts on releasing the code under a more permissive license?

pretrained weights are probably incorrect

The pretrained weights seem to be wrong.

For example, the vit_base has a dimension of 1024.

Could you upload the correct version? Thanks

why did not take block-wise mask strategy?

I am confused that you take a random mask or a focal mask, but why not try a block-wise one which has been proven effective.

vit-b-16 config

Hi,

Thanks for contributing such awesome work! Could you please release the vit-b-16 config file?

Thanks,
Ziyu Jiang

Could you please give the version of cyanure you used？

Hello，the interfaces of the current main branch of cyanure are different from what you used (e.g. there is no multiclassifier function).

Segmentation Performance

I greatly enjoyed reading your paper but I'm curious about the segmentation performance of such models. Do MSNs share the same segmentation properties as DINO?

Lambda for logistic regression evaluation

Hi, very interesting work! How was the lambda regularization parameter for logistic regression set for the results obtained in tables 1 and 2 in the paper? I get subpar performance values in some cases when I set it fixed to 0.075 or 0.0025 for all models (as mentioned in the repo and in another issue). Did you set it to a fixed value for all the models and subsets? If so, can you share this lambda value? Or did you do a sweep over a set of values and choose the best values? If so, can you share the set of lambdas that you did the search over? Thanks in advance!

Performance when small batch size

Hi, thank you for providing a awesome self-supervised learning research!

I'm wonder that how the performance will decrease when we use small batch size like between 128 ~ 512 for DEIT base model.

If we cannot use large batch size (ex: 1024) on base model, is it better to use smaller model with large batch size?

Thanks in advance!!

linear_eval.py #L327

Hi, fantastic work！Thank you for your code!
There seems to be a little omission on line 327 of 'linear_eval.py'.
linear_eval.py #L327
return encoder, opt, sched, epoch, best_acc
should be
return encoder, linear_classifier, opt, sched, epoch, best_acc

Duplicate GPU detected error

when running the main.py script on a local machine with 4 GPUS, I get the following error:
Duplicate GPU detected : rank 0 and rank 1 both on CUDA device 40

nvidia-smi shows 4 different GPUs, and the issue persists after reboot.

About the 1% In1k semi-sup evaluation

Hello, thanks for your sharing,
I was littile confused about your 1% In1k semi-sup evaluation. You said in paper that the results come from logistic regression on the extracted representations. However, with the same ViT, I found this evaluation of iBoT come from end2end full fintuning(see here), and SwAV et. all fintuned the entire res50 encoder.

What is the use of "AllReduce"?

Hello. Thank you for your great work!

I have some questions about the "AllReduce" class defined here.

msn/src/utils.py

Lines 226 to 241 in 4388dc1

class AllReduce(torch.autograd.Function):

@staticmethod

def forward(ctx, x):

if (

dist.is_available()

and dist.is_initialized()

and (dist.get_world_size() > 1)

):

x = x.contiguous() / dist.get_world_size()

dist.all_reduce(x)

return x

@staticmethod

def backward(ctx, grads):

return grads

And it is used in gathering probs when computing me-max regularization.

msn/src/losses.py

Lines 70 to 72 in 4388dc1

if me_max:

avg_probs = AllReduce.apply(torch.mean(probs, dim=0))

rloss = - torch.sum(torch.log(avg_probs**(-avg_probs))) + math.log(float(len(avg_probs)))

I wonder why not use "dist.all_reduce(x)" directly. It seems that using "AllReduce" multiply the gradient by "world_size" times.
I want to know whether i am correct and why this makes sense.

Thx!

No such file

No such file or directory: '/checkpoint/msn_os_logs/params-msn-train.yaml'
No such file or directory: '/checkpoint/msn_os_logs/msn-experiment-1_r0.csv'

How to change MSN loss to PMSN loss? (from paper "The Hidden Uniform Cluster Prior in Self-Supervised Learning")

I couldn't find an official code release for this paper arxiv.org/abs/2210.07277, in which an extension is proposed to MSN to allow arbitrary feature priors.

It looks like the main difference is a change of a single term in the loss function. Is that correct? How would I implement the changes mentioned in the PMSN paper?

Previous MSN loss:

New loss: Prior Matching for Siamese Networks, PMSN:

Include full checkpoint

Hey thanks so much for making this code public. I'm trying to use this in a project and would love to be able to continue the training process. Could you please upload a checkpoint with all of the optimizer state? Thanks so much for your help!

Using just the encoder

Hey, after few attempts, I would like to get some help :)

How can I get just the features of MSN on my dataset? (Dataset of images)
How can I take the pretrained model and train it on a new dataset?

Thank you!

Why we need multiply 1.25 for num_epochs?

Hi! thank you guys to share such interesting research first.

I have some question when I read this part of source code.

    # -- momentum schedule
    _start_m, _final_m = 0.996, 1.0
    _increment = (_final_m - _start_m) / (ipe * num_epochs * 1.25)
    momentum_scheduler = (_start_m + (_increment*i) for i in range(int(ipe*num_epochs*1.25)+1))

    # -- sharpening schedule
    _increment_T = (_final_T - _start_T) / (ipe * num_epochs * 1.25)
    sharpen_scheduler = (_start_T + (_increment_T*i) for i in range(int(ipe*num_epochs*1.25)+1))

Why we need multiply 1.25 for the num_epochs? In the paper, it write "with a momentum value of 0.996, and linearly increase this value to 1.0 by the end of training", but in this situation it can only increase to 0.9992.

The detail setting for 1% evaluation

I only get 66.9% accuracy for the released ViT-S checkpoint, which is lower than the reported 67.2%. I use the provided default seeting for logistic regression.

cyan.preprocess(embs, normalize=normalize, columns=False, centering=True)
classifier = cyan.MultiClassifier(loss='multiclass-logistic', penalty=penalty, fit_intercept=False)
classifier.fit(embs, labs, it0=10, lambd=lambd, lambd2=lambd, nthreads=-1, tol=1e-3, solver='auto', seed=0, max_epochs=300)

Besides, I set --blocks=1, --lambd=0.0025, --penalty=l2,--normalize=True.
Is there something wrong?

add web demo/model to Huggingface

Hi, would you be interested in adding msn to Hugging Face? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community. Models/datasets/spaces(web demos) can be added to a user account or organization similar to github.

Example from other organizations:
Keras: https://huggingface.co/keras-io
Microsoft: https://huggingface.co/microsoft
Facebook: https://huggingface.co/facebook

Example spaces with repos:
github: https://github.com/salesforce/BLIP
Spaces: https://huggingface.co/spaces/salesforce/BLIP

github: https://github.com/facebookresearch/omnivore
Spaces: https://huggingface.co/spaces/akhaliq/omnivore

and here are guides for adding spaces/models/datasets to your org

How to add a Space: https://huggingface.co/blog/gradio-spaces
how to add models: https://huggingface.co/docs/hub/adding-a-model
uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html

Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.

Custom dataset Linear Eval

@MidoAssran and team great work !!!

I will like to use this for my custom dataset to build a linear classification model. I started the model on my custom dataset by modifying the config/pretrain/msn_vits16.yaml

criterion:
 ent_weight: 0.0
 final_sharpen: 0.25
 me_max: true
 memax_weight: 1.0
 num_proto: 1024
 start_sharpen: 0.25
 temperature: 0.1
 batch_size: 64
 use_ent: true
 use_sinkhorn: true
data:
 color_jitter_strength: 0.5
 pin_mem: true
 num_workers: 10
 image_folder: custom_db/
 label_smoothing: 0.0
 patch_drop: 0.15
 rand_size: 224
 focal_size: 96
 rand_views: 1
 focal_views: 10
 root_path: dataset/
logging:
 folder: saved_models/msn_os_logs/
 write_tag: msn-experiment-1
meta:
 bottleneck: 1
copy_data: false
 drop_path_rate: 0.0
 hidden_dim: 2048
 load_checkpoint: false
 model_name: deit_small
 output_dim: 256
 read_checkpoint: null
 use_bn: true
 use_fp16: false
 use_pred_head: false
optimization:
 clip_grad: 3.0
 epochs: 350
 final_lr: 1.0e-06
 final_weight_decay: 0.4
 lr: 0.001
 start_lr: 0.0002
 warmup: 15
 weight_decay: 0.04

and start the model pre-training using this cmd
python main.py --fname configs/pretrain/msn_vits16.yaml --devices cuda:0

was able to start the model training as seen in the below screenshots

2) Downstream Task of Linear Classifier

confused on how to start the downstream task of image classification
Question :

How to start the model training using a linear classifier on a custom dataset without distributed training?
How to structure the dataset : dataset> custom_db > cls_id > images will this pattern work ?

Any help will be appreciated

can you release the time complexity of training ViT B/4 and ViT L/7

thank you very much

	class AllReduce(torch.autograd.Function):

	@staticmethod
	def forward(ctx, x):
	if (
	dist.is_available()
	and dist.is_initialized()
	and (dist.get_world_size() > 1)
	):
	x = x.contiguous() / dist.get_world_size()
	dist.all_reduce(x)
	return x

	@staticmethod
	def backward(ctx, grads):
	return grads

	if me_max:
	avg_probs = AllReduce.apply(torch.mean(probs, dim=0))
	rloss = - torch.sum(torch.log(avg_probs**(-avg_probs))) + math.log(float(len(avg_probs)))

facebookresearch / msn Goto Github PK

msn's People

Contributors

Stargazers

Watchers

Forkers

msn's Issues

Previous MSN loss:

New loss: Prior Matching for Siamese Networks, PMSN:

2) Downstream Task of Linear Classifier

Recommend Projects

Recommend Topics

Recommend Org