kakaobrain / coyo-vit Goto Github PK

29.0 7.0 1.0 71 KB

ViT trained on COYO-Labeled-300M dataset

License: Apache License 2.0

Python 100.00%

coyo-vit's Introduction

VisionTransformer

This repository attempted to reproduce the ViT from the COYO-Labeled-300M dataset.

The model was pre-trained on the labeled COYO-Labeled-300M dataset, which is the largest number of published classification ViT.

We provide the code for pretraining and finetuning in Tensorflow2.

We will also work with HuggingFace to provide the weights file and make it usable in pytorch and jax through the HuggingFace platform as well.

Training

We have trained and evaluated using tpu-v3 with bfloat16.
The pretraining weight we provide is last_checkpoint trained with COYO-Labeled-300M.
The finetuing weights we provide are best_checkpoint trained with Imagenet.
We used the hyperparameter search below to explore the best_weight files in finetuing.
```
learning_rate:  [0.06, 0.03, 0.01]
steps:          [20_000, 40_000]
```
We provide a weight file trained in bfloat16 and we have confirmed that there is a performance change when evaluating with float32. (But imagenet-real was evaluated in float32)

The code in this repository can be reproduced on gpu as well as tpu.

# configs/trainer.yaml
---
runtime:
  strategy: 'tpu' # one of ['cpu', 'tpu', 'gpu', 'gpu_multinode', 'gpu_multinode_async']
  use_mixed_precision: true
  tpu:
    version: 2.8.0
    name: ???
    zone: 'europe-west4-a'
    type: 'v3-32'
---
change to
---
runtime:
  strategy: 'gpu'
  use_mixed_precision: true
---

To train, you need to set the path to your dataset in here.

# configs/dataset/coyo300m.yaml
train:
  cache: false
  supervised_key: 'labels'
  builder:
    - tfds_name: null
      tfds_data_dir: {your dir}
      tfds_split: 'train'

validation:
  cache: false
  supervised_key: 'labels'
  builder:
    - tfds_name: null
      tfds_data_dir: {your dir}
      tfds_split: 'validation[:50000]' # We performed validation as part of the Imagenet21k dataset. Or you can use subset of COYO-Labeled-300M

Results

Model	Upstream Dataset	Resolution	ImageNet (downstream)	ImageNet-ReaL (downstream)	Public
ViT-L/16	JFT-300M	512	87.76	90.54	X
ViT-L/16	COYO-Labeled-300M	512	87.24 (-0.52)	90.03 (-0.51)	O
ViT-L/16	JFT-300M	384	87.12	89.99	X
ViT-L/16	COYO-Labeled-300M	384	86.72 (-0.40)	89.84 (-0.15)	O

Checkpoints

Model	Upstream Dataset	Downstream Dataset	Resolution	link
ViT-L/16	COYO-Labeled-300M	-	224	link
ViT-L/16	COYO-Labeled-300M	ImageNet	384	link
ViT-L/16	COYO-Labeled-300M	ImageNet	512	link

Requirements

We have tested our codes on the environment below
python==3.7.3 / tensorflow==2.8.0 / tensorflow-datasets==4.5.0
Please run the following command to install the necessary dependencies
```
pip install -r requirements.txt
```

Commands

We have used hydra to manage the configuration. For detailed usage, see here.

Pretraining

python3 -m trainer trainer=vit_l16_coyo300m \
runtime.tpu.name={your_tpu_name} \
runtime.tpu.type={your_tpu_type} \
experiment.debug=false experiment.save_dir={your_save_dir}

Finetuning

python3 -m trainer trainer=vit_l16_i1k_downstream \
  runtime.tpu.name={your_tpu_name} \
  runtime.tpu.type={your_tpu_type} \
  experiment.debug=false \
  experiment.save_dir={your_save_dir} \
  trainer.backbone.pretrained={your_pretrained_weight}

Also, you can experiment by changing the configuration as follows.

python3 -m trainer trainer=vit_l16_i1k_downstream \
  runtime.tpu.name={your_tpu_name} \
  runtime.tpu.type={your_tpu_type} \
  experiment.debug=false experiment.save_dir={your_save_dir} \
  trainer.backbone.pretrained={your_pretrained_weight} \
  trainer.epochs=16 \
  trainer.learning_rate.base_lr=3e-2

Evaluation

python3 -m trainer trainer=vit_l16_i1k_downstream \
  runtime.tpu.name={your_tpu_name} \
  runtime.tpu.type={your_tpu_type} \
  experiment.debug=false \
  experiment.save_dir={your_weight_path} \
  experiment.mode='eval'

Citation

@misc{kakaobrain2022coyo-vit,
  title         = {COYO-ViT},
  author        = {Lee, Sungjun and Park, Beomhee},
  year          = {2022},
  howpublished  = {\url{https://github.com/kakaobrain/coyo-vit}},
}

@misc{kakaobrain2022coyo-700m,
  title         = {COYO-700M: Image-Text Pair Dataset},
  author        = {Byeon, Minwoo and Park, Beomhee and Kim, Haecheon and Lee, Sungjun and Baek, Woonhyuk and Kim, Saehoon},
  year          = {2022},
  howpublished  = {\url{https://github.com/kakaobrain/coyo-dataset}},
}

@misc{dosovitskiy2020image,
    title   = {An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
    author  = {Alexey Dosovitskiy and Lucas Beyer and Alexander Kolesnikov and Dirk Weissenborn and Xiaohua Zhai and Thomas Unterthiner and Mostafa Dehghani and Matthias Minderer and Georg Heigold and Sylvain Gelly and Jakob Uszkoreit and Neil Houlsby},
    year    = {2020},
    eprint  = {2010.11929},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

People

Sungjun Lee (@justhungryman)
Beomhee Park (@beomheepark)

Contact

This is released as an open source in the hope that it will be helpful to many research institutes and startups for research purposes.

[email protected]

License

The source codes are licensed under Apache 2.0 License.

coyo-vit's People

Contributors

Stargazers

Watchers

Forkers

wbaek

coyo-vit's Issues

The tutorial code doesn't work

Thank you for sharing the pretrained model.

I tried running the code in the tutorial after adding the path of the ImageNet validation dataset and checkpoint of vit-l/16 (downloaded from the huggingface page).

I placed the downloaded checkpoint in ./outputs/checkpoint as you can see in trainer.yaml file, but I got an error message Failed to find any matching files for ./outputs/checkpoint (you can see this message at the bottom of the error message below). So, I think something went wrong with the checkpoint.

So, would you please help me with this issue.

Thank you in advance.

Here is the trainer.yaml I editted.

hydra:
  run:
    dir: ./outputs/checkpoint


defaults:
  - trainer: vit_b16_i1k

runtime:
  strategy: 'gpu' # one of ['cpu', 'tpu', 'gpu', 'gpu_multinode', 'gpu_multinode_async']
  use_mixed_precision: true

experiment:
  mode: eval  # 'train', 'train_eval', 'eval'
  debug: false
  save_dir: ${hydra:run.dir}
  comment: ???

Here is the bash code I tried.

python3 -m trainer trainer=vit_l16_i1k_downstream \
experiment.debug=false \
experiment.mode='eval'

And, here is the error message below.

~:$ source test.sh
/home/masaru-sasaki/.pyenv/versions/mambaforge-22.9.0-3/lib/python3.10/site-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning: 

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP). 

For more information see: https://github.com/tensorflow/addons/issues/2807 

  warnings.warn(
/home/masaru-sasaki/.pyenv/versions/mambaforge-22.9.0-3/lib/python3.10/site-packages/tensorflow_addons/utils/ensure_tf_install.py:53: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.13.0 and strictly below 2.16.0 (nightly versions are not supported). 
 The versions of TensorFlow you are currently using is 2.10.1 and is not supported. 
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version. 
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
  warnings.warn(
/home/masaru-sasaki/work_space/coyo-vit/trainer.py:323: UserWarning: 
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_path="configs", config_name="trainer")
/home/masaru-sasaki/.pyenv/versions/mambaforge-22.9.0-3/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'trainer': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information
  warnings.warn(msg, UserWarning)
/home/masaru-sasaki/.pyenv/versions/mambaforge-22.9.0-3/lib/python3.10/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
[2023-12-19 22:23:12,639][__main__][INFO] - Training with the following config:
trainer:
  dataset:
    train:
      cache: true
      supervised_key: label
      builder:
      - tfds_name: imagenet2012:5.0.0
        tfds_data_dir:
          your dir: null
        tfds_split: train
      dtype: bfloat16
      image_size: 384
      mixup_alpha: 0.0
      cutmix_alpha: 0.0
      preprocess:
      - type: InceptionCrop
        params:
          size: 384
      - type: random_hflip
      - type: normalize
        params:
          mean: 127.5
          std: 127.5
    validation:
      cache: true
      supervised_key: label
      builder:
      - tfds_name: imagenet2012:5.0.0
        tfds_data_dir: /mnt/disk202208/common-data/ImageNet/ILSVRC2012_img_val/
        tfds_split: validation
      dtype: bfloat16
      image_size: 384
      mixup_alpha: 0.0
      cutmix_alpha: 0.0
      preprocess:
      - type: resize
        params:
          size:
          - 384
          - 384
      - type: normalize
        params:
          mean: 127.5
          std: 127.5
  backbone:
    backbone_name: vit-l/16
    backbone_params:
      image_size: 384
      representation_size: 0
      attention_dropout_rate: 0.0
      dropout_rate: 0.0
      channels: 3
    dropout_rate: 0.0
    cls_kernel_init:
      type: zeros
    cls_bias_init:
      type: zeros
    pretrained: null
  loss:
    class_name: CategoricalCrossentropy
    config:
      from_logits: true
      label_smoothing: 0.0
    l2_weight_decay: 0.0
  learning_rate:
    schedule_name: vit/cosine
    init_lr: 0.0
    base_lr: 0.06
    end_learning_rate: 0
    warmup_steps: 500
  optimizer:
    class_name: SGD
    config:
      momentum: 0.9
      global_clipnorm: 1.0
    moving_average_decay: 0.0
  metrics:
    metrics_list:
    - class_name: TopKCategoricalAccuracy
      config:
        k: 1
        name: top1_acc
    - class_name: TopKCategoricalAccuracy
      config:
        k: 5
        name: top5_acc
    - class_name: CategoricalAccuracy
  global_batch_size: 512
  local_batch_size: null
  epochs: 8
runtime:
  strategy: gpu
  use_mixed_precision: true
experiment:
  mode: eval
  debug: false
  save_dir: ${hydra:run.dir}
  comment: ???

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')
[2023-12-19 22:23:15,087][tensorflow][INFO] - Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')
INFO:tensorflow:Mixed precision compatibility check (mixed_float16): OK
Your GPUs will likely run quickly with dtype policy mixed_float16 as they all have compute capability of at least 7.0
[2023-12-19 22:23:15,090][tensorflow][INFO] - Mixed precision compatibility check (mixed_float16): OK
Your GPUs will likely run quickly with dtype policy mixed_float16 as they all have compute capability of at least 7.0
[2023-12-19 22:23:15,091][__main__][INFO] - strategy: <tensorflow.python.distribute.mirrored_strategy.MirroredStrategy object at 0x7efe2a178310>
[2023-12-19 22:23:15,092][__main__][INFO] - num_workers: 4
[2023-12-19 22:23:15,092][__main__][INFO] - local_batch_size: 128, global_batch_size: 512
[2023-12-19 22:23:15,092][root][INFO] - evaluate checkpoint: ./outputs/checkpoint
[2023-12-19 22:23:15,093][__main__][INFO] - Build dataset (is_training=False)
[2023-12-19 22:23:15,093][__main__][INFO] -    [{'tfds_name': 'imagenet2012:5.0.0', 'tfds_data_dir': '/mnt/disk202208/common-data/ImageNet/ILSVRC2012_img_val/', 'tfds_split': 'validation'}]
[2023-12-19 22:23:15,093][root][INFO] - use TFDS: imagenet2012:5.0.0[validation]
[2023-12-19 22:23:15,636][absl][INFO] - Load pre-computed DatasetInfo (eg: splits, num examples,...) from GCS: imagenet2012/5.0.0
[2023-12-19 22:23:16,232][absl][INFO] - Load dataset info from /tmp/tmp8_aju2t8tfds
[2023-12-19 22:23:16,237][absl][INFO] - Field info.description from disk and from code do not match. Keeping the one from code.
[2023-12-19 22:23:16,238][absl][INFO] - Field info.release_notes from disk and from code do not match. Keeping the one from code.
[2023-12-19 22:23:16,238][absl][INFO] - Field info.citation from disk and from code do not match. Keeping the one from code.
[2023-12-19 22:23:16,238][absl][INFO] - Field info.splits from disk and from code do not match. Keeping the one from code.
[2023-12-19 22:23:16,238][absl][INFO] - Field info.supervised_keys from disk and from code do not match. Keeping the one from code.
[2023-12-19 22:23:16,238][absl][INFO] - Field info.module_name from disk and from code do not match. Keeping the one from code.
[2023-12-19 22:23:16,239][root][INFO] - stacking dataset imagenet2012:5.0.0[validation] -> updated info: {'num_examples': 50000, 'num_shards': 64, 'num_classes': 1000}
[2023-12-19 22:23:16,575][__main__][INFO] - Build backbone (name=vit-l/16)
Model: "vision_transformer"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 pos_drop (Dropout)          multiple                  0         
                                                                 
 embedding (Conv2D)          multiple                  787456    
                                                                 
 encoderblock_0 (Transformer  multiple                 12596224  
 Block)                                                          
                                                                 
 encoderblock_1 (Transformer  multiple                 12596224  
 Block)                                                          
                                                                 
 encoderblock_2 (Transformer  multiple                 12596224  
 Block)                                                          
                                                                 
 encoderblock_3 (Transformer  multiple                 12596224  
 Block)                                                          
                                                                 
 encoderblock_4 (Transformer  multiple                 12596224  
 Block)                                                          
                                                                 
 encoderblock_5 (Transformer  multiple                 12596224  
 Block)                                                          
                                                                 
 encoderblock_6 (Transformer  multiple                 12596224  
 Block)                                                          
                                                                 
 encoderblock_7 (Transformer  multiple                 12596224  
 Block)                                                          
                                                                 
 encoderblock_8 (Transformer  multiple                 12596224  
 Block)                                                          
                                                                 
 encoderblock_9 (Transformer  multiple                 12596224  
 Block)                                                          
                                                                 
 encoderblock_10 (Transforme  multiple                 12596224  
 rBlock)                                                         
                                                                 
 encoderblock_11 (Transforme  multiple                 12596224  
 rBlock)                                                         
                                                                 
 encoderblock_12 (Transforme  multiple                 12596224  
 rBlock)                                                         
                                                                 
 encoderblock_13 (Transforme  multiple                 12596224  
 rBlock)                                                         
                                                                 
 encoderblock_14 (Transforme  multiple                 12596224  
 rBlock)                                                         
                                                                 
 encoderblock_15 (Transforme  multiple                 12596224  
 rBlock)                                                         
                                                                 
 encoderblock_16 (Transforme  multiple                 12596224  
 rBlock)                                                         
                                                                 
 encoderblock_17 (Transforme  multiple                 12596224  
 rBlock)                                                         
                                                                 
 encoderblock_18 (Transforme  multiple                 12596224  
 rBlock)                                                         
                                                                 
 encoderblock_19 (Transforme  multiple                 12596224  
 rBlock)                                                         
                                                                 
 encoderblock_20 (Transforme  multiple                 12596224  
 rBlock)                                                         
                                                                 
 encoderblock_21 (Transforme  multiple                 12596224  
 rBlock)                                                         
                                                                 
 encoderblock_22 (Transforme  multiple                 12596224  
 rBlock)                                                         
                                                                 
 encoderblock_23 (Transforme  multiple                 12596224  
 rBlock)                                                         
                                                                 
 encoder_nrom (LayerNormaliz  multiple                 2048      
 ation)                                                          
                                                                 
 extract_token (Lambda)      multiple                  0         
                                                                 
 pre_logits (Identity)       multiple                  0         
                                                                 
=================================================================
Total params: 303,690,752
Trainable params: 303,690,752
Non-trainable params: 0
_________________________________________________________________
[2023-12-19 22:23:23,693][__main__][INFO] - Compile the model...
[2023-12-19 22:23:23,694][__main__][INFO] - optimizer: <class 'keras.optimizers.optimizer_v2.gradient_descent.SGD'>
[2023-12-19 22:23:23,694][__main__][INFO] -     name: SGD
[2023-12-19 22:23:23,694][__main__][INFO] -     global_clipnorm: 1.0
[2023-12-19 22:23:23,694][__main__][INFO] -     learning_rate: 0.01
[2023-12-19 22:23:23,694][__main__][INFO] -     decay: 0.0
[2023-12-19 22:23:23,694][__main__][INFO] -     momentum: 0.9
[2023-12-19 22:23:23,694][__main__][INFO] -     nesterov: False
[2023-12-19 22:23:23,694][__main__][INFO] - Build loss: <class 'keras.losses.CategoricalCrossentropy'>
[2023-12-19 22:23:23,694][__main__][INFO] - Build metrics...
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2023-12-19 22:23:23,700][tensorflow][INFO] - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2023-12-19 22:23:23,705][tensorflow][INFO] - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2023-12-19 22:23:23,709][tensorflow][INFO] - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2023-12-19 22:23:23,710][tensorflow][INFO] - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2023-12-19 22:23:23,715][tensorflow][INFO] - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2023-12-19 22:23:23,716][tensorflow][INFO] - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2023-12-19 22:23:23,720][tensorflow][INFO] - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2023-12-19 22:23:23,721][tensorflow][INFO] - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2023-12-19 22:23:23,725][tensorflow][INFO] - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2023-12-19 22:23:23,726][tensorflow][INFO] - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2023-12-19 22:23:23,736][__main__][INFO] - Build callbacks...
Error executing job with overrides: ['trainer=vit_l16_i1k_downstream', 'experiment.debug=false', 'experiment.mode=eval']
Traceback (most recent call last):
  File "/home/masaru-sasaki/.pyenv/versions/mambaforge-22.9.0-3/lib/python3.10/site-packages/tensorflow/python/training/py_checkpoint_reader.py", line 92, in NewCheckpointReader
    return CheckpointReader(compat.as_bytes(filepattern))
RuntimeError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ./outputs/checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/masaru-sasaki/.pyenv/versions/mambaforge-22.9.0-3/lib/python3.10/site-packages/tensorflow/python/checkpoint/checkpoint.py", line 2563, in restore
    status = self.read(save_path, options=options)
  File "/home/masaru-sasaki/.pyenv/versions/mambaforge-22.9.0-3/lib/python3.10/site-packages/tensorflow/python/checkpoint/checkpoint.py", line 2441, in read
    result = self._saver.restore(save_path=save_path, options=options)
  File "/home/masaru-sasaki/.pyenv/versions/mambaforge-22.9.0-3/lib/python3.10/site-packages/tensorflow/python/checkpoint/checkpoint.py", line 1448, in restore
    reader = py_checkpoint_reader.NewCheckpointReader(save_path)
  File "/home/masaru-sasaki/.pyenv/versions/mambaforge-22.9.0-3/lib/python3.10/site-packages/tensorflow/python/training/py_checkpoint_reader.py", line 96, in NewCheckpointReader
    error_translator(e)
  File "/home/masaru-sasaki/.pyenv/versions/mambaforge-22.9.0-3/lib/python3.10/site-packages/tensorflow/python/training/py_checkpoint_reader.py", line 31, in error_translator
    raise errors_impl.NotFoundError(None, None, error_message)
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ./outputs/checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/masaru-sasaki/work_space/coyo-vit/trainer.py", line 340, in train_main
    trainer.eval(config.experiment.save_dir)
  File "/home/masaru-sasaki/work_space/coyo-vit/trainer.py", line 311, in eval
    checkpoint.restore(ckpt)
  File "/home/masaru-sasaki/.pyenv/versions/mambaforge-22.9.0-3/lib/python3.10/site-packages/tensorflow/python/checkpoint/checkpoint.py", line 2567, in restore
    raise errors_impl.NotFoundError(
tensorflow.python.framework.errors_impl.NotFoundError: Error when restoring from checkpoint or SavedModel at ./outputs/checkpoint: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ./outputs/checkpoint
Please double-check that the path is correct. You may be missing the checkpoint suffix (e.g. the '-1' in 'path/to/ckpt-1').

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

OverflowError: Python int too large to convert to C long

When trying to train the model following the fine-tuning instruction, I got this error:
"{path_to_my_local_folder}\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_datasets\vision_language\wit\wit.py", line 25, in
csv.field_size_limit(sys.maxsize)
OverflowError: Python int too large to convert to C long

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.