/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2224: UserWarning: Failed to clone() tensor with name _fsdp_wrapped_module._fpw_module.model.layers.30.self_attn.q_proj.weight. This may mean that this state_dict entry could point to invalid memory regions after returning from state_dict() call if this parameter is managed by FSDP. Please check clone implementation of _fsdp_wrapped_module._fpw_module.model.layers.30.self_attn.q_proj.weight. Error: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 6; 39.42 GiB total capacity; 36.01 GiB already allocated; 16.31 MiB free; 37.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
warnings.warn(
/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2224: UserWarning: Failed to clone() tensor with name _fsdp_wrapped_module._fpw_module.model.layers.30.self_attn.q_proj.weight. This may mean that this state_dict entry could point to invalid memory regions after returning from state_dict() call if this parameter is managed by FSDP. Please check clone implementation of _fsdp_wrapped_module._fpw_module.model.layers.30.self_attn.q_proj.weight. Error: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 3; 39.42 GiB total capacity; 36.36 GiB already allocated; 22.31 MiB free; 38.06 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
warnings.warn(
/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2224: UserWarning: Failed to clone() tensor with name _fsdp_wrapped_module._fpw_module.model.layers.30.self_attn.q_proj.weight. This may mean that this state_dict entry could point to invalid memory regions after returning from state_dict() call if this parameter is managed by FSDP. Please check clone implementation of _fsdp_wrapped_module._fpw_module.model.layers.30.self_attn.q_proj.weight. Error: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 2; 39.42 GiB total capacity; 36.36 GiB already allocated; 22.31 MiB free; 38.06 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
warnings.warn(
/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2224: UserWarning: Failed to clone() tensor with name _fsdp_wrapped_module._fpw_module.model.layers.30.self_attn.q_proj.weight. This may mean that this state_dict entry could point to invalid memory regions after returning from state_dict() call if this parameter is managed by FSDP. Please check clone implementation of _fsdp_wrapped_module._fpw_module.model.layers.30.self_attn.q_proj.weight. Error: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 39.42 GiB total capacity; 36.36 GiB already allocated; 22.31 MiB free; 38.06 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
warnings.warn(
/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2224: UserWarning: Failed to clone() tensor with name _fsdp_wrapped_module._fpw_module.model.layers.30.self_attn.q_proj.weight. This may mean that this state_dict entry could point to invalid memory regions after returning from state_dict() call if this parameter is managed by FSDP. Please check clone implementation of _fsdp_wrapped_module._fpw_module.model.layers.30.self_attn.q_proj.weight. Error: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 4; 39.42 GiB total capacity; 36.30 GiB already allocated; 34.31 MiB free; 38.05 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
warnings.warn(
File /usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py:415, in load_state_dict(checkpoint_file)
414 try:
--> 415 return torch.load(checkpoint_file, map_location="cpu")
416 except Exception as e:
File /usr/local/lib/python3.8/dist-packages/torch/serialization.py:789, in load(f, map_location, pickle_module, weights_only, **pickle_load_args)
788 raise pickle.UnpicklingError(UNSAFE_MESSAGE + str(e)) from None
--> 789 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
790 if weights_only:
File /usr/local/lib/python3.8/dist-packages/torch/serialization.py:1131, in _load(zip_file, map_location, pickle_module, pickle_file, **pickle_load_args)
1130 unpickler.persistent_load = persistent_load
-> 1131 result = unpickler.load()
1133 torch._utils._validate_loaded_sparse_tensors()
File /usr/local/lib/python3.8/dist-packages/torch/_utils.py:153, in _rebuild_tensor_v2(storage, storage_offset, size, stride, requires_grad, backward_hooks)
150 def _rebuild_tensor_v2(
151 storage, storage_offset, size, stride, requires_grad, backward_hooks
152 ):
--> 153 tensor = _rebuild_tensor(storage, storage_offset, size, stride)
154 tensor.requires_grad = requires_grad
File /usr/local/lib/python3.8/dist-packages/torch/_utils.py:147, in _rebuild_tensor(storage, storage_offset, size, stride)
146 t = torch.tensor([], dtype=storage.dtype, device=storage.untyped().device)
--> 147 return t.set_(storage.untyped(), storage_offset, size, stride)
RuntimeError: Trying to resize storage that is not resizable
During handling of the above exception, another exception occurred:
UnicodeDecodeError Traceback (most recent call last)
File /usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py:419, in load_state_dict(checkpoint_file)
418 with open(checkpoint_file) as f:
--> 419 if f.read(7) == "version":
420 raise OSError(
421 "You seem to have cloned a repository without having git-lfs installed. Please install "
422 "git-lfs and run `git lfs install` followed by `git lfs pull` in the folder "
423 "you cloned."
424 )
File /usr/lib/python3.8/codecs.py:322, in BufferedIncrementalDecoder.decode(self, input, final)
321 data = self.buffer + input
--> 322 (result, consumed) = self._buffer_decode(data, self.errors, final)
323 # keep undecoded input until the next call
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
Cell In[1], line 32
28 model_config.pad_token_id = tokenizer.pad_token_id
29 #model = OpenLlamaForCausalLM(model_config).cuda()
---> 32 model= OpenLlamaForCausalLM.from_pretrained(model_name, from_tf= False).cuda()
File /usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py:2647, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
2637 if dtype_orig is not None:
2638 torch.set_default_dtype(dtype_orig)
2640 (
2641 model,
2642 missing_keys,
2643 unexpected_keys,
2644 mismatched_keys,
2645 offload_index,
2646 error_msgs,
-> 2647 ) = cls._load_pretrained_model(
2648 model,
2649 state_dict,
2650 loaded_state_dict_keys, # XXX: rename?
2651 resolved_archive_file,
2652 pretrained_model_name_or_path,
2653 ignore_mismatched_sizes=ignore_mismatched_sizes,
2654 sharded_metadata=sharded_metadata,
2655 _fast_init=_fast_init,
2656 low_cpu_mem_usage=low_cpu_mem_usage,
2657 device_map=device_map,
2658 offload_folder=offload_folder,
2659 offload_state_dict=offload_state_dict,
2660 dtype=torch_dtype,
2661 load_in_8bit=load_in_8bit,
2662 keep_in_fp32_modules=keep_in_fp32_modules,
2663 )
2665 model.is_loaded_in_8bit = load_in_8bit
2667 # make sure token embedding weights are still tied if needed
File /usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py:2956, in PreTrainedModel._load_pretrained_model(cls, model, state_dict, loaded_keys, resolved_archive_file, pretrained_model_name_or_path, ignore_mismatched_sizes, sharded_metadata, _fast_init, low_cpu_mem_usage, device_map, offload_folder, offload_state_dict, dtype, load_in_8bit, keep_in_fp32_modules)
2954 if shard_file in disk_only_shard_files:
2955 continue
-> 2956 state_dict = load_state_dict(shard_file)
2958 # Mistmatched keys contains tuples key/shape1/shape2 of weights in the checkpoint that have a shape not
2959 # matching the weights in the model.
2960 mismatched_keys += _find_mismatched_keys(
2961 state_dict,
2962 model_state_dict,
(...)
2966 ignore_mismatched_sizes,
2967 )
File /usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py:431, in load_state_dict(checkpoint_file)
426 raise ValueError(
427 f"Unable to locate the file {checkpoint_file} which is necessary to load this pretrained "
428 "model. Make sure you have saved the model properly."
429 ) from e
430 except (UnicodeDecodeError, ValueError):
--> 431 raise OSError(
432 f"Unable to load weights from pytorch checkpoint file for '{checkpoint_file}' "
433 f"at '{checkpoint_file}'. "
434 "If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True."
435 )
OSError: Unable to load weights from pytorch checkpoint file for '/data/gck/model_save_dir/hf_Open-Llama-V1_SFT_v1/checkpoint-200/pytorch_model-00001-of-00003.bin' at '/data/gck/model_save_dir/hf_Open-Llama-V1_SFT_v1/checkpoint-200/pytorch_model-00001-of-00003.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.