Comments (5)
Hi, thanks for checking out our code!
What you describe is most likely triggered by another error that occurs during the initialization of the script. Please check the full stack trace and if that doesn't help, post it here.
from net2net.
Thanks for your reply
These are the error which I got
Is this a cuda library related error?
Now, I'm using GeForce RTX 2080, CUDA 10.2 Version
2021-01-08 16:07:22.553627: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-01-08 16:07:22.553653: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Note: Conditioning network uses batch-normalization. Make sure to train with a sufficiently large batch size
Missing keys in state-dict: ['encoder.resnet.1.num_batches_tracked', 'encoder.resnet.4.0.bn1.num_batches_tracked', 'encoder.resnet.4.0.bn2.num_batches_tracked', 'encoder.resnet.4.0.bn3.num_batches_tracked', 'encoder.resnet.4.0.downsample.1.num_batches_tracked', 'encoder.resnet.4.1.bn1.num_batches_tracked', 'encoder.resnet.4.1.bn2.num_batches_tracked', 'encoder.resnet.4.1.bn3.num_batches_tracked', 'encoder.resnet.4.2.bn1.num_batches_tracked', 'encoder.resnet.4.2.bn2.num_batches_tracked', 'encoder.resnet.4.2.bn3.num_batches_tracked', 'encoder.resnet.5.0.bn1.num_batches_tracked', 'encoder.resnet.5.0.bn2.num_batches_tracked', 'encoder.resnet.5.0.bn3.num_batches_tracked', 'encoder.resnet.5.0.downsample.1.num_batches_tracked', 'encoder.resnet.5.1.bn1.num_batches_tracked', 'encoder.resnet.5.1.bn2.num_batches_tracked', 'encoder.resnet.5.1.bn3.num_batches_tracked', 'encoder.resnet.5.2.bn1.num_batches_tracked', 'encoder.resnet.5.2.bn2.num_batches_tracked', 'encoder.resnet.5.2.bn3.num_batches_tracked', 'encoder.resnet.5.3.bn1.num_batches_tracked', 'encoder.resnet.5.3.bn2.num_batches_tracked', 'encoder.resnet.5.3.bn3.num_batches_tracked', 'encoder.resnet.6.0.bn1.num_batches_tracked', 'encoder.resnet.6.0.bn2.num_batches_tracked', 'encoder.resnet.6.0.bn3.num_batches_tracked', 'encoder.resnet.6.0.downsample.1.num_batches_tracked', 'encoder.resnet.6.1.bn1.num_batches_tracked', 'encoder.resnet.6.1.bn2.num_batches_tracked', 'encoder.resnet.6.1.bn3.num_batches_tracked', 'encoder.resnet.6.2.bn1.num_batches_tracked', 'encoder.resnet.6.2.bn2.num_batches_tracked', 'encoder.resnet.6.2.bn3.num_batches_tracked', 'encoder.resnet.6.3.bn1.num_batches_tracked', 'encoder.resnet.6.3.bn2.num_batches_tracked', 'encoder.resnet.6.3.bn3.num_batches_tracked', 'encoder.resnet.6.4.bn1.num_batches_tracked', 'encoder.resnet.6.4.bn2.num_batches_tracked', 'encoder.resnet.6.4.bn3.num_batches_tracked', 'encoder.resnet.6.5.bn1.num_batches_tracked', 'encoder.resnet.6.5.bn2.num_batches_tracked', 'encoder.resnet.6.5.bn3.num_batches_tracked', 'encoder.resnet.6.6.bn1.num_batches_tracked', 'encoder.resnet.6.6.bn2.num_batches_tracked', 'encoder.resnet.6.6.bn3.num_batches_tracked', 'encoder.resnet.6.7.bn1.num_batches_tracked', 'encoder.resnet.6.7.bn2.num_batches_tracked', 'encoder.resnet.6.7.bn3.num_batches_tracked', 'encoder.resnet.6.8.bn1.num_batches_tracked', 'encoder.resnet.6.8.bn2.num_batches_tracked', 'encoder.resnet.6.8.bn3.num_batches_tracked', 'encoder.resnet.6.9.bn1.num_batches_tracked', 'encoder.resnet.6.9.bn2.num_batches_tracked', 'encoder.resnet.6.9.bn3.num_batches_tracked', 'encoder.resnet.6.10.bn1.num_batches_tracked', 'encoder.resnet.6.10.bn2.num_batches_tracked', 'encoder.resnet.6.10.bn3.num_batches_tracked', 'encoder.resnet.6.11.bn1.num_batches_tracked', 'encoder.resnet.6.11.bn2.num_batches_tracked', 'encoder.resnet.6.11.bn3.num_batches_tracked', 'encoder.resnet.6.12.bn1.num_batches_tracked', 'encoder.resnet.6.12.bn2.num_batches_tracked', 'encoder.resnet.6.12.bn3.num_batches_tracked', 'encoder.resnet.6.13.bn1.num_batches_tracked', 'encoder.resnet.6.13.bn2.num_batches_tracked', 'encoder.resnet.6.13.bn3.num_batches_tracked', 'encoder.resnet.6.14.bn1.num_batches_tracked', 'encoder.resnet.6.14.bn2.num_batches_tracked', 'encoder.resnet.6.14.bn3.num_batches_tracked', 'encoder.resnet.6.15.bn1.num_batches_tracked', 'encoder.resnet.6.15.bn2.num_batches_tracked', 'encoder.resnet.6.15.bn3.num_batches_tracked', 'encoder.resnet.6.16.bn1.num_batches_tracked', 'encoder.resnet.6.16.bn2.num_batches_tracked', 'encoder.resnet.6.16.bn3.num_batches_tracked', 'encoder.resnet.6.17.bn1.num_batches_tracked', 'encoder.resnet.6.17.bn2.num_batches_tracked', 'encoder.resnet.6.17.bn3.num_batches_tracked', 'encoder.resnet.6.18.bn1.num_batches_tracked', 'encoder.resnet.6.18.bn2.num_batches_tracked', 'encoder.resnet.6.18.bn3.num_batches_tracked', 'encoder.resnet.6.19.bn1.num_batches_tracked', 'encoder.resnet.6.19.bn2.num_batches_tracked', 'encoder.resnet.6.19.bn3.num_batches_tracked', 'encoder.resnet.6.20.bn1.num_batches_tracked', 'encoder.resnet.6.20.bn2.num_batches_tracked', 'encoder.resnet.6.20.bn3.num_batches_tracked', 'encoder.resnet.6.21.bn1.num_batches_tracked', 'encoder.resnet.6.21.bn2.num_batches_tracked', 'encoder.resnet.6.21.bn3.num_batches_tracked', 'encoder.resnet.6.22.bn1.num_batches_tracked', 'encoder.resnet.6.22.bn2.num_batches_tracked', 'encoder.resnet.6.22.bn3.num_batches_tracked', 'encoder.resnet.7.0.bn1.num_batches_tracked', 'encoder.resnet.7.0.bn2.num_batches_tracked', 'encoder.resnet.7.0.bn3.num_batches_tracked', 'encoder.resnet.7.0.downsample.1.num_batches_tracked', 'encoder.resnet.7.1.bn1.num_batches_tracked', 'encoder.resnet.7.1.bn2.num_batches_tracked', 'encoder.resnet.7.1.bn3.num_batches_tracked', 'encoder.resnet.7.2.bn1.num_batches_tracked', 'encoder.resnet.7.2.bn2.num_batches_tracked', 'encoder.resnet.7.2.bn3.num_batches_tracked']
Traceback (most recent call last):
File "translation.py", line 455, in <module>
trainer_kwargs["checkpoint_callback"] = instantiate_from_config(modelckpt_cfg)
File "translation.py", line 110, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()))
File "/home/hongiee/anaconda3/envs/Net2Net/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 190, in __init__
self.__validate_init_configuration()
File "/home/hongiee/anaconda3/envs/Net2Net/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 261, in __validate_init_configuration
raise MisconfigurationException(
pytorch_lightning.utilities.exceptions.MisconfigurationException: ModelCheckpoint(save_top_k=3, monitor=None) is not a valid configuration. No quantity for top_k to track.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "translation.py", line 531, in <module>
melk()
NameError: name 'melk' is not defined
from net2net.
Which version of pytorch-lightning are you using? This code still uses pl=0.9
and is not compatible with lightning-versions >= 1.0.
Additionally, you can try to set the save_top_key
= 0, see
Line 443 in 5d2fe33
from net2net.
Thanks,
I solve the problem by changing pytorch-lighting version
but there was another error while running SBERT-to-AE
Traceback (most recent call last):
File "translation.py", line 521, in <module>
trainer.fit(model, data)
File "/home/ubuntu/anaconda3/envs/net2net/lib/python3.7/site-packages/pytorch_lightning/trainer/states.py", line 48, in wrapped_fn
result = fn(self, *args, **kwargs)
File "/home/ubuntu/anaconda3/envs/net2net/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1058, in fit
results = self.accelerator_backend.spawn_ddp_children(model)
File "/home/ubuntu/anaconda3/envs/net2net/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_bac kend.py", line 123, in spawn_ddp_children
results = self.ddp_train(local_rank, mp_queue=None, model=model, is_master=True)
File "/home/ubuntu/anaconda3/envs/net2net/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_bac kend.py", line 224, in ddp_train
results = self.trainer.run_pretrain_routine(model)
File "/home/ubuntu/anaconda3/envs/net2net/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1224, in run_pretrain_routine
self._run_sanity_check(ref_model, model)
File "/home/ubuntu/anaconda3/envs/net2net/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1257, in _run_sanity_check
eval_results = self._evaluate(model, self.val_dataloaders, max_batches, False)
File "/home/ubuntu/anaconda3/envs/net2net/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_l oop.py", line 333, in _evaluate
output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
File "/home/ubuntu/anaconda3/envs/net2net/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_l oop.py", line 661, in evaluation_forward
output = model(*args)
File "/home/ubuntu/anaconda3/envs/net2net/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/net2net/lib/python3.7/site-packages/pytorch_lightning/overrides/data_paral lel.py", line 174, in forward
output = self.module.validation_step(*inputs[0], **kwargs[0])
File "/home/ubuntu/hongiee/net2net/net2net/models/flows/flow.py", line 194, in validation_step
loss, log_dict = self.shared_step(batch, batch_idx, split="val")
File "/home/ubuntu/hongiee/net2net/net2net/models/flows/flow.py", line 181, in shared_step
x = self.get_input(self.first_stage_key, batch)
File "/home/ubuntu/hongiee/net2net/net2net/models/flows/flow.py", line 174, in get_input
x = x.permute(0, 3, 1, 2).to(memory_format=torch.contiguous_format)
TypeError: to() received an invalid combination of arguments - got (memory_format=torch.memory_format, ), but expected one of:
* (torch.device device, torch.dtype dtype, bool non_blocking, bool copy)
* (torch.dtype dtype, bool non_blocking, bool copy)
* (Tensor tensor, bool non_blocking, bool copy)
from net2net.
Thanks,
I solve the problem by changing pytorch-lighting version
but there was another error while running
SBERT-to-AE
Traceback (most recent call last): File "translation.py", line 521, in <module> trainer.fit(model, data) File "/home/ubuntu/anaconda3/envs/net2net/lib/python3.7/site-packages/pytorch_lightning/trainer/states.py", line 48, in wrapped_fn result = fn(self, *args, **kwargs) File "/home/ubuntu/anaconda3/envs/net2net/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1058, in fit results = self.accelerator_backend.spawn_ddp_children(model) File "/home/ubuntu/anaconda3/envs/net2net/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_bac kend.py", line 123, in spawn_ddp_children results = self.ddp_train(local_rank, mp_queue=None, model=model, is_master=True) File "/home/ubuntu/anaconda3/envs/net2net/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_bac kend.py", line 224, in ddp_train results = self.trainer.run_pretrain_routine(model) File "/home/ubuntu/anaconda3/envs/net2net/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1224, in run_pretrain_routine self._run_sanity_check(ref_model, model) File "/home/ubuntu/anaconda3/envs/net2net/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1257, in _run_sanity_check eval_results = self._evaluate(model, self.val_dataloaders, max_batches, False) File "/home/ubuntu/anaconda3/envs/net2net/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_l oop.py", line 333, in _evaluate output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode) File "/home/ubuntu/anaconda3/envs/net2net/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_l oop.py", line 661, in evaluation_forward output = model(*args) File "/home/ubuntu/anaconda3/envs/net2net/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__ result = self.forward(*input, **kwargs) File "/home/ubuntu/anaconda3/envs/net2net/lib/python3.7/site-packages/pytorch_lightning/overrides/data_paral lel.py", line 174, in forward output = self.module.validation_step(*inputs[0], **kwargs[0]) File "/home/ubuntu/hongiee/net2net/net2net/models/flows/flow.py", line 194, in validation_step loss, log_dict = self.shared_step(batch, batch_idx, split="val") File "/home/ubuntu/hongiee/net2net/net2net/models/flows/flow.py", line 181, in shared_step x = self.get_input(self.first_stage_key, batch) File "/home/ubuntu/hongiee/net2net/net2net/models/flows/flow.py", line 174, in get_input x = x.permute(0, 3, 1, 2).to(memory_format=torch.contiguous_format) TypeError: to() received an invalid combination of arguments - got (memory_format=torch.memory_format, ), but expected one of: * (torch.device device, torch.dtype dtype, bool non_blocking, bool copy) * (torch.dtype dtype, bool non_blocking, bool copy) * (Tensor tensor, bool non_blocking, bool copy)
I also encountered this problem, which version of pytorch-lightning are you using? Looking forward to your reply
from net2net.
Related Issues (8)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from net2net.