Comments (12)
@starsong98 May I ask how you resolved it?
from diffusion-gan.
@aaaapineapple Hello,have you solved the problem?
from diffusion-gan.
Hi starsong98, I just tried on my end. The checkpoint works well and could be resumed for further training. Do you make sure that your CelebA is 64x64 resolution?
Here is the command that I used
python train.py --outdir=training-runs --data=/home/zdwang/datasets/celeba64.zip --gpus=2 --cfg stl10 --kimg 100000 --ts_dist uniform --resume pretrained/diffusion-stylegan2-celeba64.pkl
from diffusion-gan.
I checked just now, it is indeed 64 x 64 resolution.
I think the model architecture just does not match, but I could be wrong.
from diffusion-gan.
Thank you! Your command did the trick. Looks like it was some of the other arguments instead that were interfering.
But now the "Evaluating metrics..." part is taking really long for some reason. Could it be because I do not have Tensorboard installed in that environment?
from diffusion-gan.
Emmm, Tensorboard should not impact the evaluation time. Pytorch version could sometimes matter. Usually the first evaluation could take longer for set up evaluation stage, but the later ones will be faster.
from diffusion-gan.
I seem to have gotten a sort of time out error message:
Traceback (most recent call last):
File "train.py", line 533, in <module>
main() # pylint: disable=no-value-for-parameter
File "anaconda3/envs/diffusiongan2/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "anaconda3/envs/diffusiongan2/lib/python3.7/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "anaconda3/envs/diffusiongan2/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "anaconda3/envs/diffusiongan2/lib/python3.7/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "anaconda3/envs/diffusiongan2/lib/python3.7/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "train.py", line 528, in main
torch.multiprocessing.spawn(fn=subprocess_fn, args=(args, temp_dir), nprocs=args.num_gpus)
File "anaconda3/envs/diffusiongan2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "anaconda3/envs/diffusiongan2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
while not context.join():
File "anaconda3/envs/diffusiongan2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "anaconda3/envs/diffusiongan2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "Documents/ai504/project1/Diffusion-GAN-main/diffusion-stylegan2/train.py", line 359, in subprocess_fn
training_loop.training_loop(rank=rank, **args)
File "Documents/ai504/project1/Diffusion-GAN-main/diffusion-stylegan2/training/training_loop.py", line 437, in training_loop
value = phase.start_event.elapsed_time(phase.end_event)
File "anaconda3/envs/diffusiongan2/lib/python3.7/site-packages/torch/cuda/streams.py", line 177, in elapsed_time
return super(Event, self).elapsed_time(end_event)
RuntimeError: Both events must be recorded before calculating elapsed time.
Does this mean that it took too long to compute the metrics?
from diffusion-gan.
NO, this is because your --kimg
is smaller than the kimg has been trained. You should be able to know how many images trained from the first line output after training starts. kimg
represents that the total numbers images will be trained.
The current resuming records the number of images trained. But you can modify this to make it forget it. Then the kimg
will be the number of images futher trained.
from diffusion-gan.
I see. Thank you.
from diffusion-gan.
NO, this is because your
--kimg
is smaller than the kimg has been trained. You should be able to know how many images trained from the first line output after training starts.kimg
represents that the total numbers images will be trained.The current resuming records the number of images trained. But you can modify this to make it forget it. Then the
kimg
will be the number of images futher trained.
hello!
Should I modify the value of kimg or do I need to do some processing,
from diffusion-gan.
@menyouhua1 Hello, i meet the same problem, have you solved it?
from diffusion-gan.
@aaaapineapple Hello, i meet the same problem, have you solved it?
from diffusion-gan.
Related Issues (20)
- No module named 'upfirdn2d_plugin' HOT 5
- AssertionError: Default process group is not initialized HOT 1
- RuntimeError: Both events must be recorded before calculating elapsed time. HOT 3
- ninja: build stopped: subcommand failed. HOT 5
- Error while running train.py HOT 3
- 'tuple' object is not callable HOT 1
- how to lower fid? HOT 2
- Use Diffusion-GAN in Other GAN Architecture HOT 2
- How to use Diffusion-GAN in GP-UNIT
- A tiny question in diffusion.py HOT 2
- FID discrepancy HOT 3
- Application HOT 1
- about the value of target HOT 1
- regarding the setting of the hyperparameter "ada_kimg" HOT 4
- grid_sampler_2d_backward() is missing value for argument 'output_mask'.
- How to modify discriminator to incorporate diffusion timestep t?
- Questions about adaptive diffusion. HOT 2
- About the dataset preprocessing HOT 1
- How to use in Our Model?
- Generation includes reverse process of Diffusion? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from diffusion-gan.