Comments (5)
Hey @hayoung-jeremy , try reducing the value of global_step_period under val: in the train sample yaml file , until it stops giving the error, which worked for me when I was trying to train with 350 objects.
from openlrm.
Wow, you're my savior, thank you so much! I'll try it!
from openlrm.
Thank you @kunalkathare , I've tried with the following config, modified epoch
and global_step_period
:
...
train:
mixed_precision: bf16
find_unused_parameters: false
loss:
pixel_weight: 1.0
perceptual_weight: 1.0
tv_weight: 5e-4
optim:
lr: 4e-4
weight_decay: 0.05
beta1: 0.9
beta2: 0.95
clip_grad_norm: 1.0
scheduler:
type: cosine
warmup_real_iters: 3000
batch_size: 16
accum_steps: 1
epochs: 100 # MODIFIED : 60 -> 100
debug_global_steps: null
val:
batch_size: 4
global_step_period: 100 # MODIFIED : 1000 -> 100
debug_batches: null
...
and successfully generated a checkpoint as follows :
[TRAIN STEP]loss=0.642, loss_pixel=0.0695, loss_perceptual=0.572, loss_tv=0.7, lr=1.35e-5: 100%|███████████████████████████████████████████████| 100/100 [03:24<00:00, 5.10s/it]
But it seems the loss value is too high. What should I modify to decrease the loss value?
Should I increase the epoch to 1000?
And what is the ideal loss values for successfully generated checkpoint?
Could you share me your case?
Thank you so much for your help
from openlrm.
Thank you @kunalkathare , I've tried with the following config, modified
epoch
andglobal_step_period
:... train: mixed_precision: bf16 find_unused_parameters: false loss: pixel_weight: 1.0 perceptual_weight: 1.0 tv_weight: 5e-4 optim: lr: 4e-4 weight_decay: 0.05 beta1: 0.9 beta2: 0.95 clip_grad_norm: 1.0 scheduler: type: cosine warmup_real_iters: 3000 batch_size: 16 accum_steps: 1 epochs: 100 # MODIFIED : 60 -> 100 debug_global_steps: null val: batch_size: 4 global_step_period: 100 # MODIFIED : 1000 -> 100 debug_batches: null ...and successfully generated a checkpoint as follows :
[TRAIN STEP]loss=0.642, loss_pixel=0.0695, loss_perceptual=0.572, loss_tv=0.7, lr=1.35e-5: 100%|███████████████████████████████████████████████| 100/100 [03:24<00:00, 5.10s/it]But it seems the loss value is too high. What should I modify to decrease the loss value?
Should I increase the epoch to 1000?
And what is the ideal loss values for successfully generated checkpoint?
Could you share me your case?
Thank you so much for your help
The loss value is reduced when the size of the dataset is more, and I guess you can increase the epochs and see if it affects.
from openlrm.
Thank you for kind reply @kunalkathare !
- I don't have enough dataset for now, can I just copy the same data to increase the amount of it?
- And I've tried to increase the epoch to
1000
, it also generated the checkpoint with the loss value about0.3
.
But the inference result quality from that checkpoint is not that good, as you can see in this issue.
So I'm going to try to increase the epoch to10000
, is it okay?
If it is, what kind of values should I adjust from thetrain_sample.yaml
?
Really great help from you, many thanks for your assistance.
from openlrm.
Related Issues (20)
- Training with .obj files HOT 2
- How to evaluate metrics HOT 1
- Matrix multiplication: not supported between 'Matrix' and 'Vector' types HOT 2
- Inference result quality on trained model is not good HOT 10
- CUDA out of memory for inferring HOT 1
- What resources to train from scratch HOT 1
- Matrix multiplication: not supported between 'Matrix' and 'Vector' types HOT 1
- Pretrained models config files HOT 10
- Whether the camera system is OpenGL?
- Which parts are covered by Nvidia license? HOT 1
- Significant Resolution Difference Between Front and Back Sides in Fine-Tuned Model GLB Outputs
- Hello, I am very interested in your research and also feel that it is an excellent paper.I have a few questions about Triplane.
- how to rendered novel views
- About export format.
- TypeError: __init__() missing 1 required positional argument: 'config' HOT 1
- 版本信息问题
- How to change camera viewing angles / placement of the object for novel view synthesis
- How to load models and parameters using transformers HOT 2
- AttributeError: Float8_e5m2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openlrm.