Giter Site home page Giter Site logo

ValueError: math domain error about openlrm HOT 5 OPEN

3dtopia avatar 3dtopia commented on July 22, 2024 1
ValueError: math domain error

from openlrm.

Comments (5)

kunalkathare avatar kunalkathare commented on July 22, 2024 1

Hey @hayoung-jeremy , try reducing the value of global_step_period under val: in the train sample yaml file , until it stops giving the error, which worked for me when I was trying to train with 350 objects.

from openlrm.

hayoung-jeremy avatar hayoung-jeremy commented on July 22, 2024 1

Wow, you're my savior, thank you so much! I'll try it!

from openlrm.

hayoung-jeremy avatar hayoung-jeremy commented on July 22, 2024

Thank you @kunalkathare , I've tried with the following config, modified epoch and global_step_period :

...

train:
    mixed_precision: bf16
    find_unused_parameters: false
    loss:
        pixel_weight: 1.0
        perceptual_weight: 1.0
        tv_weight: 5e-4
    optim:
        lr: 4e-4
        weight_decay: 0.05
        beta1: 0.9
        beta2: 0.95
        clip_grad_norm: 1.0
    scheduler:
        type: cosine
        warmup_real_iters: 3000
    batch_size: 16 
    accum_steps: 1
    epochs: 100  # MODIFIED : 60 -> 100
    debug_global_steps: null

val:
    batch_size: 4
    global_step_period: 100 # MODIFIED : 1000 -> 100
    debug_batches: null

...

and successfully generated a checkpoint as follows :

[TRAIN STEP]loss=0.642, loss_pixel=0.0695, loss_perceptual=0.572, loss_tv=0.7, lr=1.35e-5: 100%|███████████████████████████████████████████████| 100/100 [03:24<00:00,  5.10s/it]

But it seems the loss value is too high. What should I modify to decrease the loss value?
Should I increase the epoch to 1000?
And what is the ideal loss values for successfully generated checkpoint?
Could you share me your case?
Thank you so much for your help

from openlrm.

kunalkathare avatar kunalkathare commented on July 22, 2024

Thank you @kunalkathare , I've tried with the following config, modified epoch and global_step_period :

...

train:
    mixed_precision: bf16
    find_unused_parameters: false
    loss:
        pixel_weight: 1.0
        perceptual_weight: 1.0
        tv_weight: 5e-4
    optim:
        lr: 4e-4
        weight_decay: 0.05
        beta1: 0.9
        beta2: 0.95
        clip_grad_norm: 1.0
    scheduler:
        type: cosine
        warmup_real_iters: 3000
    batch_size: 16 
    accum_steps: 1
    epochs: 100  # MODIFIED : 60 -> 100
    debug_global_steps: null

val:
    batch_size: 4
    global_step_period: 100 # MODIFIED : 1000 -> 100
    debug_batches: null

...

and successfully generated a checkpoint as follows :

[TRAIN STEP]loss=0.642, loss_pixel=0.0695, loss_perceptual=0.572, loss_tv=0.7, lr=1.35e-5: 100%|███████████████████████████████████████████████| 100/100 [03:24<00:00,  5.10s/it]

But it seems the loss value is too high. What should I modify to decrease the loss value?
Should I increase the epoch to 1000?
And what is the ideal loss values for successfully generated checkpoint?
Could you share me your case?
Thank you so much for your help

The loss value is reduced when the size of the dataset is more, and I guess you can increase the epochs and see if it affects.

from openlrm.

hayoung-jeremy avatar hayoung-jeremy commented on July 22, 2024

Thank you for kind reply @kunalkathare !

  • I don't have enough dataset for now, can I just copy the same data to increase the amount of it?
  • And I've tried to increase the epoch to 1000, it also generated the checkpoint with the loss value about 0.3.
    But the inference result quality from that checkpoint is not that good, as you can see in this issue.
    So I'm going to try to increase the epoch to 10000, is it okay?
    If it is, what kind of values should I adjust from the train_sample.yaml?

Really great help from you, many thanks for your assistance.

from openlrm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.