Giter Site home page Giter Site logo

Comments (7)

z-fabian avatar z-fabian commented on August 21, 2024 1

VarNet only works with batch size 1, that is per GPU. This is because the slices have varying size. If you want to increase mini-batch size, that is number of training examples averaged per gradient update just increase the number of GPUs used for training.

from fastmri.

mmuckley avatar mmuckley commented on August 21, 2024

This is a good point. I believe 32 GPUs were used.

from fastmri.

z-fabian avatar z-fabian commented on August 21, 2024

Thank you for the info. Have you experimented with how to adjust the learning rate with different batch sizes, or how much impact it has on reconstruction quality? Intuitively, I would use a factor of N lower learning rate for a factor of N less GPUs (smaller batch size), but this would require some trial-and-error to figure out. I'm interested because I want to reproduce the results from the paper, but don't have access to 32 GPUs. Thanks again.

from fastmri.

mmuckley avatar mmuckley commented on August 21, 2024

I don't have much experience myself. I've gotten up to about 0.914 SSIM with two GPUs and the same learning rate and a smaller model for testing the repository refactor. Your strategy for lowering the learning rate by the GPU factor seems like a reasonable one.

from fastmri.

adefazio avatar adefazio commented on August 21, 2024

Keeping the learning rate the same would be my recommendation, but you should consider reducing the number of layers, otherwise it's not going to be train in a reasonable amount of time.

from fastmri.

z-fabian avatar z-fabian commented on August 21, 2024

Okay, thanks for the input. I am going to experiment with model size and if the number of GPUs have an effect on SSIM. I'm closing this issue now.

from fastmri.

zhan4817 avatar zhan4817 commented on August 21, 2024

However, when I tried to train VarNet with batch_size = 2, it shows the error in the validation sanity check:

...
237, in get_low_frequency_lines
while mask[..., r, :]:
RuntimeError: bool value of Tensor with more than one value is ambiguous

Do you have any idea? Thanks!

from fastmri.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.