Giter Site home page Giter Site logo

Comments (4)

ArrowLuo avatar ArrowLuo commented on August 16, 2024

Hi @HenryHZY,

  1. You can test on 4 GPUs instead of 8 GPUs, or make the --batch_size double when using 8 GPUs. Then we can discuss the results. I am not sure what affects the performance now.
  2. The log of these three lines is redundant, and does not affect the pretrain, train, and inference. Just ignore them, or regard them as dirty information.
    Thanks.

from univl.

HenryHZY avatar HenryHZY commented on August 16, 2024

Hi @HenryHZY,

  1. You can test on 4 GPUs instead of 8 GPUs, or make the --batch_size double when using 8 GPUs. Then we can discuss the results. I am not sure what affects the performance now.
  2. The log of these three lines is redundant, and does not affect the pretrain, train, and inference. Just ignore them, or regard them as dirty information.
    Thanks.

@ArrowLuo Thanks for your quick reply!
Actually, I have also tested with 4 A100 GPUs. Double batch_size experiment with 8 A100 GPUs will be conducted later.

retrieval, FT-Align, 4 A100 GPUs
R@1: 0.2510 - R@5: 0.5780 - R@10: 0.7010 - Median R: 4.0

Maybe I need to change some parameters, such as epochs, batch_size and lr, to obtain a better result?

Do you have any other experience sharing on the fine-tuning experiment?
For example, just like your answer for #18, to increase the batch_size as much as possible to use my GPUs.

from univl.

ArrowLuo avatar ArrowLuo commented on August 16, 2024

Hi @HenryHZY, yes, the epochs, batch_size, and lr are important for the retrieval tasks. I can not remember other details/tricks to do fine-tuning now due to a long time away.

from univl.

HenryHZY avatar HenryHZY commented on August 16, 2024

Hi, @ArrowLuo. I would like to ask if the input of UniVL is video-sentences or clip-sentence or clip-sentences?

Following your instruction, I obtain the video features and text features.
Given a video_id_x that has a time interval [0, m-1 seconds], after feature extraction, video_id_x.npy is a np.array with a shape of [m, 1024].

Supposed that video_id_x has n video clips with n responding sentences. (defined in the caption.pickle)

"video_id_x":{
		"start":[s_1, s_2, ..., s_n],
		"end":[e_1, e_2, ..., e_n],
		"text":["t_1", "t_2", ..., "t_n"]
	}

/
/

Then, what is the shape of the original input tokens to UniVL? A single video clip and its one sentence?
Take the time interval [s_1, e_1] of the first video clip for an example:

video tokens: [e_1-s_1+1, 1024]
text tokens: [tokens_sum_of_t_1, word_token_embedding_size]

Are all the above data formats correct, including [m, 1024], [e_1-s_1+1, 1024] and [tokens_sum_of_t_1, word_token_embedding_size]?

Thanks for your time!

from univl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.