The eclipse from genjib

Checkoints

Great work! Would it be possible to share your final weights?

Thank you!

Hi, I find that in this _get_audio function, a spectrogram will be generated for each sampled frame by signal.spectrogram(samples, samplerate, nperseg=512,noverlap=353). For efficiency, would it be possible to generate one large spectrogram for a video and then perform sampling on the large spectrogram? Since this would save some preprocessing costs if there is no memory problem.

Thanks for releasing this nice work!

def _get_audio(self, idx, s, e):
		
		audio_mask = torch.zeros((1, self.opt.max_audio_frames), dtype=np.long)


		audio = torch.zeros((self.opt.max_audio_frames, 1024, 128), dtype=torch.double) 



		audio_folder = self.video_dict[idx].split('.')[:-1][0].replace('frames',self.opt.audio_pt)

		audio_folder_bk = audio_folder.replace('audio_raw','VGGSound_Audio_features_10s_aligned')
		self.save_path = audio_folder_bk
		# audio_folder = audio_folder.replace('playpen-iop','playpen-storage')
		
		total_num_wav = len(glob.glob(audio_folder+'/*.wav'))
		total_num_pt = len(glob.glob(audio_folder+'/*.pt'))

		# print('Read: '+ idx)

		total_fbank = []
		# if self.opt.max_audio_frames < total_num_wav: # for frame-wise fusion
		
		if self.my_len == 4816 or True:
			sample_indx = np.linspace(0, total_num_pt-1, num=self.opt.max_audio_frames, dtype=int)
			for tmp_idx in sample_indx:
				fbank = torch.load(audio_folder+'/'+ str("{:04d}".format(tmp_idx))+ '.pt', map_location=torch.device('cpu'))

				total_fbank.append(fbank)

		else:
			for tmp_idx in range(self.opt.max_audio_frames):	#total_num_wav self.opt.max_audio_frames
					### loader for VGGSound
					try:
							
						samples, samplerate = sf.read(audio_folder+'/'+ '0000.wav')

						if samples.shape[0] > 16000*(self.opt.yb_audio_length+0.1):
							sample_indx = np.linspace(0, samples.shape[0] -16000*(self.opt.yb_audio_length+0.1), num=self.opt.max_audio_frames, dtype=int)
							samples = samples[sample_indx[tmp_idx]:sample_indx[tmp_idx]+int(16000*self.opt.yb_audio_length)]

						else:
							# repeat in case audio is too short
							samples = np.tile(samples,int(self.opt.yb_audio_length))[:int(16000*self.opt.yb_audio_length)]

						samples[samples > 1.] = 1.
						samples[samples < -1.] = -1.

						frequencies, times, spectrogram = signal.spectrogram(samples, samplerate, nperseg=512,noverlap=353)
						spectrogram = np.log(spectrogram+ 1e-7)

						mean = np.mean(spectrogram)
						std = np.std(spectrogram)
						spectrogram = np.divide(spectrogram-mean,std+1e-9)

						total_fbank.append(torch.tensor(spectrogram).unsqueeze(0).float())

					except:
						print('Too short: '+ audio_folder+'/'+ str("{:04d}".format(tmp_idx))+ '.wav')
						# print("skip too short")
						continue
					
		
		

		# audio[:total_fbank.size(0)] = total_fbank
		# audio_mask[0, :total_fbank.size(0)] = 1
		# return audio, audio_mask
		total_fbank = torch.vstack(total_fbank)
		return total_fbank, audio_mask

Video-to-video retrieval?

Hi authors!

Thanks for the great work! I saw that is paper is evaluated on all kinds of video-to-text dataset. CLIP model itself works pretty well for image-to-image retrieval, despite that it is trained on image-text pairs. Similarly, I wonder if CLIP4Clip would also work for video-to-video retrieval?

Will the pretrained weights will be released?

thank you for this great work!
I want to ask if the authors are planning to release the pretraining weights.

Very impressive work!

Hi, this is the most exciting paper I've seen these days, thank you for releasing the code. : )

Hello, may I ask if you have any pre-training parameters that have not been released? Why is r1 only 0.2 when I use the model to run msrvtt data, is the cross_pytorch_model.bin file not released?

QVHighlights and YouCook2

Very good job, benefited me a lot.
But when I downloaded the QVHighlights dataset, the speed was very slow, about 20kb/s.
How can I easily obtain this dataset?
Can you upload QVHighlights to Google network disk or other network disks to provide convenient download.
Looking forward to the author's reply, thank you very much.

genjib / eclipse Goto Github PK

eclipse's People

Stargazers

Watchers

Forkers

eclipse's Issues

Checkoints

About audio sampling

Video-to-video retrieval?

Will the pretrained weights will be released?

Very impressive work!

Hello, may I ask if you have any pre-training parameters that have not been released? Why is r1 only 0.2 when I use the model to run msrvtt data, is the cross_pytorch_model.bin file not released?

QVHighlights and YouCook2

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent