liu-feng-deeplearning / coverhunter Goto Github PK

20.0 1.0 7.0 56 KB

Official PyTorch implementation of CoverHunter

Python 100.00%

coverhunter's Issues

Covers80 dataset is .mp3 files, not .wav files, and organized differently than you expect

Your current code is not capable of ingesting the covers80 dataset as published at http://labrosa.ee.columbia.edu/projects/coversongs/covers80/ because:
a) The covers80 files are .mp3 files, and your code is expecting .wav files, as detailed in your dataset.txt file. This leads to sox crashing with this message in the sox.log output file: "sox FAIL formats: can't open input file `data/covers80/wav_16k/en_vogue+Funky_Divas+09-Yesterday.wav': No such file or directory"
b) The covers80 files are presented in a different folder structure and name than your code expects: covers80 presents its mp3 files in subfolders organized by song title and all of those within a parent folder named "covers32k" whereas your code, as detailed in your dataset.txt file, expects them to be in a single folder named "wav_16k".

The Impact of Input Length of Audio Segments on mAP

Thank you for your excellent work, it has been very helpful to me.

According to my test results, I found that the longer(90s, 135s, 180s) input length of audio segments gets the higher mAP(at my local dataset,mostly short audio（10s-20s），max length < 90s). However, I am confused because:

During the training phase, the first and second stages used 15s/15s-45s.

When the length is long, because the input is short audio, so a lot of value(-100) will be padded behind it, it should not bring benefits.

What do you think might be the possible causes of this result?

I would greatly appreciate your reply

Conformer自注意力机制模块的一些思考

作者您好，您在使用Conformer时使用的数据没有通道这一维度。想请问您如果我是多通道数据可以对每个通道的数据应用自注意力机制，最终将多个多个通道的结果在连接起来传入下一个卷积层。这种想法合理嘛。
之所以有这种想法是因为看到很多自注意力机制对于输入的长度是固定的。歌曲的长度不固定，不想放弃输入不同长度进行训练这一数据增强的方式。
对于Time Domain Pool池化是否也可以采用以上思路呢。

第一阶段（coarse training）的训练细节

您好，非常感谢您出色的工作，对我帮助很大。

我现在在从头开始复现论文（加入一些中文歌曲数据），在训练第一阶段时，我对于一些细节不太清楚，想请教一下您。

代码是适用第一阶段的吗？只需要修改数据以及chunk参数

标注也是使用的song id吗？

训练了多少个epoch？

训练的超参数设置以及训练log

非常期待您的答复！

liu-feng-deeplearning / coverhunter Goto Github PK

coverhunter's Issues

Covers80 dataset is .mp3 files, not .wav files, and organized differently than you expect

The Impact of Input Length of Audio Segments on mAP

Conformer自注意力机制模块的一些思考

第一阶段（coarse training）的训练细节

refer confused users to CoverHunterMPS fork?

应用咨询

What are the query_path and ref_path for eval?

Datasets

Pre-trained model download

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent