Giter Site home page Giter Site logo

coverhunter's Issues

Covers80 dataset is .mp3 files, not .wav files, and organized differently than you expect

Your current code is not capable of ingesting the covers80 dataset as published at http://labrosa.ee.columbia.edu/projects/coversongs/covers80/ because:
a) The covers80 files are .mp3 files, and your code is expecting .wav files, as detailed in your dataset.txt file. This leads to sox crashing with this message in the sox.log output file: "sox FAIL formats: can't open input file `data/covers80/wav_16k/en_vogue+Funky_Divas+09-Yesterday.wav': No such file or directory"
b) The covers80 files are presented in a different folder structure and name than your code expects: covers80 presents its mp3 files in subfolders organized by song title and all of those within a parent folder named "covers32k" whereas your code, as detailed in your dataset.txt file, expects them to be in a single folder named "wav_16k".

The Impact of Input Length of Audio Segments on mAP

Thank you for your excellent work, it has been very helpful to me.

According to my test results, I found that the longer(90s, 135s, 180s) input length of audio segments gets the higher mAP(at my local dataset,mostly short audio(10s-20s),max length < 90s). However, I am confused because:

  1. During the training phase, the first and second stages used 15s/15s-45s.
  2. When the length is long, because the input is short audio, so a lot of value(-100) will be padded behind it, it should not bring benefits.

What do you think might be the possible causes of this result?

I would greatly appreciate your reply

Conformer自注意力机制模块的一些思考

作者您好,您在使用Conformer时使用的数据没有通道这一维度。想请问您如果我是多通道数据可以对每个通道的数据应用自注意力机制,最终将多个多个通道的结果在连接起来传入下一个卷积层。这种想法合理嘛。
之所以有这种想法是因为看到很多自注意力机制对于输入的长度是固定的。歌曲的长度不固定,不想放弃输入不同长度进行训练这一数据增强的方式。
对于Time Domain Pool池化是否也可以采用以上思路呢。

第一阶段(coarse training)的训练细节

您好,非常感谢您出色的工作,对我帮助很大。

我现在在从头开始复现论文(加入一些中文歌曲数据),在训练第一阶段时,我对于一些细节不太清楚,想请教一下您。

  1. 代码是适用第一阶段的吗?只需要修改数据以及chunk参数
  2. 标注也是使用的song id吗?
  3. 训练了多少个epoch?
  4. 训练的超参数设置以及训练log

非常期待您的答复!

应用咨询

作者你好,在实际应用中,大部分歌曲没有翻唱版本,只有个别音乐有翻唱版本,针对这种情况下,如何过滤掉没有翻唱版本的歌曲,谢谢!

What are the query_path and ref_path for eval?

I am trying to evaluate on covers80. When using the dataset.txt as query_path and ref_path I receive an error, since the "song_id" key is not included. Am I passing the wrong argument?

Datasets

作者您好,能提供一下SHS100K 的CQT 特征嘛。或者怎样去通过YouTube的url去生成SHS100K 的CQT 特征呢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.