harritaylor / torchvggish Goto Github PK

View Code? Open in Web Editor NEW

368.0 368.0 65.0 324 KB

Pytorch port of Google Research's VGGish model used for extracting audio features.

License: Apache License 2.0

Python 100.00%

audio-embedding audioset pytorch vggish

torchvggish's People

Contributors

Stargazers

Watchers

Forkers

yapengtian mowayao entn-at jaybdub daisey666 kaihemo hudsonhuang kuonanhong stevenguh ananthmadhav vvnzhang souayb dendisuhubdy hadryan m-bain botkevin maitreyapatel dfan wentaozhu lourisxu groadabike champon1020 remielyzs sberbank-ai-lab pquochuy mc261670164 kepler62f gretatuckute chenhuayou ssiwakot yycuc junwenxiong aiainui pneves1051 emc23 lenka844 poposhuai hearbenchmark toborobot davidhurst xiaoyeye1117 ankitshah009 tahy1 praveena2j ricardo0115 femtosense gljs yuchen-liu-bose kamino666 gargantua43 keunwoochoi nhattruongpham xiaoqiangzhang203 andremsouza floaredor samuel-clarke promitbasak domkirke achronferry akkarimi liaorongfan 18278455875 drscotthawley

torchvggish's Issues

About the preprocess

Do you have a plan to migrate numpy-style preprocess to a torch-style one?

Diff between the pytorchvggish and tensorflowvggish

Hi @harritaylor ,
I have inputed the piano.wav into tensorflow vggish, but the pca embedding is diff from pytorchvggish. Do you verify the output after the conversion?

def get_vggish_input(self, wav_file):
    try:
        examples_batch = vggish_input.wavfile_to_examples(wav_file)
        # Prepare a postprocessor to munge the model embeddings.
        pproc = vggish_postprocess.Postprocessor(pca_params)
        return examples_batch, pproc
    except:
        traceback.print_exc()
    return None, None

def get_features(self, examples_batch, pproc):
    try:
        # Run inference and postprocessing.
        [embedding_batch] = self.sess.run([self.embedding_tensor],
                                    feed_dict={self.features_tensor: examples_batch})
        postprocessed_batch = pproc.postprocess(embedding_batch)
        # cv2.imwrite("test.bmp", postprocessed_batch)
        return postprocessed_batch
    except:
        traceback.print_exc()
    return None

hi, thanks for your contribution and sharing, I ran into some issues when I tried to use it locally.So can you provide an example of how to use it? I mean how to load the model locally(Because I need to use it without the Internet)

How to load raw audio file

Can you provide an example of how to load a raw audio file and using vggish as feature extractor in pytorch?

convert the pca state dict to torch

torchvggish/torchvggish/vggish.py

Line 156 in 51bab6b

# TODO: Convert the state_dict to torch

Just need to upload the tensor as part of a release.

Missing activations.

I believe you're missing activations i.e. ReLU's for all of the layers.

Modify torchvggish.vggish_params.EXAMPLE_HOP_SECONDS after (or before) model load?

I would like to modify global variable torchvggish.vggish_params.EXAMPLE_HOP_SECONDS after (or before) loading the model.

However, I cannot import torchvggish.vggish_params because I don't have it installed on my system and it's tricky to install. There is no pypi module and no setup.py file.

What would be the simplest way to modify torchvggish.vggish_params.EXAMPLE_HOP_SECONDS?

Provide better documentation

Instead of having notebooks it will be better to provide simple documentation as part of the readme, as the interface is significantly slimmed down now.

PCA post-processing removes gradient from embeddings

As the tensorflow code uses numpy to PCA the output embeddings, it is not possible to take advantage of this when adding torch-vggish to other networks (the usecase for this is relatively small). It would be useful to reimplement the PCA algorithm so that it can operate on Torch tensors.

URL Error

Hello,

I just encountered this error today. Everything worked fine yesterday and now when I try to use the vggish embeddings I get this error:
urllib.error.URLError: <urlopen error [Errno 11001] getaddrinfo failed>

This is what I've tried and it worked before: vggish_model = torch.hub.load('harritaylor/torchvggish', 'vggish')

I am using windows 10, pycharm, python 3.8

Here is the full Traceback in case that helps:

  File "C:\Program Files\Python38\lib\urllib\request.py", line 1319, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "C:\Program Files\Python38\lib\http\client.py", line 1230, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Program Files\Python38\lib\http\client.py", line 1276, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "C:\Program Files\Python38\lib\http\client.py", line 1225, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Program Files\Python38\lib\http\client.py", line 1004, in _send_output
    self.send(msg)
  File "C:\Program Files\Python38\lib\http\client.py", line 944, in send
    self.connect()
  File "C:\Program Files\Python38\lib\http\client.py", line 1392, in connect
    super().connect()
  File "C:\Program Files\Python38\lib\http\client.py", line 915, in connect
    self.sock = self._create_connection(
  File "C:\Program Files\Python38\lib\socket.py", line 787, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "C:\Program Files\Python38\lib\socket.py", line 918, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:/Bachelor Arbeit/dcase-2022-baseline-main/clotho_preprocessing.py", line 99, in <module>
    preprocess_dataset(config)
  File "D:/Bachelor Arbeit/dcase-2022-baseline-main/clotho_preprocessing.py", line 16, in preprocess_dataset
    vggish_model = torch.hub.load('harritaylor/torchvggish', 'vggish')
  File "C:\Users\tincu\AppData\Roaming\Python\Python38\site-packages\torch\hub.py", line 539, in load
    repo_or_dir = _get_cache_or_reload(repo_or_dir, force_reload, trust_repo, "load",
  File "C:\Users\tincu\AppData\Roaming\Python\Python38\site-packages\torch\hub.py", line 180, in _get_cache_or_reload
    repo_owner, repo_name, ref = _parse_repo_info(github)
  File "C:\Users\tincu\AppData\Roaming\Python\Python38\site-packages\torch\hub.py", line 134, in _parse_repo_info
    with urlopen(f"https://github.com/{repo_owner}/{repo_name}/tree/main/"):
  File "C:\Program Files\Python38\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Program Files\Python38\lib\urllib\request.py", line 525, in open
    response = self._open(req, data)
  File "C:\Program Files\Python38\lib\urllib\request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "C:\Program Files\Python38\lib\urllib\request.py", line 502, in _call_chain
    result = func(*args)
  File "C:\Program Files\Python38\lib\urllib\request.py", line 1362, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "C:\Program Files\Python38\lib\urllib\request.py", line 1322, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 11001] getaddrinfo failed>

Process finished with exit code 1

VGGish should compose a VGG instead of inherit from it

torchvggish/torchvggish/vggish.py

Line 143 in 51bab6b

class VGGish(VGG):

Could you add a license file?

Thank you for your great work.
There is no license file.
Could you add a license file?

Output size

Hello,

I tried this implementation along with Usage.
The size of output tensor is [19, 128].
Do I need to fuse the output tensor in order to convert the output tensor from [19,128] to [1,128] ?
Is the audio-embedding obtained in each audiofile, or in each batch?

#In my understanfing, It can be obtained in each audiofile.

The python process becomes zombie sometimes upon reaching the forward function

GPU version support？

Thank you for your work, I would like to ask if you can add GPU support options in torch.hub?Another question is whether the obtained embedding_size must be a fixed value of 128, is there a way to convert to 2048 dimensions?

'Tensor' object has no attribute 'T'

Hi @harritaylor,
The vggish.py in line 87 reports no attribute 'T', which Pytorch version do you use?

Original vggish vs this..

Hey doesn't the original tf implementation have only four convolution layers and two fc layers? this one has 6, 3...why the difference? How could the embeddings be identical then?

Pre-activation as output of VGGish

Hello there,

when comparing this code to the one placed in tensorflow/models I've found that implementations use different layers as output of VGGish model (if considering activation as a separate layer),

yours:

torchvggish/torchvggish/vggish.py

Line 19 in 4670116

nn.ReLU(True))

google's: https://github.com/tensorflow/models/blob/f32dea32e3e9d3de7ed13c9b16dc7a8fea3bd73d/research/audioset/vggish/vggish_slim.py#L104-L106 (activation_fn=None)

Also, it's mentioned in README

Note that the embedding layer does not include a final non-linear activation, so the embedding value is pre-activation

Changing output layer of VGGish in your implementation to pre-activation one (w/o RELU) makes embeddings (almost) equal in both cases, - raw and PCA'ed ones.

Thanks for porting though, great work!

How to use this code for training on my own dataset?

Hello, I have some .wav files, and I want to train the classification model on my own datasets.

How can I use this code? Extract embeddings and train a sequence model? Is it possible to finetune VGGish feature extractor when training the classifier?

Appreciate for any advise. Thank you !!

how to feed model batch inputs ?

hi, how can i feed the model batch inputs ? i just know how to feed one audio to the model,but if i want to feed batch?can you tell me ? thanks.

questions about VGGISH_WEIGHTS

Thank you for your code and I wonder if your VGGISH_WEIGHTS from the path in the code is purely an adaption from google's checkpoint or your retrained result?

Provide setup.py so pip install still works

The move to torch hub removed the ability to use this package with pip. It would be nice if that remains as an option.

The url link of the weights of VGGish model has been out of work.

I am using VGGish model as a part of my model to extract the features of input audio. However, I can not open the url link you have post on the Github. So could you please update the url link or tell me how to import the weights of the pretrained VGGish model?

Thanks a lot.

harritaylor / torchvggish Goto Github PK

torchvggish's People

Contributors

Stargazers

Watchers

Forkers

torchvggish's Issues

Recommend Projects

Recommend Topics

Recommend Org