Giter Site home page Giter Site logo

pyannote-database's People

Contributors

dependabot[bot] avatar francescobonzi avatar frenchkrab avatar hbredin avatar paullerner avatar pkorshunov avatar wesbz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pyannote-database's Issues

Problem loading a custom protocol

Hello,

I am trying leveraging the pre-trained models on my own data. I already followed the instruction to modify the AMI protocol file. However, when I try following the instruction on preparing the protocol on the AMI subset, I kept getting error as: 'KeyError: 'AMI'' and "ValueError: Could not find any protocol for "AMI" database".

Here are the instruction on the page that I followed:

preprocessors = {'audio': FileFinder()}
protocol = get_protocol('AMI.SpeakerDiarization.MixHeadset',
preprocessors=preprocessors)

Not sure if it was the directory issue. If so, how can I define the proper directory while calling it? And what is the function FileFinder() used for here?

Thanks for your time!

FileFinder doesn't work without development subset

       for method in methods:
            try:
                protocol.progress = False
                file_generator = getattr(protocol, method)()
                first_item = next(file_generator)
            except AttributeError as e:
                continue
            except NotImplementedError as e:
              continue

The first element in methods list is 'development'. If 'development' doesn't in protocol, FileFinder doesn't work.

Rename 'speakers' to 'labels

protocol.stats(subset) returns a dictionary whose speakers key should be rename to labels.

For now, this is specific to speaker identification, but it should also work for language identification, for instance.

problem with pyannote

Hi, i have a problem when i execute the tutorial, well, this msg is returned from after execute program:

FileNotFoundError: Could not find file "EN2002a.Mix-Headset" in the following location(s):

  • /home/gabriel/Desktop/My_tasks/IA-DEEPLEARNING/IC/Codigos/amicorpus/*/audio/EN2002a.Mix-Headset.wav

i dont know why this error happen, can u help me?

Warning: Existing key "annotation" may have been modified.

Hi,

I keep getting this warning when training a pipeline (this issue might have to be transferred) /mnt/beegfs/projects/plumcot/pyannote/pyannote-database/pyannote/database/protocol/protocol.py:128: UserWarning: Existing key "annotation" may have been modified.

This only happens during the first trial (i.e. the first iteration over the whole database subset) but it didn't happen before 4.0

Generic database interface

while making db interfaces, I ended up using very similar code for diarization and speaker spotting protocols to the one that you have implemented in the interface for AMI dataset. This made me thinking that may be it would be possible to generalize this code into generic database interface that would work for any database, given the filelists formatted according the pre-defined format.

Specify the PYANNOTE_DATABASE_CONFIG in python

The possibility of defining the location of the database using python. This of course would not be compatible with a command line interface that is planned #45 but would make the use of this library more flexible.

For example:

protocol = get_protocol('Debug.SpeakerDiarization.Debug', preprocessors={"audio": FileFinder()})
protocol = get_protocol('Debug.SpeakerDiarization.Debug', preprocessors={"audio": FileFinder()}, config="~/.pyannote")

This would be backwards compatible with previous versions of the method

Support for custom progress message

Would be nice to have a way to print a custom progress message when iterating over subsets.

from pyannote.database import get_protocol
protocol = get_protocol('AMI.SpeakerDiarization.MixHeadset')
for current_file in protocol.development(msg='Loading data...'):
    pass

LABLoader import error

Packages installed using pip install pyannote.database command, do not have the latest updates, such as LABLoader

cannot import name 'LABLoader' from 'pyannote.database.loader'

`LABLoader` raise ValueError("`path` must contain the {uri} placeholder.") even if the placeholder is configured correctly

Part of my configuration:

Databases:
  # tell pyannote.database where to find AMI wav files.
  # {uri} is a placeholder for the session name (eg. ES2004c).
  # you might need to update this line to fit your own setup.
  AMI: amicorpus/{uri}/audio/{uri}.Mix-Headset.wav
  AMI-SDM: amicorpus/{uri}/audio/{uri}.Array1-01.wav

Protocols:

  AMI-SDM:
    SpeakerDiarization:
      only_words:
        train:
            uri: ../lists/train.meetings.txt
            annotation: ../only_words/rttms/train/{uri}.rttm
            annotated: ../uems/train/{uri}.uem
            lab: ../only_words/labs/train/{uri}.lab
        development:
            uri: ../lists/dev.meetings.txt
            annotation: ../only_words/rttms/dev/{uri}.rttm
            annotated: ../uems/dev/{uri}.uem
            lab: ../only_words/labs/dev/{uri}.lab
        test:
            uri: ../lists/test.meetings.txt
            annotation: ../only_words/rttms/test/{uri}.rttm
            annotated: ../uems/test/{uri}.uem
            lab: ../only_words/labs/test/{uri}.lab

When I comment out these two lines, the program runs well and file['lab'] returns exactly an Annotation object

if "uri" not in self.placeholders_:
raise ValueError("`path` must contain the {uri} placeholder.")

Seems this sanity check is not working as expected. Also other loaders (e.g. RTTMLoader) don't have this line (I guess the logic should be similar).

pyannote-database command line tool

  • List available databases
pyannote-database database
  • List available protocols (with optional --database and --task filters)
pyannote-database protocol [--database=<database>] [--task=<task>]
  • Get statistics about a protocol (with optional --subset filter)
pyannote-database stats <protocol> [--subset=<subset>]

Custom protocols cannot be pickled

>>> from pyannote.database import get_protocol
>>> protocol = get_protocol('Debug.SpeakerDiarization.Debug')

>>> import pickle
>>> pickle.dumps(protocol)
PicklingError: Can't pickle <class 'pyannote.database.custom.Debug'>: attribute lookup Debug on pyannote.database.custom failed

That is a blocking issue because it prevents multi-gpu training in pyannote.audio v2.

cc @mogwai

A bug in custom.py

Hi, I think there is a small bug in the database/custom.py file.

in line 126 through 135, the current code looks like the following.

# load annotations
    if file_rttm is not None:

        if file_rttm.suffix == '.rttm':
            annotations = load_rttm(file_rttm)
        elif file_rttm.suffix == '.mdtm':
            annotations = load_mdtm(file_mdtm)
        else:
            msg = f'Unsupported format in {file_rttm}: please use RTTM.'
            raise ValueError(msg)

For the *.mdtm file case,
annotations = load_mdtm(file_mdtm) should be annotations = load_mdtm(file_rttm)

A small bug in protocol

In the new version of pyannote.database.
line 71 In pyannote-database/pyannote/database/protocol/protocol.py

item[key] = preprocessor(item)

should be

item[key] = preprocessor(**item)

Multiple preprocessor for same field

Hey,

For that (now cursed) VTC pipeline, if we want to keep things neat, I'll be needing the support for lists of preprocessors for a given field. Let me explain myself: let's say I have 2 preprocessors for file["annotation"], LabelMapper and VoiceTypeClassifierPreprocessor. I need to be able to chain them, the solution being pretty easy:

protocol = get_protocol("Db.ProtocolType.MyProtocol", preprocessors={
    "audio": FileFinder(),
    "annotation": [LabelMapper(), VoiceTypeClassifierPreprocessor(classes=..., unions=...)]
})

Obviously, the order of preprocessors in the list is important.

Would you be ok with me adding support for this in pyannote-db, while keeping the support for the current "single preprocessor" API?

Using patterns for annotation files

Is it possible to use patterns for .rttm locations?
For example:

Protocols:
  MyDatabase:
    Protocol:
      MyProtocol:
        train:
            uri: lists/train.lst
            speaker: rttms/{uri}.rttm

Person naming conventions

To ensure consistent person naming across pyannote.database plugin, we should define a naming conventions that could be then used by pyannote.database.get_label_identifier to ensure the same person is always labeled the same way.

One could use something like:

  • @first_name_last_name for people whose identity is clearly defined
  • {database}|person_{xx} otherwise

Two person_{xx} in the same {database} are supposed to be different from each other.
However, across database, one cannot tell anything about them.

ImportError: cannot import name 'registry' from 'pyannote.database'

Hello:
I try to apply my own YAML with the command: from pyannote.database import get_protocol
But it shows"ImportError: cannot import name 'registry' from 'pyannote.database' (/home/xuan/anaconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/database/init.py)"
How can I solve this problem?Thanks^_^

AttributeError: 'PosixPath' object has no attribute 'format'

Im trying to train the model with my own data, but i cannot solve this problem.

To Reproduce
Steps to reproduce the behavior:

$pyannote-audio sad train --subset=train --to=200 --parallel=4 ${EXP_DIR} Test.SpeakerDiarization.OwnData

My database.yml :

*Databases:
   Test: ./data_set_wav/{uri}.wav
   MUSAN: ./Pyannote/AMI/musan/{uri}.wav

Protocols:
   Test:
      SpeakerDiarization:
         OwnData:
           train:
              uri: ./Reference_files/validate/train.data.lst
              annotation: ./Reference_files/validate/train.data.rttm
              annotated: ./Reference_files/validate/train.data.uem
           development:
              uri: ./Pyannote/AMI/AMI/MixHeadset.development.lst
              annotation: ./Pyannote/AMI/AMI/MixHeadset.development.rttm
              annotated: ./Pyannote/AMI/AMI/MixHeadset.development.uem
           test:
              uri: ./Pyannote/AMI/AMI/MixHeadset.test.lst
              annotation: ./Pyannote/AMI/AMI/MixHeadset.test.rttm
              annotated: ./Pyannote/AMI/AMI/MixHeadset.test.uem
   MUSAN:
      Collection:
         BackgroundNoise:
            uri: ./Pyannote/AMI/musan/MUSAN/background_noise.txt
         Noise:
            uri: ./Pyannote/AMI/musan/MUSAN/noise.txt
         Music:
            uri: ./Pyannote/AMI/musan/MUSAN/music.txt
         Speech:
            uri: ./Pyannote/AMI/musan/MUSAN/speech.txt

pyannote environment

pyannote.audio==1.1.1
pyannote.core==4.3
pyannote.database==4.1.1
pyannote.metrics==3.1
pyannote.pipeline==1.5.2

My train.lst file (First 10 lines):

140471632__701151021998050417_272_2751_20210210_134647
127467523__701151985360118_272_2754_20210201_095339
135434197__701151031971345585_356_3972_20210205_165315
131519247__701151957207998_461_2881_20210203_152337
147034889__450198019993417681_93_3989_20210216_094739
130398174__701151091989826532_272_2754_20210203_084818
128654151__701151019988926593_356_3956_20210201_181652
138260478__01577999895159_147_2413_20210209_111043
146777865__701151021996438304_272_4027_20210215_183913
81808790__450198986402390_137_2680_20201217_085743

My train.rttm file (First 10 lines):

SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 174.301 1.25 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 56.4404 0.9400000000000048 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 20.6501 1.2001000000000026 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 23.8802 1.110000000000003 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 28.2202 0.6000000000000014 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 3.63 0.9199999999999999 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 5.63 0.41000000000000014 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 7.4301 0.8999999999999995 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 8.4901 0.6199999999999992 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 9.7601 0.7100000000000009 <NA> <NA> Customer <NA> <NA>

Additional context
The example with the amicorpus data run with no error, but when i try to train my own data always get this error

AttributeError: 'NoneType' object has no attribute 'items'

Upon trying to run the following code to label preprocessors:

from pyannote.database import FileFinder
preprocessors = {'audio': FileFinder()}

I ran into the attribute error below:

AttributeError Traceback (most recent call last)
Input In [12], in <cell line: 2>()
1 from pyannote.database import FileFinder
----> 2 preprocessors = {'audio': FileFinder()}

File ~\Anaconda3\lib\site-packages\pyannote\database\util.py:94, in FileFinder.init(self, database_yml)
89 with open(self.database_yml, "r") as fp:
90 config = yaml.load(fp, Loader=yaml.SafeLoader)
92 self.config_: Dict[DatabaseName, Union[PathTemplate, List[PathTemplate]]] = {
93 str(database): path
---> 94 for database, path in config.get("Databases", dict()).items()
95 }

AttributeError: 'NoneType' object has no attribute 'items'

Could anyone please advise on how to overcome this issue? database config and database.yml contents below.

echo $PYANNOTE_DATABASE_CONFIG:
/Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/database.yml

database.yml:
Databases:
AMI: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/*/audio/{uri}.wav
MUSAN: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/MUSAN/musan/music/{uri}.wav

Protocols:
AMI:
SpeakerDiarization:
MixHeadset:
train:
uri: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.train.lst
annotation: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.train.rttm
annotated: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.train.uem
development:
uri: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.development.lst
annotation: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.development.rttm
annotated: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.development.uem
test:
uri: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.test.lst
annotation: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.test.rttm
annotated: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.test.uem

get_unique_identifier loads all protocol file values

Using the ** expression here leads to loading all the values of the protocol file, even those that are not needed

My traceback (only to ease the understanding, the error is unrelated and caused by spacy version)

Traceback (most recent call last):
  File "/people/lerner/anaconda3/envs/transformers/bin/named_id.py", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/people/lerner/pyannote/Prune/prune/named_id.py", line 900, in <module>
    shuffle=False))
  File "/people/lerner/pyannote/Prune/prune/named_id.py", line 523, in batchify
    current_audio_emb = audio_emb(current_file)
  File "/people/lerner/pyannote/pyannote-audio/pyannote/audio/features/wrapper.py", line 274, in __call__
    return self.scorer_(current_file)
  File "/people/lerner/pyannote/pyannote-audio/pyannote/audio/features/precomputed.py", line 205, in __call__
    path = Path(self.get_path(current_file))
  File "/people/lerner/pyannote/pyannote-audio/pyannote/audio/features/precomputed.py", line 73, in get_path
    uri = get_unique_identifier(item)
  File "/people/lerner/pyannote/pyannote-database/pyannote/database/util.py", line 205, in get_unique_identifier
    return IDENTIFIER.format(**item)
  File "/people/lerner/pyannote/pyannote-database/pyannote/database/protocol/protocol.py", line 122, in __getitem__
    # just imagine that this key is forbidden
    value = self.lazy[key](self)
  File "/people/lerner/pyannote/pyannote-database/pyannote/database/custom.py", line 100, in load
    return loader(current_file)
  File "/vol/work/lerner/pyannote-db-plumcot/Plumcot/loader/loader.py", line 169, in __call__
    attributes)
  File "/vol/work/lerner/pyannote-db-plumcot/Plumcot/loader/loader.py", line 201, in merge_transcriptions_entities
    # and that this raises : "Oh no, forbidden key !"
    _, one2one, _, _, one2multi = align(tokens, e_tokens)
ValueError: too many values to unpack (expected 5)

support for transcripts and entity linking annotation

Hi,

Related to PaulLerner/pyannote-db-plumcot#12.

How should I proceed ?

In all case each token should have a 'speaker' field (which might be set to unavailable)

If the transcript is a '.aligned' file, several fields could be added:

  • token start time
  • token end time
  • token alignment confidence

If the entity annotation is provided we could add (I guess those already exists in spacy):

  • POS tag
  • dependency
  • entity type

Any suggestions ?

Bug on database.yml

File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/audio/utils/protocol.py", line 31, in check_protocol
file = next(protocol.train())
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 363, in subset_helper
yield self.preprocess(file)
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 329, in preprocess
return ProtocolFile(current_file, lazy=self.preprocessors)
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 87, in init
self.store[key] = precomputed[key]
File "/home/tuenguyen/speech/speech_dia
@/env/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 122, in getitem
value = self.lazykey
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/database/loader.py", line 130, in call
loaded = load_rttm(self.path.format(**sub_file))
AttributeError: 'PosixPath' object has no attribute 'format'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 18, in
scd= SpeakerChangeDetection(ami)
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/audio/tasks/segmentation/speaker_change_detection.py", line 97, in init
super().init(
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/audio/core/task.py", line 175, in init
self.protocol, self.has_validation = check_protocol(protocol)
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/audio/utils/protocol.py", line 34, in check_protocol
raise ValueError(msg)
ValueError: Protocol AMI.SpeakerDiarization.MixHeadset does not define a training set.


I clone the develope branch and install pyannote.
After that, I create a database folder with format

[DirData]/a.lst # uri
[DirData]/a.rttm # protocol
[DirData]/*.wav


+> list(gllob.glob([DirData]/*.wav)) = [mix_0000001.wav]

+> files: a.lst

mix_0000001.wav

+> files: a.rttm

SPEAKER mix_0000001 1 3.60 1.80 speaker_0000004052
SPEAKER mix_0000001 1 6.04 2.28 speaker_0000004052
SPEAKER mix_0000001 1 14.48 2.52 speaker_0000004052
SPEAKER mix_0000001 1 35.03 3.57 speaker_0000004052
SPEAKER mix_0000001 1 64.78 3.36 speaker_0000004052
SPEAKER mix_0000001 1 81.12 2.73 speaker_0000004052
SPEAKER mix_0000001 1 98.48 4.62 speaker_0000004052
SPEAKER mix_0000001 1 106.24 3.78 speaker_0000004052
SPEAKER mix_0000001 1 120.34 3.57 speaker_0000004052
SPEAKER mix_0000001 1 124.89 9.00 speaker_0000004052
SPEAKER mix_0000001 1 134.73 2.52 speaker_0000004052
SPEAKER mix_0000001 1 146.15 3.78 speaker_0000004052
SPEAKER mix_0000001 1 154.14 3.36 speaker_0000004052
SPEAKER mix_0000001 1 202.48 4.62 speaker_0000004052
SPEAKER mix_0000001 1 216.95 3.99 speaker_0000004052
SPEAKER mix_0000001 1 232.39 3.57 speaker_0000004052
SPEAKER mix_0000001 1 244.00 3.72 speaker_0000004052
SPEAKER mix_0000001 1 4.67 3.90 speaker_0000007425
SPEAKER mix_0000001 1 11.09 4.74 speaker_0000007425
SPEAKER mix_0000001 1 17.90 4.56 speaker_0000007425

file database.yml
Databases:
AMI: [DirData]/{uri}.wav
Protocols:
AMI:
SpeakerDiarization:
MixHeadset:
train:
uri:[DirData]/a.lst
annotation:[DirData]/a.rttm


Note: DirData is a absolute path on my local.

Pls I have already read many tutorials and sources code pyannote but I can't firgure out this problem.

image

image

image

Cannot combine several protocols from different databases into one

I am trying to combine multiple protocols into one as shown in the [Meta-protocols and requirements] section in the README. I create the following database.yml configuration file with the already existing protocol configuration files (both works fine).

Requirements:
  - /path/to/folder/aishell4/database.yml
  - /path/to/folder/ali/database.yml

Protocols:
  CombinedAll:
    SpeakerDiarization:
      MyMetaProtocol:
        train:
          AISHELL4.SpeakerDiarization.Custom: [train, ]
          ALI.SpeakerDiarization.Custom: [train, ]
        development:
          AISHELL4.SpeakerDiarization.Custom: [development, ]
          ALI.SpeakerDiarization.Custom: [development, ]
        test:
          AISHELL4.SpeakerDiarization.Custom: [test, ]
          ALI.SpeakerDiarization.Custom: [test, ]

And then trying to call it

from pyannote.database import registry, FileFinder

registry.load_database("path/to/combined/database.yml")
protocol = registry.get_protocol("CombinedAll.SpeakerDiarization.MyMetaProtocol", preprocessors={"audio": FileFinder()})

for file in protocol.train():
    pass

But it gives me an error:

File "/opt/miniconda3/envs/speaker_diar/lib/python3.10/site-packages/pyannote/database/protocol/protocol.py", line 374, in subset_helper
  for file in files:
File "/opt/miniconda3/envs/speaker_diar/lib/python3.10/site-packages/pyannote/database/custom.py", line 317, in subset_iter
  raise ValueError("Missing mandatory 'uri' entry in CombinedAll.SpeakerDiarization.MyMetaProtocol.train")

How can this combination of protocols be done?

No loader for file with '.rttm' suffix

Hi, I've been following the pyannote-audio data preparation tutorial, and am trying to understand how the database.yml files work by running code samples from the README.

For instance, using the sample database.yml file, the following code sample should print out some filenames:

from pyannote.database import get_protocol
protocol = get_protocol('AMI.SpeakerDiarization.MixHeadset')
for resource in protocol.train():
    print(resource["uri"])

Instead, I get an error message about there not being a loader for .rttm files:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-3540f768261e> in <module>()
      2 from pyannote.database import get_protocol
      3 protocol = get_protocol('AMI.SpeakerDiarization.MixHeadset')
----> 4 for resource in protocol.train():
      5     print(resource["uri"])

2 frames
/usr/local/lib/python3.6/dist-packages/pyannote/database/custom.py in gather_loaders(entries, database_yml)
    205             if path.suffix not in LOADERS:
    206                 msg = f"No loader for file with '{path.suffix}' suffix"
--> 207                 raise TypeError(msg)
    208 
    209             # load custom loader class

TypeError: No loader for file with '.rttm' suffix

Here's a public Google Colab file where you can see the issue. It only takes a minute or so to run.
https://colab.research.google.com/drive/1ErF8KOk-s11zUXjOEbZnRguvzj2SBafC

Really appreciate any advice on how to fix this. Thanks!

Error in dataloader : 'PosixPath' object has no attribute 'format'

Hi !
I have an error in the data loader :

Traceback (most recent call last):
  File "main.py", line 292, in <module>
    args.func(args)
  File "main.py", line 134, in run
    trainer.fit(model)
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
    self._call_and_handle_interrupt(
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1138, in _run
    self._call_setup_hook()  # allow user to setup lightning_module in accelerator environment
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1439, in _call_setup_hook
    self.call_hook("setup", stage=fn)
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1501, in call_hook
    output = model_fx(*args, **kwargs)
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/audio/core/model.py", line 349, in setup
    self.task.setup()
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py", line 474, in wrapped_fn
    fn(*args, **kwargs)
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/audio/tasks/segmentation/mixins.py", line 51, in setup
    for f in self.protocol.train():
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 363, in subset_helper
    yield self.preprocess(file)
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 329, in preprocess
    return ProtocolFile(current_file, lazy=self.preprocessors)
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 87, in __init__
    self._store[key] = precomputed[key]
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 122, in __getitem__
    value = self.lazy[key](self)
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/database/loader.py", line 128, in __call__
    loaded = load_rttm(self.path.format(**sub_file))
AttributeError: 'PosixPath' object has no attribute 'format'

My version of pyannote-database is :
pyannote.database 4.1.1

This error seems to happen while trying to load files that don't have annotations in the rttm file.

pyannote-audio sad train fails

I am executing this command in pyannote 1.1.2 (I am pretty sure that is what I have):

export EXP_DIR=finetune1
pyannote-audio sad train --pretrained=sad_dihard --subset=train --to=5 ${EXP_DIR} headcam16.SpeakerDiarization.try1

it fails:

File "/ext3/miniconda3/envs/pyannote6/lib/python3.8/site-packages/pyannote/audio/labeling/tasks/base.py", line 294, in _load_metadata
current_file["annotated"] = get_annotated(current_file).crop(
AttributeError: 'NoneType' object has no attribute 'crop'

I do not have a uem file, I suspect this may be a problem.

I am happy to provide more info if that would be helpful, hoping you can easily tell me what the problem is...

Thanks
Michael

TypeError related to custom protocol name

Hi,

One of my protocol's name is '24', it causes the following error : TypeError: type.__new__() argument 1 must be str, not int.
I guess it should be easily fixable by tweaking yaml loading parameters or adding a str conversion somewhere but I don't understand why the name can't be an int in the first place.

Full log below

Merry Christmas :)

Traceback (most recent call last):
  File "/people/lerner/anaconda3/envs/pyannote/bin/pyannote-speaker-embedding", line 11, in <module>
    load_entry_point('pyannote.audio', 'console_scripts', 'pyannote-speaker-embedding')()
  File "/people/lerner/anaconda3/envs/pyannote/lib/python3.7/site-packages/pkg_resources/__init__.py", line 489, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/people/lerner/anaconda3/envs/pyannote/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2852, in load_entry_point
    return ep.load()
  File "/people/lerner/anaconda3/envs/pyannote/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2443, in load
    return self.resolve()
  File "/people/lerner/anaconda3/envs/pyannote/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2449, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/people/lerner/pyannote/pyannote-audio/pyannote/audio/applications/speaker_embedding.py", line 180, in <module>
    from .base import Application
  File "/people/lerner/pyannote/pyannote-audio/pyannote/audio/applications/base.py", line 40, in <module>
    from pyannote.database import FileFinder
  File "/people/lerner/pyannote/pyannote-database/pyannote/database/__init__.py", line 62, in <module>
    DATABASES, TASKS = add_custom_protocols()
  File "/people/lerner/pyannote/pyannote-database/pyannote/database/custom.py", line 314, in add_custom_protocols
    {'__init__': get_init(register)})
TypeError: type.__new__() argument 1 must be str, not int

Help on the "proper way" to build a protocol for DIHARD database with conflicting URI

As you know, i'm building the database "extension" for DiHard2. Their database format is a bit tricky, and I want the user to have to the smallest amount of work possible to do before being able to use the extension.

Here's the setup: the dihard2 data is provided in two archives, which, once unzipped, form two subfolder : dihard2/dev/ and dihard2/test/ (there isn't any train data, so this is where pyannote's other databases extensions and the metaprotocol pattern will come in really handy).

Here's the catch: in both folders, the audio files (encoded in flac) share basically the same path and there are some conflicting uri's. Thus, a ~/.pyannote/database.yml pattern to get only the dev files would be
path/to/dihard_data/dev/data/single_channel/flac/{uri}.flac

and a pattern to match only the test files would be
path/to/dihard_data/test/data/single_channel/flac/{uri}.flac

The real catch is: there are some uri's that are the same, yet not referencing the same audio file

There is a simple yet ugly solution: have the user run a bash script that would be provided in the repo that would somehow fix this, but it would imply also parsing the .rttm files and modify the uri's in there as well. Not pretty, not optimal, not my style.

I've looked a bit into the FileFinder class, and it seems it uses the format fonction to render the "matched" audio file's path: path = path_template.format(uri=uri, database=database, **kwargs)

My guess is that it's maybe possible, in your infrastructure, to use this kind of generic patch in the ~/.pyannote/database.yml file:
path/to/dihard_data/{split}/data/single_channel/flac/{uri}.flac

And the, in my implementation of a protocol have something like this:

class DIHARD2SingleChannelProtocol(SpeakerDiarizationProtocol):
    """DIHARD speaker diarization protocol """

    def dev_iter(self):
        for annot_filepath in self.load_RTTM_files():
             # parse stuff

            current_file = {
                'database': 'DIHARD2',
                'uri': uri,
                'channel': 1,
                'split': 'dev', # <======== added to be then used by the format function
                'annotated': ...,
                'annotation': ...}
            yield current_file

    def tst_iter(self):
        for test_file in test_data:
            current_file = {
                'database': 'DIHARD2',
                'uri': uri,
                'channel': 1,
                'split': 'test', # <======== added to be then used by the format function
                'annotated': ...,}
            yield current_file

Do you think that's feasible? Do you have any better way to solve this kind of problem since you have a much better understanding of pyannote?

Speaker tag across rttm files

Should all the rttm files have unique speaker tags?

For eg., If there are three audio files and all of them have two different speakers(within files and across files too). Can the tags be made like (Speaker_00, Speaker_01) for all these three files or should it be (Speaker_00, Speaker_01) for first file, (Speaker_02, Speaker_03) for second file and (Speaker_04, Speaker_05) for third file?

Feature: make paths in database.yml (optionally) relative

I think it would be nice to be able to indicate relative paths in database.yml. This way, we could easily share plumcot corpus for example.
We could easily implement it by testing if the path (to e.g. RTTM file) is_absolute() and then concatenate it to the path of PYANNOTE_DATABASE_CONFIG if it's not.

Faster RTTMLoader

RTTMLoader class is extremely slow for large RTTM files containing annotation of multiple audio files (e.g. VoxCeleb dataset).

We should make it faster!

Training the overlap detection : AttributeError: 'PosixPath' object has no attribute 'format'

from pyannote.audio.tasks import OverlappedSpeechDetection
ovl = OverlappedSpeechDetection(protocol, duration=2., batch_size=32, num_workers=4)
model = SimpleSegmentationModel(task=ovl)
trainer = pl.Trainer(max_epochs=1)
_ = trainer.fit(model)
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-23-7e3bd77348a4> in <module>
      3 model = SimpleSegmentationModel(task=ovl)
      4 trainer = pl.Trainer(max_epochs=1)
----> 5 _ = trainer.fit(model)

~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
    456         # SET UP TRAINING
    457         # ----------------------------
--> 458         self.call_setup_hook(model)
    459         self.call_hook("on_before_accelerator_backend_setup", model)
    460         self.accelerator.setup(self, model)  # note: this sets up self.lightning_module

~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in call_setup_hook(self, model)
   1062             called = self.datamodule.has_setup_test if self.testing else self.datamodule.has_setup_fit
   1063             if not called:
-> 1064                 self.datamodule.setup(stage_name)
   1065         self.setup(model, stage_name)
   1066         model.setup(stage_name)

~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py in wrapped_fn(*args, **kwargs)
     90             obj._has_prepared_data = True
     91 
---> 92         return fn(*args, **kwargs)
     93 
     94     return wrapped_fn

~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/audio/tasks/segmentation/mixins.py in setup(self, stage)
     51             self._train_metadata = dict()
     52 
---> 53             for f in self.protocol.train():
     54 
     55                 file = dict()

~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py in subset_helper(self, subset)
    361 
    362         for file in files:
--> 363             yield self.preprocess(file)
    364 
    365     def train(self) -> Iterator[ProtocolFile]:

~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py in preprocess(self, current_file)
    327 
    328     def preprocess(self, current_file: Union[Dict, ProtocolFile]) -> ProtocolFile:
--> 329         return ProtocolFile(current_file, lazy=self.preprocessors)
    330 
    331     def __str__(self):

~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py in __init__(self, precomputed, lazy)
     85             # 'precomputed' one (which is probably not the most efficient solution).
     86             for key in set(precomputed.lazy) & set(lazy):
---> 87                 self._store[key] = precomputed[key]
     88 
     89             # we use the union of 'precomputed' lazy keys and provided 'lazy' keys as lazy keys

~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py in __getitem__(self, key)
    120 
    121                 # apply preprocessor once and remove it
--> 122                 value = self.lazy[key](self)
    123                 del self.lazy[key]
    124 

~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/database/loader.py in __call__(self, file)
    126         if uri not in self.loaded_:
    127             sub_file = {key: file[key] for key in self.placeholders_}
--> 128             loaded = load_rttm(self.path.format(**sub_file))
    129             if uri not in loaded:
    130                 loaded[uri] = Annotation(uri=uri)

AttributeError: 'PosixPath' object has no attribute 'format'

Training on Jamendo Corpus

Hi!

I would like to train one of your pyannote.audio models using the Jamendo Corpus dataset available here: https://zenodo.org/record/2585988#.Yh9QgBPMJhE

Unfortunately I have some problems defining the custom data loader. Each audio track has a single label file, in .lab format with start end label format and this is different from the CTMLoader.

I wrote the following files.

database.yml:
Databases: Jamendo: - /path_to_jamendo/{uri}.mp3 - /path_to_jamendo/{uri}.ogg Protocols: Jamendo: Protocol: JamendoProtocol: train: uri: /path_to_jamendo/filelists/train annotation: /path_to_jamendo/labels/{uri}.lab development: uri: /path_to_jamendo/filelists/valid annotation: /path_to_jamendo/labels/{uri}.lab test: uri: /path_to_jamendo/filelists/test annotation: /path_to_jamendo/labels/{uri}.lab

setup.py:
from setuptools import setup, find_packages setup( name="jamendo_lab_loader", packages=find_packages(), install_requires=[ "pyannote.database >= 4.0", ], entry_points={ "pyannote.database.loader": [ ".lab = jamendo_lab_loader.loader:LabLoader", ], } )

I don't know how to write the loader.py and how to use it. Do you have any suggestions?
Thank you for sharing this great pyannote work. Hope you can help me.
Francesco

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.