Giter Site home page Giter Site logo

pyannote-database's Introduction

pyannote-database

Reproducible experimental protocols for multimedia (audio, video, text) database.

$ pip install pyannote.database

Definitions

In pyannote.database jargon, a resource can be any multimedia entity (e.g. an image, an audio file, a video file, or a webpage). In its most simple form, it is modeled as a pyannote.database.ProtocolFile instance (basically a dict on steroids) with a uri key (URI stands for unique resource identifier) that identifies the entity.

Metadata may be associated to a resource by adding keys to its ProtocolFile. For instance, one could add a label key to an image resource describing whether it depicts a chihuahua or a muffin.

A database is a collection of resources of the same nature (e.g. a collection of audio files). It is modeled as a pyannote.database.Database instance.

An experimental protocol (pyannote.database.Protocol) usually defines three subsets:

  • a train subset (e.g. used to train a neural network),
  • a development subset (e.g. used to tune hyper-parameters),
  • a test subset (e.g. used for evaluation).

Configuration file

Experimental protocols are defined via YAML configuration files:

Protocols:
  MyDatabase:
    Protocol:
      MyProtocol:
        train:
            uri: /path/to/train.lst
        development:
            uri: /path/to/development.lst
        test:
            uri: /path/to/test.lst

where /path/to/train.lst contains the list of unique resource identifier (URI) of the files in the train subset:

# /path/to/train.lst
filename1
filename2

Since version 5.0, configuration files must be loaded into the registry like that:

from pyannote.database import registry
registry.load_database("/path/to/database.yml")

registry.load_database takes an optional mode keyword argument that controls what to do when loading a protocol whose name (e.g. MyDatabase.Protocol.MyProtocol) is already used by another protocol:

  • LoadingMode.OVERRIDE to override existing protocol by the new one (default behavior);
  • LoadingMode.KEEP to keep existing protocol;
  • LoadingMode.ERROR to raise a RuntimeException when such a conflict occurs.

For backward compatibility with 4.x branch, the following configuration files are loaded automatically when importing pyannote.database, in that order:

  1. ~/.pyannote/database.yml
  2. database.yml in current working directory
  3. list of ;-separated path(s) in the PYANNOTE_DATABASE_CONFIG environment variable (e.g. /absolute/path.yml;relative/path.yml)

Once loaded in the registry, protocols can be used in Python like this:

from pyannote.database import registry
registry.load_database("/path/to/database.yml")

protocol = registry.get_protocol('MyDatabase.Protocol.MyProtocol')
for resource in protocol.train():
    print(resource["uri"])
filename1
filename2

Paths defined in the configuration file can be absolute or relative to the directory containing the configuration file. For instance, the following file organization should work just fine:

.
├── database.yml
└── lists
    └── train.lst

with the content of database.yml as follows:

Protocols:
  MyDatabase:
    Protocol:
      MyProtocol:
        train:
            uri: lists/train.lst

Data loaders

The above MyDatabase.Protocol.MyProtocol protocol is not very useful as it only allows to iterate over a list of resources with a single 'uri' key. Metadata can be added to each resource with the following syntax:

Protocols:
  MyDatabase:
    Protocol:
      MyProtocol:
        train:
            uri: lists/train.lst
            speaker: rttms/train.rttm
            transcription: ctms/{uri}.ctm

and the following directory structure:

.
├── database.yml
├── lists
|   └── train.lst
├── rttms
|   └── train.rttm
└── ctms
    ├── filename1.ctm
    └── filename2.ctm

Now, resources have both 'speaker' and 'transcription' keys:

from pyannote.database import registry
protocol = registry.get_protocol('MyDatabase.Protocol.MyProtocol')

for resource in protocol.train():
    assert "speaker" in resource
    assert isinstance(resource["speaker"], pyannote.core.Annotation)
    assert "transcription" in resource
    assert isinstance(resource["transcription"], spacy.tokens.Doc)

What happened exactly? Data loaders were automatically selected based on metadata file suffix:

  • pyannote.database.loader.RTTMLoader for speaker entry with .rttm suffix
  • pyannote.database.loader.CTMLoader for transcription entry with ctm suffix).

and used to populate speaker and transcription keys. In pseudo-code:

# instantiate loader registered with `.rttm` suffix
speaker = RTTMLoader('rttms/train.rttm')

# entries with {placeholders} serve as path templates
transcription_template = 'ctms/{uri}.ctm'

for resource in protocol.train():
    # unique resource identifier
    uri = resource['uri']

    # only select parts of `rttms/train.rttm` that are relevant to current resource,
    # convert it into a convenient data structure (here pyannote.core.Annotation), 
    # and assign it to `'speaker'` resource key 
    resource['speaker'] = speaker[uri]

    # replace placeholders in `transcription` path template
    ctm = transcription_template.format(uri=uri)

    # instantiate loader registered with `.ctm` suffix
    transcription = CTMLoader(ctm)

    # only select parts of the `ctms/{uri}.ctm` that are relevant to current resource
    # (here, most likely the whole file), convert it into a convenient data structure
    # (here spacy.tokens.Doc), and assign it to `'transcription'` resource key 
    resource['transcription'] = transcription[uri]

pyannote.database provides built-in data loaders for a limited set of file formats: RTTMLoader for .rttm files, UEMLoader for .uem files, and CTMLoader for .ctm files. See Custom data loaders section to learn how to add your own.

Preprocessors

When iterating over a protocol subset (e.g. using for resource in protocol.train()), resources are provided as instances of pyannote.database.ProtocolFile, which are basically dict instances whose values are computed lazily.

For instance, in the code above, the value returned by resource['speaker'] is only computed the first time it is accessed and then cached for all subsequent calls. See Custom data loaders section for more details.

Similarly, resources can be augmented (or modified) on-the-fly with the preprocessors options for get_protocol. In the example below, a dummy key is added that simply returns the length of the uri string:

def compute_dummy(resource: ProtocolFile):
    print(f"Computing 'dummy' key")
    return len(resource["uri"])

from pyannote.database import registry
protocol = registry.get_protocol('Etape.SpeakerDiarization.TV', 
                                 preprocessors={"dummy": compute_dummy})
resource = next(protocol.train())
resource["dummy"]
Computing 'dummy' key

FileFinder

FileFinder is a special case of preprocessors is pyannote.database.FileFinder meant to automatically locate the media file associated with the uri.

Say audio files are available at the following paths:

.
└── /path/to
    └── audio
        ├── filename1.wav
        ├── filename2.mp3
        ├── filename3.wav
        ├── filename4.wav
        └── filename5.mp3

The FileFinder preprocessor relies on a Databases: section that should be added to the database.yml configuration files and indicates where to look for media files (using resource key placeholders):

Databases:
  MyDatabase: 
    - /path/to/audio/{uri}.wav
    - /path/to/audio/{uri}.mp3

Protocols:
  MyDatabase:
    Protocol:
      MyProtocol:
        train:
            uri: lists/train.lst

Note that any pattern supported by pathlib.Path.glob is supported (but avoid ** as much as possible). Paths can also be relative to the location of database.yml. It will then do its best to locate the file at runtime:

from pyannote.database import registry
from pyannote.database import FileFinder
protocol = registry.get_protocol('MyDatabase.SpeakerDiarization.MyProtocol', 
                                 preprocessors={"audio": FileFinder()})
for resource in protocol.train():
    print(resource["audio"])
/path/to/audio/filename1.wav
/path/to/audio/filename2.mp3

Tasks

Collections

A raw collection of files (i.e. without any train/development/test split) can be defined using the Collection task:

# ~/database.yml
Protocols:
  MyDatabase:
    Collection:
      MyCollection:
        uri: /path/to/collection.lst
        any_other_key: ... # see custom loader documentation

where /path/to/collection.lst contains the list of identifiers of the files in the collection:

# /path/to/collection.lst
filename1
filename2
filename3

It can the be used in Python like this:

from pyannote.database import registry
collection = registry.get_protocol('MyDatabase.Collection.MyCollection')

for file in collection.files():
   print(file["uri"])
filename1
filename2
filename3

Segmentation

A (temporal) segmentation protocol can be defined using the Segmentation task:

Protocols:
  MyDatabase:
    Segmentation:
      MyProtocol:
        classes: 
          - speech
          - noise
          - music
        train:
            uri: /path/to/train.lst
            annotation: /path/to/train.rttm
            annotated: /path/to/train.uem

where /path/to/train.lst contains the list of identifiers of the files in the training set:

# /path/to/train.lst
filename1
filename2

/path/to/train.rttm contains the reference segmentation using RTTM format:

# /path/to/reference.rttm
SPEAKER filename1 1 3.168 0.800 <NA> <NA> speech <NA> <NA>
SPEAKER filename1 1 5.463 0.640 <NA> <NA> speech <NA> <NA>
SPEAKER filename1 1 5.496 0.574 <NA> <NA> music <NA> <NA>
SPEAKER filename1 1 10.454 0.499 <NA> <NA> music <NA> <NA>
SPEAKER filename2 1 2.977 0.391 <NA> <NA> noise <NA> <NA>
SPEAKER filename2 1 18.705 0.964 <NA> <NA> noise <NA> <NA>
SPEAKER filename2 1 22.269 0.457 <NA> <NA> speech <NA> <NA>
SPEAKER filename2 1 28.474 1.526 <NA> <NA> speech <NA> <NA>

/path/to/train.uem describes the annotated regions using UEM format:

filename1 NA 0.000 30.000
filename2 NA 0.000 30.000
filename2 NA 40.000 70.000

It is recommended to provide the annotated key even if it covers the whole file. Any part of annotation that lives outside of the provided annotated will be removed. It is also used by pyannote.metrics to remove un-annotated regions from the evaluation, and to prevent pyannote.audio from incorrectly considering empty un-annotated regions as negatives.

It can then be used in Python like this:

from pyannote.database import registry
protocol = registry.get_protocol('MyDatabase.Segmentation.MyProtocol')

for file in protocol.train():
   print(file["uri"])
   assert "annotation" in file
   assert "annotated" in file
filename1
filename2

Speaker diarization

A protocol can be defined specifically for speaker diarization using the SpeakerDiarization task:

Protocols:
  MyDatabase:
    SpeakerDiarization:
      MyProtocol:
        scope: file
        train:
            uri: /path/to/train.lst
            annotation: /path/to/train.rttm
            annotated: /path/to/train.uem

where /path/to/train.lst contains the list of identifiers of the files in the training set:

# /path/to/train.lst
filename1
filename2

/path/to/train.rttm contains the reference speaker diarization using RTTM format:

# /path/to/reference.rttm
SPEAKER filename1 1 3.168 0.800 <NA> <NA> speaker_A <NA> <NA>
SPEAKER filename1 1 5.463 0.640 <NA> <NA> speaker_A <NA> <NA>
SPEAKER filename1 1 5.496 0.574 <NA> <NA> speaker_B <NA> <NA>
SPEAKER filename1 1 10.454 0.499 <NA> <NA> speaker_B <NA> <NA>
SPEAKER filename2 1 2.977 0.391 <NA> <NA> speaker_C <NA> <NA>
SPEAKER filename2 1 18.705 0.964 <NA> <NA> speaker_C <NA> <NA>
SPEAKER filename2 1 22.269 0.457 <NA> <NA> speaker_A <NA> <NA>
SPEAKER filename2 1 28.474 1.526 <NA> <NA> speaker_A <NA> <NA>

/path/to/train.uem describes the annotated regions using UEM format:

filename1 NA 0.000 30.000
filename2 NA 0.000 30.000
filename2 NA 40.000 70.000

It is recommended to provide the annotated key even if it covers the whole file. Any part of annotation that lives outside of the provided annotated will be removed. It is also used by pyannote.metrics to remove un-annotated regions from the evaluation, and to prevent pyannote.audio from incorrectly considering empty un-annotated regions as non-speech.

It can then be used in Python like this:

from pyannote.database import registry
protocol = registry.get_protocol('MyDatabase.SpeakerDiarization.MyProtocol')

for file in protocol.train():
   print(file["uri"])
   assert "annotation" in file
   assert "annotated" in file
filename1
filename2

The scope parameters indicates the scope of speaker labels:

  • file indicates that each file has its own set of speaker labels. There is no guarantee that speaker1 in filename1 is the same speaker as speaker1 in filename2.
  • database indicates that all files in the database share the same set of speaker labels. speaker1 in database1/filename1 is the same speaker as speaker1 in database1/filename2.
  • global indicates that the set of speaker labels is the same across all databases. speaker1 in database1 is the same speaker as speaker1 in database2.

scope is then directly accessible from file['scope'].

Speaker verification

A simple speaker verification protocol can be defined by adding a trial entry to a SpeakerVerification task:

Protocols:
  MyDatabase:
    SpeakerVerification:
      MyProtocol:
        train:
            uri: /path/to/train.lst
            duration: /path/to/duration.map
            trial: /path/to/trial.txt

where /path/to/train.lst contains the list of identifiers of the files in the collection:

# /path/to/collection.lst
filename1
filename2
filename3
...

/path/to/duration.map contains the duration of the files:

filename1 30.000
filename2 30.000
...

/path/to/trial.txt contains a list of trials :

1 filename1 filename2
0 filename1 filename3
...

1 stands for target trials and 0 for non-target trials. In the example below, it means that the same speaker uttered files filename1 and filename2 and that filename1 and filename3 are from two different speakers.

It can then be used in Python like this:

from pyannote.database import registry
protocol = registry.get_protocol('MyDatabase.SpeakerVerification.MyProtocol')

for trial in protocol.train_trial():
   print(f"{trial['reference']} {trial['file1']['uri']} {trial['file2']['uri']}")
1 filename1 filename2
0 filename1 filename3

Note that speaker verification protocols (SpeakerVerificationProtocol) are a subclass of speaker diarization protocols (SpeakerDiarizationProtocol). As such, they also define regular {subset} methods.

Meta-protocols and requirements

pyannote.database provides a way to combine several protocols (possibly from different databases) into one.

This is achieved by defining those "meta-protocols" into the configuration file with the special X database:

Requirements:
  - /path/to/my/database/database.yml         # defines MyDatabase protocols
  - /path/to/my/other/database/database.yml   # defines MyOtherDatabase protocols

Protocols:
  X:
    Protocol:
      MyMetaProtocol:
        train:
          MyDatabase.Protocol.MyProtocol: [train, development]
          MyOtherDatabase.Protocol.MyOtherProtocol: [train, ]
        development:
          MyDatabase.Protocol.MyProtocol: [test, ]
          MyOtherDatabase.Protocol.MyOtherProtocol: [development, ]
        test:
          MyOtherDatabase.Protocol.MyOtherProtocol: [test, ]

The new X.Protocol.MyMetaProtocol combines the train and development subsets of MyDatabase.Protocol.MyProtocol with the train subset of MyOtherDatabase.Protocol.MyOtherProtocol to build a meta train subset.

This new "meta-protocol" can be used like any other protocol of the (fake) X database:

from pyannote.database import registry
protocol = registry.get_protocol('X.Protocol.MyMetaProtocol')

for resource in protocol.train():
    pass

Plugins

For more complex protocols, you can create (and share) your own pyannote.database plugin.

A bunch of pyannote.database plugins are already available (search for pyannote.db on pypi)

API

Databases and tasks

Everything about databases is stored in the registry.

  from pyannote.database import registry

Any database can then be instantiated as follows:

database = registry.get_database("MyDatabase")

Some databases (especially multimodal ones) may be used for several tasks. One can get a list of tasks using get_tasks method:

database.get_tasks()
["SpeakerDiarization"]

Custom data loaders

pyannote.database provides built-in data loaders for a limited set of file formats: RTTMLoader for .rttm files, UEMLoader for .uem files, and CTMLoader for .ctm files.

In case those are not enough, pyannote.database supports the addition of custom data loaders using the pyannote.database.loader entry point.

Defining custom data loaders

Here is an example of a Python package called your_package that defines two custom data loaders for files with .ext1 and .ext2 suffix respectively.

# ~~~~~~~~~~~~~~~~ YourPackage/your_package/loader.py ~~~~~~~~~~~~~~~~
from pyannote.database import ProtocolFile
from pathlib import Path

class Ext1Loader:
    def __init__(self, ext1: Path):
        print(f'Initializing Ext1Loader with {ext1}')
        # your code should obviously do something smarter.
        # see pyannote.database.loader.RTTMLoader for an example.
        self.ext1 = ext1

    def __call__(self, current_file: ProtocolFile) -> Text:
        uri = current_file["uri"]
        print(f'Processing {uri} with Ext1Loader')
        # your code should obviously do something smarter.
        # see pyannote.database.loader.RTTMLoader for an example.
        return f'{uri}.ext1'

class Ext2Loader:
    def __init__(self, ext2: Path):
        print(f'Initializing Ext2Loader with {ext2}')
        # your code should obviously do something smarter.
        # see pyannote.database.loader.RTTMLoader for an example.
        self.ext2 = ext2

    def __call__(self, current_file: ProtocolFile) -> Text:
        uri = current_file["uri"]
        print(f'Processing {uri} with Ext2Loader')
        # your code should obviously do something smarter.
        # see pyannote.database.loader.RTTMLoader for an example.
        return f'{uri}.ext2'
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The __init__ method expects a unique positional argument of type Path that provides the path to the data file in the custom data format.

__call__ expects a unique positional argument of type ProtocolFile and returns the data for the given file.

It is recommended to make __init__ as fast and light as possible and delegate all the data filtering and formatting to __call__. For instance, RTTMLoader.__init__ uses pandas to load the full .rttm file as fast as possible in a DataFrame, while RTTMLoader.__call__ takes care of selecting rows that correspond to the requested file and convert them into a pyannote.core.Annotation.

Registering custom data loaders

At this point, pyannote.database has no idea of the existence of these new custom data loaders. They must be registered using the pyannote.database.loader entry-point in your_package's setup.py, and then install the library pip install your_package (or pip install -e YourPackage/ if it is not published on PyPI yet).

# ~~~~~~~~~~~~~~~~~~~~~~~ YourPackage/setup.py ~~~~~~~~~~~~~~~~~~~~~~~
from setuptools import setup, find_packages
setup(
    name="your_package",
    packages=find_packages(),
    install_requires=[
        "pyannote.database >= 4.0",
    ]
    entry_points={
        "pyannote.database.loader": [
            # load files with extension '.ext1' 
            # with your_package.loader.Ext1Loader
            ".ext1 = your_package.loader:Ext1Loader",
            # load files with extension '.ext2' 
            # with your_package.loader.Ext2Loader
            ".ext2 = your_package.loader:Ext2Loader",
        ],
    }
)
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Testing custom data loaders

Now that .ext1 and .ext2 data loaders are registered, they will be used automatically by pyannote.database when parsing the sample demo/database.yml custom protocol configuration file.

# ~~~~~~~~~~~~~~~~~~~~~~~~~~ demo/database.yml ~~~~~~~~~~~~~~~~~~~~~~~~~~
Protocols:
  MyDatabase:
    SpeakerDiarization:
      MyProtocol:
        train:
           uri: train.lst
           key1: train.ext1
           key2: train.ext2
# tell pyannote.database about the configuration file
>>> from pyannote.database import registry
>>> registry.load_database('demo/database.yml')

# load custom protocol
>>> protocol = registry.get_protocol('MyDatabase.SpeakerDiarization.MyProtocol')

# get first file of training set
>>> first_file = next(protocol.train())
Initializing Ext1Loader with file train.ext1
Initializing Ext2Loader with file train.ext2

# access its "key1" and "key2" keys.
>>> assert first_file["key1"] == 'fileA.ext1'
Processing fileA with Ext1Loader
>>> assert first_file["key2"] == 'fileA.ext2'
Processing fileA with Ext2Loader
# note how __call__ is only called now (and not before)
# this is why it is better to delegate all the filtering and formatting to __call__

>>> assert first_file["key1"] == 'fileA.ext1'
# note how __call__ is not called the second time thanks to ProtocolFile built-in cache

Protocols

An experimental protocol can be defined programmatically by creating a class that inherits from SpeakerDiarizationProtocol and implements at least one of train_iter, development_iter and test_iter methods:

class MyProtocol(Protocol):
    def train_iter(self) -> Iterator[Dict]:
        yield {"uri": "filename1", "any_other_key": "..."}
        yield {"uri": "filename2", "any_other_key": "..."}

{subset}_iter should return an iterator of dictionnaries with - "uri" key (mandatory) that provides a unique file identifier (usually the filename), - any other key that the protocol may provide.

It can then be used in Python like this:

protocol = MyProtocol()
for file in protocol.train():
    print(file["uri"])
filename1
filename2

Collections

A collection can be defined programmatically by creating a class that inherits from CollectionProtocol and implements the files_iter method:

class MyCollection(CollectionProtocol):
    def files_iter(self) -> Iterator[Dict]:
        yield {"uri": "filename1", "any_other_key": "..."}
        yield {"uri": "filename2", "any_other_key": "..."}
        yield {"uri": "filename3", "any_other_key": "..."}

files_iter should return an iterator of dictionnaries with - a mandatory "uri" key that provides a unique file identifier (usually the filename), - any other key that the collection may provide.

It can then be used in Python like this:

collection = MyCollection()
for file in collection.files():
   print(file["uri"])
filename1
filename2
filename3

Speaker diarization

A speaker diarization protocol can be defined programmatically by creating a class that inherits from SpeakerDiarizationProtocol and implements at least one of train_iter, development_iter and test_iter methods:

class MySpeakerDiarizationProtocol(SpeakerDiarizationProtocol):
    def train_iter(self) -> Iterator[Dict]:
        yield {"uri": "filename1",
               "annotation": Annotation(...),
               "annotated": Timeline(...)}
        yield {"uri": "filename2",
               "annotation": Annotation(...),
               "annotated": Timeline(...)}

{subset}_iter should return an iterator of dictionnaries with

  • "uri" key (mandatory) that provides a unique file identifier (usually the filename),
  • "annotation" key (mandatory for train and development subsets) that provides reference speaker diarization as a pyannote.core.Annotation instance,
  • "annotated" key (recommended) that describes which part of the file has been annotated, as a pyannote.core.Timeline instance. Any part of "annotation" that lives outside of the provided "annotated" will be removed. This is also used by pyannote.metrics to remove un-annotated regions from its evaluation report, and by pyannote.audio to not consider empty un-annotated regions as non-speech.
  • any other key that the protocol may provide.

It can then be used in Python like this:

protocol = MySpeakerDiarizationProtocol()
for file in protocol.train():
   print(file["uri"])
filename1
filename2

Speaker verification

A speaker verification protocol implement the {subset}_trial functions, useful in speaker verification validation process. Note that SpeakerVerificationProtocol is a subclass of SpeakerDiarizationProtocol. As such, it shares the same {subset}_iter methods, and need a mandatory {subset}_iter method.

A speaker verification protocol can be defined programmatically by creating a class that inherits from SpeakerVerificationProtocol and implement at least one of train_trial_iter, development_trial_iter and test_trial_iter methods:

class MySpeakerVerificationProtocol(SpeakerVerificationProtocol):
    def train_iter(self) -> Iterator[Dict]:
        yield {"uri": "filename1",
               "annotation": Annotation(...),
               "annotated": Timeline(...)}
        yield {"uri": "filename2",
               "annotation": Annotation(...),
               "annotated": Timeline(...)}
    def train_trial_iter(self) -> Iterator[Dict]:
        yield {"reference": 1,
               "file1": ProtocolFile(...),
               "file2": ProtocolFile(...)}
        yield {"reference": 0,
               "file1": {
                 "uri":"filename1",
                 "try_with":Timeline(...)
                  },
               "file1": {
                 "uri":"filename3",
                 "try_with":Timeline(...)
                 }
               }

{subset}_trial_iter should return an iterator of dictionnaries with

  • reference key (mandatory) that provides an int portraying whether file1 and file2 are uttered by the same speaker (1 is same, 0 is different),
  • file1 key (mandatory) that provides the first file,
  • file2 key (mandatory) that provides the second file.

Both file1 and file2 should be provided as dictionaries or pyannote.database.protocol.protocol.ProtocolFile instances with

  • uri key (mandatory),
  • try_with key (mandatory) that describes which part of the file should be used in the validation process, as a pyannote.core.Timeline instance.
  • any other key that the protocol may provide.

It can then be used in Python like this:

protocol = MySpeakerVerificationProtocol()
for trial in protocol.train_trial():
   print(f"{trial['reference']} {trial['file1']['uri']} {trial['file2']['uri']}")
1 filename1 filename2
0 filename1 filename3

pyannote-database's People

Contributors

dependabot[bot] avatar francescobonzi avatar frenchkrab avatar hbredin avatar paullerner avatar pkorshunov avatar wesbz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pyannote-database's Issues

Training on Jamendo Corpus

Hi!

I would like to train one of your pyannote.audio models using the Jamendo Corpus dataset available here: https://zenodo.org/record/2585988#.Yh9QgBPMJhE

Unfortunately I have some problems defining the custom data loader. Each audio track has a single label file, in .lab format with start end label format and this is different from the CTMLoader.

I wrote the following files.

database.yml:
Databases: Jamendo: - /path_to_jamendo/{uri}.mp3 - /path_to_jamendo/{uri}.ogg Protocols: Jamendo: Protocol: JamendoProtocol: train: uri: /path_to_jamendo/filelists/train annotation: /path_to_jamendo/labels/{uri}.lab development: uri: /path_to_jamendo/filelists/valid annotation: /path_to_jamendo/labels/{uri}.lab test: uri: /path_to_jamendo/filelists/test annotation: /path_to_jamendo/labels/{uri}.lab

setup.py:
from setuptools import setup, find_packages setup( name="jamendo_lab_loader", packages=find_packages(), install_requires=[ "pyannote.database >= 4.0", ], entry_points={ "pyannote.database.loader": [ ".lab = jamendo_lab_loader.loader:LabLoader", ], } )

I don't know how to write the loader.py and how to use it. Do you have any suggestions?
Thank you for sharing this great pyannote work. Hope you can help me.
Francesco

Warning: Existing key "annotation" may have been modified.

Hi,

I keep getting this warning when training a pipeline (this issue might have to be transferred) /mnt/beegfs/projects/plumcot/pyannote/pyannote-database/pyannote/database/protocol/protocol.py:128: UserWarning: Existing key "annotation" may have been modified.

This only happens during the first trial (i.e. the first iteration over the whole database subset) but it didn't happen before 4.0

Multiple preprocessor for same field

Hey,

For that (now cursed) VTC pipeline, if we want to keep things neat, I'll be needing the support for lists of preprocessors for a given field. Let me explain myself: let's say I have 2 preprocessors for file["annotation"], LabelMapper and VoiceTypeClassifierPreprocessor. I need to be able to chain them, the solution being pretty easy:

protocol = get_protocol("Db.ProtocolType.MyProtocol", preprocessors={
    "audio": FileFinder(),
    "annotation": [LabelMapper(), VoiceTypeClassifierPreprocessor(classes=..., unions=...)]
})

Obviously, the order of preprocessors in the list is important.

Would you be ok with me adding support for this in pyannote-db, while keeping the support for the current "single preprocessor" API?

LABLoader import error

Packages installed using pip install pyannote.database command, do not have the latest updates, such as LABLoader

cannot import name 'LABLoader' from 'pyannote.database.loader'

Specify the PYANNOTE_DATABASE_CONFIG in python

The possibility of defining the location of the database using python. This of course would not be compatible with a command line interface that is planned #45 but would make the use of this library more flexible.

For example:

protocol = get_protocol('Debug.SpeakerDiarization.Debug', preprocessors={"audio": FileFinder()})
protocol = get_protocol('Debug.SpeakerDiarization.Debug', preprocessors={"audio": FileFinder()}, config="~/.pyannote")

This would be backwards compatible with previous versions of the method

get_unique_identifier loads all protocol file values

Using the ** expression here leads to loading all the values of the protocol file, even those that are not needed

My traceback (only to ease the understanding, the error is unrelated and caused by spacy version)

Traceback (most recent call last):
  File "/people/lerner/anaconda3/envs/transformers/bin/named_id.py", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/people/lerner/pyannote/Prune/prune/named_id.py", line 900, in <module>
    shuffle=False))
  File "/people/lerner/pyannote/Prune/prune/named_id.py", line 523, in batchify
    current_audio_emb = audio_emb(current_file)
  File "/people/lerner/pyannote/pyannote-audio/pyannote/audio/features/wrapper.py", line 274, in __call__
    return self.scorer_(current_file)
  File "/people/lerner/pyannote/pyannote-audio/pyannote/audio/features/precomputed.py", line 205, in __call__
    path = Path(self.get_path(current_file))
  File "/people/lerner/pyannote/pyannote-audio/pyannote/audio/features/precomputed.py", line 73, in get_path
    uri = get_unique_identifier(item)
  File "/people/lerner/pyannote/pyannote-database/pyannote/database/util.py", line 205, in get_unique_identifier
    return IDENTIFIER.format(**item)
  File "/people/lerner/pyannote/pyannote-database/pyannote/database/protocol/protocol.py", line 122, in __getitem__
    # just imagine that this key is forbidden
    value = self.lazy[key](self)
  File "/people/lerner/pyannote/pyannote-database/pyannote/database/custom.py", line 100, in load
    return loader(current_file)
  File "/vol/work/lerner/pyannote-db-plumcot/Plumcot/loader/loader.py", line 169, in __call__
    attributes)
  File "/vol/work/lerner/pyannote-db-plumcot/Plumcot/loader/loader.py", line 201, in merge_transcriptions_entities
    # and that this raises : "Oh no, forbidden key !"
    _, one2one, _, _, one2multi = align(tokens, e_tokens)
ValueError: too many values to unpack (expected 5)

`LABLoader` raise ValueError("`path` must contain the {uri} placeholder.") even if the placeholder is configured correctly

Part of my configuration:

Databases:
  # tell pyannote.database where to find AMI wav files.
  # {uri} is a placeholder for the session name (eg. ES2004c).
  # you might need to update this line to fit your own setup.
  AMI: amicorpus/{uri}/audio/{uri}.Mix-Headset.wav
  AMI-SDM: amicorpus/{uri}/audio/{uri}.Array1-01.wav

Protocols:

  AMI-SDM:
    SpeakerDiarization:
      only_words:
        train:
            uri: ../lists/train.meetings.txt
            annotation: ../only_words/rttms/train/{uri}.rttm
            annotated: ../uems/train/{uri}.uem
            lab: ../only_words/labs/train/{uri}.lab
        development:
            uri: ../lists/dev.meetings.txt
            annotation: ../only_words/rttms/dev/{uri}.rttm
            annotated: ../uems/dev/{uri}.uem
            lab: ../only_words/labs/dev/{uri}.lab
        test:
            uri: ../lists/test.meetings.txt
            annotation: ../only_words/rttms/test/{uri}.rttm
            annotated: ../uems/test/{uri}.uem
            lab: ../only_words/labs/test/{uri}.lab

When I comment out these two lines, the program runs well and file['lab'] returns exactly an Annotation object

if "uri" not in self.placeholders_:
raise ValueError("`path` must contain the {uri} placeholder.")

Seems this sanity check is not working as expected. Also other loaders (e.g. RTTMLoader) don't have this line (I guess the logic should be similar).

pyannote-audio sad train fails

I am executing this command in pyannote 1.1.2 (I am pretty sure that is what I have):

export EXP_DIR=finetune1
pyannote-audio sad train --pretrained=sad_dihard --subset=train --to=5 ${EXP_DIR} headcam16.SpeakerDiarization.try1

it fails:

File "/ext3/miniconda3/envs/pyannote6/lib/python3.8/site-packages/pyannote/audio/labeling/tasks/base.py", line 294, in _load_metadata
current_file["annotated"] = get_annotated(current_file).crop(
AttributeError: 'NoneType' object has no attribute 'crop'

I do not have a uem file, I suspect this may be a problem.

I am happy to provide more info if that would be helpful, hoping you can easily tell me what the problem is...

Thanks
Michael

TypeError related to custom protocol name

Hi,

One of my protocol's name is '24', it causes the following error : TypeError: type.__new__() argument 1 must be str, not int.
I guess it should be easily fixable by tweaking yaml loading parameters or adding a str conversion somewhere but I don't understand why the name can't be an int in the first place.

Full log below

Merry Christmas :)

Traceback (most recent call last):
  File "/people/lerner/anaconda3/envs/pyannote/bin/pyannote-speaker-embedding", line 11, in <module>
    load_entry_point('pyannote.audio', 'console_scripts', 'pyannote-speaker-embedding')()
  File "/people/lerner/anaconda3/envs/pyannote/lib/python3.7/site-packages/pkg_resources/__init__.py", line 489, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/people/lerner/anaconda3/envs/pyannote/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2852, in load_entry_point
    return ep.load()
  File "/people/lerner/anaconda3/envs/pyannote/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2443, in load
    return self.resolve()
  File "/people/lerner/anaconda3/envs/pyannote/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2449, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/people/lerner/pyannote/pyannote-audio/pyannote/audio/applications/speaker_embedding.py", line 180, in <module>
    from .base import Application
  File "/people/lerner/pyannote/pyannote-audio/pyannote/audio/applications/base.py", line 40, in <module>
    from pyannote.database import FileFinder
  File "/people/lerner/pyannote/pyannote-database/pyannote/database/__init__.py", line 62, in <module>
    DATABASES, TASKS = add_custom_protocols()
  File "/people/lerner/pyannote/pyannote-database/pyannote/database/custom.py", line 314, in add_custom_protocols
    {'__init__': get_init(register)})
TypeError: type.__new__() argument 1 must be str, not int

Cannot combine several protocols from different databases into one

I am trying to combine multiple protocols into one as shown in the [Meta-protocols and requirements] section in the README. I create the following database.yml configuration file with the already existing protocol configuration files (both works fine).

Requirements:
  - /path/to/folder/aishell4/database.yml
  - /path/to/folder/ali/database.yml

Protocols:
  CombinedAll:
    SpeakerDiarization:
      MyMetaProtocol:
        train:
          AISHELL4.SpeakerDiarization.Custom: [train, ]
          ALI.SpeakerDiarization.Custom: [train, ]
        development:
          AISHELL4.SpeakerDiarization.Custom: [development, ]
          ALI.SpeakerDiarization.Custom: [development, ]
        test:
          AISHELL4.SpeakerDiarization.Custom: [test, ]
          ALI.SpeakerDiarization.Custom: [test, ]

And then trying to call it

from pyannote.database import registry, FileFinder

registry.load_database("path/to/combined/database.yml")
protocol = registry.get_protocol("CombinedAll.SpeakerDiarization.MyMetaProtocol", preprocessors={"audio": FileFinder()})

for file in protocol.train():
    pass

But it gives me an error:

File "/opt/miniconda3/envs/speaker_diar/lib/python3.10/site-packages/pyannote/database/protocol/protocol.py", line 374, in subset_helper
  for file in files:
File "/opt/miniconda3/envs/speaker_diar/lib/python3.10/site-packages/pyannote/database/custom.py", line 317, in subset_iter
  raise ValueError("Missing mandatory 'uri' entry in CombinedAll.SpeakerDiarization.MyMetaProtocol.train")

How can this combination of protocols be done?

Rename 'speakers' to 'labels

protocol.stats(subset) returns a dictionary whose speakers key should be rename to labels.

For now, this is specific to speaker identification, but it should also work for language identification, for instance.

pyannote-database command line tool

  • List available databases
pyannote-database database
  • List available protocols (with optional --database and --task filters)
pyannote-database protocol [--database=<database>] [--task=<task>]
  • Get statistics about a protocol (with optional --subset filter)
pyannote-database stats <protocol> [--subset=<subset>]

ImportError: cannot import name 'registry' from 'pyannote.database'

Hello:
I try to apply my own YAML with the command: from pyannote.database import get_protocol
But it shows"ImportError: cannot import name 'registry' from 'pyannote.database' (/home/xuan/anaconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/database/init.py)"
How can I solve this problem?Thanks^_^

Faster RTTMLoader

RTTMLoader class is extremely slow for large RTTM files containing annotation of multiple audio files (e.g. VoxCeleb dataset).

We should make it faster!

A small bug in protocol

In the new version of pyannote.database.
line 71 In pyannote-database/pyannote/database/protocol/protocol.py

item[key] = preprocessor(item)

should be

item[key] = preprocessor(**item)

No loader for file with '.rttm' suffix

Hi, I've been following the pyannote-audio data preparation tutorial, and am trying to understand how the database.yml files work by running code samples from the README.

For instance, using the sample database.yml file, the following code sample should print out some filenames:

from pyannote.database import get_protocol
protocol = get_protocol('AMI.SpeakerDiarization.MixHeadset')
for resource in protocol.train():
    print(resource["uri"])

Instead, I get an error message about there not being a loader for .rttm files:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-3540f768261e> in <module>()
      2 from pyannote.database import get_protocol
      3 protocol = get_protocol('AMI.SpeakerDiarization.MixHeadset')
----> 4 for resource in protocol.train():
      5     print(resource["uri"])

2 frames
/usr/local/lib/python3.6/dist-packages/pyannote/database/custom.py in gather_loaders(entries, database_yml)
    205             if path.suffix not in LOADERS:
    206                 msg = f"No loader for file with '{path.suffix}' suffix"
--> 207                 raise TypeError(msg)
    208 
    209             # load custom loader class

TypeError: No loader for file with '.rttm' suffix

Here's a public Google Colab file where you can see the issue. It only takes a minute or so to run.
https://colab.research.google.com/drive/1ErF8KOk-s11zUXjOEbZnRguvzj2SBafC

Really appreciate any advice on how to fix this. Thanks!

A bug in custom.py

Hi, I think there is a small bug in the database/custom.py file.

in line 126 through 135, the current code looks like the following.

# load annotations
    if file_rttm is not None:

        if file_rttm.suffix == '.rttm':
            annotations = load_rttm(file_rttm)
        elif file_rttm.suffix == '.mdtm':
            annotations = load_mdtm(file_mdtm)
        else:
            msg = f'Unsupported format in {file_rttm}: please use RTTM.'
            raise ValueError(msg)

For the *.mdtm file case,
annotations = load_mdtm(file_mdtm) should be annotations = load_mdtm(file_rttm)

Error in dataloader : 'PosixPath' object has no attribute 'format'

Hi !
I have an error in the data loader :

Traceback (most recent call last):
  File "main.py", line 292, in <module>
    args.func(args)
  File "main.py", line 134, in run
    trainer.fit(model)
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
    self._call_and_handle_interrupt(
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1138, in _run
    self._call_setup_hook()  # allow user to setup lightning_module in accelerator environment
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1439, in _call_setup_hook
    self.call_hook("setup", stage=fn)
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1501, in call_hook
    output = model_fx(*args, **kwargs)
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/audio/core/model.py", line 349, in setup
    self.task.setup()
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py", line 474, in wrapped_fn
    fn(*args, **kwargs)
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/audio/tasks/segmentation/mixins.py", line 51, in setup
    for f in self.protocol.train():
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 363, in subset_helper
    yield self.preprocess(file)
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 329, in preprocess
    return ProtocolFile(current_file, lazy=self.preprocessors)
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 87, in __init__
    self._store[key] = precomputed[key]
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 122, in __getitem__
    value = self.lazy[key](self)
  File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/database/loader.py", line 128, in __call__
    loaded = load_rttm(self.path.format(**sub_file))
AttributeError: 'PosixPath' object has no attribute 'format'

My version of pyannote-database is :
pyannote.database 4.1.1

This error seems to happen while trying to load files that don't have annotations in the rttm file.

Problem loading a custom protocol

Hello,

I am trying leveraging the pre-trained models on my own data. I already followed the instruction to modify the AMI protocol file. However, when I try following the instruction on preparing the protocol on the AMI subset, I kept getting error as: 'KeyError: 'AMI'' and "ValueError: Could not find any protocol for "AMI" database".

Here are the instruction on the page that I followed:

preprocessors = {'audio': FileFinder()}
protocol = get_protocol('AMI.SpeakerDiarization.MixHeadset',
preprocessors=preprocessors)

Not sure if it was the directory issue. If so, how can I define the proper directory while calling it? And what is the function FileFinder() used for here?

Thanks for your time!

support for transcripts and entity linking annotation

Hi,

Related to PaulLerner/pyannote-db-plumcot#12.

How should I proceed ?

In all case each token should have a 'speaker' field (which might be set to unavailable)

If the transcript is a '.aligned' file, several fields could be added:

  • token start time
  • token end time
  • token alignment confidence

If the entity annotation is provided we could add (I guess those already exists in spacy):

  • POS tag
  • dependency
  • entity type

Any suggestions ?

AttributeError: 'PosixPath' object has no attribute 'format'

Im trying to train the model with my own data, but i cannot solve this problem.

To Reproduce
Steps to reproduce the behavior:

$pyannote-audio sad train --subset=train --to=200 --parallel=4 ${EXP_DIR} Test.SpeakerDiarization.OwnData

My database.yml :

*Databases:
   Test: ./data_set_wav/{uri}.wav
   MUSAN: ./Pyannote/AMI/musan/{uri}.wav

Protocols:
   Test:
      SpeakerDiarization:
         OwnData:
           train:
              uri: ./Reference_files/validate/train.data.lst
              annotation: ./Reference_files/validate/train.data.rttm
              annotated: ./Reference_files/validate/train.data.uem
           development:
              uri: ./Pyannote/AMI/AMI/MixHeadset.development.lst
              annotation: ./Pyannote/AMI/AMI/MixHeadset.development.rttm
              annotated: ./Pyannote/AMI/AMI/MixHeadset.development.uem
           test:
              uri: ./Pyannote/AMI/AMI/MixHeadset.test.lst
              annotation: ./Pyannote/AMI/AMI/MixHeadset.test.rttm
              annotated: ./Pyannote/AMI/AMI/MixHeadset.test.uem
   MUSAN:
      Collection:
         BackgroundNoise:
            uri: ./Pyannote/AMI/musan/MUSAN/background_noise.txt
         Noise:
            uri: ./Pyannote/AMI/musan/MUSAN/noise.txt
         Music:
            uri: ./Pyannote/AMI/musan/MUSAN/music.txt
         Speech:
            uri: ./Pyannote/AMI/musan/MUSAN/speech.txt

pyannote environment

pyannote.audio==1.1.1
pyannote.core==4.3
pyannote.database==4.1.1
pyannote.metrics==3.1
pyannote.pipeline==1.5.2

My train.lst file (First 10 lines):

140471632__701151021998050417_272_2751_20210210_134647
127467523__701151985360118_272_2754_20210201_095339
135434197__701151031971345585_356_3972_20210205_165315
131519247__701151957207998_461_2881_20210203_152337
147034889__450198019993417681_93_3989_20210216_094739
130398174__701151091989826532_272_2754_20210203_084818
128654151__701151019988926593_356_3956_20210201_181652
138260478__01577999895159_147_2413_20210209_111043
146777865__701151021996438304_272_4027_20210215_183913
81808790__450198986402390_137_2680_20201217_085743

My train.rttm file (First 10 lines):

SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 174.301 1.25 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 56.4404 0.9400000000000048 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 20.6501 1.2001000000000026 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 23.8802 1.110000000000003 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 28.2202 0.6000000000000014 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 3.63 0.9199999999999999 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 5.63 0.41000000000000014 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 7.4301 0.8999999999999995 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 8.4901 0.6199999999999992 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 9.7601 0.7100000000000009 <NA> <NA> Customer <NA> <NA>

Additional context
The example with the amicorpus data run with no error, but when i try to train my own data always get this error

AttributeError: 'NoneType' object has no attribute 'items'

Upon trying to run the following code to label preprocessors:

from pyannote.database import FileFinder
preprocessors = {'audio': FileFinder()}

I ran into the attribute error below:

AttributeError Traceback (most recent call last)
Input In [12], in <cell line: 2>()
1 from pyannote.database import FileFinder
----> 2 preprocessors = {'audio': FileFinder()}

File ~\Anaconda3\lib\site-packages\pyannote\database\util.py:94, in FileFinder.init(self, database_yml)
89 with open(self.database_yml, "r") as fp:
90 config = yaml.load(fp, Loader=yaml.SafeLoader)
92 self.config_: Dict[DatabaseName, Union[PathTemplate, List[PathTemplate]]] = {
93 str(database): path
---> 94 for database, path in config.get("Databases", dict()).items()
95 }

AttributeError: 'NoneType' object has no attribute 'items'

Could anyone please advise on how to overcome this issue? database config and database.yml contents below.

echo $PYANNOTE_DATABASE_CONFIG:
/Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/database.yml

database.yml:
Databases:
AMI: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/*/audio/{uri}.wav
MUSAN: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/MUSAN/musan/music/{uri}.wav

Protocols:
AMI:
SpeakerDiarization:
MixHeadset:
train:
uri: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.train.lst
annotation: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.train.rttm
annotated: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.train.uem
development:
uri: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.development.lst
annotation: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.development.rttm
annotated: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.development.uem
test:
uri: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.test.lst
annotation: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.test.rttm
annotated: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.test.uem

Generic database interface

while making db interfaces, I ended up using very similar code for diarization and speaker spotting protocols to the one that you have implemented in the interface for AMI dataset. This made me thinking that may be it would be possible to generalize this code into generic database interface that would work for any database, given the filelists formatted according the pre-defined format.

problem with pyannote

Hi, i have a problem when i execute the tutorial, well, this msg is returned from after execute program:

FileNotFoundError: Could not find file "EN2002a.Mix-Headset" in the following location(s):

  • /home/gabriel/Desktop/My_tasks/IA-DEEPLEARNING/IC/Codigos/amicorpus/*/audio/EN2002a.Mix-Headset.wav

i dont know why this error happen, can u help me?

Custom protocols cannot be pickled

>>> from pyannote.database import get_protocol
>>> protocol = get_protocol('Debug.SpeakerDiarization.Debug')

>>> import pickle
>>> pickle.dumps(protocol)
PicklingError: Can't pickle <class 'pyannote.database.custom.Debug'>: attribute lookup Debug on pyannote.database.custom failed

That is a blocking issue because it prevents multi-gpu training in pyannote.audio v2.

cc @mogwai

FileFinder doesn't work without development subset

       for method in methods:
            try:
                protocol.progress = False
                file_generator = getattr(protocol, method)()
                first_item = next(file_generator)
            except AttributeError as e:
                continue
            except NotImplementedError as e:
              continue

The first element in methods list is 'development'. If 'development' doesn't in protocol, FileFinder doesn't work.

Person naming conventions

To ensure consistent person naming across pyannote.database plugin, we should define a naming conventions that could be then used by pyannote.database.get_label_identifier to ensure the same person is always labeled the same way.

One could use something like:

  • @first_name_last_name for people whose identity is clearly defined
  • {database}|person_{xx} otherwise

Two person_{xx} in the same {database} are supposed to be different from each other.
However, across database, one cannot tell anything about them.

Help on the "proper way" to build a protocol for DIHARD database with conflicting URI

As you know, i'm building the database "extension" for DiHard2. Their database format is a bit tricky, and I want the user to have to the smallest amount of work possible to do before being able to use the extension.

Here's the setup: the dihard2 data is provided in two archives, which, once unzipped, form two subfolder : dihard2/dev/ and dihard2/test/ (there isn't any train data, so this is where pyannote's other databases extensions and the metaprotocol pattern will come in really handy).

Here's the catch: in both folders, the audio files (encoded in flac) share basically the same path and there are some conflicting uri's. Thus, a ~/.pyannote/database.yml pattern to get only the dev files would be
path/to/dihard_data/dev/data/single_channel/flac/{uri}.flac

and a pattern to match only the test files would be
path/to/dihard_data/test/data/single_channel/flac/{uri}.flac

The real catch is: there are some uri's that are the same, yet not referencing the same audio file

There is a simple yet ugly solution: have the user run a bash script that would be provided in the repo that would somehow fix this, but it would imply also parsing the .rttm files and modify the uri's in there as well. Not pretty, not optimal, not my style.

I've looked a bit into the FileFinder class, and it seems it uses the format fonction to render the "matched" audio file's path: path = path_template.format(uri=uri, database=database, **kwargs)

My guess is that it's maybe possible, in your infrastructure, to use this kind of generic patch in the ~/.pyannote/database.yml file:
path/to/dihard_data/{split}/data/single_channel/flac/{uri}.flac

And the, in my implementation of a protocol have something like this:

class DIHARD2SingleChannelProtocol(SpeakerDiarizationProtocol):
    """DIHARD speaker diarization protocol """

    def dev_iter(self):
        for annot_filepath in self.load_RTTM_files():
             # parse stuff

            current_file = {
                'database': 'DIHARD2',
                'uri': uri,
                'channel': 1,
                'split': 'dev', # <======== added to be then used by the format function
                'annotated': ...,
                'annotation': ...}
            yield current_file

    def tst_iter(self):
        for test_file in test_data:
            current_file = {
                'database': 'DIHARD2',
                'uri': uri,
                'channel': 1,
                'split': 'test', # <======== added to be then used by the format function
                'annotated': ...,}
            yield current_file

Do you think that's feasible? Do you have any better way to solve this kind of problem since you have a much better understanding of pyannote?

Support for custom progress message

Would be nice to have a way to print a custom progress message when iterating over subsets.

from pyannote.database import get_protocol
protocol = get_protocol('AMI.SpeakerDiarization.MixHeadset')
for current_file in protocol.development(msg='Loading data...'):
    pass

Feature: make paths in database.yml (optionally) relative

I think it would be nice to be able to indicate relative paths in database.yml. This way, we could easily share plumcot corpus for example.
We could easily implement it by testing if the path (to e.g. RTTM file) is_absolute() and then concatenate it to the path of PYANNOTE_DATABASE_CONFIG if it's not.

Bug on database.yml

File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/audio/utils/protocol.py", line 31, in check_protocol
file = next(protocol.train())
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 363, in subset_helper
yield self.preprocess(file)
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 329, in preprocess
return ProtocolFile(current_file, lazy=self.preprocessors)
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 87, in init
self.store[key] = precomputed[key]
File "/home/tuenguyen/speech/speech_dia
@/env/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 122, in getitem
value = self.lazykey
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/database/loader.py", line 130, in call
loaded = load_rttm(self.path.format(**sub_file))
AttributeError: 'PosixPath' object has no attribute 'format'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 18, in
scd= SpeakerChangeDetection(ami)
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/audio/tasks/segmentation/speaker_change_detection.py", line 97, in init
super().init(
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/audio/core/task.py", line 175, in init
self.protocol, self.has_validation = check_protocol(protocol)
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/audio/utils/protocol.py", line 34, in check_protocol
raise ValueError(msg)
ValueError: Protocol AMI.SpeakerDiarization.MixHeadset does not define a training set.


I clone the develope branch and install pyannote.
After that, I create a database folder with format

[DirData]/a.lst # uri
[DirData]/a.rttm # protocol
[DirData]/*.wav


+> list(gllob.glob([DirData]/*.wav)) = [mix_0000001.wav]

+> files: a.lst

mix_0000001.wav

+> files: a.rttm

SPEAKER mix_0000001 1 3.60 1.80 speaker_0000004052
SPEAKER mix_0000001 1 6.04 2.28 speaker_0000004052
SPEAKER mix_0000001 1 14.48 2.52 speaker_0000004052
SPEAKER mix_0000001 1 35.03 3.57 speaker_0000004052
SPEAKER mix_0000001 1 64.78 3.36 speaker_0000004052
SPEAKER mix_0000001 1 81.12 2.73 speaker_0000004052
SPEAKER mix_0000001 1 98.48 4.62 speaker_0000004052
SPEAKER mix_0000001 1 106.24 3.78 speaker_0000004052
SPEAKER mix_0000001 1 120.34 3.57 speaker_0000004052
SPEAKER mix_0000001 1 124.89 9.00 speaker_0000004052
SPEAKER mix_0000001 1 134.73 2.52 speaker_0000004052
SPEAKER mix_0000001 1 146.15 3.78 speaker_0000004052
SPEAKER mix_0000001 1 154.14 3.36 speaker_0000004052
SPEAKER mix_0000001 1 202.48 4.62 speaker_0000004052
SPEAKER mix_0000001 1 216.95 3.99 speaker_0000004052
SPEAKER mix_0000001 1 232.39 3.57 speaker_0000004052
SPEAKER mix_0000001 1 244.00 3.72 speaker_0000004052
SPEAKER mix_0000001 1 4.67 3.90 speaker_0000007425
SPEAKER mix_0000001 1 11.09 4.74 speaker_0000007425
SPEAKER mix_0000001 1 17.90 4.56 speaker_0000007425

file database.yml
Databases:
AMI: [DirData]/{uri}.wav
Protocols:
AMI:
SpeakerDiarization:
MixHeadset:
train:
uri:[DirData]/a.lst
annotation:[DirData]/a.rttm


Note: DirData is a absolute path on my local.

Pls I have already read many tutorials and sources code pyannote but I can't firgure out this problem.

image

image

image

Speaker tag across rttm files

Should all the rttm files have unique speaker tags?

For eg., If there are three audio files and all of them have two different speakers(within files and across files too). Can the tags be made like (Speaker_00, Speaker_01) for all these three files or should it be (Speaker_00, Speaker_01) for first file, (Speaker_02, Speaker_03) for second file and (Speaker_04, Speaker_05) for third file?

Training the overlap detection : AttributeError: 'PosixPath' object has no attribute 'format'

from pyannote.audio.tasks import OverlappedSpeechDetection
ovl = OverlappedSpeechDetection(protocol, duration=2., batch_size=32, num_workers=4)
model = SimpleSegmentationModel(task=ovl)
trainer = pl.Trainer(max_epochs=1)
_ = trainer.fit(model)
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-23-7e3bd77348a4> in <module>
      3 model = SimpleSegmentationModel(task=ovl)
      4 trainer = pl.Trainer(max_epochs=1)
----> 5 _ = trainer.fit(model)

~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
    456         # SET UP TRAINING
    457         # ----------------------------
--> 458         self.call_setup_hook(model)
    459         self.call_hook("on_before_accelerator_backend_setup", model)
    460         self.accelerator.setup(self, model)  # note: this sets up self.lightning_module

~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in call_setup_hook(self, model)
   1062             called = self.datamodule.has_setup_test if self.testing else self.datamodule.has_setup_fit
   1063             if not called:
-> 1064                 self.datamodule.setup(stage_name)
   1065         self.setup(model, stage_name)
   1066         model.setup(stage_name)

~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py in wrapped_fn(*args, **kwargs)
     90             obj._has_prepared_data = True
     91 
---> 92         return fn(*args, **kwargs)
     93 
     94     return wrapped_fn

~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/audio/tasks/segmentation/mixins.py in setup(self, stage)
     51             self._train_metadata = dict()
     52 
---> 53             for f in self.protocol.train():
     54 
     55                 file = dict()

~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py in subset_helper(self, subset)
    361 
    362         for file in files:
--> 363             yield self.preprocess(file)
    364 
    365     def train(self) -> Iterator[ProtocolFile]:

~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py in preprocess(self, current_file)
    327 
    328     def preprocess(self, current_file: Union[Dict, ProtocolFile]) -> ProtocolFile:
--> 329         return ProtocolFile(current_file, lazy=self.preprocessors)
    330 
    331     def __str__(self):

~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py in __init__(self, precomputed, lazy)
     85             # 'precomputed' one (which is probably not the most efficient solution).
     86             for key in set(precomputed.lazy) & set(lazy):
---> 87                 self._store[key] = precomputed[key]
     88 
     89             # we use the union of 'precomputed' lazy keys and provided 'lazy' keys as lazy keys

~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py in __getitem__(self, key)
    120 
    121                 # apply preprocessor once and remove it
--> 122                 value = self.lazy[key](self)
    123                 del self.lazy[key]
    124 

~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/database/loader.py in __call__(self, file)
    126         if uri not in self.loaded_:
    127             sub_file = {key: file[key] for key in self.placeholders_}
--> 128             loaded = load_rttm(self.path.format(**sub_file))
    129             if uri not in loaded:
    130                 loaded[uri] = Annotation(uri=uri)

AttributeError: 'PosixPath' object has no attribute 'format'

Using patterns for annotation files

Is it possible to use patterns for .rttm locations?
For example:

Protocols:
  MyDatabase:
    Protocol:
      MyProtocol:
        train:
            uri: lists/train.lst
            speaker: rttms/{uri}.rttm

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.