pyannote / pyannote-database Goto Github PK
View Code? Open in Web Editor NEWReproducible experimental protocols for multimedia (audio, video, text) database
License: MIT License
Reproducible experimental protocols for multimedia (audio, video, text) database
License: MIT License
Hi, I think there is a small bug in the database/custom.py file.
in line 126 through 135, the current code looks like the following.
# load annotations
if file_rttm is not None:
if file_rttm.suffix == '.rttm':
annotations = load_rttm(file_rttm)
elif file_rttm.suffix == '.mdtm':
annotations = load_mdtm(file_mdtm)
else:
msg = f'Unsupported format in {file_rttm}: please use RTTM.'
raise ValueError(msg)
For the *.mdtm file case,
annotations = load_mdtm(file_mdtm)
should be annotations = load_mdtm(file_rttm)
As you know, i'm building the database "extension" for DiHard2. Their database format is a bit tricky, and I want the user to have to the smallest amount of work possible to do before being able to use the extension.
Here's the setup: the dihard2 data is provided in two archives, which, once unzipped, form two subfolder : dihard2/dev/
and dihard2/test/
(there isn't any train data, so this is where pyannote's other databases extensions and the metaprotocol pattern will come in really handy).
Here's the catch: in both folders, the audio files (encoded in flac) share basically the same path and there are some conflicting uri's. Thus, a ~/.pyannote/database.yml
pattern to get only the dev
files would be
path/to/dihard_data/dev/data/single_channel/flac/{uri}.flac
and a pattern to match only the test
files would be
path/to/dihard_data/test/data/single_channel/flac/{uri}.flac
The real catch is: there are some uri's that are the same, yet not referencing the same audio file
There is a simple yet ugly solution: have the user run a bash script that would be provided in the repo that would somehow fix this, but it would imply also parsing the .rttm
files and modify the uri's in there as well. Not pretty, not optimal, not my style.
I've looked a bit into the FileFinder
class, and it seems it uses the format
fonction to render the "matched" audio file's path: path = path_template.format(uri=uri, database=database, **kwargs)
My guess is that it's maybe possible, in your infrastructure, to use this kind of generic patch in the ~/.pyannote/database.yml
file:
path/to/dihard_data/{split}/data/single_channel/flac/{uri}.flac
And the, in my implementation of a protocol have something like this:
class DIHARD2SingleChannelProtocol(SpeakerDiarizationProtocol):
"""DIHARD speaker diarization protocol """
def dev_iter(self):
for annot_filepath in self.load_RTTM_files():
# parse stuff
current_file = {
'database': 'DIHARD2',
'uri': uri,
'channel': 1,
'split': 'dev', # <======== added to be then used by the format function
'annotated': ...,
'annotation': ...}
yield current_file
def tst_iter(self):
for test_file in test_data:
current_file = {
'database': 'DIHARD2',
'uri': uri,
'channel': 1,
'split': 'test', # <======== added to be then used by the format function
'annotated': ...,}
yield current_file
Do you think that's feasible? Do you have any better way to solve this kind of problem since you have a much better understanding of pyannote?
Hi,
One of my protocol's name is '24', it causes the following error : TypeError: type.__new__() argument 1 must be str, not int
.
I guess it should be easily fixable by tweaking yaml loading parameters or adding a str
conversion somewhere but I don't understand why the name can't be an int in the first place.
Full log below
Merry Christmas :)
Traceback (most recent call last):
File "/people/lerner/anaconda3/envs/pyannote/bin/pyannote-speaker-embedding", line 11, in <module>
load_entry_point('pyannote.audio', 'console_scripts', 'pyannote-speaker-embedding')()
File "/people/lerner/anaconda3/envs/pyannote/lib/python3.7/site-packages/pkg_resources/__init__.py", line 489, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/people/lerner/anaconda3/envs/pyannote/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2852, in load_entry_point
return ep.load()
File "/people/lerner/anaconda3/envs/pyannote/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2443, in load
return self.resolve()
File "/people/lerner/anaconda3/envs/pyannote/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2449, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "/people/lerner/pyannote/pyannote-audio/pyannote/audio/applications/speaker_embedding.py", line 180, in <module>
from .base import Application
File "/people/lerner/pyannote/pyannote-audio/pyannote/audio/applications/base.py", line 40, in <module>
from pyannote.database import FileFinder
File "/people/lerner/pyannote/pyannote-database/pyannote/database/__init__.py", line 62, in <module>
DATABASES, TASKS = add_custom_protocols()
File "/people/lerner/pyannote/pyannote-database/pyannote/database/custom.py", line 314, in add_custom_protocols
{'__init__': get_init(register)})
TypeError: type.__new__() argument 1 must be str, not int
Hey,
@mmmaat has found this neat little lib : https://github.com/soundata/soundata
Just leaving this here as a potential idea: wouldn't it be nice if there were wrappers for this library included in pyannote-database
?
Packages installed using pip install pyannote.database command, do not have the latest updates, such as LABLoader
cannot import name 'LABLoader' from 'pyannote.database.loader'
protocol.stats(subset)
returns a dictionary whose speakers
key should be rename to labels
.
For now, this is specific to speaker identification, but it should also work for language identification, for instance.
Would be nice to have a way to print a custom progress message when iterating over subsets.
from pyannote.database import get_protocol
protocol = get_protocol('AMI.SpeakerDiarization.MixHeadset')
for current_file in protocol.development(msg='Loading data...'):
pass
Hey,
For that (now cursed) VTC pipeline, if we want to keep things neat, I'll be needing the support for lists of preprocessors for a given field. Let me explain myself: let's say I have 2 preprocessors for file["annotation"]
, LabelMapper
and VoiceTypeClassifierPreprocessor
. I need to be able to chain them, the solution being pretty easy:
protocol = get_protocol("Db.ProtocolType.MyProtocol", preprocessors={
"audio": FileFinder(),
"annotation": [LabelMapper(), VoiceTypeClassifierPreprocessor(classes=..., unions=...)]
})
Obviously, the order of preprocessors in the list is important.
Would you be ok with me adding support for this in pyannote-db, while keeping the support for the current "single preprocessor" API?
pyannote-database database
pyannote-database protocol [--database=<database>] [--task=<task>]
pyannote-database stats <protocol> [--subset=<subset>]
Hi,
I think you should add expand_user
here : https://github.com/pyannote/pyannote-database/blob/develop/pyannote/database/config.py#L61
Else you cannot, e.g., provide path to the database as tilde path in preprocessors in pyannote pipeline
Hi, speaker_verification.py
is broken as it lacks the tqdm
import
Upon trying to run the following code to label preprocessors:
from pyannote.database import FileFinder
preprocessors = {'audio': FileFinder()}
I ran into the attribute error below:
AttributeError Traceback (most recent call last)
Input In [12], in <cell line: 2>()
1 from pyannote.database import FileFinder
----> 2 preprocessors = {'audio': FileFinder()}
File ~\Anaconda3\lib\site-packages\pyannote\database\util.py:94, in FileFinder.init(self, database_yml)
89 with open(self.database_yml, "r") as fp:
90 config = yaml.load(fp, Loader=yaml.SafeLoader)
92 self.config_: Dict[DatabaseName, Union[PathTemplate, List[PathTemplate]]] = {
93 str(database): path
---> 94 for database, path in config.get("Databases", dict()).items()
95 }
AttributeError: 'NoneType' object has no attribute 'items'
Could anyone please advise on how to overcome this issue? database config and database.yml contents below.
echo $PYANNOTE_DATABASE_CONFIG:
/Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/database.yml
database.yml:
Databases:
AMI: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/*/audio/{uri}.wav
MUSAN: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/MUSAN/musan/music/{uri}.wav
Protocols:
AMI:
SpeakerDiarization:
MixHeadset:
train:
uri: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.train.lst
annotation: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.train.rttm
annotated: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.train.uem
development:
uri: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.development.lst
annotation: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.development.rttm
annotated: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.development.uem
test:
uri: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.test.lst
annotation: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.test.rttm
annotated: /Users/askrobola/Documents/GitHub/pyannote-audio/tutorials/Sandbox/AMI/MixHeadset.test.uem
Hi, i have a problem when i execute the tutorial, well, this msg is returned from after execute program:
FileNotFoundError: Could not find file "EN2002a.Mix-Headset" in the following location(s):
i dont know why this error happen, can u help me?
To ensure consistent person naming across pyannote.database plugin, we should define a naming conventions that could be then used by pyannote.database.get_label_identifier
to ensure the same person is always labeled the same way.
One could use something like:
Two person_{xx} in the same {database} are supposed to be different from each other.
However, across database, one cannot tell anything about them.
RTTM files sometimes contain a bunch of lines describing the list of speakers.
Those are marked with SPKR-INFO
first field.
pyannote-database/pyannote/database/util.py
Line 279 in 11d8dcb
I am executing this command in pyannote 1.1.2 (I am pretty sure that is what I have):
export EXP_DIR=finetune1
pyannote-audio sad train --pretrained=sad_dihard --subset=train --to=5 ${EXP_DIR} headcam16.SpeakerDiarization.try1
it fails:
File "/ext3/miniconda3/envs/pyannote6/lib/python3.8/site-packages/pyannote/audio/labeling/tasks/base.py", line 294, in _load_metadata
current_file["annotated"] = get_annotated(current_file).crop(
AttributeError: 'NoneType' object has no attribute 'crop'
I do not have a uem file, I suspect this may be a problem.
I am happy to provide more info if that would be helpful, hoping you can easily tell me what the problem is...
Thanks
Michael
RTTMLoader class is extremely slow for large RTTM files containing annotation of multiple audio files (e.g. VoxCeleb dataset).
We should make it faster!
The possibility of defining the location of the database using python. This of course would not be compatible with a command line interface that is planned #45 but would make the use of this library more flexible.
For example:
protocol = get_protocol('Debug.SpeakerDiarization.Debug', preprocessors={"audio": FileFinder()})
protocol = get_protocol('Debug.SpeakerDiarization.Debug', preprocessors={"audio": FileFinder()}, config="~/.pyannote")
This would be backwards compatible with previous versions of the method
Hi!
I would like to train one of your pyannote.audio models using the Jamendo Corpus dataset available here: https://zenodo.org/record/2585988#.Yh9QgBPMJhE
Unfortunately I have some problems defining the custom data loader. Each audio track has a single label file, in .lab format with start end label format and this is different from the CTMLoader.
I wrote the following files.
database.yml:
Databases: Jamendo: - /path_to_jamendo/{uri}.mp3 - /path_to_jamendo/{uri}.ogg Protocols: Jamendo: Protocol: JamendoProtocol: train: uri: /path_to_jamendo/filelists/train annotation: /path_to_jamendo/labels/{uri}.lab development: uri: /path_to_jamendo/filelists/valid annotation: /path_to_jamendo/labels/{uri}.lab test: uri: /path_to_jamendo/filelists/test annotation: /path_to_jamendo/labels/{uri}.lab
setup.py:
from setuptools import setup, find_packages setup( name="jamendo_lab_loader", packages=find_packages(), install_requires=[ "pyannote.database >= 4.0", ], entry_points={ "pyannote.database.loader": [ ".lab = jamendo_lab_loader.loader:LabLoader", ], } )
I don't know how to write the loader.py and how to use it. Do you have any suggestions?
Thank you for sharing this great pyannote work. Hope you can help me.
Francesco
Part of my configuration:
Databases:
# tell pyannote.database where to find AMI wav files.
# {uri} is a placeholder for the session name (eg. ES2004c).
# you might need to update this line to fit your own setup.
AMI: amicorpus/{uri}/audio/{uri}.Mix-Headset.wav
AMI-SDM: amicorpus/{uri}/audio/{uri}.Array1-01.wav
Protocols:
AMI-SDM:
SpeakerDiarization:
only_words:
train:
uri: ../lists/train.meetings.txt
annotation: ../only_words/rttms/train/{uri}.rttm
annotated: ../uems/train/{uri}.uem
lab: ../only_words/labs/train/{uri}.lab
development:
uri: ../lists/dev.meetings.txt
annotation: ../only_words/rttms/dev/{uri}.rttm
annotated: ../uems/dev/{uri}.uem
lab: ../only_words/labs/dev/{uri}.lab
test:
uri: ../lists/test.meetings.txt
annotation: ../only_words/rttms/test/{uri}.rttm
annotated: ../uems/test/{uri}.uem
lab: ../only_words/labs/test/{uri}.lab
When I comment out these two lines, the program runs well and file['lab']
returns exactly an Annotation
object
pyannote-database/pyannote/database/loader.py
Lines 260 to 261 in da5794b
Seems this sanity check is not working as expected. Also other loaders (e.g. RTTMLoader) don't have this line (I guess the logic should be similar).
Hi !
I have an error in the data loader :
Traceback (most recent call last):
File "main.py", line 292, in <module>
args.func(args)
File "main.py", line 134, in run
trainer.fit(model)
File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
self._call_and_handle_interrupt(
File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1138, in _run
self._call_setup_hook() # allow user to setup lightning_module in accelerator environment
File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1439, in _call_setup_hook
self.call_hook("setup", stage=fn)
File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1501, in call_hook
output = model_fx(*args, **kwargs)
File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/audio/core/model.py", line 349, in setup
self.task.setup()
File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py", line 474, in wrapped_fn
fn(*args, **kwargs)
File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/audio/tasks/segmentation/mixins.py", line 51, in setup
for f in self.protocol.train():
File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 363, in subset_helper
yield self.preprocess(file)
File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 329, in preprocess
return ProtocolFile(current_file, lazy=self.preprocessors)
File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 87, in __init__
self._store[key] = precomputed[key]
File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 122, in __getitem__
value = self.lazy[key](self)
File "/linkhome/rech/genini01/uzm31mf/.conda/envs/vtc2/lib/python3.8/site-packages/pyannote/database/loader.py", line 128, in __call__
loaded = load_rttm(self.path.format(**sub_file))
AttributeError: 'PosixPath' object has no attribute 'format'
My version of pyannote-database is :
pyannote.database 4.1.1
This error seems to happen while trying to load files that don't have annotations in the rttm
file.
>>> from pyannote.database import get_protocol
>>> protocol = get_protocol('Debug.SpeakerDiarization.Debug')
>>> import pickle
>>> pickle.dumps(protocol)
PicklingError: Can't pickle <class 'pyannote.database.custom.Debug'>: attribute lookup Debug on pyannote.database.custom failed
That is a blocking issue because it prevents multi-gpu training in pyannote.audio v2.
cc @mogwai
In pyannote, cluster ids are enumerated from AA to ZZ, however, when coming to "NA", if one loads the RTTM file, the "NA" id will be loaded as nan
.
This probably has something to do with pandas' load_cv
function.
Should all the rttm files have unique speaker tags?
For eg., If there are three audio files and all of them have two different speakers(within files and across files too). Can the tags be made like (Speaker_00, Speaker_01) for all these three files or should it be (Speaker_00, Speaker_01) for first file, (Speaker_02, Speaker_03) for second file and (Speaker_04, Speaker_05) for third file?
Using the **
expression here leads to loading all the values of the protocol file, even those that are not needed
My traceback (only to ease the understanding, the error is unrelated and caused by spacy version)
Traceback (most recent call last):
File "/people/lerner/anaconda3/envs/transformers/bin/named_id.py", line 7, in <module>
exec(compile(f.read(), __file__, 'exec'))
File "/people/lerner/pyannote/Prune/prune/named_id.py", line 900, in <module>
shuffle=False))
File "/people/lerner/pyannote/Prune/prune/named_id.py", line 523, in batchify
current_audio_emb = audio_emb(current_file)
File "/people/lerner/pyannote/pyannote-audio/pyannote/audio/features/wrapper.py", line 274, in __call__
return self.scorer_(current_file)
File "/people/lerner/pyannote/pyannote-audio/pyannote/audio/features/precomputed.py", line 205, in __call__
path = Path(self.get_path(current_file))
File "/people/lerner/pyannote/pyannote-audio/pyannote/audio/features/precomputed.py", line 73, in get_path
uri = get_unique_identifier(item)
File "/people/lerner/pyannote/pyannote-database/pyannote/database/util.py", line 205, in get_unique_identifier
return IDENTIFIER.format(**item)
File "/people/lerner/pyannote/pyannote-database/pyannote/database/protocol/protocol.py", line 122, in __getitem__
# just imagine that this key is forbidden
value = self.lazy[key](self)
File "/people/lerner/pyannote/pyannote-database/pyannote/database/custom.py", line 100, in load
return loader(current_file)
File "/vol/work/lerner/pyannote-db-plumcot/Plumcot/loader/loader.py", line 169, in __call__
attributes)
File "/vol/work/lerner/pyannote-db-plumcot/Plumcot/loader/loader.py", line 201, in merge_transcriptions_entities
# and that this raises : "Oh no, forbidden key !"
_, one2one, _, _, one2multi = align(tokens, e_tokens)
ValueError: too many values to unpack (expected 5)
I think it would be nice to be able to indicate relative paths in database.yml. This way, we could easily share plumcot corpus for example.
We could easily implement it by testing if the path (to e.g. RTTM file) is_absolute()
and then concatenate it to the path of PYANNOTE_DATABASE_CONFIG
if it's not.
for method in methods:
try:
protocol.progress = False
file_generator = getattr(protocol, method)()
first_item = next(file_generator)
except AttributeError as e:
continue
except NotImplementedError as e:
continue
The first element in methods list is 'development'. If 'development' doesn't in protocol, FileFinder doesn't work.
Im trying to train the model with my own data, but i cannot solve this problem.
To Reproduce
Steps to reproduce the behavior:
$pyannote-audio sad train --subset=train --to=200 --parallel=4 ${EXP_DIR} Test.SpeakerDiarization.OwnData
My database.yml :
*Databases:
Test: ./data_set_wav/{uri}.wav
MUSAN: ./Pyannote/AMI/musan/{uri}.wav
Protocols:
Test:
SpeakerDiarization:
OwnData:
train:
uri: ./Reference_files/validate/train.data.lst
annotation: ./Reference_files/validate/train.data.rttm
annotated: ./Reference_files/validate/train.data.uem
development:
uri: ./Pyannote/AMI/AMI/MixHeadset.development.lst
annotation: ./Pyannote/AMI/AMI/MixHeadset.development.rttm
annotated: ./Pyannote/AMI/AMI/MixHeadset.development.uem
test:
uri: ./Pyannote/AMI/AMI/MixHeadset.test.lst
annotation: ./Pyannote/AMI/AMI/MixHeadset.test.rttm
annotated: ./Pyannote/AMI/AMI/MixHeadset.test.uem
MUSAN:
Collection:
BackgroundNoise:
uri: ./Pyannote/AMI/musan/MUSAN/background_noise.txt
Noise:
uri: ./Pyannote/AMI/musan/MUSAN/noise.txt
Music:
uri: ./Pyannote/AMI/musan/MUSAN/music.txt
Speech:
uri: ./Pyannote/AMI/musan/MUSAN/speech.txt
pyannote environment
pyannote.audio==1.1.1
pyannote.core==4.3
pyannote.database==4.1.1
pyannote.metrics==3.1
pyannote.pipeline==1.5.2
My train.lst file (First 10 lines):
140471632__701151021998050417_272_2751_20210210_134647
127467523__701151985360118_272_2754_20210201_095339
135434197__701151031971345585_356_3972_20210205_165315
131519247__701151957207998_461_2881_20210203_152337
147034889__450198019993417681_93_3989_20210216_094739
130398174__701151091989826532_272_2754_20210203_084818
128654151__701151019988926593_356_3956_20210201_181652
138260478__01577999895159_147_2413_20210209_111043
146777865__701151021996438304_272_4027_20210215_183913
81808790__450198986402390_137_2680_20201217_085743
My train.rttm file (First 10 lines):
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 174.301 1.25 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 56.4404 0.9400000000000048 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 20.6501 1.2001000000000026 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 23.8802 1.110000000000003 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 28.2202 0.6000000000000014 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 3.63 0.9199999999999999 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 5.63 0.41000000000000014 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 7.4301 0.8999999999999995 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 8.4901 0.6199999999999992 <NA> <NA> Customer <NA> <NA>
SPEAKER 80596681__701151988339062_105_2448_20201216_114632 1 9.7601 0.7100000000000009 <NA> <NA> Customer <NA> <NA>
Additional context
The example with the amicorpus data run with no error, but when i try to train my own data always get this error
while making db interfaces, I ended up using very similar code for diarization and speaker spotting protocols to the one that you have implemented in the interface for AMI dataset. This made me thinking that may be it would be possible to generalize this code into generic database interface that would work for any database, given the filelists formatted according the pre-defined format.
Hey,
http://github.com/pyannote/pyannote-db-template is now DEPRECATED. I'm still using this system for custom DB that don't really fit the RTTM/UEM/LST model, or have custom ways of organizing files. What should be used instead of deprecated repo?
For now, only speaker diarization meta-protocols are supported.
Is it possible to use patterns for .rttm locations?
For example:
Protocols:
MyDatabase:
Protocol:
MyProtocol:
train:
uri: lists/train.lst
speaker: rttms/{uri}.rttm
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/audio/utils/protocol.py", line 31, in check_protocol
file = next(protocol.train())
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 363, in subset_helper
yield self.preprocess(file)
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 329, in preprocess
return ProtocolFile(current_file, lazy=self.preprocessors)
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 87, in init
self.store[key] = precomputed[key]
File "/home/tuenguyen/speech/speech_dia@/env/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 122, in getitem
value = self.lazykey
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/database/loader.py", line 130, in call
loaded = load_rttm(self.path.format(**sub_file))
AttributeError: 'PosixPath' object has no attribute 'format'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 18, in
scd= SpeakerChangeDetection(ami)
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/audio/tasks/segmentation/speaker_change_detection.py", line 97, in init
super().init(
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/audio/core/task.py", line 175, in init
self.protocol, self.has_validation = check_protocol(protocol)
File "/home/tuenguyen/speech/speech_dia_@/env/lib/python3.8/site-packages/pyannote/audio/utils/protocol.py", line 34, in check_protocol
raise ValueError(msg)
ValueError: Protocol AMI.SpeakerDiarization.MixHeadset does not define a training set.
I clone the develope branch and install pyannote.
After that, I create a database folder with format
[DirData]/a.lst # uri
[DirData]/a.rttm # protocol
[DirData]/*.wav
+> list(gllob.glob([DirData]/*.wav)) = [mix_0000001.wav]
+> files: a.lst
mix_0000001.wav
+> files: a.rttm
file database.yml
Databases:
AMI: [DirData]/{uri}.wav
Protocols:
AMI:
SpeakerDiarization:
MixHeadset:
train:
uri:[DirData]/a.lst
annotation:[DirData]/a.rttm
Note: DirData is a absolute path on my local.
Pls I have already read many tutorials and sources code pyannote but I can't firgure out this problem.
Hi, I've been following the pyannote-audio data preparation tutorial, and am trying to understand how the database.yml
files work by running code samples from the README.
For instance, using the sample database.yml file, the following code sample should print out some filenames:
from pyannote.database import get_protocol
protocol = get_protocol('AMI.SpeakerDiarization.MixHeadset')
for resource in protocol.train():
print(resource["uri"])
Instead, I get an error message about there not being a loader for .rttm
files:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-3540f768261e> in <module>()
2 from pyannote.database import get_protocol
3 protocol = get_protocol('AMI.SpeakerDiarization.MixHeadset')
----> 4 for resource in protocol.train():
5 print(resource["uri"])
2 frames
/usr/local/lib/python3.6/dist-packages/pyannote/database/custom.py in gather_loaders(entries, database_yml)
205 if path.suffix not in LOADERS:
206 msg = f"No loader for file with '{path.suffix}' suffix"
--> 207 raise TypeError(msg)
208
209 # load custom loader class
TypeError: No loader for file with '.rttm' suffix
Here's a public Google Colab file where you can see the issue. It only takes a minute or so to run.
https://colab.research.google.com/drive/1ErF8KOk-s11zUXjOEbZnRguvzj2SBafC
Really appreciate any advice on how to fix this. Thanks!
In the new version of pyannote.database.
line 71
In pyannote-database/pyannote/database/protocol/protocol.py
item[key] = preprocessor(item)
should be
item[key] = preprocessor(**item)
Hello,
I am trying leveraging the pre-trained models on my own data. I already followed the instruction to modify the AMI protocol file. However, when I try following the instruction on preparing the protocol on the AMI subset, I kept getting error as: 'KeyError: 'AMI'' and "ValueError: Could not find any protocol for "AMI" database".
Here are the instruction on the page that I followed:
preprocessors = {'audio': FileFinder()}
protocol = get_protocol('AMI.SpeakerDiarization.MixHeadset',
preprocessors=preprocessors)
Not sure if it was the directory issue. If so, how can I define the proper directory while calling it? And what is the function FileFinder() used for here?
Thanks for your time!
Hello:
I try to apply my own YAML with the command: from pyannote.database import get_protocol
But it shows"ImportError: cannot import name 'registry' from 'pyannote.database' (/home/xuan/anaconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/database/init.py)"
How can I solve this problem?Thanks^_^
from pyannote.audio.tasks import OverlappedSpeechDetection
ovl = OverlappedSpeechDetection(protocol, duration=2., batch_size=32, num_workers=4)
model = SimpleSegmentationModel(task=ovl)
trainer = pl.Trainer(max_epochs=1)
_ = trainer.fit(model)
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-23-7e3bd77348a4> in <module>
3 model = SimpleSegmentationModel(task=ovl)
4 trainer = pl.Trainer(max_epochs=1)
----> 5 _ = trainer.fit(model)
~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
456 # SET UP TRAINING
457 # ----------------------------
--> 458 self.call_setup_hook(model)
459 self.call_hook("on_before_accelerator_backend_setup", model)
460 self.accelerator.setup(self, model) # note: this sets up self.lightning_module
~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in call_setup_hook(self, model)
1062 called = self.datamodule.has_setup_test if self.testing else self.datamodule.has_setup_fit
1063 if not called:
-> 1064 self.datamodule.setup(stage_name)
1065 self.setup(model, stage_name)
1066 model.setup(stage_name)
~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py in wrapped_fn(*args, **kwargs)
90 obj._has_prepared_data = True
91
---> 92 return fn(*args, **kwargs)
93
94 return wrapped_fn
~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/audio/tasks/segmentation/mixins.py in setup(self, stage)
51 self._train_metadata = dict()
52
---> 53 for f in self.protocol.train():
54
55 file = dict()
~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py in subset_helper(self, subset)
361
362 for file in files:
--> 363 yield self.preprocess(file)
364
365 def train(self) -> Iterator[ProtocolFile]:
~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py in preprocess(self, current_file)
327
328 def preprocess(self, current_file: Union[Dict, ProtocolFile]) -> ProtocolFile:
--> 329 return ProtocolFile(current_file, lazy=self.preprocessors)
330
331 def __str__(self):
~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py in __init__(self, precomputed, lazy)
85 # 'precomputed' one (which is probably not the most efficient solution).
86 for key in set(precomputed.lazy) & set(lazy):
---> 87 self._store[key] = precomputed[key]
88
89 # we use the union of 'precomputed' lazy keys and provided 'lazy' keys as lazy keys
~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py in __getitem__(self, key)
120
121 # apply preprocessor once and remove it
--> 122 value = self.lazy[key](self)
123 del self.lazy[key]
124
~/miniconda3/envs/pyannote/lib/python3.8/site-packages/pyannote/database/loader.py in __call__(self, file)
126 if uri not in self.loaded_:
127 sub_file = {key: file[key] for key in self.placeholders_}
--> 128 loaded = load_rttm(self.path.format(**sub_file))
129 if uri not in loaded:
130 loaded[uri] = Annotation(uri=uri)
AttributeError: 'PosixPath' object has no attribute 'format'
Hi,
Related to PaulLerner/pyannote-db-plumcot#12.
How should I proceed ?
SpeakerDiarizationProtocol
?In all case each token should have a 'speaker' field (which might be set to unavailable
)
If the transcript is a '.aligned' file, several fields could be added:
If the entity annotation is provided we could add (I guess those already exists in spacy):
Any suggestions ?
This might proove useful when generating training data
This would allow to share a database.yml
that works out of the box.
Related to #38.
annotation_duration should not consider parts that are not annotated right ?
https://github.com/pyannote/pyannote-database/blob/develop/pyannote/database/protocol/speaker_diarization.py#L194
Hi,
I keep getting this warning when training a pipeline (this issue might have to be transferred) /mnt/beegfs/projects/plumcot/pyannote/pyannote-database/pyannote/database/protocol/protocol.py:128: UserWarning: Existing key "annotation" may have been modified.
This only happens during the first trial (i.e. the first iteration over the whole database subset) but it didn't happen before 4.0
Hey Team,
Thanks for providing this common interface. We've been using it at @SEERNET for creating plugins for our internal datasets. We also created a db-plugin for the CABank-CallHome dataset here.
Would definitely love to push it upstream in the pyannote-org
for the community.
Cheers!
/cc @venkatesh-1729
I am trying to combine multiple protocols into one as shown in the [Meta-protocols and requirements] section in the README. I create the following database.yml configuration file with the already existing protocol configuration files (both works fine).
Requirements:
- /path/to/folder/aishell4/database.yml
- /path/to/folder/ali/database.yml
Protocols:
CombinedAll:
SpeakerDiarization:
MyMetaProtocol:
train:
AISHELL4.SpeakerDiarization.Custom: [train, ]
ALI.SpeakerDiarization.Custom: [train, ]
development:
AISHELL4.SpeakerDiarization.Custom: [development, ]
ALI.SpeakerDiarization.Custom: [development, ]
test:
AISHELL4.SpeakerDiarization.Custom: [test, ]
ALI.SpeakerDiarization.Custom: [test, ]
And then trying to call it
from pyannote.database import registry, FileFinder
registry.load_database("path/to/combined/database.yml")
protocol = registry.get_protocol("CombinedAll.SpeakerDiarization.MyMetaProtocol", preprocessors={"audio": FileFinder()})
for file in protocol.train():
pass
But it gives me an error:
File "/opt/miniconda3/envs/speaker_diar/lib/python3.10/site-packages/pyannote/database/protocol/protocol.py", line 374, in subset_helper
for file in files:
File "/opt/miniconda3/envs/speaker_diar/lib/python3.10/site-packages/pyannote/database/custom.py", line 317, in subset_iter
raise ValueError("Missing mandatory 'uri' entry in CombinedAll.SpeakerDiarization.MyMetaProtocol.train")
How can this combination of protocols be done?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.