Giter Site home page Giter Site logo

fave's Introduction

FAVE toolkits

This is a repository for the FAVE-Align and FAVE-extract toolkits. The first commit here represents the toolkit as it was available on the FAVE website as of October 21, 2013. The extractFormants code in the JoFrhwld/FAAV repository represents an earlier version of the code.

Getting started

You can install FAVE using pip by running the following:

python3 -m pip install fave

While FAVE can align transcripts to audio data, we recommend using the Montreal Force Aligner for alignment because it is more recent and better maintained than the HTK library used by FAVE's aligner.

When you have an aligned TextGrid and the matching audio, you can extract acoustic measures by running the following:

fave-extract AudioFileName.wav TextGridFileName.TextGrid OutputFileName

Where AudioFileName.wav is the path to the audio file to measure, TextGridFileName.TextGrid is the path to the aligned TextGrid, and OutputFileName is the name of the file where you want your measurements to be output.

Documentation

Current documentation for installation and usage available on the github wiki. https://github.com/JoFrhwld/FAVE/wiki

FAVE website

The interactive FAVE website hosted at the University of Pennsylvania is no longer available. The DARLA site hosted by Dartmouth can be used to run the Montreal Forced Aligner, and FAVE-extract. http://darla.dartmouth.edu

Support

You can find user support for installing and using the FAVE toolkits at the FAVE Users' Group.

Contributing to FAVE

For the most part, we'll be utilizing the fork-and-pull paradigm (see Using Pull Requests). Please send pull requests to the dev branch.

You can fill in a bug report at the issue tab

There may be a delay between when a bug is reported and when a bug is resolved. Developers prioritize bugs based on difficulty, importance, and other factors, so bug reports are usually not handled in the order they are received.

Attribution

DOI GitHub GitHub PyPI version fury.io

As of v1.1.3 onwards, releases from this repository will have a DOI associated with them through Zenodo. The DOI for the current release is 10.5281/zenodo.22281. We would recommend the citation:

Rosenfelder, Ingrid; Fruehwald, Josef; Brickhouse, Christian; Evanini, Keelan; Seyfarth, Scott; Gorman, Kyle; Prichard, Hilary; Yuan, Jiahong; 2022. FAVE (Forced Alignment and Vowel Extraction) Program Suite v2.0.0 /zenodo.

Use of the interactive online interface should continue to cite:

Rosenfelder, Ingrid; Fruehwald, Josef; Evanini, Keelan and Jiahong Yuan. 2011. FAVE (Forced Alignment and Vowel Extraction) Program Suite. http://fave.ling.upenn.edu.

fave's People

Contributors

cgross95 avatar chrisbrickhouse avatar dermoehre avatar hilaryp avatar jofrhwld avatar kylebgorman avatar scjs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fave's Issues

maxFormant reports incorrectly and globals

maxFormant is reported incorrectly in the .formantlog when using formantPredictionMethod=mahalanobis. This is because with this setting, getVowelMeasurement() assigns a different, local maxFormant variable based on speaker.sex and then uses that variable when calling praat, while the global maxFormant value isn't changed. The global value is reported at the end.

This can be fixed by declaring global maxFormant at the top of getVowelMeasurement(), but maybe there should also be a warning that the config maxFormant value is already being overridden when they use the mahalanobis setting?

Rewrite README

README.md is out of date.

  • Link to ReadTheDocs instead of the wiki
  • Remove link to apparently defunct FAVE Dev Group mailing list
  • Add bug reporting and bug triage to section on how to contribute
  • Would be nice to implement badges for:
    • License
    • Python version
    • Build pass or fail

Alignment fails when transcript TSV has a final newline

Some programs add a newline to the end of a text file, meaning that some transcript files end with a blank line. Currently FAVE fails when these files are input because the blank line is seen as an error, but users may not find this easy to diagnose.

Current behavior

  • main() in FAAValign.py calls aligner.check_transcript()
  • aligner.check_transcript() calls aligner.TranscriptProcessor.check_transcription_file()
  • for each line, L in the transcript file, ....check_transcription_file() calls ....check_transcription_format(L)
  • if L contains only white space, that line is marked as needing deletion (ln 284)
  • when aligner.align() is called, if there are any lines needing deletion, the alignment fails raising a ValueError (ln 164)

Desired behavior

  • Ideally the alignment should not fail if the final line is a newline, if anything we may want to just (optionally?) remove lines containing only white space since they're already identified.
  • The output should be more helpful in debugging what lines need to be removed. The program should at least give the (approx) line number to remove, and should be more helpful in its user-facing messaging. See #64

extractFormants module imports don't work out-of-the-box

Was troubleshooting fave/extractFormants.py on a fresh install of Fave2.0.0 and found an issue with the import fave instructions. If the module is installed via pip (or pip -e) then this import should function fine. If users are running it as a script from a downloaded repo (like they would in Fave1.X), the script fails with a ModuleImportError because it is inside the fave directory and cannot find the module locally.

The workaround we found is to move ./fave/extractFormants.py up the tree to ./extractFormants.py so that that the relative import works. I'm not sure this is a viable solution in terms of packaging though. An alternative might be to change it to a relative import using import ., or modifying the script to fail gracefully and fall back on a relative import.

extractFormants.py crashes when writing log if not in git repo

This code block is outside the preceding try statement where changes is defined.

FAVE/fave/extractFormants.py

Lines 1986 to 1990 in f514fde

if changes:
f.write("Uncommitted changes when run:\n")
f.write(str(changes,'utf-8'))

If not in a git repo, changes is never defined so logging crashes with

UnboundLocalError: local variable 'changes' referenced before assignment

Since the same code is already included in the preceding try statement after changes is defined:

FAVE/fave/extractFormants.py

Lines 1972 to 1985 in f514fde

try:
subprocess.run(['git', 'rev-parse', '--is-inside-work-tree'], check=True, capture_output=True)
try:
check_changes = subprocess.Popen(
["git", "diff", "--stat"], stdout=subprocess.PIPE)
changes, err = check_changes.communicate() # pylint: disable=unused-variable
except OSError:
changes = ''
if changes:
f.write("Uncommitted changes when run:\n")
f.write(str(changes, 'utf-8'))
except subprocess.CalledProcessError:
pass

I believe L1968-L1990 should just be removed. I can verify that after removing this block, logging completes and there is no crash.

Alignment fails if the transcript file has a header row

At line 175 in aligner.align() the program loops through the lines of the transcript TSV. If the file has a header row, the alignment fails at line 192 because the headers (strings) cannot be coerced into time stamps (floats).

  • This failure mode should be avoided if possible. We should detect whether a header row is present, and skip it if so. This check-and-skip functionality should probably be added in TranscriptProcessor, and might already be built-in to the csv reader.
  • Logging and error reporting could be improved here. Currently the error message says that something couldn't be converted to a float, but where and what it is are not reported. FAVE should report the line and contents which would make identification of transcript problems easier.

specify praat/sox paths in config file

Users are currently required to modify their PATH system environment variable to enable some of the FAVE scripts to call praat and sox. This could cause issues with their system if done incorrectly. Instead, FAVE could require users to specify paths to praat/sox in the config file, which is safer and easier to do. This would also simplify the checks to confirm that these programs are accessible.

There's currently an undocumented option to specify praat and sox paths when calling extractFormants.extractFormants(), but it might be better to have all of the options configurable in the config file.

Parallelize

Update to allow parallel processing. Best candidate is pp

[RfC] Config file format

Due to an oversight, fave-extract no longer accepts a config file; all user configuration is done via command line arguments. This is not user friendly to say the least and a configuration file should be added as a new feature in 2.1.0, but a question worth asking is what format this should take.

Proposals

Design considerations

Based on our current command line flags, we have 4 types of configuration option:

  • Flags (false unless present)
  • Single entry values (provide one value)
  • Multiple entry values (provide a list of values)
  • Filenames (similar to single entry values, but the file contains multiple values so best treated separately)
    We want a format that is tailored to those, so a heavy emphasis on key:value systems is important. We only have one option where a list is important (--stopWords, multiple entry values) so this can be a second-class feature.

Additionally, we want the configuration file to be:

  • Easy to read
  • Simple to create
    Which rules out a format like JSON which require a lot of extra typing and are not very readable.

Given the major version bump we are not required to maintain backwards compatibility, but it would be nice if possible. Unfortunately, this is unlikely as the old config format was just a wrapper around the command line arguments and so not very user friendly.

Evaluation

YAML

  • ✔️ Good key-value support
  • ✔️ Good list support
  • ✔️ Easy to read
  • ❓ Simple to create
  • ❌ Backwards compatible (but we'd maintain the old version to avoid a 3.0 bump)

TSV

  • ✔️ Good key-value support
  • ❌ Good list support
  • ✔️ Easy to read
  • ✔️ Simple to create
  • ❓ backwards compatible (if we build own parser)

Old

  • ✔️ Good key-value support
  • ❌ Good list support
  • ❌ Easy to read
  • ❓ Simple to create
  • ✔️ Backwards compatible

FAVE-align on Windows

Needs documentation and possibly changes for installing and using FAVE-align on Windows

logging is broken with --multipleFiles

If the --multipleFiles option is being used, then the following global variables are cumulative across all of the input files:

logtimes
count_vowels
count_analyzed
count_uncertain
count_overlaps
count_truncated
count_stopwords
count_unstressed
count_too_short

If there are multiple input files, this causes all .formantlog files after the first one to be incorrect.

Improve logging in TranscriptProcessor.check_transcription_file()

The function aligner.TranscriptProcessor.check_transcription_file() marks lines which need removal but does not log these problem lines nor does it notify the user. Not dealing with these lines can cause the alignment to fail (see #63), making debugging the problem difficult even though FAVE knows exactly where the problem is. Logging should be improved so that the lines-at-issue can be tracked down and fixed easily.

intensity cutoff occurs before start of AY segment

I have an AY vowel token that causes getTimeOfF1Maximum() to raise a ValueError, because trimmedFormants returns an empty list.

I think this is happening because the maximum intensity is very early in the intensity curve, and the intensity drops off right away, so getIntensityCutoff() chooses a cutoff that only spans the first few intensity frames. This is an issue because the timestamps for those intensity frames are used to trim the formant frames, and the formant frames may start later than the intensity frames (there's no formant measurements at the very edges). In my token, the first formant frame is 7ms after the first intensity frame, so if the intensity cutoff occurs before the 7th 1ms intensity frame, no formant frames will be returned by trimFormants().

One way to fix this which might address #22 also, would be to take measurements in a way that intensity and formant frames are synced up. Or, a smaller fix could be made in faav(), modifyIntensityCutoff(), or getTimeOfF1Maximum() to make sure that the end_cutoff time is not before the first formant frame.

verbose option

add a verbose option that gives some of the feedback that's been commented out (eg, when it encounters uncertain and stop word transcriptions; extraction progress on each vowel token)

config files are too sensitive to whitespace

extractFormants.py end with the error unrecognized arguments if the config file has any config line that ends in non-newline white space, or if there is not a trailing newline at the end of the config file.

add format checking in praat.py for TextGrid files

Users sometimes provide a TextGrid input file to FAVE-extract that has a formatting error, such as the following (note the two newline characters after "HH:

        intervals [403]:
            xmin = 146.703
            xmax = 146.72091856991668
            text = "HH

"

praat.py crashes while trying to read in this file with the following error:

File "/home/speech/Resources/Tools/FAVE/FAVE-extract/bin/praat.py", line 540, in read
jmin = round(float(line.strip().split('=')[1].strip()), 3) # line reads "xmin = xxx.xxxxx"
IndexError: list index out of range

It would be more helpful to the user to print out the line that caused the crash so that the user doesn't need to look for it themselves after the crash. Even better would be for praat.py to check the format of the entire TextGrid file while reading it in and output information about any lines that have formatting errors.

TypeError due to accidental overwrite of internal dictionary

Lines 216-219 in fave/cmudictionary.py

                    if word not in add_dict:
                        add_dict = []
                    if t not in add_dict[word]:
                        add_dict[word].append(t)

The code path in line 217 resets add_dict (a dictionary) to an empty list. This leads to a problem on line 218 where add_dict is accessed as a dictionary which raises an error because it is now a list after the execution of line 217.

unit tests

We need some kind of unit tests to try out new code.

Reimplement configuration files

In #61 we decided to reimplement config files using YAML. This is a tracking task for that goal.

Specs

  • Configuration file settings MUST overwrite defaults.
  • Command line flags MUST overwrite configuration file settings
  • The old method using argparse where flags were specified in a file MUST remain for backwards compatibility
  • An external YAML parser SHOULD be used (probably PyYAML)

To do

  • Write tests for the "must" specs above
  • Implement yaml config
  • write tutorial / documentation page

Mac OSX doesn't guarantee praat in PATH

Troubleshooting use of fave/extractFormants.py on a fresh install of Fave2.0.0 and we ran into the problem of praat not being in PATH. The program was installed, but it seems to live in the Applications directory which is not normally in PATH. This means that the search on line 2090 fails, and the user (who has praat installed) is left with a confusing message about how the program cannot be found.

The workaround we found was to edit the user's ~/.bash_profile so that the applications directory is added to their PATH when starting their terminal session. I'd argue this is not ideal. Adding the whole applications folder to their PATH introduces a vulnerability, and adding the specific Praat folder to their PATH is fragile should praat ever change its directory structure or install location. It's probably also not the best idea to encourage end users to play around in their bash profile. This all goes double for asking them to symlink to praat from their /usr/local/bin

The best solution, I think, is a more intelligent search for praat. If the search fails (line 2097) it would make sense to search for an ~/Applications directory and see if we can find a praat executable in there (probably with a warning in case it's old or broken). If so, use it, and if not fail with some additional information. Thoughts?

extractFormants.py cant find praat on ubuntu.

I got a path error when trying to run extractFormants.py
due to these lines

    elif os.name == 'posix':
        PRAATNAME = 'Praat'

anyway I had to
sudo ln -s /usr/bin/praat /usr/bin/Praat
to get things working.

FAVE-extract doesn't provide enough context

The output from FAVE-extract provides the word from which the vowel was extracted. It should also, minimally, provide

  • the preceding segment
  • the following segment
  • the word context (initial, internal or final)
  • the full transcription of the word
  • the index of the vowel within the word
  • the full transcription of the preceding word
  • the full transcription of the following word

For the preceding and following context, some heuristic should be applied to ignore small sp's

FAVE-align dictionary check output

Some transcript files are causing the FAVE-align dictionary check to produce garbage output text files, inserting null bytes like so:
screen shot 2014-06-06 at 12 52 20 pm
(The rest of the file is like that, making it impossible to check most of the items.)

I'm not sure what's causing this problem, not all files I've run produce it. If anyone's interested in fixing this, I can send an example file.

Aligner ignores custom dictionary entries when running HTK

The HVite command is constructed at line 465 and part of the command specifies the dictionary file to be passed to HTK (line 477). Currently, the only dictionary passed to HTK is the CMU dictionary distributed with FAVE. If an alignment is attempted with a custom dictionary, HVite will fail saying that a word is not in the dictionary even if a transcription is given in the custom dictionary. This can be confusing for users to debug and fix.

FAVE should merge the default dictionary and any included dictionary, and then pass both as part of the HVite command so that custom dictionary inclusion works as intended.

Formant Tracks are Problematic

The formant tracks returned by FAVE-extract are problematic. The tracks for /ay/ frequently don't include the point measurement, probably because of the extra padding.

ay:
formant_tracks

ae:
ae_formant_tracks

Error using nmake /f htk_htklib_nt.mkf all command

Microsoft (R) Program Maintenance Utility Version 14.00.24210.0
Copyright (C) Microsoft Corporation. All rights reserved.

    cl /nologo /c /ML /W0 /GX /O2 /G5 /Ob2 /D "NDEBUG" /D "WIN32" /D "_WINDOWS" /I "." /D "WIN32_AUDIO"  /D ARCH=\"WIN32\" /D "PHNALG" esig_asc.c

'cl' is not recognized as an internal or external command,
operable program or batch file.
NMAKE : fatal error U1077: 'cl' : return code '0x1'
Stop.

FAVE-extract text output vowel class

The tab-delim text output of FAVE-extract contains the original ARPABET transcriptions for vowels, which ignores any later recoding done by FAVE-extract. For example, all short-a tokens are 'AE', even when run with the Phila system.

malloc.h -> stdlib.h on macOS Catalina

Installing HTK on macOS Catalina. Following the steps outlined in HTK on OS X, error arises the make all step...

strarr.c:21:10: fatal error: 'malloc.h' file not found
#include <malloc.h>
         ^~~~~~~~~~
1 error generated.
make[1]: *** [strarr.o] Error 1
make: *** [HTKLib/HTKLib.a] Error 1

The fix, for me at least: replace line 21 (#include <malloc.h>) of ./HTKLib/strarr.c with #include <stdlib.h>

I can submit a pull request if that will help, just didn't think it's worth it for the small issue.

specify phones/words tier numbers

Can improve compatibility by allowing users to manually specify the number of the speaker's phone/word tier numbers, rather than defaulting to adjacent phone/word tiers

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.