Giter Site home page Giter Site logo

abxpy's Introduction

https://travis-ci.org/bootphon/ABXpy.svg?branch=master

ABX discrimination test

ABX discrimination is a term that is used for three stimuli presented on an ABX trial. The third is the focus. The first two stimuli (A and B) are standard, S1 and S2 in a randomly chosen order, and the subjects' task is to choose which of the two is matched by the final stimulus (X). (Glottopedia)

This package contains the operations necessary to initialize, calculate and analyse the results of an ABX discrimination task.

Check out the full documentation at https://docs.cognitive-ml.fr/ABXpy.

Organisation

It is composed of 3 main modules and other submodules.

The features can be calculated in numpy via external tools, and made compatible with this package with the h5features module, or directly calculated with one of our tools like shennong.

The pipeline

In Module Out
  • data.item
  • parameters
task
  • data.abx
  • data.abx
  • data.features
  • distance
distance
  • data.distance
  • data.abx
  • data.distance
score
  • data.score
  • data.abx
  • data.score
analyse
  • data.csv

See Files Format for a description of the files used as input and output.

The task

According to what you want to study, it is important to characterise the ABX triplets. You can characterise your task along 3 axes: on, across and by a certain label.

An example of ABX triplet:

A B X
on_1 on_2 on_1
ac_1 ac_1 ac_2
by by by

A and X share the same 'on' attribute; A and B share the same 'across' attribute; A,B and X share the same 'by' attribute.

Example of use

See examples/complete_run.sh for a command line run and examples/complete_run.py for a Python utilisation.

Installation

The recommended installation on linux and macos is using conda:

conda install -c coml abx

Alternatively you may want to install it from sources. First clone this repository and go to its root directory. Then

conda env create -n abx -f environment.yml
source activate abx
make install
make test

Build the documentation

To build the documentation in the folder ABXpy/build/doc/html, simply have a:

make doc

Citation

If you use this software in your research, please cite:

ABX-discriminability measures and applications, Schatz T., Université Paris 6 (UPMC), 2016.

abxpy's People

Contributors

cdancette avatar dpx-fair avatar jubenjum avatar jukaradayi avatar louisabraham avatar mmmaat avatar rolt avatar syhw avatar thomas-schatz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

abxpy's Issues

Huge git repo size

There is an enormous (33Mb) pack file that is always cloned, even when using git clone --depth 1.

The biggest files in the pack file seem to be:

  • MFCC.tar.gz
  • resources/english.item
  • HTKposteriors.tar.gz
  • resources/xitsonga.item
  • ABXpy/test/frozen_files/data.abx
  • resources/sample.item

Apart from ABXpy/test/frozen_files/data.abx, they all seem to have been deleted. Would it be possible to clean the pack file?

Undefined behavior

Some outputs are determined by an iteration order on sets / dicts.
They are guaranteed to be consistent for multiple executions, but not across versions.

I think the csv test in test_frozen_analyze is the only one concerned.

Do you prefer:

  • to correct the undefined behavior (more complicated and since the tests will be changed there is a risk of breaking something but it is ultimately cleaner)?
  • to make separate tests for different Python versions (simplest to implement)?

make test failed on branch master

hi,

created a conda python 2 environment. did pip install cython, pip install git+https://github.com/bootphon/ABXpy and make test. Got the following error:

 /Users/sameerkhurana/repos/ABXpy/ABXpy/task.py:828: DeprecationWarning:
  .ix is deprecated. Please use
  .loc for label based indexing or
  .iloc for positional indexing

  See the documentation here:
  http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
    on_across_by_values = dict(db.ix[block[0]])

-- Docs: https://docs.pytest.org/en/latest/warnings.html
============================================================================ 9 failed, 12 passed, 269 warnings in 13.61 seconds ============================================================================
Closing remaining open files:test_items/data.abx...Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/Users/sameerkhurana/anaconda3/envs/zerospeech/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/Users/sameerkhurana/anaconda3/envs/zerospeech/lib/python2.7/site-packages/tables/file.py", line 134, in close_all
    fileh.close()
  File "/Users/sameerkhurana/anaconda3/envs/zerospeech/lib/python2.7/site-packages/tables/file.py", line 2732, in close
    self.root._f_close()
  File "/Users/sameerkhurana/anaconda3/envs/zerospeech/lib/python2.7/site-packages/tables/group.py", line 937, in _f_close
    self._g_close_descendents()
  File "/Users/sameerkhurana/anaconda3/envs/zerospeech/lib/python2.7/site-packages/tables/group.py", line 901, in _g_close_descendents
    node_manager.close_subtree(self._v_pathname)
  File "/Users/sameerkhurana/anaconda3/envs/zerospeech/lib/python2.7/site-packages/tables/file.py", line 543, in close_subtree
    self._close_nodes(paths, cache.pop)
  File "/Users/sameerkhurana/anaconda3/envs/zerospeech/lib/python2.7/site-packages/tables/file.py", line 516, in _close_nodes
    node._g_close()
  File "/Users/sameerkhurana/anaconda3/envs/zerospeech/lib/python2.7/site-packages/tables/group.py", line 910, in _g_close
    self._g_close_group()
  File "tables/hdf5extension.pyx", line 1090, in tables.hdf5extension.Group._g_close_group
HDF5ExtError: HDF5 error back trace

  File "H5G.c", line 812, in H5Gclose
    not a group

End of HDF5 error back trace
 File "/Users/sameerkhurana/anaconda3/envs/zerospeech/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/Users/sameerkhurana/anaconda3/envs/zerospeech/lib/python2.7/site-packages/tables/file.py", line 134, in close_all
    fileh.close()
  File "/Users/sameerkhurana/anaconda3/envs/zerospeech/lib/python2.7/site-packages/tables/file.py", line 2732, in close
    self.root._f_close()
  File "/Users/sameerkhurana/anaconda3/envs/zerospeech/lib/python2.7/site-packages/tables/group.py", line 937, in _f_close
    self._g_close_descendents()
  File "/Users/sameerkhurana/anaconda3/envs/zerospeech/lib/python2.7/site-packages/tables/group.py", line 901, in _g_close_descendents
    node_manager.close_subtree(self._v_pathname)
  File "/Users/sameerkhurana/anaconda3/envs/zerospeech/lib/python2.7/site-packages/tables/file.py", line 543, in close_subtree
    self._close_nodes(paths, cache.pop)
  File "/Users/sameerkhurana/anaconda3/envs/zerospeech/lib/python2.7/site-packages/tables/file.py", line 516, in _close_nodes
    node._g_close()
  File "/Users/sameerkhurana/anaconda3/envs/zerospeech/lib/python2.7/site-packages/tables/group.py", line 910, in _g_close
    self._g_close_group()
  File "tables/hdf5extension.pyx", line 1090, in tables.hdf5extension.Group._g_close_group
tables.exceptions.HDF5ExtError: HDF5 error back trace

  File "H5G.c", line 812, in H5Gclose
    not a group

End of HDF5 error back trace

Problems closing the Group feat_dbs
make: *** [test] Error 1

pip install miss cython

@louisabraham thanks for your PR. I just tried it on a fresh conda environment. It works but need cython installed. So the installation section of the README should be, no?

pip install cython
pip install git+https://github.com/bootphon/ABXpy

error during import

I receive the following error when i import ABXpy.distances.distances:

Traceback (most recent call last):
  File "abxpy_eval.py", line 5, in <module>
    import ABXpy.distances.distances as dis
  File "/home/szhi/miniconda3/envs/abx_eval/lib/python3.8/site-packages/ABXpy/distances/distances.py", line 8, in <module>
    import h5features
  File "/home/szhi/.local/lib/python3.8/site-packages/h5features/__init__.py", line 31, in <module>
    from .h5features import read
  File "/home/szhi/.local/lib/python3.8/site-packages/h5features/h5features.py", line 29, in <module>
    from .data import Data
  File "/home/szhi/.local/lib/python3.8/site-packages/h5features/data.py", line 22, in <module>
    from .features import Features, SparseFeatures
  File "/home/szhi/.local/lib/python3.8/site-packages/h5features/features.py", line 21, in <module>
    import scipy.sparse as sp
  File "/home/szhi/.local/lib/python3.8/site-packages/scipy/sparse/__init__.py", line 228, in <module>
    from .base import *
  File "/home/szhi/.local/lib/python3.8/site-packages/scipy/sparse/base.py", line 10, in <module>
    from .sputils import (isdense, isscalarlike, isintlike,
  File "/home/szhi/.local/lib/python3.8/site-packages/scipy/sparse/sputils.py", line 16, in <module>
    supported_dtypes = [np.typeDict[x] for x in supported_dtypes]
  File "/home/szhi/.local/lib/python3.8/site-packages/scipy/sparse/sputils.py", line 16, in <listcomp>
    supported_dtypes = [np.typeDict[x] for x in supported_dtypes]
  File "/home/szhi/miniconda3/envs/abx_eval/lib/python3.8/site-packages/numpy/__init__.py", line 284, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'typeDict'

It seems that the problem is caused by 'typeDict' being deprecated by numpy after numpy version 1.21, but after downgrading numpy to 1.21 I received a different error ImportError: numpy.core.multiarray failed to import. Is there an example environment (packages + exact versions) that works for abxpy, or another way I could resolve this issue?

An error occured during Computing the distances

Hi,

running python bin/sample_eval1.py MFCC feats, gives me the following error

MFCC is the sample MFCCs provided in this repo.

Processing task across_talkers
Preprocessing... Writing the features in h5 format
Computing the distances
An error occured during the computation
Traceback (most recent call last):
  File "bin/sample_eval1.py", line 358, in <module>
final_score = fullrun(task, args.features, args.distance, args.output, ncpus=ncpus, keepcsv=args.csv)
  File "bin/sample_eval1.py", line 277, in fullrun
distance_file, distancefun, n_cpu=ncpus)
  File "/data/sls/temp/sameerk/tools/anaconda3/envs/abx2/lib/python2.7/site-packages/ABXpy-0.1.0-py2.7-linux-x86_64.egg/ABXpy/distances/distances.py", line 265, in compute_distances
jobs = create_distance_jobs(pair_file, distance_file, n_cpu)
  File "/data/sls/temp/sameerk/tools/anaconda3/envs/abx2/lib/python2.7/site-packages/ABXpy-0.1.0-py2.7-linux-x86_64.egg/ABXpy/distances/distances.py", line 41, in create_distance_jobs
by_dsets = [by_dset for by_dset in fh['unique_pairs']]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/data/sls/temp/sameerk/tools/anaconda3/envs/abx2/lib/python2.7/site-packages/h5py-2.9.0-py2.7-linux-x86_64.egg/h5py/_hl/group.py", line 262, in __getitem__
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open

what could be going wrong?

ABX does not like Unicode symbols

When trying to create task file from an item file that contains unicode symbols (IPA symbols), I got the following error:
"UnicodeEncodeError: "ascii" codec can't encode character '\u02d0' in position 1"

Lots of PyTables warnings

While converting the code to Python 3, I found strange warnings. In fact, the problem exists already at 0fe520e with Python 2 (before I started messing around with the encodings etc).

To reproduce:

  • Execute test_analyze() modified to comment the line shutil.rmtree('test_items')
  • In a Python interpreter:
>>> import pandas as pd
>>> store = pd.HDFStore('test_items/data.abx')
>>> store.info()

It will show a lot of warnings like:

/Users/louisabraham/miniconda3/envs/abxpy/lib/python2.7/site-packages/tables-3.4.2-py2.7-macosx-10.6-x86_64.egg/tables/group.py:1187: UserWarning: problems loading leaf ``/bys``::

  variable length strings are not supported yet

The leaf will become an ``UnImplemented`` node.
  % (self._g_join(childname), exc))
/Users/louisabraham/miniconda3/envs/abxpy/lib/python2.7/site-packages/tables-3.4.2-py2.7-macosx-10.6-x86_64.egg/tables/attributeset.py:299: DataTypeWarning: Unsupported type for attribute 'sorted' in node 'c2_v0'. Offending HDF5 class: 8
  value = self._g_getattr(self._v_node, name)
/Users/louisabraham/miniconda3/envs/abxpy/lib/python2.7/site-packages/tables-3.4.2-py2.7-macosx-10.6-x86_64.egg/tables/attributeset.py:299: DataTypeWarning: Unsupported type for attribute 'empty' in node 'c2_v0'. Offending HDF5 class: 8
  value = self._g_getattr(self._v_node, name)
/Users/louisabraham/miniconda3/envs/abxpy/lib/python2.7/site-packages/tables-3.4.2-py2.7-macosx-10.6-x86_64.egg/tables/attributeset.py:299: DataTypeWarning: Unsupported type for attribute 'sorted' in node 'c2_v1'. Offending HDF5 class: 8
  value = self._g_getattr(self._v_node, name)
/Users/louisabraham/miniconda3/envs/abxpy/lib/python2.7/site-packages/tables-3.4.2-py2.7-macosx-10.6-x86_64.egg/tables/attributeset.py:299: DataTypeWarning: Unsupported type for attribute 'empty' in node 'c2_v1'. Offending HDF5 class: 8
  value = self._g_getattr(self._v_node, name)
/Users/louisabraham/miniconda3/envs/abxpy/lib/python2.7/site-packages/tables-3.4.2-py2.7-macosx-10.6-x86_64.egg/tables/attributeset.py:299: DataTypeWarning: Unsupported type for attribute 'sorted' in node 'c2_v2'. Offending HDF5 class: 8
  value = self._g_getattr(self._v_node, name)
/Users/louisabraham/miniconda3/envs/abxpy/lib/python2.7/site-packages/tables-3.4.2-py2.7-macosx-10.6-x86_64.egg/tables/attributeset.py:299: DataTypeWarning: Unsupported type for attribute 'empty' in node 'c2_v2'. Offending HDF5 class: 8
  value = self._g_getattr(self._v_node, name)
/Users/louisabraham/miniconda3/envs/abxpy/lib/python2.7/site-packages/tables-3.4.2-py2.7-macosx-10.6-x86_64.egg/tables/group.py:1187: UserWarning: problems loading leaf ``/regressors/c2_v0/indexed_datasets``::

  variable length strings are not supported yet

make test fails on master

I followed the README procedure on RedHat Linux 7.

conda env create -n abx -f environment.yml
source activate abx
make install
make test

And I get after make test "6 failed, 15 passed, 2 warnings in 17.17s"

More precisely the error is:

def dtw_cosine_distance(x, y, normalized):
return dtw.dtw(x, y, cosine.cosine_distance, normalized)
E AttributeError: module 'ABXpy.distances.metrics.dtw' has no attribute 'dtw'

If I run abx-distance I get this error:

Job 1: computing distances for block 0 on 199
Traceback (most recent call last):
File "/gpfswork/rech/jvn/uul35qx/.conda/envs/ralg_env/bin/abx-distance", line 11, in
sys.exit(main())
File "/gpfswork/rech/jvn/uul35qx/.conda/envs/ralg_env/lib/python3.6/site-packages/ABXpy/distance.py", line 93, in main
distance=args.distance, njobs=args.njobs, group=args.group)
File "/gpfswork/rech/jvn/uul35qx/.conda/envs/ralg_env/lib/python3.6/site-packages/ABXpy/distance.py", line 46, in run
distancefun, normalized=normalized, n_cpu=njobs)
File "/gpfswork/rech/jvn/uul35qx/.conda/envs/ralg_env/lib/python3.6/site-packages/ABXpy/distances/distances.py", line 316, in comput
e_distances
feature_files, feature_groups, splitted_features, 1, normalized)
File "/gpfswork/rech/jvn/uul35qx/.conda/envs/ralg_env/lib/python3.6/site-packages/ABXpy/distances/distances.py", line 199, in run_di
stance_job
by_db = store['feat_dbs/' + by]
TypeError: must be str, not numpy.bytes_

Triplets computed when each item under a label is unique

It looks like triplets are computed - and scores calculated - even when each item has a different label, and that the abx score is computed "on" this label category. This should probably not happen as there can't be any triplet with A and X sharing the same label?

Python 3 string/byte comparison error

When calling Task.generate_triplets, the following error occurred:

Writing ABX triplets to task file...
Traceback (most recent call last):
  File "/home/szhi/miniconda3/envs/abx_eval/lib/python3.8/pdb.py", line 1705, in main
    pdb._runscript(mainpyfile)
  File "/home/szhi/miniconda3/envs/abx_eval/lib/python3.8/pdb.py", line 1573, in _runscript
    self.run(statement)
  File "/home/szhi/miniconda3/envs/abx_eval/lib/python3.8/bdb.py", line 580, in run
    exec(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "/home/szhi/multimodal_learning/abxpy_eval.py", line 1, in <module>
    import argparse
  File "/home/szhi/multimodal_learning/abxpy_eval.py", line 96, in df2task
    t.generate_triplets(output=task_file)
  File "/home/szhi/miniconda3/envs/abx_eval/lib/python3.8/site-packages/ABXpy/task.py", line 773, in generate_triplets
    self._compute_triplets(
  File "/home/szhi/miniconda3/envs/abx_eval/lib/python3.8/site-packages/ABXpy/task.py", line 834, in _compute_triplets
    out_regs.write(regressors, indexed=True)
  File "/home/szhi/miniconda3/envs/abx_eval/lib/python3.8/site-packages/ABXpy/h5tools/h5io.py", line 196, in write
    self.__initialize_datasets__(sample_data)
  File "/home/szhi/miniconda3/envs/abx_eval/lib/python3.8/site-packages/ABXpy/h5tools/h5io.py", line 243, in __initialize_datasets__
    sample_data = self.__parse_input_data__(sample_data)
  File "/home/szhi/miniconda3/envs/abx_eval/lib/python3.8/site-packages/ABXpy/h5tools/h5io.py", line 226, in __parse_input_data__
    raise ValueError(
ValueError: It is necessary to write to all of the managed datasets simultaneously.
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /home/szhi/miniconda3/envs/abx_eval/lib/python3.8/site-packages/ABXpy/h5tools/h5io.py(226)__parse_input_data__()
-> raise ValueError(
(Pdb) set(data.keys())
{'phone_2', 'phone_1'}
(Pdb) self.managed_datasets
[b'phone_1', b'phone_2']

The error was thrown because set(data.keys()) != set(self.managed_datasets), but the only difference is the string/byte encoding. I tried passing in the argument on=b'phone' instead of on='phone' to the Task object, but got AssertionError('ON attribute must be specified by a string'). I'm not sure if it's possible to edit my input or environment (while keeping python3) to fix this error.

cosine.cosine_distance: ComplexWarning: Casting complex values to real discards the imaginary part

Hello, when running:

import numpy as np
import ABXpy.distances.metrics.cosine as cosine

a = np.array([[1.17004299, 0.85545695, 1.00981605, 1.16844952, 0.63780457, 0.86987048]])
cosine.cosine_distance(a, a)

I get the following warning:

/home/username/anaconda3/lib/python3.7/site-packages/ABXpy/distances/metrics/cosine.py:25: ComplexWarning: Casting complex values to real discards the imaginary part
  d = np.array([[np.float64(np.lib.scimath.arccos(d[0, 0]) / np.pi)]])

I believe its because the inner intermediate d var is slightly higher than 1 because of a float rounding.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.