brain-score / brain-score Goto Github PK

A framework for evaluating models on their alignment to brain and behavioral measurements (50+ benchmarks)

License: MIT License

Python 19.19% MATLAB 0.93% C 0.40% Jupyter Notebook 79.48%

brain-score's Introduction

Brain-Score is a platform to evaluate computational models of brain function on their match to brain measurements in primate vision. The intent of Brain-Score is to adopt many (ideally all) the experimental benchmarks in the field for the purpose of model testing, falsification, and comparison. To that end, Brain-Score operationalizes experimental data into quantitative benchmarks that any model candidate following the BrainModel interface can be scored on.

Note that you can only access a limited set of public benchmarks when running locally. To score a model on all benchmarks, submit it via the brain-score.org website.

See the documentation for more details, e.g. for submitting a model or benchmark to Brain-Score. For a step-by-step walkthrough on submitting models to the Brain-Score website, see these web tutorials.

See these code examples on scoring models, retrieving data, using and defining benchmarks and metrics. These previous examples might be helpful, but their usage has been deprecated after the 2.0 update.

Brain-Score is made by and for the community. To contribute, please send in a pull request.

Local installation

You will need Python = 3.7 and pip >= 18.1.

pip install git+https://github.com/brain-score/vision

Test if the installation is successful by scoring a model on a public benchmark:

from brainscore_vision.benchmarks import public_benchmark_pool

benchmark = public_benchmark_pool['dicarlo.MajajHong2015public.IT-pls']
model = my_model()
score = benchmark(model)

# >  <xarray.Score ()>
# >  array(0.07637264)
# >  Attributes:
# >  Attributes:
# >      error:                 <xarray.Score ()>\narray(0.00548197)
# >      raw:                   <xarray.Score ()>\narray(0.22545106)\nAttributes:\...
# >      ceiling:               <xarray.DataArray ()>\narray(0.81579938)\nAttribut...
# >      model_identifier:      my-model
# >      benchmark_identifier:  dicarlo.MajajHong2015public.IT-pls

Some steps may take minutes because data has to be downloaded during first-time use.

Environment Variables

Variable	Description
RESULTCACHING_HOME	directory to cache results (benchmark ceilings) in, `~/.result_caching` by default (see https://github.com/brain-score/result_caching)

License

MIT license

Troubleshooting

`ValueError: did not find HDF5 headers` during netcdf4 installation

pip seems to fail properly setting up the HDF5_DIR required by netcdf4. Use conda: `conda install netcdf4`

repeated runs of a benchmark / model do not change the outcome even though code was changed

results (scores, activations) are cached on disk using https://github.com/mschrimpf/result_caching. Delete the corresponding file or directory to clear the cache.

CI environment

Add CI related build commands to test_setup.sh. The script is executed in CI environment for unittests.

References

If you use Brain-Score in your work, please cite "Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like?" (technical) and "Integrative Benchmarking to Advance Neurally Mechanistic Models of Human Intelligence" (perspective) as well as the respective benchmark sources.

@article{SchrimpfKubilius2018BrainScore,
  title={Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like?},
  author={Martin Schrimpf and Jonas Kubilius and Ha Hong and Najib J. Majaj and Rishi Rajalingham and Elias B. Issa and Kohitij Kar and Pouya Bashivan and Jonathan Prescott-Roy and Franziska Geiger and Kailyn Schmidt and Daniel L. K. Yamins and James J. DiCarlo},
  journal={bioRxiv preprint},
  year={2018},
  url={https://www.biorxiv.org/content/10.1101/407007v2}
}

@article{Schrimpf2020integrative,
  title={Integrative Benchmarking to Advance Neurally Mechanistic Models of Human Intelligence},
  author={Schrimpf, Martin and Kubilius, Jonas and Lee, Michael J and Murty, N Apurva Ratan and Ajemian, Robert and DiCarlo, James J},
  journal={Neuron},
  year={2020},
  url={https://www.cell.com/neuron/fulltext/S0896-6273(20)30605-X}
}

brain-score's People

Contributors

Stargazers

Watchers

Forkers

franzigeiger mschrimpf shashikg tiagogmarques kohitij-kar jiaxx stothe2 yeyumin dapello jason-rohman jaffourt bwest25 hermessuen serre-lab xiaozheng-yy mike-ferguson aalok-sathe chrisbrownlie ryan-woo helenr6 williamberrios alexevag bonnerlab agrimsharma20 augix dmayo christinaliang stephanieolaiya udeigwe gaspto ajcirincione jecummin sdrammis danielbear dulayjm jinghere11 wktown samwinebrake 00mjk nshervt jerry2001 benlonnqvist yingtiandt deyh2020 susanwys eringrant linus-md serre-lab danialghiaseddin test-runs-tci lee-wanhee jennyziyi-xu yudixie ynshah3 777dunhan thekej ernestobocini lchahuas

brain-score's Issues

Incorporate Gallant V1,V2 data

We should have the whole ventral stream in mkgu rather sooner than later, even if chunked.
Right now, we have V4-IT and we can get V1 and V2 data from Jack Gallant's lab.
V1: http://crcns.org/data-sets/vc/pvc-4/about (paper)
V2: http://crcns.org/data-sets/vc/v2-1/about-v2-1 (paper)

(also nice to see the approach of a different data viewer)

Move configuration external to code

A .ini or .yml (or whatever) file, and infrastructure for making config stuff available throughout the project.

Make a config.example.yml, add 'config.yml' to .gitignore so users can add credentials without accidentally committing.

Things to include:

directory for local data cache (e.g. needs to be a larger mount than home on OpenMind)
SQLite file location
Postgres connection information
Logging level and location
AWS credentials and which profile to use

sqlite I/O error

Sometimes (!), mkgu raises an sqlite3 disk I/O error.
This does not occur all the time, it seems like it mostly happens when running jobs in batches. Maybe concurrent accesses to sqlite do not work?

Traceback (most recent call last):
  File "neural_metrics/compare.py", line 39, in main
    hvm = mkgu.get_assembly(name="HvM")
  File "/om/user/msch/miniconda3/envs/neural-metrics/lib/python3.6/site-packages/mkgu/__init__.py", line 11, in get_assembly
    return fetch.get_assembly(name)
  File "/om/user/msch/miniconda3/envs/neural-metrics/lib/python3.6/site-packages/mkgu/fetch.py", line 247, in get_assembly
    assy_record = get_lookup().lookup_assembly(name)
  File "/om/user/msch/miniconda3/envs/neural-metrics/lib/python3.6/site-packages/mkgu/fetch.py", line 129, in lookup_assembly
    cursor.execute(self.sql_lookup_assy, (name,))
sqlite3.OperationalError: disk I/O error

reset_index bug caused by DataAssembly constructor

DataAssembly.reset_index() does not work, because internally DataArray._replace() constructs a new object, and the DataAssembly constructor recreates the MultiIndexes.

fix neuralfit-pca

it seems like there is an issue in the interplay between PCA and the neuralfit.
Specifically, test_metrics.test_neural_fit_metric_pca100 scores very low (<0.15) and I observed something similar when PCA'ing down to just 167 (= len(neuroids) - 1).

numpy.linalg.LinAlgError: SVD did not converge

For models such as CORnet-R with many zeros in their activations, PLS regression fails with a numpy.linalg.LinAlgError: SVD did not converge.
The error originates from NaNs in the regression weights which in turn stem from https://github.com/scikit-learn/scikit-learn/blob/a7a834bdb7a51ec260ff005715d50ab6ed01a16b/sklearn/cross_decomposition/pls_.py#L67 where x_score = 0 and thus y_weights = ... / 0 = NaN.

Solution approaches

~~pad activations with epsilon -- failed due to scaling~~
~~pad activations with random numbers -- yields a score of 0 due to randomness~~
~~drop neuroids with zero values -- not sure why, but it failed. maybe due to scaling again~~
~~drop neuroids with zero values after scaling -- it seems there's only 42 unique values in the first place, after this step, there's only one~~

specify git dependencies with PEP 508

PEP 508 allows the specification of install_requires requirements like so:

install_requires=[
"result_caching @ git+https://github.com/mschrimpf/result_caching", 
]

This functionality is supported since pip 18.1 and removes the need for --process-dependency-links (which will be removed in pip 19).
We need to update at least our setup.py accordingly.

code-arrangement for Similarity utilities

A Similarity takes as input assembly1, assembly2 and outputs a Score object.

As of now, there are two kinds of similarities:

non-parametric, such as the RDMSimilarity: compute similarity of two assemblies directly
parametric, such as the NeuralFit: first fit on a training set, then predict on a test set and compute similarity based on the predictions

We also have additional utility on top of the simple case:

compute the outer-product of combinations of all adjacent dimensions (i.e. dimensions that are not used for the Similarity such as region).
cross-validate with several folds over a dimension that Similarity is computed over, e.g. object_name as part of presentation

There are several ways to organize the code around this:

sub-classing (this is the way it is organized now): there is a parent class OuterCrossValidationSimilarity that all Similarity classes need to inherit from. This parent class implements (1) and (2) from above and sub-classes only need to implement the simple case. Drawbacks:
1.1 harder to test since all sub-classes drag the parent code with them
1.2 we can't just implement apply in our sub-classes but need to adjust the method name
1.3 we can't compute similarity without the extra baggage, i.e. everything is always cross-validated
chaining: each Similarity class implements exactly one operation in apply. A chain operator then takes all these classes, applies them one after another and outputs only the final result. The result here is a list of assemblies which are then fed into a Score in Similarity.__call__. However, I don't know how to represent parametric and non-parametric similarities with this approach (one has to fit, predict, compare_prediction, the other just has to compare)
extract the computor from Similarity: all the specialized handling ((1) and (2) from utilities) goes into Similarity sub-classes, operation on simplified assemblies goes into a Computor class. Hard to separate the two though, for instance Similarity still needs to call fit, predict etc.

For now, approach (1) works but after NIPS, I would like to revisit the structuring here.

documentation: Move some jupyter notebook tutorials

Some of the jupyter notebooks illustrating the use of brainio code are in this repo, and they should be moved to brainio or brainio-contrib repo

Interest in adding similarity datasets?

From http://cocosci.princeton.edu/papers/Peterson_et_al-2018-Cognitive_Science.pdf

lookup.db not copied on installation

Installing mkgu with python setup.py install doesn't copy the lookup.db file to the site-packages directory. As a result, SQLiteLookup does not find the table names.

setting attributes in StimulusSet

just saw this and was wondering about it: in https://github.com/dicarlolab/mkgu/blob/d054d65fc2410587c91fac066c21f82f1d32b0df/mkgu/stimuli.py#L10, should this not be moved to the __init__ constructor?

travis unit-test examples

We should automatically run our examples with travis to make sure they still work.
At the moment, they keep getting out-dated because there is no automatic test in place to ensure they run.

There is an example of such a setup here: https://github.com/ghego/travis_anaconda_jupyter

package result of RSA/RDMCharacterization in assembly again

on the same note, make sure that we compute the correlation over neuroids and not ids.

StimulusSet selection yields DataFrame

(transferring @qbilius' issue from https://github.com/mschrimpf/brain-score/issues/5)

df = brainscore.get_stimulus_set('dicarlo.hvm')
df = brainscore.stimuli.StimulusSet(df[df.variation == 6])

df is now a pandas DataFrame, not StimulusSet.
expected output is that df remains a StimulusSet.

Show progress for fetchers

yield coord, (dims, value) from walk_coords

right now coord, dims, value is being yielded

Neural Fit identity score low

Using neural fit to go from hvm to itself (i.e. neural_fit(hvm, hvm)) yields a comparably low score of around .78 (see this unit test).
Did others experience something similar (@qbilius)? Is that an issue with the regression?

Fix proxy behavior of DataAssembly to return a DataAssembly when appropriate

automated API

following up on #1, it would be great if we could get some documentation. Ideally we'd create this automatically through readthedocs.io

put metrics in brainscore/init's all

make sure the following works:

import brainscore
brainscore.metrics

Preserve relevant coords after GroupBy

rename dicarlo.Hong2011 to dicarlo.Majaj2015

Various suggestions

Bugs

test_mkgu has two unused declarations of test_load

Style

type is a reserved word in Python; consider replacing with kind
Benchmark.calculate, Metric.apply etc might be better served by __call__ method
Things like _fetcher_types might be better declared at the top and in capitals (FETCHER_TYPES)
I strongly suggest not using rst for formatting README and HISTORY. Nobody uses this format. Markdown is much more common.
return 0 (e.g., in metrics.py) is not a thing in Python

Other

Docs don't exist yet but aren't they supposed to render automatically on ReadTheDocs?
Any chance lookup.dbcould be a simple csv file? Since it is unlikely to grow too big, there would be no performance penalty, but there would be an advantage of being able to quickly see dataset names and available assets.
A Jupyter Notebook with an example is much desired.

Add a method to save a DataAssembly to NetCDF, flattening MultiIndexes

There is no convenient method to save a DataAssembly.

running metrics is slow

ideas for improving:

parallelize, e.g. across cross-validations
provide a "quick-and-dirty" way of evaluating, e.g. run the fit without cross-validation

test data alignment

the RDMs seem to be different after the recent assembly re-formatting, suggesting that there's an error in the data.
Make sure that the data is aligned properly.

Add Rishi's behavioral data

Make DataAssembly inherit from xr.DataArray

Implement URL fetcher

Add environment variable for choosing boto3 profile

Profile can then be used with session = boto3.Session(profile_name='mturk')

Add StimulusSets

With lookup
With fetching
Add reference(s) to StimulusSets in DataAssembly class and in lookup
Do some verification that relevant coordinates in a DataAssembly are valid in the associated StimulusSet

Behavioral benchmarks are not resizing to model degrees

show-case how to score models

given the activations of a model, show how they can be fed into a Benchmark to retrieve a score for that model

Where should dataset-specific code go?

Add behavioral metrics

@qbilius' implementation: https://github.com/qbilius/streams/blob/master/streams/metrics/behav_cons.py#L269

new version of xarray gives an error

(Submitting as requested)
If I use the most recent xarray version, it gives an error that 'Score' does not have an attribute 'indexes'. Using older version of xarray won't give the error.

return aggregate Score object

contents:

region
mean, std

Add a function to list assembly names

conda env create errors

Getting the following errors when creating conda environment with conda env create -f environment.yml:

ResolvePackageNotFound:
  - netcdf4==1.2.4=np113py36_1

After removing that package:

UnsatisfiableError: The following specifications were found to be in conflict:
  - libnetcdf==4.4.1=1 -> jpeg=9
  - qt==5.6.2=2

Disallow multidimensional coords

and add a convenience method for getting coords names for a given dimension

ToliasCadena benchmarks do not run on visual_degrees branch

To correct add the following field to the stimulus_set_degrees dictionary on brainscore/init.py

'tolias.Cadena2017': 2

Handle public AWS S3 resources gracefully

mkgu gives NoCredentialsError if there are no AWS credentials files, even if the resource is public.

Add logging

Naming and formatting coordinates

presentation: presentation_id, image_id (currently, it's called id (double-check it's not the _id))
neuroid: neuroid_id

rename vars from 'V0' -> 0
filenames -> strip full path

necessity to configure AWS credentials

It seems like users still have to configure AWS credentials even if they only access public resources.
Can we get rid of that requirement @jjpr-mit? If yes, how long will it take?
It's also okay if we put a note in the README detailing how to configure AWS.

Gather indexes when constructing DataAssembly

Website not mobile-friendly

Table doesn't fit on screen yet there is no horizontal scrollbar.
Tested on Firefox.

Possible solution (wild guess):

body {
  overflow:auto;
}

metrics

RDM
Neural fit

Add a function to get index level names (pseudo-coordinates) like coords_for_dim()

There is no convenient method to obtain the names of the MultiIndex levels for a given dimension.

Note that it's not simple, as a dimension may have multiple MultiIndexes, and (I think) a given MultiIndex may apply to more than one dimension.