erikbern / ann-benchmarks Goto Github PK

View Code? Open in Web Editor NEW

4.8K 117.0 724.0 21.95 MB

Benchmarks of approximate nearest neighbor libraries in Python

Home Page: http://ann-benchmarks.com

License: MIT License

Python 88.07% HTML 2.42% Dockerfile 9.51%

nearest-neighbors benchmark docker

ann-benchmarks's Introduction

Benchmarking nearest neighbors

Doing fast searching of nearest neighbors in high dimensional spaces is an increasingly important problem with notably few empirical attempts at comparing approaches in an objective way, despite a clear need for such to drive optimization forward.

This project contains tools to benchmark various implementations of approximate nearest neighbor (ANN) search for selected metrics. We have pre-generated datasets (in HDF5 format) and prepared Docker containers for each algorithm, as well as a test suite to verify function integrity.

Evaluated

Annoy
FLANN
scikit-learn: LSHForest, KDTree, BallTree
Weaviate
PANNS
NearPy
KGraph
NMSLIB (Non-Metric Space Library) : SWGraph, HNSW, BallTree, MPLSH
hnswlib (a part of nmslib project)
RPForest
FAISS
DolphinnPy
Datasketch
nndescent
PyNNDescent
MRPT
NGT : ONNG, PANNG, QG
SPTAG
PUFFINN
N2
ScaNN
Vearch
Elasticsearch : HNSW
Elastiknn
ExpANN
OpenSearch KNN
DiskANN : Vamana, Vamana-PQ
Vespa
scipy: cKDTree
vald
Qdrant
HUAWEI(qsgngt)
Milvus : Knowhere
Zilliz(Glass)
pgvector
pgvecto.rs
RediSearch
- pg_embedding
Descartes(01AI)
kgn

Data sets

We have a number of precomputed data sets in HDF5 format. All data sets have been pre-split into train/test and include ground truth data for the top-100 nearest neighbors.

Dataset	Dimensions	Train size	Test size	Neighbors	Distance	Download
DEEP1B	96	9,990,000	10,000	100	Angular	HDF5 (3.6GB)
Fashion-MNIST	784	60,000	10,000	100	Euclidean	HDF5 (217MB)
GIST	960	1,000,000	1,000	100	Euclidean	HDF5 (3.6GB)
GloVe	25	1,183,514	10,000	100	Angular	HDF5 (121MB)
GloVe	50	1,183,514	10,000	100	Angular	HDF5 (235MB)
GloVe	100	1,183,514	10,000	100	Angular	HDF5 (463MB)
GloVe	200	1,183,514	10,000	100	Angular	HDF5 (918MB)
Kosarak	27,983	74,962	500	100	Jaccard	HDF5 (33MB)
MNIST	784	60,000	10,000	100	Euclidean	HDF5 (217MB)
MovieLens-10M	65,134	69,363	500	100	Jaccard	HDF5 (63MB)
NYTimes	256	290,000	10,000	100	Angular	HDF5 (301MB)
SIFT	128	1,000,000	10,000	100	Euclidean	HDF5 (501MB)
Last.fm	65	292,385	50,000	100	Angular	HDF5 (135MB)

Results

These are all as of April 2023, running all benchmarks on a r6i.16xlarge machine on AWS with --parallelism 31 and hyperthreading disabled. All benchmarks are single-CPU.

glove-100-angular

sift-128-euclidean

fashion-mnist-784-euclidean

nytimes-256-angular

gist-960-euclidean

glove-25-angular

TODO: update plots on http://ann-benchmarks.com.

Install

The only prerequisite is Python (tested with 3.10.6) and Docker.

Clone the repo.
Run pip install -r requirements.txt.
Run python install.py to build all the libraries inside Docker containers (this can take a while, like 10-30 minutes).

Running

Run python run.py (this can take an extremely long time, potentially days)
Run python plot.py or python create_website.py to plot results.
Run python data_export.py --out res.csv to export all results into a csv file for additional post-processing.

You can customize the algorithms and datasets as follows:

Check that ann_benchmarks/algorithms/{YOUR_IMPLEMENTATION}/config.yml contains the parameter settings that you want to test
To run experiments on SIFT, invoke python run.py --dataset glove-100-angular. See python run.py --help for more information on possible settings. Note that experiments can take a long time.
To process the results, either use python plot.py --dataset glove-100-angular or python create_website.py. An example call: python create_website.py --plottype recall/time --latex --scatter --outputdir website/.

Including your algorithm

Add your algorithm in the folder ann_benchmarks/algorithms/{YOUR_IMPLEMENTATION}/ by providing

A small Python wrapper in module.py
A Dockerfile named Dockerfile
A set of hyper-parameters in config.yml
A CI test run by adding your implementation to .github/workflows/benchmarks.yml

Check the available implementations for inspiration.

Principles

Everyone is welcome to submit pull requests with tweaks and changes to how each library is being used.
In particular: if you are the author of any of these libraries, and you think the benchmark can be improved, consider making the improvement and submitting a pull request.
This is meant to be an ongoing project and represent the current state.
Make everything easy to replicate, including installing and preparing the datasets.
Try many different values of parameters for each library and ignore the points that are not on the precision-performance frontier.
High-dimensional datasets with approximately 100-1000 dimensions. This is challenging but also realistic. Not more than 1000 dimensions because those problems should probably be solved by doing dimensionality reduction separately.
Single queries are used by default. ANN-Benchmarks enforces that only one CPU is saturated during experimentation, i.e., no multi-threading. A batch mode is available that provides all queries to the implementations at once. Add the flag --batch to run.py and plot.py to enable batch mode.
Avoid extremely costly index building (more than several hours).
Focus on datasets that fit in RAM. For billion-scale benchmarks, see the related big-ann-benchmarks project.
We mainly support CPU-based ANN algorithms. GPU support exists for FAISS, but it has to be compiled with GPU support locally and experiments must be run using the flags --local --batch.
Do proper train/test set of index data and query points.
Note that we consider that set similarity datasets are sparse and thus we pass a sorted array of integers to algorithms to represent the set of each user.

Authors

Built by Erik Bernhardsson with significant contributions from Martin Aumüller and Alexander Faithfull.

Related Publication

Design principles behind the benchmarking framework are described in the following publications:

M. Aumüller, E. Bernhardsson, A. Faithfull: ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms. Information Systems 2019. DOI: 10.1016/j.is.2019.02.006
M. Aumüller, E. Bernhardsson, A. Faithfull: Reproducibility protocol for ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor search algorithms, Artifacts.

Related Projects

big-ann-benchmarks is a benchmarking effort for billion-scale approximate nearest neighbor search as part of the NeurIPS'21 Competition track.

ann-benchmarks's People

Contributors

Stargazers

Watchers

Forkers

pixelogik kod3r maciejkula lenovor piskvorky weichern spideryan rtvt123 xieyanfu lizhenjie2015 jasonyaw liyi-1989 jediz mindis parallelgithub liaobs fxcebx agangzz caomw xuanhan863 rakave marutiagarwal yurymalkov erramuzpe zayedmohamed aaalgo ashhher3 searchivarius fulquan dreadlord1984 liyongbao adrianhust anyong298 hhh920406 zhenv5 loganding kakacynic dblalock ucasqcz daniel-acuna nptdat boluoyu ieee820 xiuweihe e- bygreencn svebk lesteve yiliangnie cserxy djiayong5 glancefox lelayf gauresh10 bileg geogreff zhouyonglong alexbigboy mllekander xhappy benfred wsyjwps1983 yuchengg longchuan1985 maumueller happynoom jimmycao zyx1986 currymj lluhrbuh digideskio bikong2 beimingmaster zengjianping guoyu07 eycab dataplayer12 locussam huangpeng1126 ansvver raffedwardbah ale-f jianqiangq liwzhi ilyaraz talentlei lmcinnes kevinking zhijieqiu pareshnakhe xiaocai506 indexsearch-universe xiangjun0103 cosecant-csc rafaelcp mutual-ai afcarl binbinmeng yxz9087 writtic

ann-benchmarks's Issues

pynndescent

saw this randomly: https://github.com/lmcinnes/pynndescent

Website Title Isn't Very Descriptive

Just a suggestion, I think having a more descriptive <title> tag on the website's homepage would be helpful. Right now, it's simply ANN-Benchmarks. It might be better as "ANN-Benchmarks: Approximate Nearest-Neighbors Algorithm Benchmarks" or something like that.

Colors

@erikbern Your choices of colors make the charts really hard to read... and I am cannot fathom what it would be for a color blind person. It would be great if you can pick something with more contrast and possibly avoid shades of blue/green/orange. Thanks!

New run May 2018

Have been running it on a c5.4xlarge for about 36h. It's still on glove-100-angular and it's mostly done:

Not sure what's up with NGT... think maybe it timed out and we should increase timeout a bit. Especially since we're now running many searches in the same container and CPU is limited to 1. I'll double it from 2h to 4h I think.

Random datasets too easy

The random blobs created from sklearn seem to be far too easy to solve, see screenshot below. I propose to bring back the old random datasets that use https://github.com/maumueller/random-inputs/. For me the easiest way would be to create them, put them on a website, add a function to datasets.py that fetches it and transforms it correctly.

Link to latest version of paper in Readme.md

Maybe?
Paper is here: https://arxiv.org/abs/1807.05614

Run ONNG for nytimes

@maumueller
I resolved the problem (#93) of the results due to invalid zero vectors in the queries. Sorry for troubling you, but could you rebuild NGT and recreate results for the nytimes dataset?
Thank you in advance.

Add NSG

C++ code here: https://github.com/ZJULearning/nsg
I think providing wrapping with cython should be easy

Check README or datasets

In README it says that GloVe datasets have around 60k queries, while in the files there are 10k of them.

Is there any plan to compare with Rayuela.jl ??

https://github.com/una-dinosauria/Rayuela.jl
This is the code for the paper
Julieta Martinez, Shobhit Zakhmi, Holger H. Hoos, James J. Little. LSQ++: Lower running time and higher recall in multi-codebook quantization. In ECCV 2018. [CVF].

Rayuela.jl is a package that implements non-orthogonal multi-codebook quantization methods (MCQ). These methods are useful for fast search of high-dimensional (d=100s-1000s) dense vectors. Rayuela is written in Julia.

Falconn not included?

Is there a specific reason why Falconn is not included?

https://github.com/falconn-lib/falconn

Add Granne

https://github.com/granne/granne

Saw this from https://0x65.dev/blog/2019-12-07/indexing-billions-of-text-vectors.html

Instance for the GloVe dataset

Was it supposed to be c4.2xlarge (not 2.4)?

Which libraries support incremental item additions?

Thanks for putting this project together. Is there any way a table or small list could be included to describe which libraries support incremental item additions? I imagine for many this is an important criteria in real-time or near-real-time applications where new content is continuously added and needs to be indexed for look-ups.

Redirect http://www.itu.dk/people/pagh/SSS/ann-benchmarks/ to ann-benchmarks.com

There was a hacker news thread linking to the itu.dk page: https://news.ycombinator.com/item?id=21374400

Probably makes sense to remove that page and redirect to ann-benchmarks.com or what do you think @maumueller @ale-f ?

Reminds me it's probably time to run another benchmark soon. There's been some additions lately.

Noob question: OS you used and requirements

Hello @erikbern and team,

First of all, let me congratulate you for your fine job, it is amazing :D

I'm trying to run your benchmark for K = 1 (seems like performance changes). I'm using Ubuntu 16.04 and python3.5.

While I was installing (sudo python3 install.py) I noticed it installs lots of dependencies from bionic beaver. Is Bionic Beaver (18.04) or any other OS a requirement?

Are numerical results available as well?

Right now, I can see only plots in the repo, which are pretty useless for the "external" comparison as such.

In particular, is HNSW still using hours for preprocessing? :) There does not seem to be a way to check now.

Add Milvus

https://github.com/milvus-io/milvus

"Milvus -- the world's fastest vector search engine." would love to verify that claim :)

Add FLASH

https://github.com/RUSH-LAB/Flash

https://arxiv.org/pdf/1709.01190.pdf

the module 'ann_benchmarks.algorithms.faiss_gpu' could not be loaded How can I solve it?

python run.py --docker-tag ann-benchmarks-faiss --dataset gist-960-euclidean --run-disabled --local --batch

ann_benchmarks.algorithms.faiss_gpu.FaissGPU([4096, 1]): warning: the module 'ann_benchmarks.algorithms.faiss_gpu' could not be loaded; skipping
ann_benchmarks.algorithms.nmslib.NmslibReuseIndex([u'euclidean', 'sw-graph', {'NN': 16}, False]): warning: the module 'ann_benchmarks.algorithms.nmslib' could not be loaded; skipping
ann_benchmarks.algorithms.panns.PANNS([u'euclidean', 10, 50]): warning: the module 'ann_benchmarks.algorithms.panns' could not be loaded; skipping
ann_benchmarks.algorithms.nmslib.NmslibReuseIndex([u'euclidean', 'vptree', {'tuneK': 10, 'desiredRecall': 0.4}, False]): warning: the module 'ann_benchmarks.algorithms.nmslib' could not be loaded; skipping
ann_benchmarks.algorithms.nearpy.NearPy([u'euclidean', 16, 20]): warning: the module 'ann_benchmarks.algorithms.nearpy' could not be loaded; skipping
ann_benchmarks.algorithms.mrpt.MRPT([u'euclidean', 25, 2]): warning: the module 'ann_benchmarks.algorithms.mrpt' could not be loaded; skipping
ann_benchmarks.algorithms.nmslib.NmslibReuseIndex([u'euclidean', 'vptree', {'tuneK': 10, 'desiredRecall': 0.9}, False]): warning: the module 'ann_benchmarks.algorithms.nmslib' could not be loaded; skipping
ann_benchmarks.algorithms.hnswlib.HnswLib([u'euclidean', {'efConstruction': 500, 'M': 48}]): warning: the module 'ann_benchmarks.algorithms.hnswlib' could not be loaded; skipping
ann_benchmarks.algorithms.pynndescent.PyNNDescent([u'euclidean', 10, 4, 20]): warning: the module 'ann_benchmarks.algorithms.pynndescent' could not be loaded; skipping
ann_benchmarks.algorithms.faiss_gpu.FaissGPU([1024, 10]): warning: the module 'ann_benchmarks.algorithms.faiss_gpu' could not be loaded; skipping
ann_benchmarks.algorithms.faiss.FaissIVF([u'euclidean', 8192]): warning: the module 'ann_benchmarks.algorithms.faiss' could not be loaded; skipping
ann_benchmarks.algorithms.hnswlib.HnswLib([u'euclidean', {'efConstruction': 500, 'M': 8}]): warning: the module 'ann_benchmarks.algorithms.hnswlib' could not be loaded; skipping
ann_benchmarks.algorithms.nmslib.NmslibReuseIndex([u'euclidean', 'vptree', {'tuneK': 10, 'desiredRecall': 0.2}, False]): warning: the module 'ann_benchmarks.algorithms.nmslib' could not be loaded; skipping
ann_benchmarks.algorithms.faiss.FaissIVF([u'euclidean', 512]): warning: the module 'ann_benchmarks.algorithms.faiss' could not be loaded; skipping
ann_benchmarks.algorithms.hnswlib.HnswLib([u'euclidean', {'efConstruction': 500, 'M': 24}]): warning: the module 'ann_benchmarks.algorithms.hnswlib' could not be loaded; skipping
ann_benchmarks.algorithms.faiss.FaissIVF([u'euclidean', 1024]): warning: the module 'ann_benchmarks.algorithms.faiss' could not be loaded; skipping
ann_benchmarks.algorithms.flann.FLANN([u'euclidean', 0.9]): warning: the module 'ann_benchmarks.algorithms.flann' could not be loaded; skipping

plot.py does not plot

I used python plot.py --dataset=glove-100-angular to plot the results after the simulations are done. The results are in results/glove-100-angular/10.
But it does not plot. Instead, it says Exception: Nothing to plot
I tracked into compute_metrics in utils.py and found that the res is empty.
Can you help me out?

Add PQT

https://github.com/cgtuebingen/Product-Quantization-Tree

https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Wieschollek_Efficient_Large-Scale_Approximate_CVPR_2016_paper.pdf

float32 vs float64

It appears that the benchmark has switched from float32 to float64 quite a while ago, and this somehow contributed to a substantial performance regression of KGraph, which is only optimized for float32 although float64 also runs.

Below are my justifications for using float32 in the context of approximate K-NN search:

The search process is usually approximate, and the feature data are usually noisy by nature, so typically the precision of float64 is not necessary.
The necessity of an index to speed up search presumes that data is big. So a factor of 2x in the cost just to store the data in memory is significant.
For some algorithm/implementation, using float32 means more data can be fit into cache and the code being much faster.

I believe the performance behavior of some library will be very different for float32 and float64 (though I believe the relative ranking of different libraries should roughly stay the same). If only one can be chosen, I feel it should be float32 so that the benchmark represents the typical use case.

Is there a particular reason to keep float64, or it's better supported by some libraries in the benchmark?
(There's another reason why KGraph is slow which I'm still investigating.)

for help: code raise a exception

when runing install.py this line of code raise a exception，
subprocess.check_call( 'docker build \ --rm -t ann-benchmarks -f install/Dockerfile .', shell=True)
Exception is：
subprocess.CalledProcessError: Command 'docker build --rm -t ann-benchmarks -f install/Dockerfile .' returned non-zero exit status 1.
I hope you can help me

Add N2 ?

This one is promising:
https://github.com/kakao/n2

hdidx

randomly stumbled upon this one: https://github.com/hdidx/hdidx

interface looks simple, should be easy to add

disable nearpy, panns, dolphinnpy

They seem pretty bad. I don't mind keeping the code, but probably makes sense to disable them by default. They add a bunch of noise to the plots and also take a disproportionate amount of time during benchmarks

Add LOPQ

https://github.com/yahoo/lopq

writeup from Flickr: http://code.flickr.net/2017/03/07/introducing-similarity-search-at-flickr/

faiss

Here is not direct comparision of faiss library:
https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors

Hamming dataset

We should resurrect it

Randomly saw https://github.com/agnusmaximus/Word2Bits on Hacker News today – maybe we can use that dataset

ivf-hnsw

I stumbled over https://github.com/dbaranchuk/ivf-hnsw

Did anyone tried to benchmark their implementation?
Is there conzeptual difference between ivf-hnsw and hnsw from hnswlib or nswlib?

index construction time plotting

I plotted index construction time using data saved in glove.txt and see a spread of almost 2 orders of magnitude (hnsw(nmslib) index construction might have been parallelized according to code). I'm most interested in the performance of sw-graph vs hnsw. It's very hard to say which one is better because one might be able to improve the search performance of sw-graph by building more aggressive indices. I collaborated with Princeton folks on developing a hierarchical version of the algorithm behind KGraph and the results were negative. Initially there was noticeable speedup at the same accuracy, but after carefully controlling index construction setup, the speedup was gone.

I also suggest that we establish a standard hardware configuration to run the benchmark. Preferrably some type of AWS instance which can be easily reproduced. I think something close to your current setup should be good.

Add SPTAG

This looks interesting: https://github.com/microsoft/SPTAG

What does “count” mean in the function write_output?

distances = f.create_dataset('distances', (len(test), count), dtype='f'),
why count=100?
"count" means "the number of near neighbours to search for"?

Please add SPTAG to the benchmark

SPTAG: https://github.com/Microsoft/SPTAG

Thank you

Zero vectors in nytimes-256-angular

I found that nytimes-256-angular and nytimes-16-angular datasets have hundreds of zero-valued vectors.
The problem is that the cosine distance to such vectors is undefined, making the search an ill-defined task.
Probably, such vectors should be removed. Another option is using unnormalized inner product distance instead of cosine.

Regarding an interesting project

@erikbern do you have any thoughts or observations about the kepler-mapper?
https://github.com/MLWave/kepler-mapper
https://mlwave.github.io/kepler-mapper/examples.html

Any plan to test even higher dimensional data?

It'll be very interesting to see the comparisons between different suitable methods for higher dimensional data (e.g., several thousands or even larger).

Thank you.

Compare both Index Build time and Query Time using some sort of combined metric?

Is there a known metric to combine both build time and query time into a metric, for those that wish to do approximate k nearest neighbor within a different dataset each time.

Remove FALCONN from benchmarks

Please, remove FALCONN from the benchmarks. It's not properly tuned for the new settings (100 neighbors instead of 10 etc), and the results are confusing.

We might later add it back (the new, faster version), but meanwhile I don't want the results to be here.

benchmark running for too long

I've been running the benchmark on a machine that shouldn't be much slower than c4.2xlarge (core i7-2600k @ 3.4GHz). It's been 5 days and I only see 195 lines in sift.txt (the plot looks good though). There are 321 lines in the standard glove.txt, and 82 lines in the standard sift.txt which I think is incomplete. So I estimate it will take 15 - 20 days for me to finish running everything. Is it supposed to be so?

I'm attaching the SIFT plot I produced with the data I have so far, which is coherent to the Glove plot I see from the project page. (KGraph bug fixed.)

Nearpy distance is Cosine by default - Euclidean not applied

Hi,

I was checking some libraries and realized that in the case of Nearpy, the euclidean distance is not applied in the benchmark. As can be seen in the docs of Nearpy, the default is the cosine but the euclidean one is not applied. I will PR changes
thanks for the bench, it's great

if I only use the MNIST dataset?

hi,dear,
for the project may cost a long time,so I just want to try a little Dataset,
maybe the MNIST will be okey
so how to modify the codes ?
thx

About SPTAG test results

I tested it with a GIST data set，The size of the query vector is (1000*960)，Return 100 neighbors，I used the official default parameters，The time I built the index is 20562.084 s，The query time is 35.082 s，recall is 0.92669，The time to add 100 vectors is 22.281 s. I feel this time is very long, am I using it correctly? Its index construction and query time are longer than NSG. Can you give SPTAG test reference results?
my machine:
Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz
Ubuntu 18.04
python3

Request: Provide nearest neighbors for training data

For datasets like SIFT 1M, it would be great if nearest neighbors were provided for training data as well. From the perspective of computational cost, the test data is very small and users can create its metrics themsleves. But if you could provide the NNs for 1M training points that would help those who want to use the larger datasets.

Add MRPT

https://github.com/teemupitkanen/mrpt

Batch queries and single-threaded mode

There are currently two relicts in the code which are not accessible since the arguments for them have been removed. What shall we do with them?

Running an algorithm with runner.py has a force_single which avoids the thread-pooling mechanism altogether, independent of the use_threads method of an algorithm. I find it nice for checking whether the ThreadPool decreases performance.
Running an algorithm also has support for batch mode and each algorithm already implements batch_query. However, only faiss_gpu.py currently implements an actual batch mode. However, HNSW and faiss implementations actually have good batch query implementations and they can easily be included to the benchmark in a different section.

Split up building and query parameters

Currently, we spend a lot of time basically building the same data structure over and over again. What we could do instead, is split up build and query parameters. For example, Annoy is currently configured by

    annoy:
      constructor: Annoy
      base-args: ["@metric"]
      run-groups:
        annoy:
          args: [[100, 200, 400], [100, 200, 400, 1000, 2000, 4000, 10000,
                20000, 40000, 100000, 200000, 400000]] # number of trees x number of points to gather

It would save a lot of time, if one were to separate query and build parameters in the following way:

    annoy:
      constructor: Annoy
      base-args: ["@metric"]
      run-groups:
        annoy:
          build-args: [[100, 200, 400]] # number of trees 
          query-args: [[100, 200, 400, 1000, 2000, 4000, 10000,
                20000, 40000, 100000, 200000, 400000]] # number of points to gather during query

The idea would now be to build the data structure for each build-args group, and query it for all (combinations of) query-args. This should be rather straight-forward to do.

Should be labeled as "enhancement".

Licence?

Much appreciate addition of a licence file.

10X performance difference between multi-thread and single thread tests

Good day!

I have measured the difference in performance when switching the use_threads parameter from True to False on hnsw (nmslib). As it turns out, the difference is really huge, ~ 10X on 1M SIFT for low recalls (see enclosed plots for a 6-core intel cpu), and likely is caused by the python-part overhead of the tester scripts.

Multi-thread (default):

Single-thread (use_threads=False):

This makes the current results questionable, especially when compared to FALCONN (which seems to be the only algorithm which takes advantage of not using multi-thread tests). Multithread tests also significantly change the shape of the recall-QPS curve (flat vs rapidly decreasing QPS at low recall).

I suggest using the single thread version for all algorithms. Otherwise, for the case of relatively high QPS (~>2 kps) we just measure the performance of the tester script.