cavalab / srbench Goto Github PK

A living benchmark framework for symbolic regression

License: GNU General Public License v3.0

Python 48.19% C++ 16.55% Makefile 0.35% Shell 4.00% Jupyter Notebook 15.54% R 0.44% Dockerfile 0.44% C 14.49%

srbench's Introduction

SRBench: A Living Benchmark for Symbolic Regression

The methods for symbolic regression (SR) have come a long way since the days of Koza-style genetic programming (GP). Our goal with this project is to keep a living benchmark of modern symbolic regression, in the context of state-of-the-art ML methods.

Currently these are the challenges, as we see it:

Lack of cross-pollination between the GP community and the ML community (different conferences, journals, societies etc)
Lack of strong benchmarks in SR literature (small problems, toy datasets, weak comparator methods)
Lack of a unified framework for SR, or GP

We are addressing the lack of pollination by making these comparisons open source, reproduceable and public, and hoping to share them widely with the entire ML research community. We are trying to address the lack of strong benchmarks by providing open source benchmarking of many SR methods on large sets of problems, with strong baselines for comparison. To handle the lack of a unified framework, we've specified minimal requirements for contributing a method to this benchmark: a scikit-learn compatible API.

Benchmarked Methods

This benchmark currently consists of 14 symbolic regression methods, 7 other ML methods, and 252 datasets from PMLB, including real-world and synthetic datasets from processes with and without ground-truth models.

Methods currently benchmarked:

Age-Fitness Pareto Optimization (Schmidt and Lipson 2009) paper , code
Age-Fitness Pareto Optimization with Co-evolved Fitness Predictors (Schmidt and Lipson 2009) paper , code
AIFeynman 2.0 (Udrescu et al. 2020) paper , code
Bayesian Symbolic Regression (Jin et al. 2020) paper , code
Deep Symbolic Regression (Petersen et al. 2020) paper , code
Fast Function Extraction (McConaghy 2011) paper , code
Feature Engineering Automation Tool (La Cava et al. 2017) paper , code
epsilon-Lexicase Selection (La Cava et al. 2016) paper , code
GP-based Gene-pool Optimal Mixing Evolutionary Algorithm (Virgolin et al. 2017) paper , code
gplearn (Stephens) code
Interaction-Transformation Evolutionary Algorithm (de Franca and Aldeia, 2020) paper , code
Multiple Regression GP (Arnaldo et al. 2014) paper , code
Operon (Burlacu et al. 2020) paper , code
Semantic Backpropagation GP (Virgolin et al. 2019) paper , code

Methods Staged for Benchmarking:

PySR (Cranmer 2020) code
PSTree (Zhang 2021) code
RILS-ROLS (Kartelj 2023) code

Contribute

We are actively updating and expanding this benchmark. Want to add your method? See our Contribution Guide.

References

A pre-print of the current version of the benchmark is available: v2.0 was reported in our Neurips 2021 paper:

La Cava, W., Orzechowski, P., Burlacu, B., de França, F. O., Virgolin, M., Jin, Y., Kommenda, M., & Moore, J. H. (2021). Contemporary Symbolic Regression Methods and their Relative Performance. Neurips Track on Datasets and Benchmarks. arXiv, neurips.cc

v1.0 was reported in our GECCO 2018 paper:

Orzechowski, P., La Cava, W., & Moore, J. H. (2018). Where are we now? A large benchmark study of recent symbolic regression methods. GECCO 2018. DOI, Preprint

Contact

William La Cava (@lacava), william dot lacava at childrens dot harvard dot edu

srbench's People

Contributors

Stargazers

Watchers

Forkers

lacava jmmcd yimingpeng alaminzju lkampoli yensteel ying531 tiamat-tech aboisbunon yoshitomo-matsubara arita37 thw1021 remiadon bhugueney milescranmer m33ra leleluan armandocardenasf jmmc91 janopig pele0599 pakamienny devilinchina kgae-cup miqueltriana luoyuanzhen brendenpetersen chengsfsu alessandrosimon leoningel qsarazi nightdr grantdick nisalr fly-back druidowm jessicamegane nnkennard danyalrehman yuanchun-li mathepia zyx921197494 newtongold aminravanbakhsh gaomath wassimtenachi liangzhengyucsu mafaldamalafaia dan255 yskre shj1987 charisathena marcovirgolin everettgrethel zendra123 guoqiang-fu gtc1030 alert-wang autoresearch kahlmeyer94 creatixchu codedk hengzhe-zhang abouter galdeia kartelj crazyboystop alucantonio folivetti minofeel hoolagans zhixing1020 yuzhthu

srbench's Issues

typo in feat_install.sh

Thank you for creating srbench! While running the installation I noticed a typo in feat_install.sh, line 12: https://github.com/cavalab/srbench/blob/master/experiment/methods/src/feat_install.sh#L12

checkotu should be checkout.

Not sure whether this makes a big difference but since I saw it, I thought I would let you know :)

Complexity: 'features' does not exist

https://github.com/EpistasisLab/regression-benchmark/blob/fab59ab05e6d963e8a899ffe02a6f8e2e7e68d70/experiment/complexity.py#L22

Scikit-learn version issue ?

Hi, I am encountering the following ImportError:
ImportError: cannot import name 'HalvingGridSearchCV' from 'sklearn.model_selection'
But I checked and my sckikit-learn version is 0.24.1, as required in the environment...
Should I be using a different version ?
Note : I also encountered errors when running conda env create -f environment.yml, which prevented aifeynman and operon from being installed, but I would believe this is unrelated as the scikit-learn version is correct...

Ground-truth datasets are broken?

Hi!

Thank you for your great work and framework!
I wanted to try the benchmarked methods for the ground-truth datasets (i.e., Feynman and Strogatz datasets) and followed the instructions in README.

Is each of the datasets not in gzip format?

However, the datasets fetched from the pmlb repository look broken. Here is one of the errors I got when running
python analyze.py -results ../results_sym_data -target_noise 0.0 "/data/pmlb/datasets/strogatz*" -sym_data -n_trials 10 -time_limit 9:00 -tuned --local
for Strogatz dataset. (Same errors occurred for Feynman dataset by "/data/pmlb/datasets/feynman_*" as well)

========================================
Evaluating tuned.FEATRegressor on
/data/pmlb/datasets/strogatz_bacres1/strogatz_bacres1.tsv.gz
========================================
compression: gzip
filename: /data/pmlb/datasets/strogatz_bacres1/strogatz_bacres1.tsv.gz
Traceback (most recent call last):
File "evaluate_model.py", line 291, in <module>
**eval_kwargs)
File "evaluate_model.py", line 39, in evaluate_model
features, labels, feature_names = read_file(dataset)
File "/opt/app/srbench/experiment/read_file.py", line 19, in read_file
engine='python')
File "/opt/conda/envs/srbench/lib/python3.7/site-packages/pandas/util/_decorators.py",
line 311, in wrapper
return func(*args, **kwargs)
File "/opt/conda/envs/srbench/lib/python3.7/site-packages/pandas/io/parsers/readers.py",
line 586, in read_csv
return _read(filepath_or_buffer, kwds)
File "/opt/conda/envs/srbench/lib/python3.7/site-packages/pandas/io/parsers/readers.py",
line 482, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/opt/conda/envs/srbench/lib/python3.7/site-packages/pandas/io/parsers/readers.py",
line 811, in __init__
self._engine = self._make_engine(self.engine)
File "/opt/conda/envs/srbench/lib/python3.7/site-packages/pandas/io/parsers/readers.py",
line 1040, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "/opt/conda/envs/srbench/lib/python3.7/site-
packages/pandas/io/parsers/python_parser.py", line 100, in __init__
self._make_reader(self.handles.handle)
File "/opt/conda/envs/srbench/lib/python3.7/site-
packages/pandas/io/parsers/python_parser.py", line 203, in _make_reader
line = f.readline()
File "/opt/conda/envs/srbench/lib/python3.7/gzip.py", line 300, in read1
return self._buffer.read1(size)
File "/opt/conda/envs/srbench/lib/python3.7/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/opt/conda/envs/srbench/lib/python3.7/gzip.py", line 474, in read
if not self._read_gzip_header():
File "/opt/conda/envs/srbench/lib/python3.7/gzip.py", line 422, in _read_gzip_header
raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b've')

I also tried to manually gunzip the file, but the error message still says it's not in gzip format

$ gunzip /data/pmlb/datasets/strogatz_bacres1/strogatz_bacres1.tsv.gz
gzip: /data/pmlb/datasets/strogatz_bacres1/strogatz_bacres1.tsv.gz: not in gzip format

Could you please resolve this issue for both Feynman and Strogatz datasets?
Thank you!

Dockerfile for SRBench

Hi @foolnotion
Here are the docker file (zipped as GitHub doesn't accept Dockerfile here) and steps that fail to install Operon and you requested in #55
Dockerfile.zip

Unzip the dockerfile

unzip Dockerfile.zip

Build docker image

docker build --pull --rm -f "Dockerfile" -t srbench:latest "."

Run docker

docker run --runtime=nvidia --gpus all \
    -it srbench /bin/bash

As suggested by @folivetti
remove the following from environment.yml:

pybind11=2.6.1
tbb-devel=2020.2
then add the following into environment.yml:
taskflow=3.1.0
pybind11=2.6.2

vi environment.yml

In srbench/experiments/methods/src/operon_install.sh remove the line:
git checkout 015d420944a64353a37e0493ae9be74c645b4198

vi ./experiments/methods/src/operon_install.sh

Finish conda setup

conda update conda -y
bash configure.sh
conda activate srbench
bash install.sh

You will see that it fails to install Operon then.

Thank you

add methods

add SR methods for comparison. the following come to mind:

Operon fails to launch from "evaluate_model.py" on Docker image

Hi there,

Trying to use Operon with "evaluate_model.py" on the Docker image, returns the error:

» python evaluate_model.py /data/eq1.tsv -ml sembackpropgp -seed 42 -skip_tuning
(...)
ImportError: libpython3.9.so.1.0: cannot open shared object file: No such file or directory

whereas, for example:
python evaluate_model.py /data/eq1.tsv -ml sembackpropgp -seed 42 -skip_tuning

does launch the regressor.

Any ideas?

PS: Operon also fails to launch on the "192_vineyard" dataset - where other methods do not.

Many thanks!

refactor validation scripts

refactor validation scripts so that each method is validated with the same lines of code.

How do I correctly run the statistical comparison?

I am glad that most of this repository is replicable. However, it appears that the statistical comparison part is unable to correctly deal with the provided experiment result. There are several missing values for FEAT and MRGP in the provided file. As a result, the Wilcoxon signed-rank test cannot function properly during the statistical comparison process. So, how should we approach the problem?

Explore nix as conda alternative

Hi,

Since this issue does not require other changes in this repo except for flake.nix, I took the liberty of working with the dev branch.

This allows nix to be used as an alternative to conda, with a bunch of advantages:

conda lacks sane management of transitive deps leading to conflicts
nix is far more robust and allows complete reproducibility
faster
easy to build the environment (just type nix develop)

easy to generate docker images on the fly

docker run -p 8888:8888 -ti --rm docker.nix-community.org/nixpkgs/nix-flakes nix develop github:cavalab/srbench/dev --no-write-lock-file

flake-enabled frameworks like pyoperon pull their own dependencies automatically (no need to keep adding things to an environment file)
flake.lock files can fix versions/revisions

This is obviously a low priority issue right now, but I've been using it to deploy srbench/operon without conda.
My frustration with conda began with not being able to add gcc/gxx-11.2.0 to the environment.

This issue is meant to track integration of other frameworks with nix. So far I have also integrated FEAT and Ellyn (wip). Other frameworks should be easily integrated as long as they use standard packaging.

There are some aspects that will need attention from other authors:

FEAT:

seems to be incompatible with latest numpy (this usually gets fixed upstream)

>>> import feat
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nix/store/h3gfxkz8a31l60qmpzj15ryq12nqsspm-python3.9-feat_ml-0.5.2/lib/python3.9/site-packages/feat/__init__.py", line 1, in <module>
    from .feat import Feat, FeatRegressor, FeatClassifier
  File "/nix/store/h3gfxkz8a31l60qmpzj15ryq12nqsspm-python3.9-feat_ml-0.5.2/lib/python3.9/site-packages/feat/feat.py", line 12, in <module>
    from .pyfeat import PyFeat
  File "feat/pyfeat.pyx", line 1, in init feat.pyfeat
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

the install script setup.py seems to be really tailored for conda environments, it took some tricks with setting ENV vars to get it to work

Ellyn
- setup.py hardcoded for Conda, does not work with nix (I will try to patch it)

Best,
Bogdan

Request for Instructions on How To run just one of the Experiments - to validate/debug the installation

Hi there,

Would it be possible to add to your instructions
how to exactly run just one of the several Experiments?
Perhaps the simplest, fastest, non-empty, experiment - just for validation?

I can't get docker to finish executing, so I need to debug / validate my installation of 'srbench'.
I'm completely new to this toolchain, so a bit lost.

Many thanks.

Always the same wrong result, across different methods within SRBench, when trying to fit a simple ground truth equation

Hi there,

When attempting to fit the ground truth equation "3x^2 - 2x + 1" between x values of [0.0 to 10], with 60 data points,
on different methods within SRBench,
the result consistently evolves to something close to "0.297677x^2 + 0.951x -0.298461".

Steps to reproduce:

Launch locally "evaluate_model.py" on a file with 60 data points with x values ranging from 0.0 to 10,
for example with Operon but other methods give a similar result:
python evaluate_model.py /pmlb/datasets/eq1/eq1.tsv -ml OperonRegressor -seed 42

Observe that the result consistently, across several methods within SRBench, always evolves to something close to
"0.297677x^2 + 0.951x -0.298461",
rather than the expected
"3x^2 - 2x + 1".

Specifically, other methods either yield the above or "not implemented".
It seems we can rule out some sort of linear normalization going on.
And it's quite enough points for, say, TuringBot to find the correct solution on the same data.

Any ideas?

Many thanks!

PS: Relevant files at this dropbox folder,
http://bit.ly/3kICKKs

measure complexity

we should measure complexity of the final models produced by the methods. Since they all vary quite a bit, this isn't trivial. for the SR methods, we should be able to count the number of nodes (operators and literals) in the solutions.

Wall clock time benchmark

As discussed in #62 with @lacava (and discussed a bit in #24 by others last year), I think a wall clock time benchmark would be a really nice complement to comparing over a fixed number of evaluations.

I think fixing the number of evaluations is only one way of enforcing a level playing ground. One could also fix:

The number of mutations
The number of subtree evaluations (i.e., count the # of total operators evaluated)
The number of FLOPS
The number of copies

or any other way of bottlenecking the internal processes of a search code. I think by measuring against only one of these, it artificially biases a comparison against algorithms which might use more of one of these, for whatever reason.

Some algorithms are sample intensive, while other algorithms do a lot of work between steps. I think that only by comparing algorithms based on # of evaluations, this artificially biases any comparison against sample intensive algorithms.

An algorithm and its implementation are not easily separable, so I would argue that you really need to measure by wall clock time to see the whole picture. Not only is this much more helpful from a user's point of view, but it enables algorithms which are intrinsically designed for parallelism to actually demonstrate their performance increase. The same can be said for other algorithmic sacrifices, like required for rigid data structures, data batching, etc.

Of course, there is no single best solution and every different type of benchmark will provide additional info. So I think this wall clock benchmark should be included with the normal fixed-evaluation benchmark, using a separate set of tuned parameters, which would give a more complete picture of performance.

Finally, I note I have a conflict of interest since PySR/SymbolicRegression.jl are designed for parallelism and fast evaluation, but hopefully my above points are not too influenced by this!

Eager to hear others' thoughts

Is it appropriate to use halving grid search as a hyper-parameter tuning strategy?

As for the default hyper-parameter tuning strategy, I find this benchmark using the halving grid search method. To begin, I admit that the traditional grid search strategy is impractical due to the prohibitively expensive computational cost. However, when we use the halving grid search method and the parameter grid is large, as in the case of XGBoost. The first few rounds of hyper-parameter search will only train on a few data points, and the results might be unreliable. So, is this a truly good method for hyper-parameter tuning, or is this tuning protocol sufficient to persuade reviewers?

Docker environment

Create a docker environment for installing and running srbench. Include an image with releases.

Docker isn't supported by the cluster I'm using, unfortunately. But I think it could be a good way to manage version control going forward.

Also: see #56 for an example starting point

Methods install scripts

Hi,

I wanted to point out some issues with the environment/install scripts:

Bash scripts

The install scripts should not require elevated privileges ("sudo"). anaconda/miniconda is normally installed in an unprivileged location (home folder), so it shouldn't be necessary to do anything as root. This seems to trip some things up in some cases.
Some rudimentary way to check if a script succeeded would be useful. I would also redirect the compile output to a log file.

failed=()

# install all methods
for install_file in $(ls *.sh) ; do
    echo "vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv"
    echo "Running $install_file"
    echo "////////////////////////////////////////////////////////////////////////////////"

    bash $install_file

    if [ $? -gt 0 ]
    then
        failed+=($install_file)
    fi

    echo "////////////////////////////////////////////////////////////////////////////////"
    echo "Finished $install_file"
    echo "^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^"
done
echo "failed: ${failed}"

Running an install script again should probably clean the old src folders.

Conda environment

pkg-config needs to be added to environment.yml, otherwise build scripts relying on it will fail (e.g. GP-GOMEA)
cxxopts is not necessary (operon only needs it for the command-line program, not for the python module)
~~on my ubuntu machine (16.04) , ellyn installs but fails to run~~
nevermind, it's not supposed to be called directly.

My conda info:

$ conda info

     active environment : srbench
    active env location : /home/bogdb/miniconda3/envs/srbench
            shell level : 2
       user config file : /home/bogdb/.condarc
 populated config files : 
          conda version : 4.10.1
    conda-build version : not installed
         python version : 3.9.1.final.0
       virtual packages : __linux=5.11.13=0
                          __glibc=2.23=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /home/bogdb/miniconda3  (writable)
      conda av data dir : /home/bogdb/miniconda3/etc/conda
  conda av metadata url : https://repo.anaconda.com/pkgs/main
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/bogdb/miniconda3/pkgs
                          /home/bogdb/.conda/pkgs
       envs directories : /home/bogdb/miniconda3/envs
                          /home/bogdb/.conda/envs
               platform : linux-64
             user-agent : conda/4.10.1 requests/2.25.1 CPython/3.9.1 Linux/5.11.13 ubuntu/16.04.7 glibc/2.23
                UID:GID : 1001:1001
             netrc file : None
           offline mode : False

bsub/bjobs dependency?

Hi, we are trying to run the tool locally, but are running into a dependency error. We believe we followed the instructions correctly.

If you comment these lines out, we get the following.

Our intuition was we are missing some slurm dependency, but when we installed the client, the issue persisted.

sympy-compatibility of final model strings

At the moment a lot of preprocessing is done to convert the models returned by different methods into a common, sympy-compatible format in experiment/symbolic_utils.py.

I would like to remove this post-processing step and, in the future, require methods to return sympy compatible strings. Steps:

Move centralized model cleaning to the individual methods
Have method developers update their codebases to support sympy return strings

See updated contribution guide

broken links on website

@folivetti i'm noticing some of the links on the website are broken, probably because they are copied straight from the markdown files that have relative paths (e.g., https://epistasislab.github.io/srbench/CONTRIBUTING.md)

Do you have any free time to update them? thank you!

scaling of input data

I have had a quick review of the benchmarking pipeline to better understand how the comparison is performed. During that review I noticed the scaling is always performed while reading the data files using a RobustScaler from sklearn.

https://github.com/EpistasisLab/srbench/blob/1ad633974c9126a8eb6ce936873d2e9b3d40294c/experiment/read_file.py#L32

The actual model is generated in the evaluate model script, which additionally has a parameter scale_x and scale_y that determine whether the input data X and target y should be scaled.

https://github.com/EpistasisLab/srbench/blob/1ad633974c9126a8eb6ce936873d2e9b3d40294c/experiment/evaluate_model.py#L46-L61

This means that if scale_x is set to true the input data is scaled twice when using the benchmarking pipeline. I don't know if this behavior is intended, but I suspect that the RobustScaler is an artifact from previous experimentation and should be removed. Otherwise, although I set the scale_x parameter to false, scaling is performed while reading the data.

competition code suggestions

Adding @janoPig other suggestions here (also ref #131 (comment))

Some suggestions for published code here -> https://github.com/janoPig/srbench/tree/srcomp

fix run on local machine. solve errors
"ValueError: option names {'--ml'} already added"
"ValueError: too many values to unpack (expected 3)"

check

added scripts for running on local computer subit_stage[N]_local.sh METHOD_NAME

fix number running threads per job(it run always default 4). Add new parameter -n_threads to subit_stage[N].sh script

The submit scripts are there for reproducibility, i.e. they are fixed calls to python analyze.py [args]. analyze.py can be configured for both of these purposes directly.

added DataFrame=False parameter to HROCH regressor for correct run.

test for input data featureselection.csv. The data was created using the function '0.11x1^3 + 0.91x2x3 + 0.68x4x5 + 0.26x6^2x7 + 0.13x8x9x10' and feature_absence_score was evaluated for '0.11x1^3 + 0.91x3x5 + 0.68x7x9 + 0.26x11^2x13 + 0.13x15x17x19'

This should be checked with the code base release on the srcomp branch.

send PRs to the dev branch

Just a quick note that new PRs should be submitted to the dev branch. The contributing guidelines have been updated accordingly

Feature renaming issue

Hi,

I believe you have an issue at this line:

srbench/experiment/symbolic_utils.py

Line 158 in a7875bf

for i,f in enumerate(features):

Here are the steps to reproduce it:

instance: feynman_I_11_19
candidate model: x0*x3 + x1*x4 + x2*x5
true model: x1*y1+x2*y2+x3*y3
yaml features order: x1, x2, x3, y1, y2, y3 == x0, x1, x2, x3, x4, x5

replacing feature 0 with x1
x1*x3+x1*x4+x2*x5

replacing feature 1 with x2
x2*x3+x2*x4+x2*x5

replacing feature 2 with x3
x3*x3+x3*x4+x3*x5

replacing feature 3 with y1
y1*y1+y1*x4+y1*x5

replacing feature 4 with y2
y1*y1+y1*y2+y1*x5

replacing feature 5 with y3
y1*y1+y1*y2+y1*y3

So, the candidate model x0*x3 + x1*x4 + x2*x5 is renamed to y1*y1+y1*y2+y1*y3, which is obviously not the same thing.

The possible fix would be to perform renaming backwards.
for i,f in enumerate(features): --> for i,f in reversed(list(enumerate(features))):

Now, the steps will be:
replacing feature 5 with y3
x0*x3+x1*x4+x2*y3

replacing feature 4 with y2
x0*x3+x1*y2+x2*y3

replacing feature 3 with y1
x0*y1+x1*y2+x2*y3

replacing feature 2 with x3
x0*y1+x1*y2+x3*y3

replacing feature 1 with x2
x0*y1+x2*y2+x3*y3

replacing feature 0 with x1
x1*y1+x2*y2+x3*y3

Best regards,
Aleksandar

standing leaderboard

keep a standing leaderboard of methods in the documentation. as new results/methods are developed, add them to the leaderboard.

Operon install script breaks version control

@foolnotion the operon install script now installs a bunch of packages with no version control. We need to fix them to stable versions.
It's causing the current CI build to fail and is probably going to cause more down the road. Can you add git checkout {version} to each clone, or use packaged versions of the dependencies in conda instead?

Use `micromamba` instead of `conda`

One other suggestion: use micromamba instead of conda (mamba is a C++ rewrite of conda - it uses the same package servers but is much faster). The docker image mambaorg/micromamba which has micromamba built-in seems to work if all conda calls are replaced with micromamba. It's much faster for me!

Originally posted by @MilesCranmer in #59 (comment)

evaluate_model.py: ValueError when number of samples > 10000

In this line of code I'm getting the error ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all():

https://github.com/EpistasisLab/regression-benchmark/blob/967c66beec7e79ef53d6a3bb133190887224cfc1/experiment/evaluate_model.py#L36

I suggest changing it to:

sample_idx = np.random.choice(np.arange(len(labels)), size=n_samples)

How to get the summarized experimental results?

I have executed the experiment script and it outputted several JSON files. However, an issue is that it seems that there is no script that can aggregate these results into a single file. Therefore, my question is, is there any additional analysis tool affiliate with this benchmark package?

add 'seed' to contributing as attribute to set

https://github.com/EpistasisLab/regression-benchmark/blob/28e2310264d53ac387c4fb9c5576c18c30038ab9/experiment/evaluate_model.py#L24

Is it possible to provide complete experimental results?

This project is awesome and I believe it can greatly promote the advancement of Symbolic Regression domain. However, a critical issue is that we don't have enough computational resources to repeat the full experiment. Therefore, is it possible to provide complete experimental results for us to directly use in our paper? I will be grateful if such results can be provided.

Using X_train, X_test instead of X_train_scaled, X_test_scaled

I think at this line you meant to be using X_train_scaled and X_test_scaled instead of X_train, X_test (X_test_scaled was created but never used).

https://github.com/EpistasisLab/regression-benchmark/blob/dfec2511daff6091dd011d5c0c43141a0dfa9895/experiment/evaluate_model.py#L131

Suggestion: option to not use standardization

I've noticed that you're standardizing the predictors and target variable before fitting the regression models:

https://github.com/EpistasisLab/regression-benchmark/blob/967c66beec7e79ef53d6a3bb133190887224cfc1/experiment/evaluate_model.py#L47-L49

In my experience my algorithm works best without any normalization. I also tested a few datasets with KernelRidge with and without standardization and it seems to also work better without this transformation. Maybe add an option to use the standardized data set or not.

Discussion of methods parameters

Hi,

I am opening this issue to start a discussion about the settings and computational limits of the methods. I added some of the email comments in there, feel free to edit!

Evaluation budget

There will be a fixed evaluation budget that each method can expend in its own way. Suggested budget: 500,000 evaluations.

Things to consider:

implementations take different amounts of time (depending on tree size as well)
some methods use local search
some methods use minibatch sampling
some methods aren't EC-based

It seems reasonable to take into account local search iterations and adjust for minibatch sampling.

It might also be interesting to monitor the load average on the cluster, if you can isolate it on a per-method basis, maybe combine with other measurements like memory usage. This would give a more general measure of each method's computational requirements.

Hyper-parameters

six total combos, for consistency with the first benchmarks paper and to ensure reasonable computational costs.

Model complexity

Not much to comment here, maybe just a minor nit pick: I noticed some methods also use the "AQ" (analytical quotient) symbol, which can be decomposed into basic math operations: aq(a,b) = a / sqrt(1 + b^2). What is then the complexity of the AQ symbol?

instructions to clone pmlb does not work

the command git clone https://github.com/EpistasisLab/pmlb/tree/feynman does not work, it says

fatal: repository 'https://github.com/EpistasisLab/pmlb/tree/feynman/' not found

but using git clone https://github.com/EpistasisLab/pmlb.git seems to work ok. The fetch command downloads 299 files in total.

PySR parameters

@MilesCranmer can you specify the set of 6 hyperparameters you would like to use for benchmarking PySR? Going to start the runs. Hoping to have these by the end of tomorrow. They should match the original constraints:

max evals = 500k
6 total combos
finish within 24 hrs with hyperparameter tuning, within 8 hours with single call to fit

Currently hyperparameters are set small for testing, but evaluate_model.py now recognizes and shrinks some PySR parameters during testing. So the desired set should specified directly in the model file. (In the updated version for the competition, you can specify test_params explicitly).

This is the current version:

hyper_params = [
    {
        "annealing": (True,), # (True, False)
        "denoise": (True,), # (True, False)
        "binary_operators": (["+", "-", "*", "/"],),
        "unary_operators": (
            [],
            # poly_basis,
            # poly_basis + trig_basis,
            # poly_basis + exp_basis,
        ),
        "populations": (20,), # (40, 80),
        "alpha": (1.0,),
        "model_selection": ("best",)
        # "alpha": (0.01, 0.1, 1.0, 10.0),
        # "model_selection": ("accuracy", "best"),
    }
]

Problems reproducing SRBench results [most methods fail or some runs go to an end.]

Hello,
So I followed the conda installation guidelines using Anaconda (Anaconda3-2021.11-Linux-x86_64)
Below is the conda env description.
I use a forked version of srbench, unfortunately I am not able to push it, I don't know if it comes from me or the initial repo as I get the following error though I did not add large files to it: "Account responsible for LFS bandwidth should purchase more data packs to restore access".
Maybe you could at least store the feather results files somewhere as they are not heavy?

Thanks a lot.

active environment : srbench
active env location : /private/home/pakamienny/anaconda3/envs/srbench
shell level : 3
user config file : /private/home/pakamienny/.condarc
populated config files : /private/home/pakamienny/.condarc
conda version : 4.10.3
conda-build version : 3.21.5
python version : 3.9.7.final.0
virtual packages : __cuda=11.4=0
__linux=5.4.0=0
__glibc=2.31=0
__unix=0=0
__archspec=1=x86_64
base environment : /private/home/pakamienny/anaconda3 (writable)
conda av data dir : /private/home/pakamienny/anaconda3/etc/conda
conda av metadata url : None
channel URLs : https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /private/home/pakamienny/anaconda3/pkgs
/private/home/pakamienny/.conda/pkgs
envs directories : /private/home/pakamienny/anaconda3/envs
/private/home/pakamienny/.conda/envs
platform : linux-64
user-agent : conda/4.10.3 requests/2.26.0 CPython/3.9.7 Linux/5.4.0-81-generic ubuntu/20.04.2 glibc/2.31
UID:GID : 1185300944:1185300944
netrc file : None
offline mode : False

Bingo: `bingo.bingocpp.AGraph` has no attribute `_update`

@nightdr your submission didn't pass the tests on merge. can you figure out what is going on:

evaluate_model.py:146: in evaluate_model
    results['symbolic_model'] = model(est, X_train_scaled)
methods/Bingo/regressor.py:68: in model
    model_str = str(est.get_best_individual())
/usr/share/miniconda3/envs/srcomp-Bingo/lib/python3.9/site-packages/bingo/symbolic_regression/symbolic_regressor.py:225: in get_best_individual
    return self.best_estimator_.get_best_individual()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = SymbolicRegressor(clo_threshold=1e-05, crossover_prob=0.3, max_time=350.0,
                  mutation_prob=0.45,
     ...', 'exp', 'log',
                             'sqrt'],
                  population_size=2500, use_simplification=True)

    def get_best_individual(self):
        if self.best_ind is None:
            print("Best individual is None, setting to X_0")
            from bingo.symbolic_regression import AGraph
            self.best_ind = AGraph()
            self.best_ind.command_array = np.array([[0, 0, 0]], dtype=int)  # X_0
>           self.best_ind._update()
E           AttributeError: 'bingo.bingocpp.AGraph' object has no attribute '_update'

/usr/share/miniconda3/envs/srcomp-Bingo/lib/python3.9/site-packages/bingo/symbolic_regression/symbolic_regressor.py:153: AttributeError

link:
https://github.com/cavalab/srbench/runs/6446882859?check_suite_focus=true#step:9:111

Originally posted by @lacava in #123 (comment)

Complexity should refer to 'RandomForest', not 'RF'.

https://github.com/EpistasisLab/regression-benchmark/blob/fab59ab05e6d963e8a899ffe02a6f8e2e7e68d70/experiment/complexity.py#L17

Invalid parameter alpha for estimator PySRRegressor(equations=0.0).

I encountered the following error while running the PySR Regressor. It appears that PySR has undergone some changes to its parameters, and the alpha value is no longer supported. @MilesCranmer, could you please take a look at this? Thank you!

  File "/opt/conda/envs/srbench/lib/python3.7/site-packages/joblib/parallel.py", line 819, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/opt/conda/envs/srbench/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/opt/conda/envs/srbench/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 597, in __init__
    self.results = batch()
  File "/opt/conda/envs/srbench/lib/python3.7/site-packages/joblib/parallel.py", line 289, in __call__
    for func, args, kwargs in self.items]
  File "/opt/conda/envs/srbench/lib/python3.7/site-packages/joblib/parallel.py", line 289, in <listcomp>
    for func, args, kwargs in self.items]
  File "/opt/conda/envs/srbench/lib/python3.7/site-packages/sklearn/utils/fixes.py", line 216, in __call__
    return self.function(*args, **kwargs)
  File "/opt/conda/envs/srbench/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 668, in _fit_and_score
    estimator = estimator.set_params(**cloned_parameters)
  File "/opt/conda/envs/srbench/lib/python3.7/site-packages/sklearn/base.py", line 248, in set_params
    "with `estimator.get_params().keys()`." % (key, self)
ValueError: Invalid parameter alpha for estimator PySRRegressor(equations=0.0). Check the list of available parameters with `estimator.get_params().keys()`.

Quick start help needed - surely useful for other new users

Hi there,
Some quick start help needed:

After installing, how to run the benchmarks on user-supplied data?
I'm struggling getting this work - to make sure there's nothing wrong with my SRBench install:

Could please indicate some simple steps on how to achieve this?

Many thanks!

this is terrible

I think this is terrible.
def model(est, X):
mapping = {'x_'+str(i):k for i,k in enumerate(X.columns)}
new_model = est.model_
for k,v in mapping.items():
new_model = new_model.replace(k,v)

replace x_1? if exist x_11?

simple this is solution:
for k, v in reversed(mapping.items()):

Are these the same seeds used when running the competition?

Hi,

I think these seeds are the ones used in srbench, not those used for the competition.

srbench/experiment/seeds.py

Line 2 in 2faf1fe

SEEDS = [

Have you considered including TensorGP?

Have you considered including "TensorGP" in your collection of benchmarked methods?
Seems highly relevant, however I don't have any relation to the author. Links:

TensorGP:
https://github.com/AwardOfSky/TensorGP

Symbolic Regression example:
https://github.com/AwardOfSky/TensorGP/blob/dev/example_symreg.py

To reproduce results for the ground-truth datasets

Thank @lacava for helping me resolve the dataset issue last time.

Based on the command in README, I tried to reproduce the results reported in Figure 3 of your recently accepted paper for both Strogatz and Feynman datasets, and found some concerns/questions.

1. How should we see the produced results?

For Strogatz dataset, I ran
python analyze.py -results ../results_sym_data -target_noise 0.0 "/path/to/pmlb/datasets/strogatz*" -sym_data -n_trials 10 -time_limit 9:00 -tuned --local

Following that, there were many json files produced. In strogatz_bacres1_tuned.FE_AFPRegressor_15795.json (AFP_FE), I found the following values:

>>> with open('strogatz_bacres1_tuned.FE_AFPRegressor_15795.json', 'r') as fp:
...    fe_afp_result = json.load(fp)
...
>>> fe_afp_result['r2_test']
0.9984022413915545

>>> fe_afp_result['symbolic_model']
'(log(((((((-0.567/cos(1.486))^2)/(x_1+(x_0/exp((cos(((0.069^2)*(x_0*
(x_0+cos((cos(((-0.065^2)*(x_0*x_0)))*x_1))))))^3)))))^3)^3)+exp(sin(log(((sqrt(|
(cos(((-0.064^2)*(x_0*x_0)))^3)|)/((0.286+
(x_1*0.017))^2))^2))))))*cos((log(x_0)/(log((x_0^3))-x_1))))'

>>> fe_afp_result['true_model']
' 20 - x - \\frac{x \\cdot y}{1+0.5 \\cdot x^2}$'

I think a) r2_test is called Accuracy in the paper, b) symbolic_model means the symbolic expression as a result of training on strogatz_bacres1 c) whereas the true symbolic expression is associated with true_model.

Is my understanding correct for all a), b), c)?
Also, is the above symbolic_model expected as output of AFP_FE for strogatz_bacres1? Since the method is the 2nd best for ground-truth datasets shown in Fig. 3, and I expected a clearer expression.

2. How is the solution rate derived?

Could you please clarify how the solution rate in Fig. 3 is derived?
Did you manually compare the produced expression symbolic_model to the true expression true_model and consider it solved only when the produced expression exactly matches the true one?

Or if it is fully based on Definition 4.1 (Symbolic Solution). in the paper, what values of a and b are used in Fig. 3?

3. Operon build failed

On Ubuntu 18.04 and 20.04, operon build with your provided install.sh failed due to version discrepancy between libceres-dev (expects Eigen 3.4.0) and libeigen3-dev (the latest available version is 3.3.7). I even tried to build Eigen v3.4.0 from source, but still the build failed.
Do you remember how you setup the dependencies for Operon?

4. Commands to reproduce the results in Fig. 3

Could you provide the exact commands to reproduce the results in Fig. 3?
For Strogatz datasets with target noise = 0.0, I think the following command was used
python analyze.py -results ../results_sym_data -target_noise 0.0 "/path/to/pmlb/datasets/strogatz*" -sym_data -n_trials 10 -time_limit 9:00 -tuned

but how about Feynman datasets?
Also, how should we determine -time_limit ?

5. Computing resource and estimated runtime

To estimate how long it will take to reproduce the results in Fig. 3, could you share the detail of computing resource used in the paper e.g., how many machines of 24-28 core Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz chipsets and 250 GB of RAM are used and (rough) estimated runtime to get the results if you remember?

On a machine with 4-core CPU, 128GB RAM and 2 GPUs, even strogatz_bacres1 (400 samples) is taking more than a day to complete python analyze.py -results ../results_sym_data -target_noise 0.0 /path/to/pmlb/datasets/strogatz_bacres1/strogatz_bacres1.tsv.gz -sym_data -n_trials 10 -time_limit 9:00 -tuned --local

Sorry for many questions, but your responses would be really appreciated and helpful for using this great work in my research.

Thank you!

How to compare the ground-truth expressions with the sympy string

Hi,

Recently I am implementing my SR algorithm for competition, everything is fine but I still have a question for this competition detail eventhough I have read the Competition Guidlines.

Say a gound-truth model is -1.6+x**2 and my model produces a string log(|-0.2|)+x**2, then according to the "regressor guide" in Competition Guidelines, I should change this string to the sympy compatible string (here is removing '|' since sympy doesn't recognize it, so did in "submission/feat-example/regressor.py"):
sympy_str = est.model_.replace("|", "")

When I do this, the model string would be converted to log(-0.2)+x**2, this is definely not the same expression as my model since if I run f = sympy.symplify("log(-0.2)+x**2"), then sympy_str would become "x**2 - 1.6094379124341 + I*pi".

So this result is totally different from the gound-truth model -1.6+x**2. My question is: will "symplify" be used for model comparison during the competition? If used, how should I deal with this problem?

Support for a better result report

Now, it seems that the reports generated by those report scripts are too crude. Some important indicators in the AutoML domain, such as significance test and mean accuracy among all tasks, are not presented. In my research domain, there is a benchmark project named "srbench" (https://github.com/EpistasisLab/srbench). It provides a good example for showing above-mentioned metrics. Consequently, I hope this benchmark project can also implement similar features.

Is there any future plan for supporting classification benchmarks?

In 2014, a paper published in JMLR reported the results of more than 100+ classification algorithms on numerous classification benchmark datasets [1].
However, it seems that such a paper does not consider the genetic programming based methods, such as M4GP [2]. Consequently, is it possible to develop a classification benchmark to further boost the advancement of genetic programming and even the machine learning domain?

[1]. Fernández-Delgado M, Cernadas E, Barro S, et al. Do we need hundreds of classifiers to solve real world classification problems?[J]. The journal of machine learning research, 2014, 15(1): 3133-3181.
[2]. La Cava W, Silva S, Danai K, et al. Multidimensional genetic programming for multiclass classification[J]. Swarm and evolutionary computation, 2019, 44: 260-272.

add local test script for competition submissions

add a local script mirroring the CI process for users to test their submissions locally.
Came up in #81

ImportError: cannot import name 'itea_srbench' from 'ITEA'

I attempted to run ITEA within a Docker image; however, I encountered an error. I suspect that the issue arises from recent refactoring in ITEA. @folivetti It would be greatly appreciated if you could update the relevant code in srbench or adjust the Git version in the ITEA installation script.

Traceback (most recent call last):
  File "ITEARegressor.py", line 1, in <module>
    from ITEA import itea_srbench as itea
ImportError: cannot import name 'itea_srbench' from 'ITEA' (unknown location)

cavalab / srbench Goto Github PK

srbench's Introduction

SRBench: A Living Benchmark for Symbolic Regression

Benchmarked Methods

Contribute

References

Contact

srbench's People

Contributors

Stargazers

Watchers

Forkers

srbench's Issues

Is each of the datasets not in gzip format?

Evaluation budget

Things to consider:

Hyper-parameters

Model complexity

1. How should we see the produced results?

2. How is the solution rate derived?

3. Operon build failed

4. Commands to reproduce the results in Fig. 3

5. Computing resource and estimated runtime

Recommend Projects

Recommend Topics

Recommend Org