Giter Site home page Giter Site logo

cavalab / srbench Goto Github PK

View Code? Open in Web Editor NEW
201.0 201.0 74.0 75.42 MB

A living benchmark framework for symbolic regression

Home Page: https://cavalab.org/srbench/

License: GNU General Public License v3.0

Python 48.19% C++ 16.55% Makefile 0.35% Shell 3.99% Jupyter Notebook 15.54% R 0.44% Dockerfile 0.44% C 14.49%

srbench's People

Contributors

aboisbunon avatar athril avatar folivetti avatar foolnotion avatar hengzhe-zhang avatar jmmcd avatar kahlmeyer94 avatar kartelj avatar lacava avatar marcovirgolin avatar milescranmer avatar ying531 avatar yoshitomo-matsubara avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

srbench's Issues

this is terrible

I think this is terrible.
def model(est, X):
mapping = {'x_'+str(i):k for i,k in enumerate(X.columns)}
new_model = est.model_
for k,v in mapping.items():
new_model = new_model.replace(k,v)

replace x_1? if exist x_11?

simple this is solution:
for k, v in reversed(mapping.items()):

PySR parameters

@MilesCranmer can you specify the set of 6 hyperparameters you would like to use for benchmarking PySR? Going to start the runs. Hoping to have these by the end of tomorrow. They should match the original constraints:

  • max evals = 500k
  • 6 total combos
  • finish within 24 hrs with hyperparameter tuning, within 8 hours with single call to fit

Currently hyperparameters are set small for testing, but evaluate_model.py now recognizes and shrinks some PySR parameters during testing. So the desired set should specified directly in the model file. (In the updated version for the competition, you can specify test_params explicitly).

This is the current version:

hyper_params = [
    {
        "annealing": (True,), # (True, False)
        "denoise": (True,), # (True, False)
        "binary_operators": (["+", "-", "*", "/"],),
        "unary_operators": (
            [],
            # poly_basis,
            # poly_basis + trig_basis,
            # poly_basis + exp_basis,
        ),
        "populations": (20,), # (40, 80),
        "alpha": (1.0,),
        "model_selection": ("best",)
        # "alpha": (0.01, 0.1, 1.0, 10.0),
        # "model_selection": ("accuracy", "best"),
    }
]

Scikit-learn version issue ?

Hi, I am encountering the following ImportError:
ImportError: cannot import name 'HalvingGridSearchCV' from 'sklearn.model_selection'
But I checked and my sckikit-learn version is 0.24.1, as required in the environment...
Should I be using a different version ?
Note : I also encountered errors when running conda env create -f environment.yml, which prevented aifeynman and operon from being installed, but I would believe this is unrelated as the scikit-learn version is correct...

scaling of input data

I have had a quick review of the benchmarking pipeline to better understand how the comparison is performed. During that review I noticed the scaling is always performed while reading the data files using a RobustScaler from sklearn.

https://github.com/EpistasisLab/srbench/blob/1ad633974c9126a8eb6ce936873d2e9b3d40294c/experiment/read_file.py#L32

The actual model is generated in the evaluate model script, which additionally has a parameter scale_x and scale_y that determine whether the input data X and target y should be scaled.

https://github.com/EpistasisLab/srbench/blob/1ad633974c9126a8eb6ce936873d2e9b3d40294c/experiment/evaluate_model.py#L46-L61

This means that if scale_x is set to true the input data is scaled twice when using the benchmarking pipeline. I don't know if this behavior is intended, but I suspect that the RobustScaler is an artifact from previous experimentation and should be removed. Otherwise, although I set the scale_x parameter to false, scaling is performed while reading the data.

ImportError: cannot import name 'itea_srbench' from 'ITEA'

I attempted to run ITEA within a Docker image; however, I encountered an error. I suspect that the issue arises from recent refactoring in ITEA. @folivetti It would be greatly appreciated if you could update the relevant code in srbench or adjust the Git version in the ITEA installation script.

Traceback (most recent call last):
  File "ITEARegressor.py", line 1, in <module>
    from ITEA import itea_srbench as itea
ImportError: cannot import name 'itea_srbench' from 'ITEA' (unknown location)

Methods install scripts

Hi,

I wanted to point out some issues with the environment/install scripts:

Bash scripts

  1. The install scripts should not require elevated privileges ("sudo"). anaconda/miniconda is normally installed in an unprivileged location (home folder), so it shouldn't be necessary to do anything as root. This seems to trip some things up in some cases.

  2. Some rudimentary way to check if a script succeeded would be useful. I would also redirect the compile output to a log file.

failed=()

# install all methods
for install_file in $(ls *.sh) ; do
    echo "vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv"
    echo "Running $install_file"
    echo "////////////////////////////////////////////////////////////////////////////////"

    bash $install_file

    if [ $? -gt 0 ]
    then
        failed+=($install_file)
    fi

    echo "////////////////////////////////////////////////////////////////////////////////"
    echo "Finished $install_file"
    echo "^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^"
done
echo "failed: ${failed}"
  1. Running an install script again should probably clean the old src folders.

Conda environment

  • pkg-config needs to be added to environment.yml, otherwise build scripts relying on it will fail (e.g. GP-GOMEA)

  • cxxopts is not necessary (operon only needs it for the command-line program, not for the python module)

  • on my ubuntu machine (16.04) , ellyn installs but fails to run
    nevermind, it's not supposed to be called directly.

My conda info:

$ conda info

     active environment : srbench
    active env location : /home/bogdb/miniconda3/envs/srbench
            shell level : 2
       user config file : /home/bogdb/.condarc
 populated config files : 
          conda version : 4.10.1
    conda-build version : not installed
         python version : 3.9.1.final.0
       virtual packages : __linux=5.11.13=0
                          __glibc=2.23=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /home/bogdb/miniconda3  (writable)
      conda av data dir : /home/bogdb/miniconda3/etc/conda
  conda av metadata url : https://repo.anaconda.com/pkgs/main
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/bogdb/miniconda3/pkgs
                          /home/bogdb/.conda/pkgs
       envs directories : /home/bogdb/miniconda3/envs
                          /home/bogdb/.conda/envs
               platform : linux-64
             user-agent : conda/4.10.1 requests/2.25.1 CPython/3.9.1 Linux/5.11.13 ubuntu/16.04.7 glibc/2.23
                UID:GID : 1001:1001
             netrc file : None
           offline mode : False

Is there any future plan for supporting classification benchmarks?

In 2014, a paper published in JMLR reported the results of more than 100+ classification algorithms on numerous classification benchmark datasets [1].
However, it seems that such a paper does not consider the genetic programming based methods, such as M4GP [2]. Consequently, is it possible to develop a classification benchmark to further boost the advancement of genetic programming and even the machine learning domain?

[1]. Fernández-Delgado M, Cernadas E, Barro S, et al. Do we need hundreds of classifiers to solve real world classification problems?[J]. The journal of machine learning research, 2014, 15(1): 3133-3181.
[2]. La Cava W, Silva S, Danai K, et al. Multidimensional genetic programming for multiclass classification[J]. Swarm and evolutionary computation, 2019, 44: 260-272.

sympy-compatibility of final model strings

At the moment a lot of preprocessing is done to convert the models returned by different methods into a common, sympy-compatible format in experiment/symbolic_utils.py.

I would like to remove this post-processing step and, in the future, require methods to return sympy compatible strings. Steps:

  1. Move centralized model cleaning to the individual methods
  2. Have method developers update their codebases to support sympy return strings

See updated contribution guide

Operon install script breaks version control

@foolnotion the operon install script now installs a bunch of packages with no version control. We need to fix them to stable versions.
It's causing the current CI build to fail and is probably going to cause more down the road. Can you add git checkout {version} to each clone, or use packaged versions of the dependencies in conda instead?

Suggestion: option to not use standardization

I've noticed that you're standardizing the predictors and target variable before fitting the regression models:

https://github.com/EpistasisLab/regression-benchmark/blob/967c66beec7e79ef53d6a3bb133190887224cfc1/experiment/evaluate_model.py#L47-L49

In my experience my algorithm works best without any normalization. I also tested a few datasets with KernelRidge with and without standardization and it seems to also work better without this transformation. Maybe add an option to use the standardized data set or not.

How to get the summarized experimental results?

I have executed the experiment script and it outputted several JSON files. However, an issue is that it seems that there is no script that can aggregate these results into a single file. Therefore, my question is, is there any additional analysis tool affiliate with this benchmark package?

Quick start help needed - surely useful for other new users

Hi there,
Some quick start help needed:

After installing, how to run the benchmarks on user-supplied data?
I'm struggling getting this work - to make sure there's nothing wrong with my SRBench install:

  • Could please indicate some simple steps on how to achieve this?

Many thanks!

broken links on website

@folivetti i'm noticing some of the links on the website are broken, probably because they are copied straight from the markdown files that have relative paths (e.g., https://epistasislab.github.io/srbench/CONTRIBUTING.md)

Do you have any free time to update them? thank you!

instructions to clone pmlb does not work

the command git clone https://github.com/EpistasisLab/pmlb/tree/feynman does not work, it says

fatal: repository 'https://github.com/EpistasisLab/pmlb/tree/feynman/' not found

but using git clone https://github.com/EpistasisLab/pmlb.git seems to work ok. The fetch command downloads 299 files in total.

Use `micromamba` instead of `conda`

One other suggestion: use micromamba instead of conda (mamba is a C++ rewrite of conda - it uses the same package servers but is much faster). The docker image mambaorg/micromamba which has micromamba built-in seems to work if all conda calls are replaced with micromamba. It's much faster for me!

Originally posted by @MilesCranmer in #59 (comment)

competition code suggestions

Adding @janoPig other suggestions here (also ref #131 (comment))

Some suggestions for published code here -> https://github.com/janoPig/srbench/tree/srcomp

  • fix run on local machine. solve errors
    "ValueError: option names {'--ml'} already added"
    "ValueError: too many values to unpack (expected 3)"
  • check
  • added scripts for running on local computer subit_stage[N]_local.sh METHOD_NAME

  • fix number running threads per job(it run always default 4). Add new parameter -n_threads to subit_stage[N].sh script

The submit scripts are there for reproducibility, i.e. they are fixed calls to python analyze.py [args]. analyze.py can be configured for both of these purposes directly.

  • added DataFrame=False parameter to HROCH regressor for correct run.

  • test for input data featureselection.csv. The data was created using the function '0.11x1^3 + 0.91x2x3 + 0.68x4x5 + 0.26x6^2x7 + 0.13x8x9x10' and feature_absence_score was evaluated for '0.11x1^3 + 0.91x3x5 + 0.68x7x9 + 0.26x11^2x13 + 0.13x15x17x19'

  • This should be checked with the code base release on the srcomp branch.

Is it possible to provide complete experimental results?

This project is awesome and I believe it can greatly promote the advancement of Symbolic Regression domain. However, a critical issue is that we don't have enough computational resources to repeat the full experiment. Therefore, is it possible to provide complete experimental results for us to directly use in our paper? I will be grateful if such results can be provided.

standing leaderboard

keep a standing leaderboard of methods in the documentation. as new results/methods are developed, add them to the leaderboard.

measure complexity

we should measure complexity of the final models produced by the methods. Since they all vary quite a bit, this isn't trivial. for the SR methods, we should be able to count the number of nodes (operators and literals) in the solutions.

Ground-truth datasets are broken?

Hi!

Thank you for your great work and framework!
I wanted to try the benchmarked methods for the ground-truth datasets (i.e., Feynman and Strogatz datasets) and followed the instructions in README.

Is each of the datasets not in gzip format?

However, the datasets fetched from the pmlb repository look broken. Here is one of the errors I got when running
python analyze.py -results ../results_sym_data -target_noise 0.0 "/data/pmlb/datasets/strogatz*" -sym_data -n_trials 10 -time_limit 9:00 -tuned --local
for Strogatz dataset. (Same errors occurred for Feynman dataset by "/data/pmlb/datasets/feynman_*" as well)

========================================
Evaluating tuned.FEATRegressor on
/data/pmlb/datasets/strogatz_bacres1/strogatz_bacres1.tsv.gz
========================================
compression: gzip
filename: /data/pmlb/datasets/strogatz_bacres1/strogatz_bacres1.tsv.gz
Traceback (most recent call last):
File "evaluate_model.py", line 291, in <module>
**eval_kwargs)
File "evaluate_model.py", line 39, in evaluate_model
features, labels, feature_names = read_file(dataset)
File "/opt/app/srbench/experiment/read_file.py", line 19, in read_file
engine='python')
File "/opt/conda/envs/srbench/lib/python3.7/site-packages/pandas/util/_decorators.py",
line 311, in wrapper
return func(*args, **kwargs)
File "/opt/conda/envs/srbench/lib/python3.7/site-packages/pandas/io/parsers/readers.py",
line 586, in read_csv
return _read(filepath_or_buffer, kwds)
File "/opt/conda/envs/srbench/lib/python3.7/site-packages/pandas/io/parsers/readers.py",
line 482, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/opt/conda/envs/srbench/lib/python3.7/site-packages/pandas/io/parsers/readers.py",
line 811, in __init__
self._engine = self._make_engine(self.engine)
File "/opt/conda/envs/srbench/lib/python3.7/site-packages/pandas/io/parsers/readers.py",
line 1040, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "/opt/conda/envs/srbench/lib/python3.7/site-
packages/pandas/io/parsers/python_parser.py", line 100, in __init__
self._make_reader(self.handles.handle)
File "/opt/conda/envs/srbench/lib/python3.7/site-
packages/pandas/io/parsers/python_parser.py", line 203, in _make_reader
line = f.readline()
File "/opt/conda/envs/srbench/lib/python3.7/gzip.py", line 300, in read1
return self._buffer.read1(size)
File "/opt/conda/envs/srbench/lib/python3.7/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/opt/conda/envs/srbench/lib/python3.7/gzip.py", line 474, in read
if not self._read_gzip_header():
File "/opt/conda/envs/srbench/lib/python3.7/gzip.py", line 422, in _read_gzip_header
raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b've')

I also tried to manually gunzip the file, but the error message still says it's not in gzip format

$ gunzip /data/pmlb/datasets/strogatz_bacres1/strogatz_bacres1.tsv.gz
gzip: /data/pmlb/datasets/strogatz_bacres1/strogatz_bacres1.tsv.gz: not in gzip format

Could you please resolve this issue for both Feynman and Strogatz datasets?
Thank you!

Problems reproducing SRBench results [most methods fail or some runs go to an end.]

Hello,
So I followed the conda installation guidelines using Anaconda (Anaconda3-2021.11-Linux-x86_64)
Below is the conda env description.
I use a forked version of srbench, unfortunately I am not able to push it, I don't know if it comes from me or the initial repo as I get the following error though I did not add large files to it: "Account responsible for LFS bandwidth should purchase more data packs to restore access".
Maybe you could at least store the feather results files somewhere as they are not heavy?

Thanks a lot.

active environment : srbench
active env location : /private/home/pakamienny/anaconda3/envs/srbench
shell level : 3
user config file : /private/home/pakamienny/.condarc
populated config files : /private/home/pakamienny/.condarc
conda version : 4.10.3
conda-build version : 3.21.5
python version : 3.9.7.final.0
virtual packages : __cuda=11.4=0
__linux=5.4.0=0
__glibc=2.31=0
__unix=0=0
__archspec=1=x86_64
base environment : /private/home/pakamienny/anaconda3 (writable)
conda av data dir : /private/home/pakamienny/anaconda3/etc/conda
conda av metadata url : None
channel URLs : https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /private/home/pakamienny/anaconda3/pkgs
/private/home/pakamienny/.conda/pkgs
envs directories : /private/home/pakamienny/anaconda3/envs
/private/home/pakamienny/.conda/envs
platform : linux-64
user-agent : conda/4.10.3 requests/2.26.0 CPython/3.9.7 Linux/5.4.0-81-generic ubuntu/20.04.2 glibc/2.31
UID:GID : 1185300944:1185300944
netrc file : None
offline mode : False

bsub/bjobs dependency?

Hi, we are trying to run the tool locally, but are running into a dependency error. We believe we followed the instructions correctly.

image

If you comment these lines out, we get the following.

image

Our intuition was we are missing some slurm dependency, but when we installed the client, the issue persisted.

Docker environment

Create a docker environment for installing and running srbench. Include an image with releases.

Docker isn't supported by the cluster I'm using, unfortunately. But I think it could be a good way to manage version control going forward.

Also: see #56 for an example starting point

Always the same wrong result, across different methods *within SRBench*, when trying to fit a simple ground truth equation

Hi there,

When attempting to fit the ground truth equation "3x^2 - 2x + 1" between x values of [0.0 to 10], with 60 data points,
on different methods within SRBench,
the result consistently evolves to something close to "0.297677x^2 + 0.951x -0.298461".

Steps to reproduce:

Launch locally "evaluate_model.py" on a file with 60 data points with x values ranging from 0.0 to 10,
for example with Operon but other methods give a similar result:
python evaluate_model.py /pmlb/datasets/eq1/eq1.tsv -ml OperonRegressor -seed 42

Observe that the result consistently, across several methods within SRBench, always evolves to something close to
"0.297677x^2 + 0.951x -0.298461",
rather than the expected
"3x^2 - 2x + 1".

Specifically, other methods either yield the above or "not implemented".
It seems we can rule out some sort of linear normalization going on.
And it's quite enough points for, say, TuringBot to find the correct solution on the same data.

  • Any ideas?

Many thanks!

PS: Relevant files at this dropbox folder,
http://bit.ly/3kICKKs

Dockerfile for SRBench

Hi @foolnotion
Here are the docker file (zipped as GitHub doesn't accept Dockerfile here) and steps that fail to install Operon and you requested in #55
Dockerfile.zip

Unzip the dockerfile

unzip Dockerfile.zip

Build docker image

docker build --pull --rm -f "Dockerfile" -t srbench:latest "."

Run docker

docker run --runtime=nvidia --gpus all \
    -it srbench /bin/bash

As suggested by @folivetti
remove the following from environment.yml:

  • pybind11=2.6.1
  • tbb-devel=2020.2
    then add the following into environment.yml:
  • taskflow=3.1.0
  • pybind11=2.6.2
vi environment.yml

In srbench/experiments/methods/src/operon_install.sh remove the line:
git checkout 015d420944a64353a37e0493ae9be74c645b4198

vi ./experiments/methods/src/operon_install.sh

Finish conda setup

conda update conda -y
bash configure.sh
conda activate srbench
bash install.sh

You will see that it fails to install Operon then.

Thank you

Wall clock time benchmark

As discussed in #62 with @lacava (and discussed a bit in #24 by others last year), I think a wall clock time benchmark would be a really nice complement to comparing over a fixed number of evaluations.

I think fixing the number of evaluations is only one way of enforcing a level playing ground. One could also fix:

  • The number of mutations
  • The number of subtree evaluations (i.e., count the # of total operators evaluated)
  • The number of FLOPS
  • The number of copies

or any other way of bottlenecking the internal processes of a search code. I think by measuring against only one of these, it artificially biases a comparison against algorithms which might use more of one of these, for whatever reason.

Some algorithms are sample intensive, while other algorithms do a lot of work between steps. I think that only by comparing algorithms based on # of evaluations, this artificially biases any comparison against sample intensive algorithms.

An algorithm and its implementation are not easily separable, so I would argue that you really need to measure by wall clock time to see the whole picture. Not only is this much more helpful from a user's point of view, but it enables algorithms which are intrinsically designed for parallelism to actually demonstrate their performance increase. The same can be said for other algorithmic sacrifices, like required for rigid data structures, data batching, etc.

Of course, there is no single best solution and every different type of benchmark will provide additional info. So I think this wall clock benchmark should be included with the normal fixed-evaluation benchmark, using a separate set of tuned parameters, which would give a more complete picture of performance.

Finally, I note I have a conflict of interest since PySR/SymbolicRegression.jl are designed for parallelism and fast evaluation, but hopefully my above points are not too influenced by this!

Eager to hear others' thoughts

How do I correctly run the statistical comparison?

I am glad that most of this repository is replicable. However, it appears that the statistical comparison part is unable to correctly deal with the provided experiment result. There are several missing values for FEAT and MRGP in the provided file. As a result, the Wilcoxon signed-rank test cannot function properly during the statistical comparison process. So, how should we approach the problem?

Discussion of methods parameters

Hi,

I am opening this issue to start a discussion about the settings and computational limits of the methods. I added some of the email comments in there, feel free to edit!

Evaluation budget

There will be a fixed evaluation budget that each method can expend in its own way. Suggested budget: 500,000 evaluations.

Things to consider:

  • implementations take different amounts of time (depending on tree size as well)
  • some methods use local search
  • some methods use minibatch sampling
  • some methods aren't EC-based

It seems reasonable to take into account local search iterations and adjust for minibatch sampling.

It might also be interesting to monitor the load average on the cluster, if you can isolate it on a per-method basis, maybe combine with other measurements like memory usage. This would give a more general measure of each method's computational requirements.

Hyper-parameters

  • six total combos, for consistency with the first benchmarks paper and to ensure reasonable computational costs.

Model complexity

Not much to comment here, maybe just a minor nit pick: I noticed some methods also use the "AQ" (analytical quotient) symbol, which can be decomposed into basic math operations: aq(a,b) = a / sqrt(1 + b^2). What is then the complexity of the AQ symbol?

To reproduce results for the ground-truth datasets

Thank @lacava for helping me resolve the dataset issue last time.

Based on the command in README, I tried to reproduce the results reported in Figure 3 of your recently accepted paper for both Strogatz and Feynman datasets, and found some concerns/questions.

1. How should we see the produced results?

For Strogatz dataset, I ran
python analyze.py -results ../results_sym_data -target_noise 0.0 "/path/to/pmlb/datasets/strogatz*" -sym_data -n_trials 10 -time_limit 9:00 -tuned --local

Following that, there were many json files produced. In strogatz_bacres1_tuned.FE_AFPRegressor_15795.json (AFP_FE), I found the following values:

>>> with open('strogatz_bacres1_tuned.FE_AFPRegressor_15795.json', 'r') as fp:
...    fe_afp_result = json.load(fp)
...
>>> fe_afp_result['r2_test']
0.9984022413915545

>>> fe_afp_result['symbolic_model']
'(log(((((((-0.567/cos(1.486))^2)/(x_1+(x_0/exp((cos(((0.069^2)*(x_0*
(x_0+cos((cos(((-0.065^2)*(x_0*x_0)))*x_1))))))^3)))))^3)^3)+exp(sin(log(((sqrt(|
(cos(((-0.064^2)*(x_0*x_0)))^3)|)/((0.286+
(x_1*0.017))^2))^2))))))*cos((log(x_0)/(log((x_0^3))-x_1))))'

>>> fe_afp_result['true_model']
' 20 - x - \\frac{x \\cdot y}{1+0.5 \\cdot x^2}$'

I think a) r2_test is called Accuracy in the paper, b) symbolic_model means the symbolic expression as a result of training on strogatz_bacres1 c) whereas the true symbolic expression is associated with true_model.

Is my understanding correct for all a), b), c)?
Also, is the above symbolic_model expected as output of AFP_FE for strogatz_bacres1? Since the method is the 2nd best for ground-truth datasets shown in Fig. 3, and I expected a clearer expression.

2. How is the solution rate derived?

Could you please clarify how the solution rate in Fig. 3 is derived?
Did you manually compare the produced expression symbolic_model to the true expression true_model and consider it solved only when the produced expression exactly matches the true one?

Or if it is fully based on Definition 4.1 (Symbolic Solution). in the paper, what values of a and b are used in Fig. 3?

3. Operon build failed

On Ubuntu 18.04 and 20.04, operon build with your provided install.sh failed due to version discrepancy between libceres-dev (expects Eigen 3.4.0) and libeigen3-dev (the latest available version is 3.3.7). I even tried to build Eigen v3.4.0 from source, but still the build failed.
Do you remember how you setup the dependencies for Operon?

4. Commands to reproduce the results in Fig. 3

Could you provide the exact commands to reproduce the results in Fig. 3?
For Strogatz datasets with target noise = 0.0, I think the following command was used
python analyze.py -results ../results_sym_data -target_noise 0.0 "/path/to/pmlb/datasets/strogatz*" -sym_data -n_trials 10 -time_limit 9:00 -tuned

but how about Feynman datasets?
Also, how should we determine -time_limit ?

5. Computing resource and estimated runtime

To estimate how long it will take to reproduce the results in Fig. 3, could you share the detail of computing resource used in the paper e.g., how many machines of 24-28 core Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz chipsets and 250 GB of RAM are used and (rough) estimated runtime to get the results if you remember?

On a machine with 4-core CPU, 128GB RAM and 2 GPUs, even strogatz_bacres1 (400 samples) is taking more than a day to complete python analyze.py -results ../results_sym_data -target_noise 0.0 /path/to/pmlb/datasets/strogatz_bacres1/strogatz_bacres1.tsv.gz -sym_data -n_trials 10 -time_limit 9:00 -tuned --local

Sorry for many questions, but your responses would be really appreciated and helpful for using this great work in my research.

Thank you!

Feature renaming issue

Hi,

I believe you have an issue at this line:

for i,f in enumerate(features):

Here are the steps to reproduce it:

instance: feynman_I_11_19
candidate model: x0*x3 + x1*x4 + x2*x5
true model: x1*y1+x2*y2+x3*y3
yaml features order: x1, x2, x3, y1, y2, y3 == x0, x1, x2, x3, x4, x5

replacing feature 0 with x1
x1*x3+x1*x4+x2*x5

replacing feature 1 with x2
x2*x3+x2*x4+x2*x5

replacing feature 2 with x3
x3*x3+x3*x4+x3*x5

replacing feature 3 with y1
y1*y1+y1*x4+y1*x5

replacing feature 4 with y2
y1*y1+y1*y2+y1*x5

replacing feature 5 with y3
y1*y1+y1*y2+y1*y3

So, the candidate model x0*x3 + x1*x4 + x2*x5 is renamed to y1*y1+y1*y2+y1*y3, which is obviously not the same thing.

The possible fix would be to perform renaming backwards.
for i,f in enumerate(features): --> for i,f in reversed(list(enumerate(features))):

Now, the steps will be:
replacing feature 5 with y3
x0*x3+x1*x4+x2*y3

replacing feature 4 with y2
x0*x3+x1*y2+x2*y3

replacing feature 3 with y1
x0*y1+x1*y2+x2*y3

replacing feature 2 with x3
x0*y1+x1*y2+x3*y3

replacing feature 1 with x2
x0*y1+x2*y2+x3*y3

replacing feature 0 with x1
x1*y1+x2*y2+x3*y3

Best regards,
Aleksandar

Is it appropriate to use halving grid search as a hyper-parameter tuning strategy?

As for the default hyper-parameter tuning strategy, I find this benchmark using the halving grid search method. To begin, I admit that the traditional grid search strategy is impractical due to the prohibitively expensive computational cost. However, when we use the halving grid search method and the parameter grid is large, as in the case of XGBoost. The first few rounds of hyper-parameter search will only train on a few data points, and the results might be unreliable. So, is this a truly good method for hyper-parameter tuning, or is this tuning protocol sufficient to persuade reviewers?

Operon fails to launch from "evaluate_model.py" on Docker image

Hi there,

Trying to use Operon with "evaluate_model.py" on the Docker image, returns the error:

» python evaluate_model.py /data/eq1.tsv -ml sembackpropgp -seed 42 -skip_tuning
(...)
ImportError: libpython3.9.so.1.0: cannot open shared object file: No such file or directory

whereas, for example:
python evaluate_model.py /data/eq1.tsv -ml sembackpropgp -seed 42 -skip_tuning

does launch the regressor.

  • Any ideas?

PS: Operon also fails to launch on the "192_vineyard" dataset - where other methods do not.

Many thanks!

Invalid parameter alpha for estimator PySRRegressor(equations=0.0).

I encountered the following error while running the PySR Regressor. It appears that PySR has undergone some changes to its parameters, and the alpha value is no longer supported. @MilesCranmer, could you please take a look at this? Thank you!

  File "/opt/conda/envs/srbench/lib/python3.7/site-packages/joblib/parallel.py", line 819, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/opt/conda/envs/srbench/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/opt/conda/envs/srbench/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 597, in __init__
    self.results = batch()
  File "/opt/conda/envs/srbench/lib/python3.7/site-packages/joblib/parallel.py", line 289, in __call__
    for func, args, kwargs in self.items]
  File "/opt/conda/envs/srbench/lib/python3.7/site-packages/joblib/parallel.py", line 289, in <listcomp>
    for func, args, kwargs in self.items]
  File "/opt/conda/envs/srbench/lib/python3.7/site-packages/sklearn/utils/fixes.py", line 216, in __call__
    return self.function(*args, **kwargs)
  File "/opt/conda/envs/srbench/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 668, in _fit_and_score
    estimator = estimator.set_params(**cloned_parameters)
  File "/opt/conda/envs/srbench/lib/python3.7/site-packages/sklearn/base.py", line 248, in set_params
    "with `estimator.get_params().keys()`." % (key, self)
ValueError: Invalid parameter alpha for estimator PySRRegressor(equations=0.0). Check the list of available parameters with `estimator.get_params().keys()`.

Explore nix as conda alternative

Hi,

Since this issue does not require other changes in this repo except for flake.nix, I took the liberty of working with the dev branch.

This allows nix to be used as an alternative to conda, with a bunch of advantages:

  • conda lacks sane management of transitive deps leading to conflicts
  • nix is far more robust and allows complete reproducibility
  • faster
  • easy to build the environment (just type nix develop)
  • easy to generate docker images on the fly
    docker run -p 8888:8888 -ti --rm docker.nix-community.org/nixpkgs/nix-flakes nix develop github:cavalab/srbench/dev --no-write-lock-file
    
  • flake-enabled frameworks like pyoperon pull their own dependencies automatically (no need to keep adding things to an environment file)
  • flake.lock files can fix versions/revisions

This is obviously a low priority issue right now, but I've been using it to deploy srbench/operon without conda.
My frustration with conda began with not being able to add gcc/gxx-11.2.0 to the environment.

This issue is meant to track integration of other frameworks with nix. So far I have also integrated FEAT and Ellyn (wip). Other frameworks should be easily integrated as long as they use standard packaging.

There are some aspects that will need attention from other authors:

  • FEAT:

    • seems to be incompatible with latest numpy (this usually gets fixed upstream)
      >>> import feat
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/nix/store/h3gfxkz8a31l60qmpzj15ryq12nqsspm-python3.9-feat_ml-0.5.2/lib/python3.9/site-packages/feat/__init__.py", line 1, in <module>
          from .feat import Feat, FeatRegressor, FeatClassifier
        File "/nix/store/h3gfxkz8a31l60qmpzj15ryq12nqsspm-python3.9-feat_ml-0.5.2/lib/python3.9/site-packages/feat/feat.py", line 12, in <module>
          from .pyfeat import PyFeat
        File "feat/pyfeat.pyx", line 1, in init feat.pyfeat
      ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
      
    • the install script setup.py seems to be really tailored for conda environments, it took some tricks with setting ENV vars to get it to work
  • Ellyn

    • setup.py hardcoded for Conda, does not work with nix (I will try to patch it)

Best,
Bogdan

Support for a better result report

Now, it seems that the reports generated by those report scripts are too crude. Some important indicators in the AutoML domain, such as significance test and mean accuracy among all tasks, are not presented. In my research domain, there is a benchmark project named "srbench" (https://github.com/EpistasisLab/srbench). It provides a good example for showing above-mentioned metrics. Consequently, I hope this benchmark project can also implement similar features.

How to compare the ground-truth expressions with the sympy string

Hi,

Recently I am implementing my SR algorithm for competition, everything is fine but I still have a question for this competition detail eventhough I have read the Competition Guidlines.

Say a gound-truth model is -1.6+x**2 and my model produces a string log(|-0.2|)+x**2, then according to the "regressor guide" in Competition Guidelines, I should change this string to the sympy compatible string (here is removing '|' since sympy doesn't recognize it, so did in "submission/feat-example/regressor.py"):
sympy_str = est.model_.replace("|", "")

When I do this, the model string would be converted to log(-0.2)+x**2, this is definely not the same expression as my model since if I run f = sympy.symplify("log(-0.2)+x**2"), then sympy_str would become "x**2 - 1.6094379124341 + I*pi".

So this result is totally different from the gound-truth model -1.6+x**2. My question is: will "symplify" be used for model comparison during the competition? If used, how should I deal with this problem?

Bingo: `bingo.bingocpp.AGraph` has no attribute `_update`

@nightdr your submission didn't pass the tests on merge. can you figure out what is going on:

evaluate_model.py:146: in evaluate_model
    results['symbolic_model'] = model(est, X_train_scaled)
methods/Bingo/regressor.py:68: in model
    model_str = str(est.get_best_individual())
/usr/share/miniconda3/envs/srcomp-Bingo/lib/python3.9/site-packages/bingo/symbolic_regression/symbolic_regressor.py:225: in get_best_individual
    return self.best_estimator_.get_best_individual()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = SymbolicRegressor(clo_threshold=1e-05, crossover_prob=0.3, max_time=350.0,
                  mutation_prob=0.45,
     ...', 'exp', 'log',
                             'sqrt'],
                  population_size=2500, use_simplification=True)

    def get_best_individual(self):
        if self.best_ind is None:
            print("Best individual is None, setting to X_0")
            from bingo.symbolic_regression import AGraph
            self.best_ind = AGraph()
            self.best_ind.command_array = np.array([[0, 0, 0]], dtype=int)  # X_0
>           self.best_ind._update()
E           AttributeError: 'bingo.bingocpp.AGraph' object has no attribute '_update'

/usr/share/miniconda3/envs/srcomp-Bingo/lib/python3.9/site-packages/bingo/symbolic_regression/symbolic_regressor.py:153: AttributeError

link:
https://github.com/cavalab/srbench/runs/6446882859?check_suite_focus=true#step:9:111

Originally posted by @lacava in #123 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.