folivetti / itea Goto Github PK
View Code? Open in Web Editor NEWInteraction-Transformation Evolutionary Algorithm
License: GNU General Public License v3.0
Interaction-Transformation Evolutionary Algorithm
License: GNU General Public License v3.0
hello,
In the update to srbench, all methods need to return a sympy compatible model string. I'm having trouble parsing the one that ITEA in srbench currently returns, which is formatted as a Python expression with, e.g. np.tanh(x[:,0])
in it. Could you clarify how to return sympy models?
Here's a test log: https://github.com/cavalab/srbench/actions/runs/5884085024/job/15958102513
the model()
function here needs to be updated: https://github.com/cavalab/srbench/blob/separate-envs/algorithms/itea/regressor.py
We need to have the derivatives information to make the Marginal Effects analysis
in some situations ITEA may overfit the training data. In the last version, I've already implemented a training-validation split to alleviate this problem, but it is not enough in same cases.
TODO:
Given an IT expression, parse it to a Python and R scripts. This can be helpful for post-processing.
In some situations ITEA may create some terms that are linearly related. There are some ways to measure and detect it. It should be interesting to test them in IT expressions in order to further reduce them.
Hey, I'm trying to install ITEA without sudo still. I'm trying to use the conda stack package, although I get this error whether I get stack from there or from haskellstack.org. I updated the install script to the following:
git clone https://github.com/folivetti/ITEA.git
cd ITEA
stack build --allow-different-user
Without the "--allow-different-user" flag, I get a build permissions error. But even with it included, I run into a similar error when running the tests. Let me know if you have any ideas. Full output below:
============================================================== test session starts ===============================================================
platform linux -- Python 3.7.10, pytest-6.2.3, py-1.10.0, pluggy-0.13.1
rootdir: /media/bill/Drive/projects/symbolic-regression/analysis
collected 1 item
test_evaluate_model.py F [100%]
==================================================================== FAILURES ====================================================================
_______________________________________________________ test_evaluate_model[ITEARegressor] _______________________________________________________
ml = 'ITEARegressor'
@pytest.mark.parametrize("ml", MLs)
def test_evaluate_model(ml):
print('running test_evaluate_model with ml=',ml)
dataset = 'test/192_vineyard_small.tsv.gz'
results_path = 'tmp_results'
random_state = 42
algorithm = importlib.__import__('methods.'+ml,globals(),
locals(),
['est','hyper_params','complexity'])
print('algorithm imported:',algorithm)
evaluate_model(dataset,
results_path,
random_state,
ml,
algorithm.est,
algorithm.hyper_params,
algorithm.complexity,
algorithm.model,
> test=True # testing
)
test_evaluate_model.py:38:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
evaluate_model.py:107: in evaluate_model
grid_est.fit(X_train_scaled, y_train_scaled)
/home/bill/anaconda3/envs/srbench/lib/python3.7/site-packages/sklearn/model_selection/_search_successive_halving.py:213: in fit
super().fit(X, y=y, groups=None, **fit_params)
/home/bill/anaconda3/envs/srbench/lib/python3.7/site-packages/sklearn/utils/validation.py:63: in inner_f
return f(*args, **kwargs)
/home/bill/anaconda3/envs/srbench/lib/python3.7/site-packages/sklearn/model_selection/_search.py:880: in fit
self.best_estimator_.fit(X, y, **fit_params)
methods/src/ITEA/itea.py:111: in fit
df = pd.read_csv(f"{logname}/exprs.csv")
/home/bill/anaconda3/envs/srbench/lib/python3.7/site-packages/pandas/io/parsers.py:610: in read_csv
return _read(filepath_or_buffer, kwds)
/home/bill/anaconda3/envs/srbench/lib/python3.7/site-packages/pandas/io/parsers.py:462: in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
/home/bill/anaconda3/envs/srbench/lib/python3.7/site-packages/pandas/io/parsers.py:819: in __init__
self._engine = self._make_engine(self.engine)
/home/bill/anaconda3/envs/srbench/lib/python3.7/site-packages/pandas/io/parsers.py:1050: in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
/home/bill/anaconda3/envs/srbench/lib/python3.7/site-packages/pandas/io/parsers.py:1867: in __init__
self._open_handles(src, kwds)
/home/bill/anaconda3/envs/srbench/lib/python3.7/site-packages/pandas/io/parsers.py:1368: in _open_handles
storage_options=kwds.get("storage_options", None),
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
path_or_buf = '/tmp/tmpfqo_guee/tmp/exprs.csv', mode = 'r', encoding = 'utf-8', compression = None, memory_map = False, is_text = True
errors = 'replace', storage_options = None
def get_handle(
path_or_buf: FilePathOrBuffer,
mode: str,
encoding: Optional[str] = None,
compression: CompressionOptions = None,
memory_map: bool = False,
is_text: bool = True,
errors: Optional[str] = None,
storage_options: StorageOptions = None,
) -> IOHandles:
"""
Get file handle for given path/buffer and mode.
Parameters
----------
path_or_buf : str or file handle
File path or object.
mode : str
Mode to open path_or_buf with.
encoding : str or None
Encoding to use.
compression : str or dict, default None
If string, specifies compression mode. If dict, value at key 'method'
specifies compression mode. Compression mode must be one of {'infer',
'gzip', 'bz2', 'zip', 'xz', None}. If compression mode is 'infer'
and `filepath_or_buffer` is path-like, then detect compression from
the following extensions: '.gz', '.bz2', '.zip', or '.xz' (otherwise
no compression). If dict and compression mode is one of
{'zip', 'gzip', 'bz2'}, or inferred as one of the above,
other entries passed as additional compression options.
.. versionchanged:: 1.0.0
May now be a dict with key 'method' as compression mode
and other keys as compression options if compression
mode is 'zip'.
.. versionchanged:: 1.1.0
Passing compression options as keys in dict is now
supported for compression modes 'gzip' and 'bz2' as well as 'zip'.
memory_map : boolean, default False
See parsers._parser_params for more information.
is_text : boolean, default True
Whether the type of the content passed to the file/buffer is string or
bytes. This is not the same as `"b" not in mode`. If a string content is
passed to a binary file/buffer, a wrapper is inserted.
errors : str, default 'strict'
Specifies how encoding and decoding errors are to be handled.
See the errors argument for :func:`open` for a full list
of options.
storage_options: StorageOptions = None
Passed to _get_filepath_or_buffer
.. versionchanged:: 1.2.0
Returns the dataclass IOHandles
"""
# Windows does not default to utf-8. Set to utf-8 for a consistent behavior
encoding_passed, encoding = encoding, encoding or "utf-8"
# read_csv does not know whether the buffer is opened in binary/text mode
if _is_binary_mode(path_or_buf, mode) and "b" not in mode:
mode += "b"
# open URLs
ioargs = _get_filepath_or_buffer(
path_or_buf,
encoding=encoding,
compression=compression,
mode=mode,
storage_options=storage_options,
)
handle = ioargs.filepath_or_buffer
handles: List[Buffer]
# memory mapping needs to be the first step
handle, memory_map, handles = _maybe_memory_map(
handle, memory_map, ioargs.encoding, ioargs.mode, errors
)
is_path = isinstance(handle, str)
compression_args = dict(ioargs.compression)
compression = compression_args.pop("method")
if compression:
# compression libraries do not like an explicit text-mode
ioargs.mode = ioargs.mode.replace("t", "")
# GZ Compression
if compression == "gzip":
if is_path:
assert isinstance(handle, str)
handle = gzip.GzipFile(
filename=handle,
mode=ioargs.mode,
**compression_args,
)
else:
handle = gzip.GzipFile(
fileobj=handle, # type: ignore[arg-type]
mode=ioargs.mode,
**compression_args,
)
# BZ Compression
elif compression == "bz2":
handle = bz2.BZ2File(
handle, # type: ignore[arg-type]
mode=ioargs.mode,
**compression_args,
)
# ZIP Compression
elif compression == "zip":
handle = _BytesZipFile(handle, ioargs.mode, **compression_args)
if handle.mode == "r":
handles.append(handle)
zip_names = handle.namelist()
if len(zip_names) == 1:
handle = handle.open(zip_names.pop())
elif len(zip_names) == 0:
raise ValueError(f"Zero files found in ZIP file {path_or_buf}")
else:
raise ValueError(
"Multiple files found in ZIP file. "
f"Only one file per ZIP: {zip_names}"
)
# XZ Compression
elif compression == "xz":
handle = get_lzma_file(lzma)(handle, ioargs.mode)
# Unrecognized Compression
else:
msg = f"Unrecognized compression type: {compression}"
raise ValueError(msg)
assert not isinstance(handle, str)
handles.append(handle)
elif isinstance(handle, str):
# Check whether the filename is to be opened in binary mode.
# Binary mode does not support 'encoding' and 'newline'.
if ioargs.encoding and "b" not in ioargs.mode:
if errors is None and encoding_passed is None:
# ignore errors when no encoding is specified
errors = "replace"
# Encoding
handle = open(
handle,
ioargs.mode,
encoding=ioargs.encoding,
errors=errors,
> newline="",
)
E FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpfqo_guee/tmp/exprs.csv'
/home/bill/anaconda3/envs/srbench/lib/python3.7/site-packages/pandas/io/common.py:647: FileNotFoundError
-------------------------------------------------------------- Captured stdout call --------------------------------------------------------------
running test_evaluate_model with ml= ITEARegressor
algorithm imported: <module 'methods.ITEARegressor' from '/media/bill/Drive/projects/symbolic-regression/analysis/experiment/methods/ITEARegressor.py'>
========================================
Evaluating ITEARegressor on
test/192_vineyard_small.tsv.gz
========================================
compression: gzip
filename: test/192_vineyard_small.tsv.gz
scaling X
scaling y
X_train: (15, 3)
y_train: (15,)
test mode enabled
hyper_params set to {}
n_iterations: 1
n_required_iterations: 1
n_possible_iterations: 1
min_resources_: 15
max_resources_: 15
aggressive_elimination: False
factor: 3
----------
iter: 0
n_candidates: 1
n_resources: 15
Fitting 2 folds for each of 1 candidates, totalling 2 fits
[CV] END .................................................... total time= 0.0s
[CV] END .................................................... total time= 0.0s
-------------------------------------------------------------- Captured stderr call --------------------------------------------------------------
You are not the owner of '/media/bill/Drive/projects/symbolic-regression/analysis/experiment/methods/src/ITEA/.stack-work/'. Aborting to protect file permissions.
Retry with '--allow-different-user' to disable this precaution.
You are not the owner of '/media/bill/Drive/projects/symbolic-regression/analysis/experiment/methods/src/ITEA/.stack-work/'. Aborting to protect file permissions.
Retry with '--allow-different-user' to disable this precaution.
You are not the owner of '/media/bill/Drive/projects/symbolic-regression/analysis/experiment/methods/src/ITEA/.stack-work/'. Aborting to protect file permissions.
Retry with '--allow-different-user' to disable this precaution.
============================================================ short test summary info =============================================================
FAILED test_evaluate_model.py::test_evaluate_model[ITEARegressor] - FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpfqo_guee/t...
=============================================================== 1 failed in 1.08s ===============================================================
Creates some post-analysis plots for the obtained results. Maybe using hvega for interactive plots.
Multithread is broken in current version. It may be a good idea to apply parallel evaluation during genReport
function call.
The idea is that, since the generation is a lazily evaluated list (it is an infinite list of populations), the evaluation would be forced during the search for the best individual throughout the evolution process.
Another possibility is to implement the parallel evaluation during the tournament selection.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.