folivetti / itea Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 2.0 38.96 MB

Interaction-Transformation Evolutionary Algorithm

License: GNU General Public License v3.0

Haskell 79.76% Python 17.74% Shell 1.28% Nix 1.23%

itea's People

Contributors

Stargazers

Watchers

Forkers

yskre lacava

itea's Issues

return sympy compatible model

hello,

In the update to srbench, all methods need to return a sympy compatible model string. I'm having trouble parsing the one that ITEA in srbench currently returns, which is formatted as a Python expression with, e.g. np.tanh(x[:,0]) in it. Could you clarify how to return sympy models?

Here's a test log: https://github.com/cavalab/srbench/actions/runs/5884085024/job/15958102513

the model() function here needs to be updated: https://github.com/cavalab/srbench/blob/separate-envs/algorithms/itea/regressor.py

Calculate and store the derivatives of the final expression

We need to have the derivatives information to make the Marginal Effects analysis

Handling overfit

in some situations ITEA may overfit the training data. In the last version, I've already implemented a training-validation split to alleviate this problem, but it is not enough in same cases.

TODO:

penalize model size (implemented but not tested)
add regularization to the fitting process
automatically increment expression size limit and exponent range until we observe only small improvements.

create a parser of IT-expressions to Python

Given an IT expression, parse it to a Python and R scripts. This can be helpful for post-processing.

Implement mechanisms to handle multicollinearity

In some situations ITEA may create some terms that are linearly related. There are some ways to measure and detect it. It should be interesting to test them in IT expressions in order to further reduce them.

permissions error

Hey, I'm trying to install ITEA without sudo still. I'm trying to use the conda stack package, although I get this error whether I get stack from there or from haskellstack.org. I updated the install script to the following:

git clone https://github.com/folivetti/ITEA.git

cd ITEA

stack build --allow-different-user

Without the "--allow-different-user" flag, I get a build permissions error. But even with it included, I run into a similar error when running the tests. Let me know if you have any ideas. Full output below:

============================================================== test session starts ===============================================================
platform linux -- Python 3.7.10, pytest-6.2.3, py-1.10.0, pluggy-0.13.1
rootdir: /media/bill/Drive/projects/symbolic-regression/analysis
collected 1 item                                                                                                                                 

test_evaluate_model.py F                                                                                                                   [100%]

==================================================================== FAILURES ====================================================================
_______________________________________________________ test_evaluate_model[ITEARegressor] _______________________________________________________

ml = 'ITEARegressor'

    @pytest.mark.parametrize("ml", MLs)
    def test_evaluate_model(ml):
        print('running test_evaluate_model with ml=',ml)
        dataset = 'test/192_vineyard_small.tsv.gz'
        results_path = 'tmp_results'
        random_state = 42
    
        algorithm = importlib.__import__('methods.'+ml,globals(),
                                         locals(),
                                       ['est','hyper_params','complexity'])
    
        print('algorithm imported:',algorithm)
        evaluate_model(dataset,
                       results_path,
                       random_state,
                       ml,
                       algorithm.est,
                       algorithm.hyper_params,
                       algorithm.complexity,
                       algorithm.model,
>                      test=True # testing
                      )

test_evaluate_model.py:38: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
evaluate_model.py:107: in evaluate_model
    grid_est.fit(X_train_scaled, y_train_scaled)
/home/bill/anaconda3/envs/srbench/lib/python3.7/site-packages/sklearn/model_selection/_search_successive_halving.py:213: in fit
    super().fit(X, y=y, groups=None, **fit_params)
/home/bill/anaconda3/envs/srbench/lib/python3.7/site-packages/sklearn/utils/validation.py:63: in inner_f
    return f(*args, **kwargs)
/home/bill/anaconda3/envs/srbench/lib/python3.7/site-packages/sklearn/model_selection/_search.py:880: in fit
    self.best_estimator_.fit(X, y, **fit_params)
methods/src/ITEA/itea.py:111: in fit
    df = pd.read_csv(f"{logname}/exprs.csv")
/home/bill/anaconda3/envs/srbench/lib/python3.7/site-packages/pandas/io/parsers.py:610: in read_csv
    return _read(filepath_or_buffer, kwds)
/home/bill/anaconda3/envs/srbench/lib/python3.7/site-packages/pandas/io/parsers.py:462: in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
/home/bill/anaconda3/envs/srbench/lib/python3.7/site-packages/pandas/io/parsers.py:819: in __init__
    self._engine = self._make_engine(self.engine)
/home/bill/anaconda3/envs/srbench/lib/python3.7/site-packages/pandas/io/parsers.py:1050: in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
/home/bill/anaconda3/envs/srbench/lib/python3.7/site-packages/pandas/io/parsers.py:1867: in __init__
    self._open_handles(src, kwds)
/home/bill/anaconda3/envs/srbench/lib/python3.7/site-packages/pandas/io/parsers.py:1368: in _open_handles
    storage_options=kwds.get("storage_options", None),
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

path_or_buf = '/tmp/tmpfqo_guee/tmp/exprs.csv', mode = 'r', encoding = 'utf-8', compression = None, memory_map = False, is_text = True
errors = 'replace', storage_options = None

    def get_handle(
        path_or_buf: FilePathOrBuffer,
        mode: str,
        encoding: Optional[str] = None,
        compression: CompressionOptions = None,
        memory_map: bool = False,
        is_text: bool = True,
        errors: Optional[str] = None,
        storage_options: StorageOptions = None,
    ) -> IOHandles:
        """
        Get file handle for given path/buffer and mode.
    
        Parameters
        ----------
        path_or_buf : str or file handle
            File path or object.
        mode : str
            Mode to open path_or_buf with.
        encoding : str or None
            Encoding to use.
        compression : str or dict, default None
            If string, specifies compression mode. If dict, value at key 'method'
            specifies compression mode. Compression mode must be one of {'infer',
            'gzip', 'bz2', 'zip', 'xz', None}. If compression mode is 'infer'
            and `filepath_or_buffer` is path-like, then detect compression from
            the following extensions: '.gz', '.bz2', '.zip', or '.xz' (otherwise
            no compression). If dict and compression mode is one of
            {'zip', 'gzip', 'bz2'}, or inferred as one of the above,
            other entries passed as additional compression options.
    
            .. versionchanged:: 1.0.0
    
               May now be a dict with key 'method' as compression mode
               and other keys as compression options if compression
               mode is 'zip'.
    
            .. versionchanged:: 1.1.0
    
               Passing compression options as keys in dict is now
               supported for compression modes 'gzip' and 'bz2' as well as 'zip'.
    
        memory_map : boolean, default False
            See parsers._parser_params for more information.
        is_text : boolean, default True
            Whether the type of the content passed to the file/buffer is string or
            bytes. This is not the same as `"b" not in mode`. If a string content is
            passed to a binary file/buffer, a wrapper is inserted.
        errors : str, default 'strict'
            Specifies how encoding and decoding errors are to be handled.
            See the errors argument for :func:`open` for a full list
            of options.
        storage_options: StorageOptions = None
            Passed to _get_filepath_or_buffer
    
        .. versionchanged:: 1.2.0
    
        Returns the dataclass IOHandles
        """
        # Windows does not default to utf-8. Set to utf-8 for a consistent behavior
        encoding_passed, encoding = encoding, encoding or "utf-8"
    
        # read_csv does not know whether the buffer is opened in binary/text mode
        if _is_binary_mode(path_or_buf, mode) and "b" not in mode:
            mode += "b"
    
        # open URLs
        ioargs = _get_filepath_or_buffer(
            path_or_buf,
            encoding=encoding,
            compression=compression,
            mode=mode,
            storage_options=storage_options,
        )
    
        handle = ioargs.filepath_or_buffer
        handles: List[Buffer]
    
        # memory mapping needs to be the first step
        handle, memory_map, handles = _maybe_memory_map(
            handle, memory_map, ioargs.encoding, ioargs.mode, errors
        )
    
        is_path = isinstance(handle, str)
        compression_args = dict(ioargs.compression)
        compression = compression_args.pop("method")
    
        if compression:
            # compression libraries do not like an explicit text-mode
            ioargs.mode = ioargs.mode.replace("t", "")
    
            # GZ Compression
            if compression == "gzip":
                if is_path:
                    assert isinstance(handle, str)
                    handle = gzip.GzipFile(
                        filename=handle,
                        mode=ioargs.mode,
                        **compression_args,
                    )
                else:
                    handle = gzip.GzipFile(
                        fileobj=handle,  # type: ignore[arg-type]
                        mode=ioargs.mode,
                        **compression_args,
                    )
    
            # BZ Compression
            elif compression == "bz2":
                handle = bz2.BZ2File(
                    handle,  # type: ignore[arg-type]
                    mode=ioargs.mode,
                    **compression_args,
                )
    
            # ZIP Compression
            elif compression == "zip":
                handle = _BytesZipFile(handle, ioargs.mode, **compression_args)
                if handle.mode == "r":
                    handles.append(handle)
                    zip_names = handle.namelist()
                    if len(zip_names) == 1:
                        handle = handle.open(zip_names.pop())
                    elif len(zip_names) == 0:
                        raise ValueError(f"Zero files found in ZIP file {path_or_buf}")
                    else:
                        raise ValueError(
                            "Multiple files found in ZIP file. "
                            f"Only one file per ZIP: {zip_names}"
                        )
    
            # XZ Compression
            elif compression == "xz":
                handle = get_lzma_file(lzma)(handle, ioargs.mode)
    
            # Unrecognized Compression
            else:
                msg = f"Unrecognized compression type: {compression}"
                raise ValueError(msg)
    
            assert not isinstance(handle, str)
            handles.append(handle)
    
        elif isinstance(handle, str):
            # Check whether the filename is to be opened in binary mode.
            # Binary mode does not support 'encoding' and 'newline'.
            if ioargs.encoding and "b" not in ioargs.mode:
                if errors is None and encoding_passed is None:
                    # ignore errors when no encoding is specified
                    errors = "replace"
                # Encoding
                handle = open(
                    handle,
                    ioargs.mode,
                    encoding=ioargs.encoding,
                    errors=errors,
>                   newline="",
                )
E               FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpfqo_guee/tmp/exprs.csv'

/home/bill/anaconda3/envs/srbench/lib/python3.7/site-packages/pandas/io/common.py:647: FileNotFoundError
-------------------------------------------------------------- Captured stdout call --------------------------------------------------------------
running test_evaluate_model with ml= ITEARegressor
algorithm imported: <module 'methods.ITEARegressor' from '/media/bill/Drive/projects/symbolic-regression/analysis/experiment/methods/ITEARegressor.py'>
========================================
Evaluating ITEARegressor on 
test/192_vineyard_small.tsv.gz
========================================
compression: gzip
filename: test/192_vineyard_small.tsv.gz
scaling X
scaling y
X_train: (15, 3)
y_train: (15,)
test mode enabled
hyper_params set to {}
n_iterations: 1
n_required_iterations: 1
n_possible_iterations: 1
min_resources_: 15
max_resources_: 15
aggressive_elimination: False
factor: 3
----------
iter: 0
n_candidates: 1
n_resources: 15
Fitting 2 folds for each of 1 candidates, totalling 2 fits
[CV] END .................................................... total time=   0.0s
[CV] END .................................................... total time=   0.0s
-------------------------------------------------------------- Captured stderr call --------------------------------------------------------------
You are not the owner of '/media/bill/Drive/projects/symbolic-regression/analysis/experiment/methods/src/ITEA/.stack-work/'. Aborting to protect file permissions.
Retry with '--allow-different-user' to disable this precaution.
You are not the owner of '/media/bill/Drive/projects/symbolic-regression/analysis/experiment/methods/src/ITEA/.stack-work/'. Aborting to protect file permissions.
Retry with '--allow-different-user' to disable this precaution.
You are not the owner of '/media/bill/Drive/projects/symbolic-regression/analysis/experiment/methods/src/ITEA/.stack-work/'. Aborting to protect file permissions.
Retry with '--allow-different-user' to disable this precaution.
============================================================ short test summary info =============================================================
FAILED test_evaluate_model.py::test_evaluate_model[ITEARegressor] - FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpfqo_guee/t...
=============================================================== 1 failed in 1.08s ===============================================================

Post-analysis plots

Creates some post-analysis plots for the obtained results. Maybe using hvega for interactive plots.

Fix multithread

Multithread is broken in current version. It may be a good idea to apply parallel evaluation during genReport function call.

The idea is that, since the generation is a lazily evaluated list (it is an infinite list of populations), the evaluation would be forced during the search for the best individual throughout the evolution process.

Another possibility is to implement the parallel evaluation during the tournament selection.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.