petab-dev / petab_select Goto Github PK

9.0 5.0 0.0 704 KB

Repository for development of the model selection extension

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

petab_select's Introduction

PEtab Select

The repository for the development of the extension to PEtab for model selection, including the additional file formats and Python 3 package.

Install

The Python 3 package provides both the Python 3 and command-line (CLI) interfaces, and can be installed from PyPI, with pip3 install petab-select.

Documentation

Further documentation is available at http://petab-select.readthedocs.io/.

Examples

There are example Jupyter notebooks for usage of PEtab Select with

the command-line interface, and
the Python 3 interface,

in the doc/examples directory.

Supported features

Criterion

Methods

forward: https://en.wikipedia.org/wiki/Stepwise_regression#Main_approaches
backward: https://en.wikipedia.org/wiki/Stepwise_regression#Main_approaches
brute_force: Optimize all possible model candidates, then return the model with the best criterion value.
famos: https://doi.org/10.1371/journal.pcbi.1007230

Note that the directional methods (forward, backward) find models with the smallest step size (in terms of number of estimated parameters). For example, given the forward method and a predecessor model with 2 estimated parameters, if there are no models with 3 estimated parameters, but some models with 4 estimated parameters, then the search may return candidate models with 4 estimated parameters.

File formats

Column or key names that are surrounding by square brackets (e.g. [constraint_files]) are optional.

Selection problem

A YAML file with a description of the model selection problem.

format_version: [string]
criterion: [string]
method: [string]
model_space_files: [List of filenames]
[constraint_files]: [List of filenames]
[predecessor_model_files]: [List of filenames]

format_version: The version of the model selection extension format ( e.g. 'beta_1')
criterion: The criterion by which models should be compared (e.g. 'AIC')
method: The method by which model candidates should be generated (e.g. 'forward')
model_space_files: The filenames of model space files.
constraint_files: The filenames of constraint files.
predecessor_model_files: The filenames of predecessor (initial) model files.

Model space

A TSV with candidate models, in compressed or uncompressed format.

`model_subspace_id`	`petab_yaml`	[`sbml`]	`parameter_id_1`	...	`parameter_id_n`
(Unique) [string]	[string]	[string]	[string/float] OR [; delimited list of string/float]	...	[string/float] OR [; delimited list of string/float]

model_subspace_id: An ID for the model subspace.
petab_yaml: The PEtab YAML filename that serves as the base for a model.
sbml: An SBML filename. If the PEtab YAML file specifies multiple SBML models, this can select a specific model by model filename.
parameter_id_1...parameter_id_n : Parameter IDs that are specified to take specific values or be estimated. Example valid values are:
- uncompressed format:
  - 0.0
  - 1.0
  - estimate
- compressed format
  - 0.0;1.1;estimate (the parameter can take the values 0.0 or 1.1, or be estimated according to the PEtab problem)

Constraints

A TSV file with constraints.

`petab_yaml`	[`if`]	`constraint`
[string]	[SBML L3 Formula expression]	[SBML L3 Formula expression]

petab_yaml: The filename of the PEtab YAML file that this constraint applies to.
if: As a single YAML can relate to multiple models in the model space file, this ensures the constraint is only applied to the models that match this if statement
constraint: If a model violates this constraint, it is skipped during the model selection process and not optimized.

Model(s) (Predecessor models / model interchange / report)

Predecessor models are used to initialize an appropriate model selection method. Model IDs should be unique here and compared to model IDs in any model space files.
Model interchange refers to the format used to transfer model information between PEtab Select and a PEtab-compatible calibration tool, during the model selection process.
Report refers to the final results of the model selection process, which may include calibration results from any calibrated models, or just the select model.

Here, the format for a single model is shown. Multiple models can be specified as a YAML list of the same format.

The only required key is the PEtab YAML, as a model requires a PEtab problem. All other keys are maybe required, for the different uses of the format (e.g., the report format should include estimated_parameters), or at different stages of the model selection process (the PEtab-compatible calibration tool should provide criteria for model comparison).

[criteria]: [Dictionary of criterion names and values]
[estimated_parameters]: [Dictionary of parameter IDs and values]
[model_hash]: [string]
[model_id]: [string]
[parameters]: [Dictionary of parameter IDs and values]
petab_yaml: [string]
[predecessor_model_hash]: [string]
[sbml]: [string]

criteria: The value of the criterion by which model selection was performed, at least. Optionally, other criterion values too.
estimated_parameters: Parameter estimates, not only of parameters specified to be estimated in a model space file, but also parameters specified to be estimated in the original PEtab problem of the model.
model_hash: The model hash, generated by the PEtab Select library.
model_id: The model ID.
model_subspace_id: Same as in the model space files.
model_subspace_indices: The indices that locate this model in its model subspace.
parameters: The parameters from the problem (either values or 'estimate') (a specific combination from a model space file, but uncalibrated).
petab_yaml: Same as in model space files.
predecessor_model_hash: The hash of the model that preceded this model during the model selection process.
sbml: Same as in model space files.

Test cases

Several test cases are provided, to test the compatibility of a PEtab-compatible calibration tool with different PEtab Select features.

The test cases are available in the test_cases directory, and are provided in the model format.

Test ID	Criterion	Method	Model space files	Compressed format	Constraints files	Predecessor (initial) models files
0001	(all)	(only one model)	1
0002¹	AIC	forward	1
0003	BIC	all	1	Yes
0004	AICc	backward	1		1
0005	AIC	forward	1			1
0006	AIC	forward	1
0007²	AIC	forward	1
0008²	AICc	backward	1
0009³	AICc	FAMoS	1	Yes		Yes

1. Model M1_0 differs from M1_1 in three parameters, but only 1 additional estimated parameter. The effect of this on model selection criteria needs to be clarified. Test case 0006 is a duplicate of 0002 that doesn't have this issue.

2. Noise parameter is removed, noise is fixed to 1.

3. This is a computationally expensive problem to solve. Developers can try a model selection initialized with the provided predecessor model, which is a model start that reproducibly finds the expected model. To solve the problem reproducibly ab initio, on the order of 100 random model starts are required. This test case reproduces the model selection problem presented in https://doi.org/10.1016/j.cels.2016.01.002 .

petab_select's People

Contributors

Stargazers

Watchers

petab_select's Issues

Handle similar model criterion values

Models can have indistinguishable criterion values, such that choosing one over the other is determined by numerical noise.

For example, in one case, during a forward search, two models with the same number of parameters were calibrated to have the same likelihood to several decimal places. Repeating the forward search 100 times resulted in 2 different trajectories through model space, occurring approximately 50/50. This is because numerical noise determined which of the two similar models were chosen at this point during the forward search.

Options to handle:

emit a warning that it is unclear which model to select, when models are very similar
create branches in the model selection, when encountering models that are within some epsilon criterion of each other or the best model so far
allow users to restart model selection at specific points, to explore trajectories that might have been chosen given different numerical noise, or were within some epsilon criterion of the best model so far

User-friendly model hashes/IDs

A unique model hash/ID can be automatically generated from

model subspace ID
model subspace indices

Some helper function to convert such a hash/ID back into a model, given a PEtab Select problem, could then be written as well.

Check for plausible likelihoods in case of nested models

Maybe I missed it, but I don't think it's implemented yet:
It would be nice (to have an option) to automatically check whether in case of nested models and identical datasets, the super-model has a likelihood at least as good as any submodel. If this is not the case, this is most likely due to non-converged optimizations, and should be communicated to the user.

sbmldocument leaking

Running a basic analysis like:

    problem = Problem.from_yaml(args.yaml_file_name)
    best = evaluate_problem(problem,
            temp_dir=args.output_dir,
            delete_temp_files=False)

    if args.output_yaml: 
        best.to_yaml(args.output_yaml)

where the evaluate_problem function is one of mine, not doing anything with libsbml, Problem is the petab_select one. Yields a warning message:

swig/python detected a memory leak of type 'SBMLDocument *', no destructor found.

issues with serialisation of `model.to_yaml`

trying to compare some of the results with the expected ones in the test_cases, I encountered an issue, where the result files serialised by running to_yaml after the selection on the best model yields rather cryptic yaml files. Where I was hoping for files with floating point numbers, as you have in the expected files, I receive files like:

criteria:
  AIC: !!python/object/apply:numpy.core.multiarray.scalar
  - &id001 !!python/object/apply:numpy.dtype
    args:
    - f8
    - false
    - true
    state: !!python/tuple
    - 3
    - <
    - null
    - null
    - null
    - -1
    - -1
    - 0
  - !!binary |
    bPioMii0tEA=
  AICc: !!python/object/apply:numpy.core.multiarray.scalar
  - *id001
  - !!binary |
    bPioMii1tEA=
  BIC: !!python/object/apply:numpy.core.multiarray.scalar
  - *id001
  - !!binary |
    Dpto4/KztEA=
  LLH: !!python/object/apply:numpy.core.multiarray.scalar
  - *id001
  - !!binary |
    bPioMiiypMA=

is that on my end, or would it be something that petab_select could do to cast the elements into basic types?

I use:

    test_model.set_criterion(Criterion.LLH, llh)

   
    test_model.compute_criterion(Criterion.AIC)
    test_model.compute_criterion(Criterion.AICC)
    test_model.compute_criterion(Criterion.BIC)

potential issue with `write_summary_tsv`

in line 272 in ui.py we have

    # FIXME remove once MostDistantCandidateSpace exists...
    method = candidate_space.method
    if (
        candidate_space.governing_method == Method.FAMOS
        and candidate_space.predecessor_model.predecessor_model_hash is None
    ):

however, the predecessor_model can be either a Model, or str. In the case of str this code will fail with an AttributeError.

Using python >= 3.8 function?

Running tests on python 3.7 gives

petab_select/model_space.py:6: in <module>
    from typing import (
E   ImportError: cannot import name 'get_args' from 'typing' (/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/typing.py)

(see e.g. https://github.com/PEtab-dev/petab_select/runs/5360619080?check_suite_focus=true)

runs w/o problems on python >= 3.8. So, need to either fix this function use or require >= 3.8 as minimum version.

Add more documentation and comments to `candidate_space.py`

CI tests

Add tests in tests/ directory to GitHub CI

petab_select.Model.to_petab not working

Wanting to try out the package, i've ran into a snag, potentially a wrong petab version. The error message is:

  petab_select/petab_select/model.py in to_petab(self, output_path)
      330         if output_path is not None:
      331             petab_yaml_path = \
  --> 332                 petab_problem.to_files_generic(prefix_path=output_path)
      333 
      334         return (petab_problem, petab_yaml_path)

maybe it should be to_files rather than to_files_generic.

CI code quality

Add code quality checks (e.g. flake8 and black) in GitHub CI and git precommit

Code review

Eventually all files, but can be one file at a time. Can provide feedback by adding comments to code then PR.

To be reviewed:

main code in petab_select/
tests in tests/
test cases in test_cases/
notebooks in doc/examples

Handle termination criteria for switching methods

FAMoS can start with forward, which may return 0 candidates if there are no feasible forward moves. In a normal forward search, this would reasonably end the search. However, since FAMoS can switch, the search should continue, but with a different method. A user observed that FAMoS does not continue, and instead the search is ended unreasonably early.

should `petab_select.ui.candidates` provide already visited models?

As mentioned during the TC, i'm using petab_select.ui.candidates instead of the neighbor function of the old version, to essentially visit all models from the candidate space. In the case of the forward selection, this works fine. For brute force i also retrieve models that are already visited.

So, the question is, should the following loop terminate, or not?

p = petab_select.Problem.from_yaml(TEST_SUITE_DIR + '/0001/petab_select_problem.yaml')
cs = petab_select.ui.candidates(problem=p)
test_models = cs.models
while test_models: 
    for model in test_models:
        model.set_criterion(p.criterion, 100)
    p.add_calibrated_models(test_models)
    m = p.get_best(test_models)
    petab_select.ui.candidates(
            problem=p,
            candidate_space=cs,
            predecessor_model=m,
    )
    test_models = cs.models

Host docs on Read the Docs

add to GitHub CI
include Jupyter notebook examples

Monitor code coverage

Have regular reports on code coverage, e.g. via codecov

user defined criteria in enum

I was wondering, whether you could add a user defined criteria to the list of criteria a user could save with the model. For me this would be ideal, since I'd like to save the COPASI objective values (a SSR) for the individual steps. If you added a 'USER_DEFINED' or 'OTHER' or 'SSR' or anything really to the class Criterion(str, Enum): in your constants, then it would make it easier for me to roundtrip those fields, rather than tracking them myself outside of petab_select.

Doc `petab_select.ui.candidates`

The result changes with each successive call, s.t. the first brute-force will provide all models, the next call will provide no models.

Add petab-select logo

Visualization code

line graph of history of best model criterion at each iteration
network graph of all visited models
criterion vs. number of parameters plot

Other suggestions welcome

Creating candidate space fails for version 0.1.12

For the latest version of petab-select PEtab.jl errors. Running on Python 3.10 it boils down to:

select_problem = petab_select.Problem.from_yaml("petab_select_problem.yaml")
cs = petab_select.ui.candidates(problem=select_problem, criterion=select_problem.criterion)

Yielding the error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sebpe/anaconda3/envs/PeTab/lib/python3.10/site-packages/petab_select/ui.py", line 99, in candidates
    predecessor_model.get_criterion(
AttributeError: 'NoneType' object has no attribute 'get_criterion'

Everything worked prior to version 0.1.12 (noticed this once I managed to get PyCall workng on GitHub CI).

I used the petab-select problem file from here.

Make model subspace objects available to a CandidateSpace `search_subspaces` wrapper

Add `previous_predecessor_model` to docs

Notebooks too, so devs can have useful "predecessor change" columns in their "summary.tsv" files.

Use kwargs to specify values in `criteria.py`; fix AICc arguments

ModelSubspace.from_definition API change

Tests in https://github.com/PEtab-dev/petab_select/blob/main/tests/model_subspace/test_model_subspace.py#L41 assume a different function than in https://github.com/PEtab-dev/petab_select/blob/main/petab_select/model_subspace.py#L566

Error when using limit argument with method 'brute_force'

We (me + @fgwieland) encountered an error in PEtab-select with test case 0003 and using the limit argument (5 in this case) in the CLI:

'Traceback (most recent call last):
   File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 193, in _run_module_as_main
     "__main__", mod_spec)
   File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
     exec(code, run_globals)
   File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\Scripts\petab_select.exe\__main__.py", line 7, in <module>
   File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\click\core.py", line 1137, in __call__
     return self.main(*args, **kwargs)
   File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\click\core.py", line 1062, in main
     rv = self.invoke(ctx)
   File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\click\core.py", line 1668, in invoke
     return _process_result(sub_ctx.command.invoke(sub_ctx))
   File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\click\core.py", line 1404, in invoke
     return ctx.invoke(self.callback, **ctx.params)
   File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\click\core.py", line 763, in invoke
     return __callback(*args, **kwargs)
   File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\petab_select\cli.py", line 208, in candidates
     excluded_model_hashes=excluded_model_hashes,
   File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\petab_select\ui.py", line 79, in candidates
     problem.model_space.search(candidate_space, limit=limit)
   File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\petab_select\model_space.py", line 201, in search
     'An unknown error has occurred. Too many models were '
 ValueError: An unknown error has occurred. Too many models were generated. Requested limit: 5.0. Number of generated models: 18.
 '

It seems like petab-select fails when using brute_force with a limit unequal to the number of all possible models. Is this intended?

Fixed parameters entering as measurements in calculation of AICc

M1_3 in test case 0004 has 4 parameters in total, of which 2 are fixed during parameter estimation. The expected AICc seems to be calculated for n_estimated = 2, n_priors = 2 (I guess the comment in l. 127 of petab_select/criteria.py is related).

From my understanding, it should be either n_estimated = 2, n_priors = 0 or n_estimated = 4, n_priors = 2 (consider fixed parameters as estimated ones but with an infinitely narrow prior). However, these two approaches do not result in the same criterion. Could somebody elaborate or give a reference on why priors are treated as measurements in the calculation of, e.g., the AICc?