SymbolicRegression.jl searches for symbolic expressions which optimize a particular objective.
sr_animation.mp4
Latest release | Documentation | Forums | Paper |
---|---|---|---|
Build status | Coverage |
---|---|
Check out PySR for a Python frontend. Cite this software
Contents:
We are eager to welcome new contributors! If you have an idea for a new feature, don't hesitate to share it on the issues page or forums.
Mark Kittisopikul 💻 💡 🚇 📦 📣 👀 🔧 |
T Coxon 🐛 💻 🔌 💡 🚇 🚧 👀 🔧 |
Dhananjay Ashok 💻 🌍 💡 🚧 |
Johan Blåbäck 🐛 💻 💡 🚧 📣 👀 |
JuliusMartensen 🐛 💻 📖 🔌 💡 🚇 🚧 📦 📣 👀 🔧 📓 |
ngam 💻 🚇 📦 👀 🔧 |
Kaze Wong 🐛 💻 💡 🚇 🚧 📣 👀 🔬 📓 |
Christopher Rackauckas 🐛 💻 🔌 💡 🚇 📣 👀 🔬 🔧 |
Patrick Kidger 🐛 💻 📖 🔌 💡 🚧 📣 👀 🔬 🔧 |
Okon Samuel 🐛 💻 📖 🚧 💡 🚇 👀 |
William Booth-Clibborn 💻 🌍 📖 📓 🚧 👀 🔧 |
Pablo Lemos 🐛 💡 📣 👀 🔬 📓 |
Jerry Ling 🐛 💻 📖 🌍 💡 📣 👀 📓 |
Charles Fox 🐛 💻 💡 🚧 📣 👀 🔬 📓 |
Johann Brehmer 💻 📖 💡 📣 👀 🔬 |
Marius Millea 💻 💡 📣 👀 📓 |
Coba 🐛 💻 💡 👀 📓 |
Pietro Monticone 🐛 📖 💡 |
Mateusz Kubica 📖 💡 |
Jay Wadekar 🐛 💡 📣 🔬 |
Anthony Blaom, PhD 🚇 💡 👀 |
Jgmedina95 🐛 💡 👀 |
Michael Abbott 💻 💡 👀 🔧 |
Oscar Smith 💻 💡 |
Eric Hanson 💡 📣 📓 |
Henrique Becker 💻 💡 👀 |
qwertyjl 🐛 📖 💡 📓 |
Rik Huijzer 💡 🚇 |
Hongyu Wang 💡 📣 🔬 |
Saurav Maheshkar 🔧 |
Install in Julia with:
using Pkg
Pkg.add("SymbolicRegression")
The easiest way to use SymbolicRegression.jl is with MLJ. Let's see an example:
import SymbolicRegression: SRRegressor
import MLJ: machine, fit!, predict, report
# Dataset with two named features:
X = (a = rand(500), b = rand(500))
# and one target:
y = @. 2 * cos(X.a * 23.5) - X.b ^ 2
# with some noise:
y = y .+ randn(500) .* 1e-3
model = SRRegressor(
niterations=50,
binary_operators=[+, -, *],
unary_operators=[cos],
)
Now, let's create and train this model on our data:
mach = machine(model, X, y)
fit!(mach)
You will notice that expressions are printed
using the column names of our table. If,
instead of a table-like object,
a simple array is passed
(e.g., X=randn(100, 2)
),
x1, ..., xn
will be used for variable names.
Let's look at the expressions discovered:
report(mach)
Finally, we can make predictions with the expressions on new data:
predict(mach, X)
This will make predictions using the expression
selected using the function passed to selection_method
.
By default this selection is made a mix of accuracy and complexity.
For example, we can make predictions using expression 2 with:
mach.model.selection_method = Returns(2)
predict(mach, X)
For fitting multiple outputs, one can use MultitargetSRRegressor
.
For a full list of options available to each regressor, see the API page.
The heart of SymbolicRegression.jl is the
equation_search
function.
This takes a 2D array and attempts
to model a 1D array using analytic functional forms.
Note: unlike the MLJ interface,
this assumes column-major input of shape [features, rows].
import SymbolicRegression: Options, equation_search
X = randn(2, 100)
y = 2 * cos.(X[2, :]) + X[1, :] .^ 2 .- 2
options = Options(
binary_operators=[+, *, /, -],
unary_operators=[cos, exp],
populations=20
)
hall_of_fame = equation_search(
X, y, niterations=40, options=options,
parallelism=:multithreading
)
You can view the resultant equations in the dominating Pareto front (best expression seen at each complexity) with:
import SymbolicRegression: calculate_pareto_frontier
dominating = calculate_pareto_frontier(hall_of_fame)
This is a vector of PopMember
type - which contains the expression along with the score.
We can get the expressions with:
trees = [member.tree for member in dominating]
Each of these equations is a Node{T}
type for some constant type T
(like Float32
).
You can evaluate a given tree with:
import SymbolicRegression: eval_tree_array
tree = trees[end]
output, did_succeed = eval_tree_array(tree, X, options)
The output
array will contain the result of the tree at each of the 100 rows.
This did_succeed
flag detects whether an evaluation was successful, or whether
encountered any NaNs or Infs during calculation (such as, e.g., sqrt(-1)
).
You can also manipulate and construct trees directly. For example:
import SymbolicRegression: Options, Node, eval_tree_array
options = Options(;
binary_operators=[+, -, *, ^, /], unary_operators=[cos, exp, sin]
)
x1, x2, x3 = [Node(; feature=i) for i=1:3]
tree = cos(x1 - 3.2 * x2) - x1^3.2
This tree has Float64
constants, so the type of the entire tree
will be promoted to Node{Float64}
.
We can convert all constants (recursively) to Float32
:
float32_tree = convert(Node{Float32}, tree)
We can then evaluate this tree on a dataset:
X = rand(Float32, 3, 100)
output, did_succeed = eval_tree_array(tree, X, options)
We can view the equations in the dominating Pareto frontier with:
dominating = calculate_pareto_frontier(hall_of_fame)
We can convert the best equation to SymbolicUtils.jl with the following function:
import SymbolicRegression: node_to_symbolic
eqn = node_to_symbolic(dominating[end].tree, options)
println(simplify(eqn*5 + 3))
We can also print out the full pareto frontier like so:
import SymbolicRegression: compute_complexity, string_tree
println("Complexity\tMSE\tEquation")
for member in dominating
complexity = compute_complexity(member, options)
loss = member.loss
string = string_tree(member.tree, options)
println("$(complexity)\t$(loss)\t$(string)")
end
SymbolicRegression.jl is organized roughly as follows. Rounded rectangles indicate objects, and rectangles indicate functions.
(if you can't see this diagram being rendered, try pasting it into mermaid-js.github.io/mermaid-live-editor)
flowchart TB
op([Options])
d([Dataset])
op --> ES
d --> ES
subgraph ES[equation_search]
direction TB
IP[sr_spawner]
IP --> p1
IP --> p2
subgraph p1[Thread 1]
direction LR
pop1([Population])
pop1 --> src[s_r_cycle]
src --> opt[optimize_and_simplify_population]
opt --> pop1
end
subgraph p2[Thread 2]
direction LR
pop2([Population])
pop2 --> src2[s_r_cycle]
src2 --> opt2[optimize_and_simplify_population]
opt2 --> pop2
end
pop1 --> hof
pop2 --> hof
hof([HallOfFame])
hof --> migration
pop1 <-.-> migration
pop2 <-.-> migration
migration[migrate!]
end
ES --> output([HallOfFame])
The HallOfFame
objects store the expressions with the lowest loss seen at each complexity.
The dependency structure of the code itself is as follows:
stateDiagram-v2
AdaptiveParsimony --> Mutate
AdaptiveParsimony --> Population
AdaptiveParsimony --> RegularizedEvolution
AdaptiveParsimony --> SingleIteration
AdaptiveParsimony --> SymbolicRegression
CheckConstraints --> Mutate
CheckConstraints --> SymbolicRegression
Complexity --> CheckConstraints
Complexity --> HallOfFame
Complexity --> LossFunctions
Complexity --> Mutate
Complexity --> Population
Complexity --> SearchUtils
Complexity --> SingleIteration
Complexity --> SymbolicRegression
ConstantOptimization --> Mutate
ConstantOptimization --> SingleIteration
Core --> AdaptiveParsimony
Core --> CheckConstraints
Core --> Complexity
Core --> ConstantOptimization
Core --> HallOfFame
Core --> InterfaceDynamicExpressions
Core --> LossFunctions
Core --> Migration
Core --> Mutate
Core --> MutationFunctions
Core --> PopMember
Core --> Population
Core --> Recorder
Core --> RegularizedEvolution
Core --> SearchUtils
Core --> SingleIteration
Core --> SymbolicRegression
Dataset --> Core
HallOfFame --> SearchUtils
HallOfFame --> SingleIteration
HallOfFame --> SymbolicRegression
InterfaceDynamicExpressions --> LossFunctions
InterfaceDynamicExpressions --> SymbolicRegression
LossFunctions --> ConstantOptimization
LossFunctions --> HallOfFame
LossFunctions --> Mutate
LossFunctions --> PopMember
LossFunctions --> Population
LossFunctions --> SymbolicRegression
Migration --> SymbolicRegression
Mutate --> RegularizedEvolution
MutationFunctions --> Mutate
MutationFunctions --> Population
MutationFunctions --> SymbolicRegression
Operators --> Core
Operators --> Options
Options --> Core
OptionsStruct --> Core
OptionsStruct --> Options
PopMember --> ConstantOptimization
PopMember --> HallOfFame
PopMember --> Migration
PopMember --> Mutate
PopMember --> Population
PopMember --> RegularizedEvolution
PopMember --> SingleIteration
PopMember --> SymbolicRegression
Population --> Migration
Population --> RegularizedEvolution
Population --> SearchUtils
Population --> SingleIteration
Population --> SymbolicRegression
ProgramConstants --> Core
ProgramConstants --> Dataset
ProgressBars --> SearchUtils
ProgressBars --> SymbolicRegression
Recorder --> Mutate
Recorder --> RegularizedEvolution
Recorder --> SingleIteration
Recorder --> SymbolicRegression
RegularizedEvolution --> SingleIteration
SearchUtils --> SymbolicRegression
SingleIteration --> SymbolicRegression
Utils --> CheckConstraints
Utils --> ConstantOptimization
Utils --> Options
Utils --> PopMember
Utils --> SingleIteration
Utils --> SymbolicRegression
Bash command to generate dependency structure from src
directory (requires vim-stream
):
echo 'stateDiagram-v2'
IFS=$'\n'
for f in *.jl; do
for line in $(cat $f | grep -e 'import \.\.' -e 'import \.'); do
echo $(echo $line | vims -s 'dwf:d$' -t '%s/^\.*//g' '%s/Module//g') $(basename "$f" .jl);
done;
done | vims -l 'f a--> ' | sort
See https://astroautomata.com/SymbolicRegression.jl/stable/api/#Options