fnoorian / gramevol Goto Github PK

View Code? Open in Web Editor NEW

27.0 27.0 12.0 1.36 MB

Grammatical Evolution for R

R 86.54% TeX 13.46%

gramevol's People

Contributors

Stargazers

Watchers

Forkers

paulhendricks danklotz pepsalehi foton263 rodriguesgiancarlo ramcqueary richardmcqueary vs-genn rishirelan marcabeer mytarmail

gramevol's Issues

infinite loop

I managed to reproduce the memory problem I had, the following snippet triggers an infinite loop (that ends up consuming all the available RAM):

library("gramEvol")

grammarDef <- CreateGrammar(list(
   expr  = grule(op(expr, expr), func(expr), var),
   func  = grule(sin, cos, log, sqrt),
   op    = grule(`+`, `-`, `*`),
   var   = grule(v, v^n, n, 1),
   n     = gvrule(2:4),
   v     = grule(x,y)
))

GrammarRandomExpression(grammarDef, 3)

I am using the version currently on Github.

how to add a poly and lag into list() when making a ruleDef?

If I do poly(x,n) in var = grule( )
I get Error in [...], :
non-conformable arrays

for lag: time-series/vector length mismatch

I have been looking for a Genetic Algorithm that uses integers for R and have found your package incredibly helpful.
When reviewing the code, I noticed that the ga.new.chromosome function introduces a bias away from the values of genomeMin and genomeMax.

This is the code in use:

ga.new.chromosome <- function(genomeLen, genomeMin, genomeMax, allowrepeat) {
  chromosome = round(runif(genomeLen) * (genomeMax - genomeMin) + genomeMin)
  
  if (!allowrepeat) {
    chromosome = ga.unique.maker(chromosome, genomeMin, genomeMax)
  }
  
  return (chromosome)
}

Due to usage of the round function, the probability of a random value becoming genomeMin or genomeMax is halved.
This effect can be visualized like so:

genomeLen = 400
genomeMin = 1
genomeMax = 18
hist(round(runif(genomeLen ) * (genomeMax - genomeMin ) + genomeMin ), xlim=c(0,20), breaks = c(-0.5:19.5), col="gray")

A potential solution is to use a floor function instead, while widening the range of possible numbers auch as

chromosome = floor(runif(genomeLen) * (genomeMax - (genomeMin - 1) ) + genomeMin)

In the same scenario as before we now get an even distribution.

genomeLen = 400
genomeMin = 1
genomeMax = 18
hist(floor(runif(genomeLen ) * (genomeMax - (genomeMin  - 1) ) + genomeMin ), xlim=c(0,20), breaks = c(-0.5:19.5), col="gray")

Note that "runif will not generate either of the extreme values unless max = min or max-min is small compared to min,
and in particular not for the default arguments." according to the documentation, so generating a value higher than genomeMax by chance will not occurr.

I am unsure how important this bias is generally and whether it may even be intended, but for the way I am applying this genetic algorithm I changed it and I thought I would bring it up to you.

Best wishes,
Matthias Becker

Invalid cost function return value (NA or NaN)

I am getting:

Error in EvolutionStrategy.int(genomeLen = chromosomeLen, codonMin = 0, :
Invalid cost function return value (NA or NaN).

So I adopted the cost function as follows:

SymRegFitFunc <- function(expr) {
result <- eval(expr)
if (any(is.nan(result)) || any(is.na(result)))
return(Inf)
return(mean(log(1 + abs(y - result))))
}

Still the same issue. Any idea? Thanks!

Examples of evolving higher order functions and functions with multiple arguments?

Hi! Very cool package! I had a question about whether it has support for higher order functions.

I often want to explore models using higher order functions like Box Cox transformations on parts of the feature space. So I write the higher order function bcox as below. Is it possible to encode these expressions into the grammar? e.g. so I could fit y ~ bcox(0.2)(x1) + bcox(-0.5)(x2) or more generally y ~ bcox(0.2)(x2 + 3*log(x1)) + bcox(-0.5 + exp(x1/5))(x2). I'm curious how I could represent that.

bcox = function(lambda, offset = 0) {
  function(x) {
    ((x + offset)^lambda - 1)/lambda
  }
}

how to change grammar (ruleDef) externally

I am ashamed to ask, but still I do not understand how I can change my grammar from the outside.

for example I have aruleDeflike this

library(gramEvol)
ruleDef <- list(expr = gsrule("<var><op><var>"),
                 op  = gsrule("+", "-"),
                 var = gsrule("A","B","C"))
grammarDef <- CreateGrammar(ruleDef)     
GrammarMap(c(0, 1, 1, 1), grammarDef)

result is
B - B

For example, I want to change the var from the outside

my_var <- c("A","B","C")
ruleDef <- list(expr = gsrule("<var><op><var>"),
                 op  = gsrule("+", "-"),
                 var = gvrule(my_var))  # My var
grammarDef <- CreateGrammar(ruleDef)     
GrammarMap(c(0, 1, 1, 1), grammarDef)

and my result is
"B" - "B"

How can I get the same result as in the first example? B - B

Is there a way to immediately get a character vector from a grammar

For example, I have a grammar like this

set.seed(123)
library(gramEvol)
ruleDef <- list(  
  RR = grule( c(R,R,R) ),
  R = grule( op(r,r)  ),
  r = grule( com( LAST , var) , com( var , var) ), 
  com = grule(">","<"),
  op = grule("&"),
  var = grule("A","B","C")  )

grammarDef <- CreateGrammar(ruleDef)
res <- GrammarRandomExpression(grammarDef)

> res
expression(c(LAST > "A" & LAST > "C", LAST < "B" & LAST < "B", "B" < "B" & LAST < "C"))

But I need a simple character vector with rules like this
с( "LAST > A & LAST > C" , "LAST < B & LAST < B" , "B < B & LAST < C" )

I can convert this to vector myself

res2 <- trimws(unlist(stringr::str_split(gsub( "[couple()\"]","",res),pattern = ",")))
> res2
[1] "LAST > A & LAST > C" "LAST < B & LAST < B" "B < B & LAST < C"

My question is, can I immediately get such a vector?
Without using res2

genomeMin and genomeMax are not upheld in GeneticAlg.int

Code to reproduce:

test_fit_function<-function(l){
  if (any(codonmin > l) | any(l > codonmax)) {
   print(l)
  }   

  fitness<-abs(sum(l) - 700)
  return(fitness)
}

codonmax = c(15, 400, 1000)
codonmin = c(5, 100, 500)
nchromosomes = 100
niter = 1000

enalgres<-GeneticAlg.int(genomeLen=3, genomeMin=codonmin, genomeMax=codonmax, popSize=nchromosomes, mutationChance=0.2, elitism=floor(0.1*nchromosomes), iterations=niter, evalFunc=test_fit_function, allowrepeat=TRUE)

how to freeze part of an expression for several other expressions

I want to implement indexing for multiple expressions like this
X[ (i-n):i ], X[ (i-n):i ]
expressions should look like this

X[ (5-7):5 ],    X[ (5-7):5 ]

X[ (15-2):15 ],    X[ (15-2):15 ]

Here is what I have at the moment

library(gramEvol)
ruleDef <- list(
  xx = grule(c(x,x)),
  x  = gsrule("X[<ii>]"),
  
  ii = gsrule("(<i>-<n>):<i>"),                        
  i  = gvrule(1:20),
  n  = gvrule(1:100)
)
grammarDef <- CreateGrammar(ruleDef)
GrammarRandomExpression(grammarDef)

expression(c(X[(1 - 78):19], X[(15 - 34):2]))

I suspect the solution is very simple, maybe I just need coffee, but I can't think of anything.

How to write this grammar correctly

HI!!, I wrote this grammar

library(gramEvol)
ruleDef <- list(
  id_blok = gsrule("<var>[i<op><id>]"),
  op = gsrule("+", "-"),
  id = gvrule(0:5),
  var = grule(op,hi,lo,cl)
)

grammarDef <- CreateGrammar(ruleDef)
GrammarRandomExpression(grammarDef, 2)

[[1]]
expression(lo[i + 5])

[[2]]
expression(hi[i - 2])

But if I run the code multiple times I get an error

GrammarRandomExpression(grammarDef, 2)

Error in parse(text = unescape.gt.lt(result.string)) : 
  <text>:1:2: unexpected '['
1: +[
     ^
```What am I doing wrong? Thanks!

How to get a sequence from an expression

library(gramEvol)
ruleDef <- list(expr =  gsrule("<var><op><var>"),
                op   =  gsrule("+", "-", "*"),
                var  =  grule(A, B, C))
grammarDef <- CreateGrammar(ruleDef)

I can get an expression from a sequence using the GrammarMap function

sq <- c(0, 0, 1, 2)
expr <- GrammarMap(sq, grammarDef, verbose = F)

print(expr)
A - C

How do I do the opposite, get a sequence using an expression. An expression is a regular string.
Have
expr <- "A - C"
I want to get seq
c( 0 0 1 2)

mutationChance argument in GeneticAlg.Int() function not working

It appears that the mutationChance argument in GeneticAlg.Int() does not work.

Borrowing the example code under the function's help information and modifying the mutationChance reveals this issue.

Code below for convenience:

evalfunc <- function(l) {
odd <- seq(1, 20, 2)
even <- seq(2, 20, 2)
err <- sum(l[even]) - sum(l[odd]);

stopifnot(!any(duplicated(l))) # no duplication allowed

return (err)
}

monitorFunc <- function(result) {
cat("Best of gen: ", min(result$best$cost), "\n")
}

x <- GeneticAlg.int(genomeLen = 20, codonMin = 0, codonMax = 20,
allowrepeat = FALSE, terminationCost = -110, mutationChance=0.25,
monitorFunc = monitorFunc, evalFunc = evalfunc, verbose=TRUE)

numExpr / multi-gene problem

Either I'm missing how it is supposed to work, or multi-expressions don't seem to work.

Specifically, if I set numExpr = 2, it sends 2 expressions to the cost function only about half the time, but often only 1.

start searching for the optimal solution from the expression I specified

I have a huge grammar and I would like to reduce the lookup by giving my initial expression.
for example

library(gramEvol)
ruleDef <- list(
exp = grule(op(var,var) , op(var,exp) , op(exp,exp)),
var = grule(a,b,c,e,f), 
op = grule("+","-","/","*"))

grammarDef <- CreateGrammar(ruleDef)
GrammarRandomExpression(grammarDef, 5)

[[1]]
expression(c/(b + e * (c - c)))

[[2]]
expression(f * f + (c + f/e) + (f + b))

[[3]]
expression(e + e * b)

For example - can I start searching with this expression?
my_start_expr <- "f * f + (c + f/e) + (f + b)"
Of course I don't know the gene of this expression, I can only express it as a string

The question is, can I run the search algorithm with my own expression string?

return a pair

Is it possible to use a pair of expressions as the root in the expression tree? Something like :

grammarDef <- CreateGrammar(list(
  pair  = grule(couple(expr, expr)),
  expr = grule(op(expr, expr), func(expr), var),
  func = grule(sin, cos, log, sqrt),
  op    = grule(`+`, `-`, `*`), # define unary operators
  var   = grule(distance, distance^n, n),
  n      = gvrule(1:4) # this is shorthand for grule(1,2,3,4)
))

I tried using a list(expr, expr) and c(expr,expr) but both caused a crash (script eating all the memory). I also tried using a couple(expr,expr,member) function (that returns the expression indiced by member) but the program refused it because it relies on a closure.

can i use continuous optimization in gramEvol

As I understand it, the package only uses discrete (binary) optimization with integer values ...

I have a question - would it be very wrong to use an algorithm with continuous optimization, and the output of the algorithm would simply be converted to an integer form

out_real_val <- runif(20,1,5)
> out_real_val
 [1] 4.258845 4.062256 4.318080 1.446091 1.696332 3.841191 2.379701 1.795424
 [9] 1.815174 3.017192 2.872621 4.239992 1.847689 2.986606 1.420673 3.995540
[17] 4.822753 2.358454 4.808910 3.152754

convert for GrammarMap

to_bin_val <- round(out_real_val,0)
> to_bin_val
 [1] 4 4 4 1 2 4 2 2 2 3 3 4 2 3 1 4 5 2 5 3

I wonder if I can do this or not, and if not, why?
thanks.

Installation from github?

I was trying to get "suggestions" to work, but was getting an error. Looks like you actually fixed that problem so I wanted to install the latest version from github using devtools, but that doesn't seem to work. Says it is not a proper archive:

devtools::install_github("fnoorian/gramEvol")
Downloading GitHub repo fnoorian/gramEvol@master
tar: This does not look like a tar archive

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
tar: This does not look like a tar archive

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Error in getrootdir(untar(src, list = TRUE)) :
length(file_list) > 0 is not TRUE
In addition: Warning messages:
1: In utils::untar(tarfile, ...) :
‘tar -xf '/tmp/RtmpDXZwtO/file1a724a1b5a3a.tar.gz' -C '/tmp/RtmpDXZwtO/remotes1a721b914521'’ returned error code 2
2: In system(cmd, intern = TRUE) :
running command 'tar -tf '/tmp/RtmpDXZwtO/file1a724a1b5a3a.tar.gz'' had status 2

`

GERule.Concat

Perhaps it makes sense to replace the function

GERule.Concat <- function(x,y) {
  lx = length(x)
  ly = length(y)
  for (i in 1:ly) {
    x[lx + i] = y[i]
  }
  x
}

to the usual
GERule.Concat <- function(x,y) c(x, y)
It will be much faster

How to define floating point numbers as expressions in grammar

How do I define floating point numbers as valid expressions in the grammar? Thank you

Is there an easy way to use third party optimization libraries?

I would like to try other optimization packages, is there an easy way to do this?

how to create a sequence of rules

Hi !
I need a help, I want to create a sequence of rules, but I don't know how. I created single rules and there is no problem with that
for example

ruleDef <- list(
                 expr = grule((expr) & (sub.expr),(expr) | (sub.expr),  sub.expr),
                   sub.expr = grule(comparison(var, var)),
                 comparison = grule(">=","<="),
                        var = grule(x1,x2,x3,x4))

grammarDef <- CreateGrammar(ruleDef)
GrammarRandomExpression(grammarDef,numExpr = 1,max.depth = 3)

result

expression(((x2 <= x1) & (x4 <= x2)) | (x2 >= x4))

But how i can get something like this from grammarDef

my.set.rules
      [,1]                                                                      [,2]      
 [1,] "x4 <= x1"                                                       "next rule"
 [2,] "x3 <= x3"                                                       "next rule"
 [3,] "((x4 <= x1) & (x4 >= x4)) & (x3 >= x4)"                    "next rule"
 [4,] "x4 <= x3"                                                       "abort"    
 [5,] "((x4 <= x1) & (x4 >= x4)) & (x1 >= x4)"                      "next rule"
 [6,] "x1 >= x1"                                                        "next rule"
 [7,] "((x1 <= x2) | (x4 <= x1)) & (x4 >= x4)"                         "next rule"
 [8,] "((x2 >= x2) | (x3 <= x2)) | (x4 >= x3)"                          "next rule"

mysterious error

After loading and running the package I constantly get the following message:

Error in isNamespaceLoaded(pkg) : 
  attempt to use zero-length variable name

Any ideas?

factors in formula

Thanks for creating this package. Just curious, do I have to create factors in formulas like this manually/explicitly using n in the grammar definition?

y = 10.3 + 2.3 * x

grammarDef <- CreateGrammar(list(
expr = grule(op(expr, expr), func(expr), var),
func = grule(exp),
op = grule(+, -, *),
var = grule(x, n),
n = gvrule(10.3, 2.3 )
))

Hope this makes sense? Or could I say pick factors in a range from [-10 ... 10]?

Precedence over simpler expression with the same precision

Is there a precedence for the shorter expression?
For example, you need to find the number "4"

this can be done in many ways, for example
1+1+1+1 or 3+1 or 2+2
2+2 is better than 1+1+1+1 because it's shorter.
Is there a built-in such search priority, if not, is it possible to do this?
Thank you!

Error in 1:possible.choices : argument of length 0

I can't run grammar, I get an error
Error in 1:possible.choices : argument of length 0

here is my grammar

library(gramEvol)
rules <- list(
              For = gsrule("for(i in 5:200) {  X[i,] <- <ex>  }"),
              
              ex  = gsrule("<x><op><x>"),
              x   = gsrule("p[i-<i>]","p[i]"),
              i   = gvrule(1:5),
              op  = gsrule("+","-","/","*")
              )
grammar <- CreateGrammar(rules)
gramEvol::GrammarRandomExpression(grammar)

the problem is in this line
For = gsrule("for(i in 5:200) { X[i,] <- <ex> }")
without this <- <ex> part everything works