luca-scr / ga Goto Github PK

View Code? Open in Web Editor NEW

90.0 90.0 29.0 43.14 MB

An R package for optimization using genetic algorithms

Home Page: http://luca-scr.github.io/GA/

R 51.18% C++ 38.19% C 3.15% CSS 7.48%

genetic-algorithm optimisation r

ga's People

Contributors

Stargazers

Watchers

ga's Issues

Trouble with sample() in 3.0.3

I get this error in 3.0.3, but not in 3.0.2:

Error in sample.int(length(x), size, replace, prob) :
too few positive probabilities

The code is below:

M = 100000;
costMatrix = as.matrix(rbind(
c( 0,12,10, M, M, M,12),
c(12, 0, 8,12, M, M, M),
c(10, 8, 0,11, 3, M, 9),
c( M,12,11, 0,11,10, M),
c( M, M, 3,11, 0, 6, 7),
c( M, M, M,10, 6, 0, 9),
c(12, M, 9, M, 7, 9, 0)));
numcities = 7;

given a tour, calculate the total cost

tourCost <- function(tour, costMatrix) {
tour <- c(tour, tour[1])
route <- embed(tour, 2)[, 2:1]
sum(costMatrix[route])
}

inverse of the total distance is the fitness

tspFitness <- function(tour, ...) 1/tourCost(tour, ...)
require(GA)
result <- ga(type = "permutation", fitness = tspFitness, costMatrix = costMatrix, min = 1,
max = numcities, popSize = 10, maxiter = 500, run = 100, pmutation = 0.2
, monitor = NULL)
soln <- as.vector(result@solution[1,]) # use first soln
tourCost(soln,costMatrix)
tour <- c(soln,result@solution[1]);
tour # approx best tour

Discrete input parameters

What changes are to be made so that the input parameters are considered to be discrete/integer values.

Making high penalties in the fitness function is not the right option as it becomes a part of the iterative process.
skipping the non-integer values is again a part of iterative process.

Something needs to be changed in:
ga(type = c("binary", "real-valued", "permutation"),
min, max, nBits, fitness, ..., )
to make the algo take in only integer/ discrete parameters. I have four discrete params.

I cant figure it out :(
Thanks..

gaperm_oxCrossover_Rcpp(object, parents) : index error on apple M1

Dear Luca;

I am happy that you provide the GA package.

CRAN reports an index error for my recmap package using GA. Please see:
https://www.stats.ox.ac.uk/pub/bdr/M1mac/recmap.out

I also ran into the same error executing the following code snippet
r recmap::recmapGA(recmap::checkerboard(4))
on my brand-new apple M1 device.

0 cp@leda:~ % R                                                                                                               21-05-28|9:56:04

R version 4.1.0 (2021-05-18) -- "Camp Pontanezen"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: aarch64-apple-darwin20 (64-bit)

R ist freie Software und kommt OHNE JEGLICHE GARANTIE.
Sie sind eingeladen, es unter bestimmten Bedingungen weiter zu verbreiten.
Tippen Sie 'license()' or 'licence()' für Details dazu.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Tippen Sie 'demo()' für einige Demos, 'help()' für on-line Hilfe, oder
'help.start()' für eine HTML Browserschnittstelle zur Hilfe.
Tippen Sie 'q()', um R zu verlassen.

R> recmap::recmapGA(recmap::checkerboard(4))
GA | iter = 1 | Mean = 0.3971563 | Best = 0.9981481
Fehler in gaperm_oxCrossover_Rcpp(object, parents) : index error
Zusätzlich: Es gab 28 Warnungen (Anzeige mit warnings())
R>

R> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Big Sur 11.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib

locale:
[1] C/UTF-8/C/C/C/C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] recmap_1.0.9     sp_1.4-5         Rcpp_1.0.6      
[4] GA_3.2.1         iterators_1.0.13 foreach_1.5.1   

loaded via a namespace (and not attached):
[1] compiler_4.1.0   cli_2.5.0        crayon_1.4.1    
[4] codetools_0.2-18 grid_4.1.0       lattice_0.20-44 
R>

At the moment it is not urgent but it would be nice to investigate the reason.

Thanks and best wishes,

Christian

Miscellaneous Plotting Errors with the "GA" Library in R (Genetic Algorithm)

I am working with R. I am following this tutorial (https://cran.r-project.org/web/packages/GA/vignettes/GA.html) and am learning how to optimize functions using the "genetic algorithm".

The entire process is illustrated in the code below:

Part 1: Generate some sample data ("train_data")

Part 2: Define the "fitness function" : the objective of my problem is to generate 7 random numbers :

"random_1" (between 80 and 120)
"random_2" (between "random_1" and 120)
"random_3" (between 85 and 120)
"random_4" (between random_2 and 120)
"split_1" (between 0 and 1)
"split_2" (between 0 and 1)
"split_3" (between 0 and 1 )

and use these numbers to perform a series of data manipulation procedures on the train data. At the end of these data manipulation procedures, a "total" mean variable is calculated.

Part 3: The purpose of the "genetic algorithm" is to find the set of these 7 numbers that produce the largest value of the "total".

Below, I illustrate this entire process :

Part 1

#load libraries
library(dplyr)
library(GA)

# create some data for this example
a1 = rnorm(1000,100,10)
b1 = rnorm(1000,100,5)
c1 = sample.int(1000, 1000, replace = TRUE)
train_data = data.frame(a1,b1,c1)

Part 2

#define fitness function
fitness <- function(random_1, random_2, random_3, random_4, split_1, split_2, split_3) {

    #bin data according to random criteria
    train_data <- train_data %>% mutate(cat = ifelse(a1 <= random_1 & b1 <= random_3, "a", ifelse(a1 <= random_2 & b1 <= random_4, "b", "c")))
    
    train_data$cat = as.factor(train_data$cat)
    
    #new splits
    a_table = train_data %>%
        filter(cat == "a") %>%
        select(a1, b1, c1, cat)
    
    b_table = train_data %>%
        filter(cat == "b") %>%
        select(a1, b1, c1, cat)
    
    c_table = train_data %>%
        filter(cat == "c") %>%
        select(a1, b1, c1, cat)
    
    
    
    #calculate  quantile ("quant") for each bin
    
    table_a = data.frame(a_table%>% group_by(cat) %>%
                             mutate(quant = quantile(c1, prob = split_1)))
    
    table_b = data.frame(b_table%>% group_by(cat) %>%
                             mutate(quant = quantile(c1, prob = split_2)))
    
    table_c = data.frame(c_table%>% group_by(cat) %>%
                             mutate(quant = quantile(c1, prob = split_3)))
    
    
    
    
    #create a new variable ("diff") that measures if the quantile is bigger tha the value of "c1"
    table_a$diff = ifelse(table_a$quant > table_a$c1,1,0)
    table_b$diff = ifelse(table_b$quant > table_b$c1,1,0)
    table_c$diff = ifelse(table_c$quant > table_c$c1,1,0)
    
    #group all tables
    
    final_table = rbind(table_a, table_b, table_c)
# calculate the total mean : this is what needs to be optimized
    mean = mean(final_table$diff)
    
    
}

Part 3


#run the genetic algorithm (20 times to keep it short):
GA <- ga(type = "real-valued", 
         fitness = function(x)  fitness(x[1], x[2], x[3], x[4], x[5], x[6], x[7]),
         lower = c(80, 80, 80, 80, 0,0,0), upper = c(120, 120, 120, 120, 1,1,1), 
         popSize = 50, maxiter = 20, run = 20)

The above code (Part 1, Part 2, Part 3) all work fine.

Problem: Now, I am trying to produce some the of the visual plots from the tutorial:

First Plot - This Works:

plot(GA)

But I can't seem to produce the other plots from the tutorial:

Second Plot: Does Not Work

lbound <- 80
ubound <- 120

curve(fitness, from = lbound, to = ubound, n = 1000)
points(GA@solution, GA@fitnessValue, col = 2, pch = 19)

 Error: Problem with `mutate()` column `cat`.
i `cat = ifelse(...)`.
x argument "random_3" is missing, with no default
Run `rlang::last_error()` to see where the error occurred. 

Error in xy.coords(x, y) : 'x' and 'y' lengths differ

Third Plot : Does Not Work

random_1 <- random_2 <- seq(80, 120, by = 0.1)
f <- outer(x1, x2, fitness)
persp3D(x1, x2, fitness, theta = 50, phi = 20, col.palette = bl2gr.colors)

Error: Problem with `mutate()` column `cat`.
i `cat = ifelse(...)`.
x argument "random_3" is missing, with no default
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
 Error: Problem with `mutate()` column `cat`.
i `cat = ifelse(...)`.
x argument "random_3" is missing, with no default
Run `rlang::last_error()` to see where the error occurred. 

Error in z[-1, -1] : object of type 'closure' is not subsettable

Fourth Plot: Does Not Work

filled.contour(random_1, random_2, fitness, color.palette = bl2gr.colors)

Error in min(x, na.rm = na.rm) : invalid 'type' (list) of argument

Can someone please show me how to fix these errors?

Thanks

Multi objective optimization using the GA Library

Hello,

Currently, is the GA library capable of performing multi objective optimization? For example, something like this?

https://stackoverflow.com/questions/56359838/argument-passing-in-r-to-functions-of-several-real-variables

Thanks

GA crashes

Hi Luca,

First of all, thank you for sharing and maintaining this very useful package.
Second, recently I started experiencing problems with it.
It used to work fine but then I changed my data.
Now, after a number of iterations, (see this file ga_monitor__M130_m18q1_2.pdf ), I get this message:

Error in { : task 31 failed - "replacement has length zero"
Calls: ga -> %DO% -> <Anonymous>
Execution halted

I am using GA version 3.0.2, on R version 3.4.1 (2017-06-30) -- "Single Candle"
running it on 15 cores of a computing cluster with platform: x86_64-pc-linux-gnu (64-bit)

The call in my program is

pdf(file = sprintf("ga_monitor_%s.pdf", filesuff))
assortmentOpt = ga(type="binary", fitness=expected_profit, 
                   assortmentmap=assortmentmap, N0l=N, M0l=M, Nl=Nfocal, Ml=Mfocal, Kl=K, Dl=D, 
                   maxnKl=maxnK, cindl=cindfocal, tauVl=tauVfocal, thetaVl=thetaVfocal, 
                   Xl=Xfocal, marginsVl=focaldata$margins, suggestions=initialConds,
                   nBits = nrow(assortmentbase), names=assortmentbase$key, pmutation = 0.25,
		     pcrossover = 0.75, seed = 1234,
                   maxiter = maxOptiter, parallel=runparallel, monitor=plot) 
dev.off()

the fitness function is

# compute profit for the entire chain
expected_profit = function(assortment, assortmentmap,
                           N0l, M0l, Nl, Ml, Kl, Dl, maxnKl, cindl, 
                           tauVl, thetaVl, Xl, marginsVl ) {

  # compute profit specific to a draw in the chain of estimates
  draw_profit = function(Nl, Ml, Kl, nKl, cindl, aindl, taul, thetal, Xl, marginsl ) {
    
    expu = matrix(NA, nrow = Nl, ncol = Kl)
    expuMargin = matrix(NA, nrow = Nl, ncol = Kl)
    # compute utilities for available alternatives
    for (n in 1:Nl){
      for (k in 1:nKl[n]){
        expu[n,k] <- exp( taul[cindl[n],aindl[n,k]] + 
                            t(Xl[n,aindl[n,k],]) %*% thetal[cindl[n],] + logiterrors[n,k])
        expuMargin[n,k] = expu[n,k] * marginsl[n,aindl[n,k]]
        # if price coeff is negative, omit observation
        if (thetal[cindl[n],1]<0) expuMargin[n,k]=0 
      }
    }
    expectedProfit = sum(apply(expuMargin,1,sum, na.rm=TRUE) / apply(expu,1,sum, na.rm=TRUE))
    return(expectedProfit)
  }
  
  ######################################
  ## RECONSTRUCT DATA STRUCTURES
  ######################################

  # expand gene to span all transactions, rows are observatinos, columns are alternatives
  tmp = matrix(assortment[assortmentmap], ncol=Kl, nrow=Nl, byrow=TRUE)
  maxnKl = max(apply(tmp,1,function(x) sum(!is.na(x))))
  aindl = matrix(0, ncol=maxnKl, nrow=Nl)
  nKl = numeric(Nl)
  if (sum(tmp==0)>0) {
    # contruct list of available alternatives for each observation
    for (n in 1:Nl) {
      tmp2 = which(tmp[n,]==1)
      aindl[n, 1:length(tmp2)] = tmp2
      nKl[n] = length(tmp2)
    }
  } else {
    aindl = matrix(rep(1:Kl,Nl), ncol = Kl, nrow=Nl, byrow=TRUE)
    nKl = rep(K,Nl)
  }  
 
  marginsl = array(NA,c(Nl,Kl))
  for (n in 1:Nl){
    indices = ((n-1)*Kl+1):(n*Kl)
    marginsl[n,] = marginsVl[indices]
  }
  
  ######################################
  ## COMPUTE PROFITS
  ######################################
  profit = numeric(usedraws)
  for (draw in 1:usedraws){
    thetal = matrix(as.numeric(thetaVl[draw,]), nrow = Ml, ncol=Dl, byrow=FALSE)
    taul= matrix(as.numeric(tauVl[draw,]), nrow=Ml, ncol=Kl,  byrow=FALSE)
    profit[draw] = draw_profit(Nl, Ml, Kl, nKl, cindl, aindl, taul, thetal, Xl, marginsl )
  }                           
  
  tmp = sum(profit, na.rm=TRUE)
  if (length(tmp)==0 | is.na(tmp) | is.nan(tmp) | is.infinite(tmp))
    return(0)
  else
    return(tmp)  
}

So I am avoiding invalid values for the fitness function.

If you could offer any insight on what may be happening, I would greatly appreciate it.

Thanks in advance

Rafael

Can not find function "parNames"

I want to add progress bar in GA::ga function, so I paste the ga source code into my R script and modify it,
but it shows error " can not find function parNames", and I cant find the source code of function "parNames".

Saving the population from each generation

Hello,

I'm wondering if there is a way to record the population (and their associated fitness) from each generation? As far as I can tell, one can only access the final population in the output of ga. I am specifically interested in this in order to visualize the parameter space.

I have managed a way to do this with a custom monitoring function, but it requires that I write the population to a text file in the global environment. I think this is a big no-no when making a function (at least when the goal is to include the function in another package).

I was wondering if you know if any other more elegant ways of doing this. I thought of memoisation, but have been unable to access the cache.

Thanks in advance for your help. Below is an example of what I'm trying to do:

Example

library(GA)

Rastrigin <- function(x1, x2)
{
  20 + x1^2 + x2^2 - 10*(cos(2*pi*x1) + cos(2*pi*x2))
}

gaMonitor2a <- function (object, digits = getOption("digits"), ...){
  pop <- as.data.frame(object@population)
  names(pop) <- paste0("par", seq(ncol(pop)))
  pop$fitness <- object@fitness
  pop$iter <- object@iter

  if(object@iter == 1){
    write.table(x = pop, file = "gaMonitorObj.csv", sep = ",", append = FALSE, row.names = FALSE, col.names = TRUE)
  } else {
    write.table(x = pop, file = "gaMonitorObj.csv", sep = ",", append = TRUE, row.names = FALSE, col.names = FALSE)
  }

  fitness <- na.exclude(object@fitness)
  sumryStat <- c(mean(fitness), max(fitness))
  sumryStat <- format(sumryStat, digits = digits)
  cat(paste("GA | iter =", object@iter, "| Mean =",
      sumryStat[1], "| Best =", sumryStat[2]))
  cat("\n")
  flush.console()
}

GA2a <- ga(type = "real-valued",
  fitness =  function(x) -Rastrigin(x[1], x[2]),
  lower = c(-5.12, -5.12), upper = c(5.12, 5.12),
  popSize = 50, maxiter = 100,
  optim = TRUE, seed = 1,
  monitor = gaMonitor2a ### custom monitor function
)
summary(GA2a)

# load and plot
pop <- read.csv(file = "gaMonitorObj.csv")

n <- 60
X <- akima::interp(x = pop$par1, y = pop$par2, z = pop$fitness, duplicate = TRUE, nx = n, ny = n)
pal <- colorRampPalette(c("#352A87", "#3439A8", "#214DC8", "#0E5FDB", "#056EDE", "#0F79D9",
  "#1283D4", "#0D8FD1", "#089BCE", "#06A5C7", "#0BACBC", "#1CB1AE",
  "#33B7A0", "#4EBB91", "#6DBE81", "#8ABE75", "#A3BD6A", "#BBBC60",
  "#D1BA58", "#E6B94E", "#F9BD3F", "#FBC831", "#F8D626", "#F5E71A",
  "#F9FB0E"))
image(X, col = pal(100))
points(par2 ~ par1, pop, pch = ".", col = adjustcolor(1,0.2))

Upper value must be greater than lower value - Error

Hi,

First of all, thank you for the package. I am trying to learn how to use it properly and perhaps my doubt is a simple one.

Considering the code below:

# https://rpubs.com/karthy1988/TSP_GA

library(GA)
data("eurodist", package = "datasets")
D <- as.matrix(eurodist)

#Function to calculate tour length

tourLength <- function(tour, distMatrix) {
  tour <- c(tour, tour[1])
  route <- embed(tour, 2)[,2:1]
  sum(distMatrix[route])
}

#Firness function to be maximized

tspFitness <- function(tour, ...) 1 / tourLength(tour, ...)

GA <- ga(type = "permutation", fitness = tspFitness, distMatrix = D,
         lower = 1, upper = attr(eurodist, "Size"), popSize = 50, maxiter = 5000,
         run = 500, pmutation = 0.2)

show(summary(GA))

It runs smoothly. But if I would change the upper parameter lower than 4, I get an error.


> GA <- ga(type = "permutation", fitness = tspFitness, distMatrix = D,
+          lower = 1, upper = 3, popSize = 50, maxiter = 5000,
+          run = 500, pmutation = 0.2)
GA | iter = 1 | Mean = 0.0001316829 | Best = 0.0001316829
Error in gaperm_oxCrossover_Rcpp(object, parents) : 
  Sample size must be <= n when not using replacement!

But my problem is happening with a slight modified version of that program.

tourLength <- function(tour, distMatrix) {
  tour <- c(1, tour + 1) # I would like the tour to start at site 1 and not return to it at the end
  route <- embed(tour, 2)[, 2:1]
  sum(distMatrix[route])
}

GA <- ga(type = "permutation", fitness = tspFitness, distMatrix = D,
         lower = 1, upper = 2, popSize = 50, maxiter = 5000,
         run = 500, pmutation = 0.2)

> GA <- ga(type = "permutation", fitness = tspFitness, distMatrix = D,
+          lower = 1, upper = 2, popSize = 50, maxiter = 5000,
+          run = 500, pmutation = 0.2)
GA | iter = 1 | Mean = 0.0002247632 | Best = 0.0002335903
Error in gaperm_oxCrossover_Rcpp(object, parents) : 
  upper value must be greater than lower value

tourLength <- function(tour, distMatrix) {
  tour <- c(1, tour) # I would like the tour to start at site 1 and not return to it at the end
  route <- embed(tour, 2)[, 2:1]
  sum(distMatrix[route])
}

GA <- ga(type = "permutation", fitness = tspFitness, distMatrix = D,
         lower = 2, upper = 3, popSize = 50, maxiter = 5000,
         run = 500, pmutation = 0.2)

> GA <- ga(type = "permutation", fitness = tspFitness, distMatrix = D,
+          lower = 2, upper = 3, popSize = 50, maxiter = 5000,
+          run = 500, pmutation = 0.2)
GA | iter = 1 | Mean = 0.0002244101 | Best = 0.0002335903
Error in gaperm_oxCrossover_Rcpp(object, parents) : 
  upper value must be greater than lower value

But both codes work if the difference between lower and upper bounds are higher than 2. If the difference is 1, the error mentioned above. If the difference is 2, the error is:

> GA2 <- ga(type = "permutation", fitness = tspFitness, distMatrix = D,
+          lower = 2, upper = 4, popSize = 50, maxiter = 5000,
+          run = 500, pmutation = 0.2)
GA | iter = 1 | Mean = 0.0002015671 | Best = 0.0002225684
Error in gaperm_oxCrossover_Rcpp(object, parents) : 
  Sample size must be <= n when not using replacement!

I would appreciate any help you could provide.

Best Regards, R

NaN in vars in fitness function

I don't know why i get NaN inside vars in fitness function
Browse[1]> vars [1] 13.830 46.040 15.120 5.900 15.090 4.020 0.000 0.000 1289.053 234.117 NaN 999.000 NaN NaN 1050.661

GA <- ga(type = "real-valued", fitness = function(vars){
   browser()
  input_test[3:17]=vars
  -predict(object = model_rf_gros,newdata = input_test)
},
  crossover=gabin_uCrossover,
  lower =as.numeric( grid_input_test[1,]),
  upper =as.numeric( grid_input_test[nrow(grid_input_test),]),
  optim = F,
  pmutation = 0.5, # mutation rate prob
  popSize = nrow(grid_input_test) # the number of indivduals/solutions
 )

i used browser to debug the function
any suggestions will be appreciated

Variant Permutation Optimisation

Hi Luca,

Firstly, thanks for your GA package: I have used it a number of times in the past for optimising functions of real values and got great results.

I am presently working with an optimisation problem which looks like this: maximise a fitness metric which accepts a vector of 10 integer values. Those integers can vary between 1 and 4. So, for example, these are valid inputs to the fitness function:

1 1 1 1 1 1 1 1 1 1
1 2 3 4 1 2 3 4 1 2
3 2 4 1 2 3 1 2 4 2

Now I thought that I could handle this using GA() by specifying type = "permutation" and setting

min = rep(1, 10)
max = rep(4, 10)

However this doesn't work. I have dug into your source code and seen that for permutation problems you set

min <- as.vector(min)[1]
max <- as.vector(max)[1]

This effectively removes replication from the values of min and max, with the result that my fitness function is only called with a permutation of the integers 1, 2, 3 and 4.

Obviously this doesn't work! :(

Do you have any ideas for how I could pose this problem in a suitable way to be handled by GA? I'm sure that genetic algorithms are the best approach to this problem, which can be stated in words as "find the sequence of ten integers between 1 and 4 which maximises this function...".

Any suggestions would be appreciated!

Best regards,
Andrew.

Multi-objective optimization support

I love how GA package structure the workflow, which is nice and neat. I am trying to adopt GA package in my resent research work. My work mainly focuses on optimizing architecture design options, which involve multi-object optimization (MOO). Currently, does GA support MOO? If not, are there any plans for support it? Thanks!

Argument names doesn't seem to work

Consider the following example:

library( GA )

Rastrigin <- function(x1, x2)
{
  print(str(x1))
  print(str(x2))
  20 + x1^2 + x2^2 - 10*(cos(2*pi*x1) + cos(2*pi*x2))
}

GA <- ga(type = "real-valued", 
         fitness =  function(x) -Rastrigin(x[1], x[2]),
         lower = c(-5.12, -5.12), upper = c(5.12, 5.12),
         names = c( "a", "b" ),
         popSize = 50, maxiter = 100,
         optim = TRUE)

I.e. x1 and x2 doesn't seem to be named, despite the option present.

Suggestions override min and max

When suggestions us used in ga, the results of ga can be outside the bounds set in min and max.
This could be considered a bug, as many users expect the results to be within the set boundaries.

After looking at the code, it is clear that when a suggestion is made, it is included in the initial population, and when a suggestion outside the bounds has high value, that value can be "inherited" by future generations.

Considering that this may be expected behavior, we could produce a warning instead of forcing the results to be within the min/max bounds.

Duplicated Sols in @bestSol

Using ga function, as:

GA <- ga(...)

Sometimes duplicated best solutions exist in GA@bestSol list.

> GA@bestSol
[[1]]
      [,1]     [,2]
[1,]   1.0     51.0

[[2]]
      [,1]     [,2]
[1,]   1.2     51.2
      [,1]     [,2]
[2,]   1.2     51.2

GA Version 3.2 for Linux: gaIslands.R line 127

Hello,
I am trying to use 8 islands with the gaisl.
It is not clear if we need to replicate the initial soln row according to the number of islands
Maybe 8 islands need 8 rows?

I created a "suggestion" (initial solution) with the right number of columns (upper and lower interval) and
8 rows (per each island?)
Rstudio gives error:

Error in suggestions[seq(popSize), , drop = FALSE] :
subscript out of bounds

Why the line 127 in gIslands.R is like the following:
suggestions <- suggestions[seq(popSize),,drop=FALSE]

Maybe there is a bug?
How can we use the suggestion for let say n islands?

Thanks in advance...

Non-uniform distribution over Initial Population

I have a situation where I know from context I should start my initial population close to a certain point. So using the DEoptim package I can specify a truncated normal distribution instead of uniform for my initial population. Unfortunately, the problem is a constrained opt problem. Your package can handle this better. But i cant find a clear example to specify the initial population I desire.

My function has over 200 variables and so can't afford to waste time away from my guess.

So is there a way to specify an initial population?

I tried by accessing the gaControl() but I don't understand how to modify it.

Many thanks,

Confusing documentation of ga solution object

The documentation of the solution field of the object returned by a call to ga in GA_3.2 says "solution the value(s) of the decision variables giving the best fitness at the final iteration." But the solution object returned by ga seems to be a matrix, e.g.: "num [1:5, 1:5000]". The 2nd dimension of 5000 is the number of decision variables, but I don't understand what the first dimension of 5 means. Does the returned matrix contain several possible solutions?

Is the documentation in error or am I just confused?

Thanks.

Does type "binary" feed the fitness function with gray encoded individuals?

Hi, nice package by the way.

Why in your Binary search solution of your A quick tour of GA document, you have to used the function gray2binary on the "x" argument before moving on (as I understand it, this argument represents the individuals). It looks like the GA function gives gray encoding individuals to the fitness function when the GA "binary" type is used instead of just binary encoded individuals.
Thanks

Allow Variables of Mixed Type

Hey,

I'd like to maximise or minimise over a search space where some variables are real-valued, some are binary and some are integers.

Thanks!

Is the default parallel execution prescheduled?

Greetings, I appreciate the work you have contributed to this package. The execution time of my fitness function varies greatly depending on the input values, and for parallel execution I think it would make the most sense to not preschedule the work done by my cluster. Is ga() prescheduled by default? If so, how would I go about forking work in a unscheduled way?

My understanding is that the preschedule=FALSE argument is supplied to foreach, but I'm not sure how to pass this to ga().

additional arguments not passed to the fitness function in de

Should the additional arguments be passed in the call to "ga", currently they are not passed?

object <- do.call("ga", c(args, list(fitness = fitness,
lower = lower, upper = upper, popSize = popSize)))

Error in documentation of binary2decimal function

The documentation for the binary2decimal function in GA_3.2 says "binary2binary converts a binary value, i.e. a vector of 0s and 1s, to a decimal representation." "binary2binary" should be "binary2decimal".

keepBest problem

Thank you for developing such a useful package. Your package is helping me a lot with my current project.

There seems to be an issue with the keepBest option.

After iteration 1, the bestSol list is all NULL.
After the last iteration k, the bestSol list is populated by k-1 elements.

I think that many will agree the expected behavior is that after iteration 1, the bestSol list contains 1 element and after iteration k, the bestSol list contains k elements.

It would be great if this could be updated accordingly. Thank you!

GA package reluctant to import data for dimensions containing interval in their title

Hi,

When trying to import the following data in R, R returns a table with the Handle User column missing.

Sessions.by.user.by.month <- ga$getData(id, batch=TRUE, metrics = "ga:Sessions", dimensions = "ga: Handle User, ga:Date")

Thank you in advance.
Tsvetan

Cannot Quiet GA run

Hello,
This is a rather minor problem but I am unable to quiet the output from the GA function. It does not respond to the invisible() command or implementation of the quiet=TRUE parameter. Any idea why this is occurring?
Thanks,
Sam

[Help wanted] Use GA with integer

Hello,
Thanks for your package, I'm managed to use it efficiently with the type "real-valued" but I cannot find how to use it with the behavior of the "real-valued" type but with a population of integer (I have a one-variable problem that does not require floating precision).
When I try the type "permutation" I have this error :

Error in sample.int(length(x), size, replace, prob) : 
  cannot take a sample larger than the population when 'replace = FALSE'

And trying to debug it, I found that, in type "permutation", it generate a matrix of popSize * (max - min) and that is why the error is triggered; but I just need a 1D integer population of size popSize.

I search for a way to generate an integer population instead of rounding the variable inside the fitness function, in order to have the best performance.
My ultimate goal is to have a step variable in my function in order to generate fewer individuals.

For instance, my function search for intervals by finding a cut off, and it generate a bush of solutions greater than 53 like : 53.04, 53.69, 53.14, ... because it found the correct interval that is [54, ... ], So instead of computing all this redundant solutions, I wanted it to generate only integer solutions so it will be quicker and more accurate.

I hope I was clear and thanks in advance for your help.

How to debug GA function from GA package?

I ran GA function from R GA package for securties portfolio optimization. Its correct for shares and bonds separatelly. Id like to combine s.&b. in one porfolio. Thus, i combined the both datasets in one. But got error about "as.vector(nBits)" , but i have no such expression in my code. How to understand, which my GA parameter is wrong?
Error in as.vector(nBits) : argument "nBits" is missing, with no default

I posted Q. here too.
My issue not bug report, but help request

How to generate population from pre-defined individuals?

Hi!

Suppose I have 100 solutions and have selected the five best ones. How can I generate 50 new individuals from the five best ones with pcrossover = 0.8 and pmutation = 0.1?

Thank you very much!

memoise(fitness) decreased the speed

Hello, Luca
i tryed to use memoise(fitness) and got unsuccessfull result - GA became slower. How is it possible?
Here is result:
test replications elapsed relative average
1 ga_res 1 42.75 1.000 42.75
2 ga_res_memoised 1 49.08 1.148 49.08
2 rows
My GA has calculating optimised portfolio. The piece of cofe:
`

library(memoise)
fitness <- ga_params$ga_param_fitness
mfitness <- memoise(fitness)
is.memoised(fitness)
## [1] FALSE
is.memoised(mfitness)
## [1] TRUE
library(rbenchmark)

  library("GA")
tab <- benchmark(
  ga_res = ga(
    type	 = ga_params$ga_param_type,
    fitness 	 = fitness,
    lower 	 = ga_params$ga_param_lower ,
    upper 	 = ga_params$ga_param_upper ,
    popSize 	 = ga_params$ga_param_popSize ,
    maxiter 	 = ga_params$ga_param_maxiter ,
    run	 = ga_params$ga_param_run,
    names 	 = ga_params$ga_param_names ,
    suggestions 	 = ga_params$ga_param_suggestions ,
    fitness_data = fitness_data,
    #keepBest=TRUE,
    parallel=TRUE,
    monitor=FALSE,
    seed=1,
    pcrossover = 0.05
    #,optim = TRUE
  ),
  ga_res_memoised = ga(
    type	 = ga_params$ga_param_type,
    fitness 	 = mfitness,
    lower 	 = ga_params$ga_param_lower ,
    upper 	 = ga_params$ga_param_upper ,
    popSize 	 = ga_params$ga_param_popSize ,
    maxiter 	 = ga_params$ga_param_maxiter ,
    run	 = ga_params$ga_param_run,
    names 	 = ga_params$ga_param_names ,
    suggestions 	 = ga_params$ga_param_suggestions ,
    fitness_data = fitness_data,
    #keepBest=TRUE,
    parallel=TRUE,
    monitor=FALSE,
    seed=1,
    pcrossover = 0.05
    #,optim = TRUE
  ),  
columns = c("test", "replications", "elapsed", "relative"), 
replications = 1
)
tab$average <- with(tab, elapsed/replications)
tab
forget(mfitness)

`

GA giving out an error after few iterations

Hi,

I'm trying to run GA with a custom fitness function. I'm working with a company and hence cannot give you a reproducible example but I wanted to know when and why does the following error occur ?

Error in ga_nlrSelection_Rcpp(object, q) : Probabilities must be finite and non-negative!

Thanks and Regards
Krishna Chaitanya Bandi

Ability to renew learning

Hi Luca!

Ha is a super library!
Thank you very much!
I get a fan of its use :)
For example, like this :)
https://www.facebook.com/groups/rtalks/permalink/769814930347533/

I think:
It would be nice if it was possible to set the initial data to launch the algorithm.

Always when we start GA he starts to sort out options with random values.

However, if you stay on some iteration, and it would be able to start the GA from the result that has already accelerated the solution to find a solution.

Here is my idea :)

Thanks again for the magnificent library!

In runif(object@popSize, min[j], max[j]) : NAs produced

Under certain conditions, the population will be filled with NA values instead of truncated.

I think it should just drop the remainder of the population that it can't generate a sampling of;
so if pipSize is set to 100, and runif(object@popSize, min[j], max[j]) can only produce a sampling of 50, it should drop the remaining 50 with a warning.

Summary of number of calls to fitness function

This is a suggestion to include in the summary how many times the fitness function was called. This is important when comparing experiments with different parameters.

By looking at the code it seems that currently, this information is not stored anywhere. Maybe have a counter and increment it somewhere between lines 225-242 of ga.R ?

Is there a workaround? Was thinking of opening a file inside the fitness function and write a counter but would be very inneficient and if using parallel > 1 some synch issues could arise.

Use GA for genetic programming

Is it possible to use GA for genetic programming?

decrease the accuracy of genome of real-valued GA

Hello!
i have calculating investing portfolio with GA. And have one problem - GA has give weights with too high accuracy. For example:
> ga_results$optimised_porfolio sol_weights [1,] "TQOB@SU26210RMFS3" "0.625122874975204" [2,] "TQOB@SU26215RMFS2" "0.375124052166939" [3,] "TQBR@AFKS" "0.00508862733840942" [4,] "TQBR@DSKY" "0.000841811299324036"
How to decrease the accuracy of genome of real-valued GA by increasing the mutation step?
I need get such as:
> ga_results$optimised_porfolio sol_weights [1,] "TQOB@SU26210RMFS3" "0.62" [2,] "TQOB@SU26215RMFS2" "0.36" [3,] "TQBR@AFKS" "0.01" [4,] "TQBR@DSKY" "0.01"

print current solution in monitor function

hello!
i`d liketo debug my ga. cancerning that, i tryed to print current solution in monitor function

monitor <- function(obj) 
{ 
  #gaMonitor(obj)
  browser()
  print(c("iteration =", obj@iter , "solution =" , obj@solution))
}

but obj@solution is empty

Called from: monitor(object)
Browse[1]> obj@solution
<0 x 0 matrix>

Call:
ga(type = "real-valued", fitness = fitness, lower = rep(0, nrow(UsingInstr)), upper = rep(1, nrow(UsingInstr)), pcrossover = 0.05, maxiter = 50000, run = 100, names = files, suggestions = rep(1/nrow(UsingInstr), nrow(UsingInstr)), parallel = TRUE, monitor = monitor, seed = 1)
How to print solution in iteration?

Choosing pairs for crossover from selection

How are the selected inidviduals paired together for crossover after selection? Is it choosing each pair randomly from selected ones without replacement?

Position based crossover for permutation type problems not working as expected. Possible bug found.

Hi, while using the package and trying to understand the inner workings of each genetic operator used in my application, I found that the function gaperm_pbxCrossover_R doesn't seem to perform any crossover operation due to an indexing error in the for loop.

While I'm not allowed to share an example with the actual data I'm working on, I wrote a simple reproducible example of the issue I found. The only modification I propose is the change of the index -j to j in the indexing of the for loop of the function gaperm_pbxCrossover_R.

You can reproduce the following example in R to take look at the problem if you wish.

speed up GA with GPU

I have working on portfolio optimisation problem. I've been using the GA package successfully. The lenght of genome is about 40 , the type - real-valued. The fitness function is anough complice and run about 0.5-1 sec for one step. The whole script runs about 5-10 minutes. I used the GA "parallel" parametr and "snow" package to speed up the script running. It`s working methods.

Now I want to speed up it with GPU. What approach should be used? OpenCL , gpuR to re-write GA methods to GPU-based, or is there a simpler design that exists?

I am using Ubuntu 18.04, CUDA 10.0, Jupyter and R.

Perhaps there is an error in the README file

Installed the package with devtools...done.
loaded the package with library(GA)
as suggested in the README I have tried to show the vignette...

> vignette("GA")
Warning message:
vignette ‘GA’ not found

differences in handling suggestions between ga and gaisl

The suggestion matrix in ga() does not have to be the same number of rows as popSize, but this is necessary for gaisl() because of the following code:

suggestions <- suggestions[seq(popSize), , drop = FALSE]

The help file for gaisl() has the same information as in ga(), which doesn't mention this constraint.

Non-Reproducible Parallel Results

I am working on a project looking at optimal variable selection and we are using out-of-sample testing as part of our fitness function. The code is reproducible with a seed when run in serial, but not when run in parallel.

A minimal example:

library(GA)

data("fat", package = "UsingR")
nms <- c('age', 'weight', 'height', 'neck', 'chest', 'abdomen', 'hip', 'thigh',
         'knee', 'ankle', 'bicep', 'forearm', 'wrist')

fitness <- function(string) {
  equation <- paste(c("body.fat.siri ~",
                      paste(nms[which(string == 1)], collapse = " + ")),
                    collapse = " ")
  if (equation == "body.fat.siri ~ ") return(.Machine$integer.max)
  nfold <- 10
  folds <- sample(1:nrow(fat)) %% nfold
  tmp <- sapply((1:nfold) - 1, function(fold) {
    indices <- folds %in% fold
    mod <- lm(equation, data = fat[!indices,])
    fat[indices, "body.fat.siri"] - predict(mod, fat[indices,])
  })
  -sum(unlist(tmp)^2)
}

eg <- function(par = FALSE) {
  ga("binary", fitness = fitness, nBits = length(nms), maxiter = 5,
     names = nms, monitor = NULL, seed = 20160505, parallel = par)
}

all(eg()@summary == eg()@summary)
# [1] TRUE
all(eg(2)@summary == eg(2)@summary)
# [1] FALSE

I am working on a patch for my team and will submit a pull request when it is working.

ga parallel does not work when the fitness function is written in Rcpp

Hi,

I want to use ga in parallel format for a function that is written in Rcpp.

When I put parallel = TRUE/parallel = "snow", I got this error:

Error in { : task 1 failed - "NULL value passed as symbol address"

This error usually occurs in "foreach" when we want to call a function from Rcpp in parallel mode. The solution is to write a package that has the Rcpp and then pass the package to the "foreach".

I do not know the same solution is possible in GA.

I appreciate it if you can help me with this issue.

Thanks in advance.

Aden

Plotting GA object without burn-in period

Hi. I would like to run plot(GA , log="x") on the domain 100:maxiter, so ignoring the first 100 generations in the plot. Is there a way to do this through extracting information from the GA object directly or another way?

Thanks for your time.

Log messages in parallel mode

When using parallel=TRUE, absolutely no information is returned on the console from the fitness function itself. Is it somehow possible to save the messages of the function? I was thinking of something like the outfile option of the makeCluster, but unfortunately I can not set it from the outside in GA (if I understand everything correctly).

keepBest option not working

Hello,

I am trying to save the best solution of each iteration, but can't get the option to work.
Please can you help me with this?

###########################################################
library(GA)
packageDescription("GA")$Version
packageDescription("GA")$Date

f <- function(x) (x^2+x)*cos(x) # -10 < x < 10

monitor <- function(obj)
{
cat(paste("iteration =", obj@iter , "solution =" , obj@bestSol[[obj@iter]]), "\n")
}

GA <- ga(type = "real-valued", fitness = f, lower = -10, upper = 10,
keepBest = TRUE, monitor = monitor,maxiter = 10)

summary(GA)$bestSol
##########################################################
output:

GA <- ga(type = "real-valued", fitness = f, lower = -10, upper = 10, keepBest = TRUE, monitor = monitor,maxiter = 10)
iteration = 1 solution =
iteration = 2 solution =
iteration = 3 solution =
iteration = 4 solution =
iteration = 5 solution =
iteration = 6 solution =
iteration = 7 solution =
iteration = 8 solution =
iteration = 9 solution =
iteration = 10 solution =
summary(GA)$bestSol
NULL

Parallel error

When trying to use the parallel option, it gives the following error:
Error in R process: simpleError : task 1 failed - "object of type 'closure' is not subsettable". Works without parallel option. Any ideas?

> traceback()
4: stop(simpleError(msg, call = expr))
3: e$fun(obj, substitute(ex), parent.frame(), e$data)
2: foreach(i. = seq_len(popSize), .combine = "c") %DO% {
       if (is.na(Fitness[i.])) 
           do.call(fitness, c(list(Pop[i., ]), callArgs))
       else Fitness[i.]
   }
1: ga(type = "permutation", fitness = eval2, lower = 1, upper = 1000, 
       popSize = 10, maxiter = 2, parallel = TRUE)

Persisting state for partial runs

I've been thinking about how one might go about saving partial state (say as of the last successful iteration) if the run is interrupted early (manually or otherwise, but mostly manually). It's often a shame to want or need to interrupt a long run only to have nothing to show for it. I'm not certain how that might best be implemented (or even if the current package already has a way to do this). Thoughts?

MPI Cluster Parallelization

Hi,

I'm trying to run GA in parallel on a MPI Cluster but GA does not seem to support parallelizations via MPI cluster.

Apparently, the check (https://github.com/luca-scr/GA/blob/master/R/parallel.R#L12) in line 12 returns FALSE for MPI cluster, which will then be used in lines 157+ (e.g. https://github.com/luca-scr/GA/blob/master/R/ga.R#L157 and https://github.com/luca-scr/GA/blob/master/R/ga.R#L172) for the parallelization procedure.
MPI Cluster have the Classes ["nbmpicluster", "mpicluster", "dompicluster"], for which line 12 will produce the following structure for the parallel object:

str(parallel)
atomic [1:1] FALSE

attr(*, "type")= chr "multicore"
attr(*, "cores")= int 28

I was wondering if there is a way to allow for MPI Cluster parallelization.

Thanks.

luca-scr / ga Goto Github PK

ga's People

Contributors

Stargazers

Watchers

Forkers

ga's Issues

given a tour, calculate the total cost

inverse of the total distance is the fitness

Example

Recommend Projects

Recommend Topics

Recommend Org