Giter Site home page Giter Site logo

Comments (30)

thieu1995 avatar thieu1995 commented on June 20, 2024 1

Hi @AxelThevenot ,

Yes, we spoke about refactoring the project. But now we decide to keep it with this current style. And test the new features first.

from opfunu.

HeinrichWizardKreuser avatar HeinrichWizardKreuser commented on June 20, 2024 1

@thieu1995

You are 100% correct. I'm currently switching gears and building notebooks to show off what we can do. Here's the main things I want to showcase using the database:

  • Use the database to filter benchmarks that have a certain tag
  • Select some algorithms from the mealpy library
  • Run experiment(s) where each algorithm solves each of the selected benchmarks while storing the results
  • Programmatically plot results such that it is clear how each algorithm performs on each benchmark such that the developer can conclude whether an algorithm is more suited for landscapes of one tag or the other.

Will keep you posted.

from opfunu.

thieu1995 avatar thieu1995 commented on June 20, 2024

Hi @HeinrichWizardKreuser ,

Thank you so much for your awesome words. I love your idea but I really don't have enough time to do it. It is a huge works you should know.

from opfunu.

HeinrichWizardKreuser avatar HeinrichWizardKreuser commented on June 20, 2024

Hi @thieu1995

I'm glad to hear that you like the idea. I am willing to do this myself in a pull request. I would just need you to supervise it and give feedback on it.

Would you please take the time to comment on each of these and give feedback, after which I will complete the database based on the set of functions that are already available in the opfunu library.

The shape of the datastructure

I think the current shape (a list of dictionaries) is a good start. It can easily be transformed into a pandas dataframe.

Where to put this code

I have a few options for this one

  1. A new module in the package called db. Then a file containing all of the data.
  2. The previous option, but with a file for each of the module of functions (e.g. dimension_based, cec/*/ etc.) Then one file that would call each of the module's database files and put them into one datastructure.
  3. Each module in the package (e.g. dimension_based, cec/*/ etc.) would have a dedicated db.py file containing a database on the functions in that module.
  4. Following on the previous option, we can have another module db that would have methods to combine all of the different db.py contents. The module could offer helper methods such as getting all methods that fall under a certain filter (.e.g. get all functions with dimensionality of 1 or that fall under the 'convex' category). I do think that the helper methods would be very specific as more broad helper methods would just result in being direct calls to built-in pandas methods

The fields to include and how to represent each of them

Are the fields I've added correct / useful? I added these fields based on what I found useful, but also based on what I could consistently add. Here I have listed them for you along with how I believe they ought to be represented (they are in alphabetical order)

  • dimensions
    • int representation of the dimensions e.g. 1, 2 etc.
    • if the dimensions can by anything (N), I was thinking of '*', 'd', or None. any ideas?
  • domain
    • list of floats representing the domain of values for the input vector x.
    • e.g. [-1, 1]
  • latex
    • The latex string for this function
    • very useful when writing a report
  • links
    • list of strings
    • relevant URLs of where information on this function was obtained
  • method
    • pointer to the implementation of the function in the opfunu library
  • minima
    • I'm still hesitant on how to do this one. I need some advice
    • I was thinking on adding a list of dictionaries describing the minima (solution) for the function
    • e.g. dict(fx=0.55, x=[1]) would mean that a minima (f(x)) of 0.55 is achieved when plugging in 1 (as x) into the function.
      • the question is how this changes if the function is of d dimensions and the minima changes based on the dimension
      • currently I'm not worrying too much about it since I'm just pulling the information straight from sources
    • e.g. dict(fx=0.55, x=[-1.51, -0.75]) would represent a minima of 0.55 when plugging in x of either -1.51 or -0.75
  • name
    • The formal name of the function
  • references
    • academic references for this function
  • tags
    • This is probably the most useful field of them all - a list of tags allowing the user to categorize the function.
    • e.g. [ 'continuous', 'differentiable', 'separable', 'scalable', 'multi-modal' ]

The correctness of each field

Although I can manually add these fields, or even write a webscraper to get many of these fields - both of those methods may result in errors. I don't know how to assert that they are correct (other than double checking), but I guess that's open-source for you. People can point out the mistakes and fix them.

from opfunu.

thieu1995 avatar thieu1995 commented on June 20, 2024

@HeinrichWizardKreuser ,

Oh after I re-read all of what you have written here. I remember one I've had thought about writing documents for opfunu, with all of the properties as you listed above. I tried with readthedocs, because it will reduce the time to write the documents. But I failed like 2 or 3 times. You can actually see its document here:
https://opfunu.readthedocs.io/en/latest/pages/cec_basic/cec_basic.html#opfunu-cec-basic-cec2014-module

I did use comment format as required but it was't showing the document on the website, so I gave up. But now I know what caused the problem because I've successfuly built another library with a completed documents
https://mealpy.readthedocs.io/en/latest/pages/models/mealpy.bio_based.html#module-mealpy.bio_based.EOA

You can use search to find anything you want in there, any algorithm or tag or properties. So what do you think? Instead of writing such a field to opfunu, we can just re-update the comment and fix the bug with readthedocs. I still don't know what can we do with pandas functionalities when adding a module db.py to each module (type_based, multi_model,...)? Because if users want to know about function characteriestics, they can search on the doc's website.

However, opfunu is still missing an important functionality which is drawing. I also tried it so long ago but was not successful with 3D plotting. Now I found a really good repository where the author implements the drawing functions and codes in a very clear way.
(you can see it here: https://github.com/AxelThevenot/Python_Benchmark_Test_Optimization_Function_Single_Objective/blob/main/pybenchfunction/function.py)
Yes, he uses some properties to draw each function. Maybe we can keep some properties as a dictionary python for each function (class) to draw the figure, and other properties such as latex, references, and link, we can put them in the documents section instead. What do you think?

And another question, I would like to ask your suggestion. Currently, there are 2 types of programming in opfunu (functional and OOP-class).
It messed up the repository for new users. I'm thinking about removing the functional style and keeping the OOP style because it will reduce the coding time. What is your suggestion for this matter?

from opfunu.

HeinrichWizardKreuser avatar HeinrichWizardKreuser commented on June 20, 2024

Greetings @thieu1995 . I appreciate this conversation and hope that it will benefit the repository.

Docs vs the Database

I agree with your point on adding the details to the documentation, but I believe having it in physical code is also important. What it boils down to is being able to programmatically filter benchmark functions based on attributes, running simulations of each benchmark (with its own meta-parameters) and exporting results - all in one pipeline. Having all of the details - including the physical implementation - of each method will allow the users to programmatically run experiments and draw conclusions (something that I wish I had when working on my projects and writing papers).

But I think we should have both the database and the docs. Even better, we can have the database and the docs can be populated from it (thus we'd only need to update the database and the docs would automatically be updated).

3D plotting

I was actually thinking of adding 3D plotting to opfunu next. I made lots of plots in my previous projects for my course on Computational Intelligence:

image

It was some simple matplotlib code, but it is rather specific so having it be built-in for the user's convenience would be good.

As for the library https://github.com/AxelThevenot/Python_Benchmark_Test_Optimization_Function_Single_Objective/blob/main/pybenchfunction/function.py, I agree, it does seem very useful. We can even incorporate the code for 3D plotting into opfunu (or invite them to add it).

I also think that the details that @AxelThevenot added to each method will be instrumental in speeding up adding new fields to the database and asserting its correctness.

Functional vs OOP

I would love to contribute my suggestion, but unfortunately, I don't have enough information. Could you perhaps post examples comparing the two?

from opfunu.

thieu1995 avatar thieu1995 commented on June 20, 2024

@HeinrichWizardKreuser

Docs vs Database

I get it now and agree with you. I'm not sure how to do it with one pipeline, but I think with your imagination we can do it.
If we can build a database, I think we pull it out for the docs also.

3D plotting.

I think I can spend time to re-structure opfunu as that guy did in his repository. So any function can pull out its 2D or 3D figures.

Functional vs OOP

You can see it in the readme.md file, I give an example of how to call function or class in opfunu. For example, the CEC-2014 module

  • If you want to use functional style, you can import and call any function you want like this
import numpy as np
from opfunu.cec.cec2014.function import F1, F2, ...

problem_size = 10
solution = np.random.uniform(0, 1, problem_size)

print(F1(solution))             # Function style
  • If you want to use OOP style, you can import class
from opfunu.cec.cec2014.unconstraint import Model as MD
func = MD(problem_size)         # Object style solve different problems with different functions
print(func.F1(solution))
print(func.F2(solution))

Anyway, it is just a way to structure the code and the way to call the function out.
You can call it from a module or call it from class. But now I think each benchmark function should be a class. And it should inherit from a BaseClass that defines anything in common there.

Where to put this code

I think we can start with option 3.
Option 4 sounds the better one, but it may be hard to combine all of them in one DB since each module with each function has its characteristics and properties.

The fields to include and how to represent each of them

  • dimensions: For N dimensions, I think we can put it None there.

  • minima: I think you can just try it with your current idea.

  • other fields: I agree with them all

from opfunu.

HeinrichWizardKreuser avatar HeinrichWizardKreuser commented on June 20, 2024

@thieu1995

Docs vs Database

I'm glad we can agree on this. I'll implement the database first and then we can look at automatically generating the docs from there.

3D plotting

I agree with using his 3d plotting, but I am sceptical about restructuring the package to use OOP (see next point)

Functional vs OOP

The implementation of my database approach essentially creates a dictionary for each benchmark function where the dictionary contains "metadata" of the benchmark along with the actual benchmark python implementation which can simply be called using the __call__ attribute (commonly known as "calling the method" - using f() where f is the method to call).

If you wish to take the OOP approach, then my database implementation will introduce redundancy since the values in each dictionary (such as the latex formula and attributes such as convex etc) will likely also be in the OOP implementation.
For instance, where the OOP implementation would be

class Adjiman:
    name = 'Adjiman'
    latex_formula = r'f(x, y)=cos(x)sin(y) - \frac{x}{y^2+1}'

the database implementation would be

data = [
  dict(
    name='Adjiman',
    latex_formula=r'f(x, y)=cos(x)sin(y) - \frac{x}{y^2+1}',
  ),
  ...
]

Thus, both would have the fields name and latex_formula in this example, which is redundant. Ideally, you'd want one to simply inherit/collect the information from the other.

If you wish to use OOP for each benchmark, we can convert the database to loading the classes in memory and calling the metadata that python reveals for us such as __dict__. For instance:

>>> # retreiving from the class itself
... a.__dict__
mappingproxy({'__module__': '__main__',
              'name': 'Adjiman',
              'latex_formula': 'f(x, y)=cos(x)sin(y) - \\frac{x}{y^2+1}',
              '__dict__': <attribute '__dict__' of 'Adjiman' objects>,
              '__weakref__': <attribute '__weakref__' of 'Adjiman' objects>,
              '__doc__': None})
>>> # retrieving from an instance
... a = Adjiman()
... a.__class__.__dict
mappingproxy({'__module__': '__main__',
              'name': 'Adjiman',
              'latex_formula': 'f(x, y)=cos(x)sin(y) - \\frac{x}{y^2+1}',
              '__dict__': <attribute '__dict__' of 'Adjiman' objects>,
              '__weakref__': <attribute '__weakref__' of 'Adjiman' objects>,
              '__doc__': None})

of course, this approach is ugly as it also includes things such as __doc__ and __weakref__. We would just need to build methods that call the __dict__ attributes of each class and "clean" them (remove things like __doc__) and then build the database structure I originally designed.

In this approach of using OOP, it would be desirable for the classes to inherit from a base class (e.g. BaseBenchmark) since we can then call BaseBenchmark.__subclasses__() to retrieve each class that is currently loaded into memory (when you import a module/package) in a list. This list can then be iterated over and calling __dict__ on each subclass would be used to populate the database.

Continuous Development plan if we take the OOP approach

To conclude, I am open to the idea of using OOP. Personally, it doesn't matter to me. I do think OOP does give us more control over customizing a benchmark function, so perhaps we should go for it. We'd have to take a continuous development approach:

  1. Introducing OOP
    1.1. Restructure the codebase to use OOP.
    1.2. Implement a base class and perhaps build a more advanced hierarchy. For example, benchmarks that have parameters might have a different implementation from other non-parameterized benchmarks so they could inherit from a parent that contains common procedures. This parent class can then inherit from the base class.
    1.3. We should consider whether this is worth doing for all benchmarks, or whether we just do it for some new and/or regularly used benchmarks. opfunu is quite large and impressive when looking at the sheer amount of individual benchmarks. But the size might be out of proportion to do such a big change, or some of it would be pointless to update as some benchmarks might be rarely used versus their new versions (e.g. cec2005 vs the newer cec).
    1.4. We should also consider backwards compatibility. Some users may have already grown accustomed to using Functions._brown__() for instance. Perhaps we should keep these methods, but alter them to call our new implementation and simply raise a deprecation warning.
  2. Filling in the details that the database would introduce
    2.1. With our OOP implementation done, we should now start populating fields such as latex and convex etc. Here we can use https://github.com/AxelThevenot/Python_Benchmark_Test_Optimization_Function_Single_Objective/blob/main/pybenchfunction/function.py to already fill in many of the blanks.
  3. Implement the database
    3.1. Now we finally get to the purpose of my suggestion where we implement a database. Here I would implement methods that load the classes that inherit from a base class and build a database that users can convert to dataframes.
    3.2. I could also introduce some example jupyter notebooks that show off the experiment pipelines I was thinking of / see the database being a huge use case of.
  4. Updating docs
    4.1. This doesn't have to be done after 3 and can be done directly after 2. Here we try to build a pipeline that automatically generates the docs using the OOP fields
    4.2 I do think that this pipeline would actually use the code from 3 to help populate the docs.

from opfunu.

thieu1995 avatar thieu1995 commented on June 20, 2024

@HeinrichWizardKreuser ,

Ah, I see. Then I think we should keep it as functional style. And now because we are using database, I think we don't need to split benchmark functions as multiple module as I did (type_based, or dimension_based). I think we can group them into 1 single module.
Because lots of methods were re-implemented in both categories.
What do you think?

from opfunu.

HeinrichWizardKreuser avatar HeinrichWizardKreuser commented on June 20, 2024

@thieu1995 , I updated my message and I believe I finished the edit after you already started formulating a response. I just want to confirm, is your message made with respect with the latest version of my message that contains the "Continuous Development plan if we take the OOP approach" section?

from opfunu.

HeinrichWizardKreuser avatar HeinrichWizardKreuser commented on June 20, 2024

To be clear, I don't know whether functional approach or OOP is better. The easiest would be to just stick to opfunu's current functional approach and then add the database. The question is whether OOP offers a benefit. Does OOP give us some desired control over benchmark methods? Such as say parameterized versions of benchmarks?

from opfunu.

thieu1995 avatar thieu1995 commented on June 20, 2024

@HeinrichWizardKreuser

Lol, I only read the above part that you wrote.

So that is the point of my suggestion, because the repo above, he already implemented as OOP style and can search the functions with some properties as you wish with the database.
For example this code from his repo:

import pybenchfunction as bench
# get all the available functions accepting ANY dimension
any_dim_functions = bench.get_functions(None)

# get all the available continuous and non-convex functions accepting 2D
continous_nonconvex_2d_functions = bench.get_functions(
    2,  # dimension
    continuous=True,
    convex=False,
    separable=None,
    differentiable=None,
    mutimodal=None,
    randomized_term=None
)
print(len(any_dim_functions))  # --> 40
print(len(continous_nonconvex_2d_functions))  # --> 41 

Continuous Development plan if we take the OOP approach

Right now, I only consider the non-parameterized benchmark functions. But we may think to create a new module for parameterized functions in the future.

My suggestion

Yes, we should stick to the current functional style. But I still want re-name the function as a public function. For example:
Instead of of calling "Functions._brown__()", they can call. "Functions.brown()".
I have coded this style a long time ago (it is stupid when implemented as a private function) when I was a student. I haven't thought about changing it until now.

Also, can you try the database with type_based and dimension_based modules first? Leave the cec for later. I want to see how the database works with them first before moving to cec functions.

from opfunu.

HeinrichWizardKreuser avatar HeinrichWizardKreuser commented on June 20, 2024

Continuous Development plan if we take the OOP approach

I agree. Let's only consider parameterized benchmarks for a future OOP overhaul.

My suggestion

I'll leave the name change to you for later.

Right now I will write my suggested database approach with references to benchmarks in type_based and dimension_based first and then we can take it from there. Will post my progress here.

from opfunu.

thieu1995 avatar thieu1995 commented on June 20, 2024

@HeinrichWizardKreuser ,

What about my above question? Do you suggest grouping type_based and dimension_based into 1 module? Because like I said, there are several functions have been duplicated in both modules.

And please create a new branch when you want to push something new. The branch name should be "dev/feature_name" or "dev/your_name", I don't mind.

from opfunu.

HeinrichWizardKreuser avatar HeinrichWizardKreuser commented on June 20, 2024

@thieu1995, sorry for missing your question.

I think we can combine them, yes. Should I do it the PR or leave it for later? I was thinking for later.

Understood, will name the branch accordingly.

from opfunu.

thieu1995 avatar thieu1995 commented on June 20, 2024

@HeinrichWizardKreuser

I guess it depends on you. Do you want to create the database first and then group them or do you want to group them into a single module first and then design the database?

Besides, to not waste your time, you should try to create the database for some functions only and then test the pipeline or whatever you want first. If it works as you expected then you can apply for the rest of the functions.

from opfunu.

HeinrichWizardKreuser avatar HeinrichWizardKreuser commented on June 20, 2024

@thieu1995

We want to group the functions in any case, so let's group them in a separate PR (or you can do it yourself). If we group them in this PR and end up scrapping this PR, then we have to group them together again or do some commit picking black magic to extract the grouping part of the PR.

I agree. I will make the database for the two files we discussed and then make some notebooks showing off use cases to ensure that they work in the way I believe the user would desire.

from opfunu.

thieu1995 avatar thieu1995 commented on June 20, 2024

@HeinrichWizardKreuser ,

Yeah, then let's leave it for later and for another PR.

from opfunu.

HeinrichWizardKreuser avatar HeinrichWizardKreuser commented on June 20, 2024

@thieu1995

I am populating the fields of each benchmark using the following criteria (in order)

  1. Reference paper: A Literature Survey of Benchmark Functions For Global Optimization Problems (2013): https://arxiv.org/pdf/1308.4008.pdf
  2. Search https://www.sfu.ca/~ssurjano/optimization.html
  3. Search http://infinity77.net/global_optimization/test_functions.html
  4. Use Thevenot's fields in https://github.com/AxelThevenot/Python_Benchmark_Test_Optimization_Function_Single_Objective/blob/main/pybenchfunction/function.py
  5. Search https://www.indusmic.com/

Question

If I cannot find a tag such as convex for a benchmark, does that automatically mean that it is non-convex? For example, the benchmark "Egg Holder" does not have a tag denoting its convexness.

Looking at Thevenot's implementation of EggHolder, I see that it has convex set to False: https://github.com/AxelThevenot/Python_Benchmark_Test_Optimization_Function_Single_Objective/blob/91c37d9d0f1f3366064004fdb3dd23e5c2681712/pybenchfunction/function.py#L981.

For now, I will assume the answer to this question is "yes".

from opfunu.

thieu1995 avatar thieu1995 commented on June 20, 2024

@HeinrichWizardKreuser

Yes, if not convex, you can tag it non-convex. We can change it later if it is convex. Just do what you think is good.

from opfunu.

AxelThevenot avatar AxelThevenot commented on June 20, 2024

Hello 👋

I saw you were speaking about refactoring your project like mine in some ways

About the question on if EggHolder is convex or not, It is possible I made a mistake

It was a hard work so maybe there are more than one mistake so do not take my parameters as they were perfect :)

But I saw this https://www.researchgate.net/figure/Eggholder-function-a-non-convex-function-multimodal-and-with-a-large-number-of-local_fig4_332169500#:~:text=Commons%20Zero%201.0-,Eggholder%20function%3A%20a%20non%20convex%20function%20multimodal%20and%20with%20a,number%20of%20local%20pronounced%20bowls

from opfunu.

HeinrichWizardKreuser avatar HeinrichWizardKreuser commented on June 20, 2024

@thieu1995

Progress Update

I've realised how much manual labour this is and have written a web scraper to get the data from

I've successfully crawled the data from those two websites. My next goal is to parse the data from the markdown files in https://github.com/mazhar-ansari-ardeh/BenchmarkFcns/tree/gh-pages/benchmarkfcns.

After that, I will combine the cleanest combination of the data and then test the database in some notebooks where I run experiments to show how the database can be used.

If some of the data disagree with each other, I will flag it here where you can advise. (.e.g. one claims that a method is convex while the other says it is non-convex).

from opfunu.

thieu1995 avatar thieu1995 commented on June 20, 2024

@AxelThevenot ,

Yeah, I appreciate the heads-up!

from opfunu.

HeinrichWizardKreuser avatar HeinrichWizardKreuser commented on June 20, 2024

Progress Update

Been doing other work the last week, but yesterday I finished crawling the markdown files in https://github.com/mazhar-ansari-ardeh/BenchmarkFcns/tree/gh-pages/benchmarkfcns.

I'm currently matching all the functions across the different sources. So the next step is to combine to find how I can best combine them (for instance, how do I decide which source's input domain to keep?). I'm making good progress.

from opfunu.

HeinrichWizardKreuser avatar HeinrichWizardKreuser commented on June 20, 2024

@thieu1995

Here is a preview of the data I've collected so far.
https://github.com/HeinrichWizardKreuser/mealpy-database/blob/master/nb/data.json

There is also a jupyter notebook in the same directory showcasing how I collected the data. Each item in the list is a dictionary where the keys are b for benchmarkfcns, s for sfu and i for infinity77. These keys represent where they were gotten from. The values for these keys are then the scraped data such as the latex, name etc.

Some dictionaries don't contain data for things like sfu, but have data from infinity77 and benchmarkfcns and vice versa.

Have a look when you get the chance.
TIP: you might want to download the file and then open it in your browser if you want to easily view the json file (collapse and expand some parts etc)

from opfunu.

HeinrichWizardKreuser avatar HeinrichWizardKreuser commented on June 20, 2024

These are just data that overlap with each other from different sources, I still need to add the data that is from individual sources and then still map the data to benchmark functions that you've implemented. Then I need to find a way to concisely list them as a database.

from opfunu.

thieu1995 avatar thieu1995 commented on June 20, 2024

@HeinrichWizardKreuser ,

That is a really great job. But I think, you should test with a few functions first, then build a database, functionalities, and pipeline that you want. Don't spend too much time correcting each function's properties right now.

When your database and functionalities that your design work as you expected, then we can come back and finish all other functions.

from opfunu.

thieu1995 avatar thieu1995 commented on June 20, 2024

Hi @HeinrichWizardKreuser ,

Any new news on your progress?

from opfunu.

HeinrichWizardKreuser avatar HeinrichWizardKreuser commented on June 20, 2024

Hi @thieu1995

I haven't made any updates since my last comment - been busy with work and other hobby projects. But I've hit an obstacle with some of it, so I think it would be good to take my mind off of it and continue with my work here

from opfunu.

thieu1995 avatar thieu1995 commented on June 20, 2024

@HeinrichWizardKreuser,

Thanks for letting me know.

from opfunu.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.