Provides to optimize the hyperparameters using Reinforcement Learning. Term Project for the Optimization course at Izmir University of Economics.
The agent makes decisions by choosing actions that are expected to maximize the cumulative reward over time. The agent is a neural network model that takes the state of the environment as input and outputs the action to be taken.
The objective function is to minimize the validation loss (val_loss) which is the Mean Squared Error (MSE). Given a set of n samples, where for each sample i, the predicted value is y^ i and the actual value is yi , the MSE is calculated as:
In the context of reinforcement learning, Q-values that must be maximized represent the expected future reward for taking a certain action in a certain state. where:
● s is the current state
● a is the action taken,
● r is the immediate reward received after taking action a in state s,
● s′ is the new state after taking action a,
● a′ is the action taken in state s′,
● γ is the discount factor which determines the present value of future rewards.
Let f: ℝ n → ℝ be the fitness or cost function which must be minimized. Let x ∈ ℝ n designate a position or candidate solution in the search-space. The basic RandomSearch algorithm can then be described as:
- Initialize x with a random position in the search-space.
- Until a termination criterion is met (e.g. number of iterations performed, or adequate fitness reached),
repeat the following:
- Sample a new position y from the hypersphere of a given radius surrounding the current position x (see e.g. Marsaglia's technique for sampling a hypersphere.)
- If f(y) < f(x) then move to the new position by setting x = y
You can access the Tutorial and all inferences below.
[1] Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems ANDREW G. BARTO, Member, IEEE, Richard S. Sutton, and Charles w. Anderson (0018-9472/83/0900-083401.00 01983 IEEE)