Hi Fabian, the next task will be to stabilize the learning process b

Hi Andre, I just uploaded a notebook here: <a href="https://github.c

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Stabilize control about active_flow_control_past_cylinder_using_drl HOT 8 CLOSED

fabiangabriel commented on August 23, 2024

Stabilize control

from active_flow_control_past_cylinder_using_drl.

Comments (8)

FabianGabriel commented on August 23, 2024 1

Hi Andre,

thank you for theese suggestions. I already thought the use of the 2 gamma distributions was the way to go here and implemented a very similar solution. (For the record: The second number in the 2 gamma distributions doesn't have to be 1 as long as they are the same: Z ~ Beta(a,b) from X ~ Gamma(a,c) and Y ~ Gamma(b,c), c can be any number here).

I have now implemented both the tanh-normal-distribution and the beta-distribution and started a training each. The results look very promising so far.

Best Regards,
Fabian

from active_flow_control_past_cylinder_using_drl.

FabianGabriel commented on August 23, 2024

Hi Andre,

I have now implemented this code in both Python and C++, and while the outputs are in the prescriped range, the limiting of the log_std valueis somewhat questionable.
Starting at a standard deviation value of about 0.8, the distribution shifts more towards the edges, making the mean less likely than other values:

At even higher values, the distribution shifts even more drastically to the edges, cresting a shape like this:

For our machine learning case here this behaviour would be not intended and even counterproductive, right?
So to counteract this, I would suggest changing the log_std_max value in the boundary condition to -0.35 (approx. ln(0.7)).

Best Regards
Fabian

from active_flow_control_past_cylinder_using_drl.

AndreWeiner commented on August 23, 2024

Hi Fabian,
that's indeed an important observation. Could you please create a notebook/script with the following three plots of the limited Gaussian distribution?

influence of std
influence of mean
combined influence of mean and std

Thanks!
Andre

from active_flow_control_past_cylinder_using_drl.

FabianGabriel commented on August 23, 2024

Hi Andre,

I just uploaded a notebook here: notebook

Best Regards,
Fabian

from active_flow_control_past_cylinder_using_drl.

AndreWeiner commented on August 23, 2024

Good job! Very interesting profiles but their potential impact on the training is not so easy to foresee (at least for me). Clipping too early might corrupt the gradient needed to update the log_std output of the policy network. Let's give the value you suggested a try and see if the learning still works.
Best, Andre

from active_flow_control_past_cylinder_using_drl.

FabianGabriel commented on August 23, 2024

Hi Andre,

I still have some problems with calculating the entropy of the bounded distribution.
The implementation of the sampling, squashing with tanh and then rescaling part in the agentRotatingWallVelocityFvPatchVectorField.C file was pretty straight forward and there is also a formula there for the log_probability. But I can't seem to find a way to calculate the entropy of this changed distrinution.

The implementation in the get_predictions part was also a little bit more challeging as that function is getting actions and states as their input. So I can't just sample new actions there. But as thoose actions are read from an existing trajectory.csv file, I can be sure that those actions are already bounded to the actionspace. To calculate the log_probability there I had to "descale" the actions and reverse the tanh function. Then I can use the formula from above too. That should now also lead to the correct log probabilities.
But I also have to calculate the entropies there, so I have essentially the same problem there too.

Best Regards,
Fabian

from active_flow_control_past_cylinder_using_drl.

FabianGabriel commented on August 23, 2024

Hi Andre,

on Friday we talked about the tanh-distribution and how the integral is not 1.
Well, turns out I made a typo in the plotting notebook. The integral actually is 1.
I'm sorry and I uploaded the fixed version of the notebook.

Best Regards,
Fabian

from active_flow_control_past_cylinder_using_drl.

AndreWeiner commented on August 23, 2024

Hi @FabianGabriel,

great that you fixed the issue! I've tried to derive the entropy equation for bounded normal distribution, but the outcome was too complex to be useful (there were a couple of terms without a closed-form solution). Maybe a more advanced tool like Mathematica can compute the integral. I've gone through the procedure once for the normal distribution, and I append the result here for completeness:

As I mentioned in the meeting, the exact shape of the entropy term is not crucial as far as I understand. If you plot the differential entropy of the normal distribution, you see that this term in the policy loss function basically prevents the standard deviation to drop to zero (the consequence of which would be no exploration).

Regarding the implementation of the beta distribution in the OpenFOAM boundary condition:

this source shows how to use the gamma distribution in the c++ standard library
this source describes that one can sample Z ~ Beta(a,b) from X ~ Gamma(a,1) and Y ~ Gamma(b,1) as Z=X/(X+Y)

Here is a minimal working example (I scaled the output by ten to use the original plotting function):

// compile with
// g++ beta.C -o beta
#include <iostream>
#include <random>

int main()
{
  const int nrolls=10000;  // number of experiments
  const int nstars=100;    // maximum number of stars to distribute

  std::default_random_engine generator;
  std::gamma_distribution<double> distribution_1(2.0, 1.0);
  std::gamma_distribution<double> distribution_2(2.0, 1.0);

  int p[10]={};

  for (int i=0; i<nrolls; ++i) {
    double number_1 = distribution_1(generator);
    double number_2 = distribution_2(generator);
    double sample = number_1 / (number_1+number_2) * 10;
    ++p[int(sample)];
  }

  std::cout << "beta distribution (2.0,2.0):" << std::endl;

  for (int i=0; i<10; ++i) {
    std::cout << i << "-" << (i+1) << ": ";
    std::cout << std::string(p[i]*nstars/nrolls,'*') << std::endl;
  }

  return 0;
}

If you need further suggestions for the implementation in OpenFOAM, let me know.
Best, Andre

from active_flow_control_past_cylinder_using_drl.

Stabilize control about active_flow_control_past_cylinder_using_drl HOT 8 CLOSED

Comments (8)

Related Issues (6)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent