przechoj / gips Goto Github PK
View Code? Open in Web Editor NEWgips - Gaussian model Invariant by Permutation Symmetry
Home Page: https://przechoj.github.io/gips/
License: GNU General Public License v3.0
gips - Gaussian model Invariant by Permutation Symmetry
Home Page: https://przechoj.github.io/gips/
License: GNU General Public License v3.0
I've convinced myself that goal_function
works properly for two simple edge cases, where det_phi_part
is one or consists of a product of a single element. Unfortunately, I think the det_phi_part
is not calculated correctly for the general case.
I would love to see this det_phi_part
outsourced as a function with some unit tests.
We decide if the covariance matrix is invariant under permutation based on the a posteriori distribution in the Bayesian approach. We could then instruct users to use the statistical test to decide whether the found permutation is reasonable. Such tests are developed theoretically. IDK if someone implemented those in R, but it seems easy to do by ourselves...
When the user optimizes his S
matrix and then plots the results, he may conclude that he wants to run the optimization for longer, for example, for twice as many iterations. Then he plots the results and sees that gips forgot the original path, showing only the new one.
I think the good idea would be to make an additional parameter in find_gips()
called continue
. When a user sets it to TRUE
, then the optimization is performed in such a way to remember the previously made progress.
When the user calls find_gips(g, continue=FALSE)
on already optimized g
, there should be a warning that his process will be lost. Maybe even ask for confirmation?
I thought it would be nice to be able to call just heatmap(g)
, where g
is of gips
class. It would be implemented like heatmap(gips::project_matrix(attr(g, "S"), g[[1]]), Rowv = NA, Colv = NA)
.
Is it a good idea? Can it be useful?
The minor technical issue is that stats::heatmap
is not a generic function. We have two options now: make it generic in our package and set the heatmap.default
to stats::heatmap
; or call this function sth like heatmap_gips(g)
.
What do you think, @MrDomani ?
For now, everything works on a natural orthonormal basis
Users may be interested in getting the decomposition for different orthonormal basis.
library(gips)
library(rags2ridges)
GIPS_N_ITER <- 2
n_points <- 50
S <- createS(n=n_points, p=150)
D_coef <- 1000
delta <- 3
D_matrix <- diag(ncol(S)) * D_coef
g <- gips(S, n_points, delta = delta, D_matrix = D_matrix,
was_mean_estimated = TRUE)
g_MAP <- find_MAP(g, max_iter = GIPS_N_ITER, return_probabilities = TRUE,
optimizer = "MH", save_all_perms = TRUE)
validate_gips(g_MAP)
Right now we don't have any tests nor code. Once we have some, it will be useful to automatically run R CMD Check.
Execute
usethis::use_github_action_check_standard()
once ready.
In this issue, I will share my thoughts on optimizers in find_MAP()
.
For an introduction to the topic, see the vignette("Optimizers", package="gips")
also available as the pkgdown page.
We encourage anyone to comment on those.
We want to find the cyclic group. In the reference paper, the process of walking on groups is referred to as the First approach in 4.1.1. The process of walking on permutations is referred to as the Second approach in 4.1.2.
In the gips
package, the Second approach is implemented as the "MH" optimizer. We examined the First approach throughout and decided not to implement it. We don't reject it, though, and we may choose to do it in the future.
find_MAP(optimizer = "BF")
for perm_size < 10
calculates a posteriori function only for group generators, while for 10 <= perm_size
calculates a posteriori function for all permutations of a given size. This is faster to compute it for fewer perms, but we had to stop somewhere and decided to stop at perm_size == 10
. We can always later extend the perm_group_generators_list
object generated in data-raw/perm_group_generators_list.R
.
We have implemented other optimizers that we decided not to yet add to gips
. Those include:
Prepare for release:
git pull
urlchecker::url_check()
devtools::build_readme()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
revdepcheck::revdep_check(num_workers = 4)
cran-comments.md
git push
Submit to CRAN:
usethis::use_version('minor')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version(push = TRUE)
For now, we will implement a function that will get the observations $Z = (Z_1, Z_2, ..., Z_n) $ or matrix $U = {\sum_{i=1}}^n Z_i^T \cdot Z_i $ and will find the best permutational group $\Gamma $ to project the matrix $U $ on
However, if
The projection $U $ on $\mathcal{P}_{\Gamma} $ will be the maximum likelihood estimator, iff
It also is, that
It may be that the found group
The goal function of the Metrop-Heist algorithm looks like this:
, where matrixes
, and function
I suspect the calculation of function may be very expensive.
Edit, the following part contains errors. See the following comment
Let's indicate the disjoint cycles of as
. Then notice that
depends only on
and
.
It may be a big computational profit to save the calculated and recall them in stead of calculating them every time. Keep in mind that in single M-H step we are going from
into
such that
, therefore
and
are different on the maximum 2 of theirs disjoint cycles:
plot.gips(type="heatmap")
will produce a ggplot2
graph. It would be nice for the other types to also support ggplot2
.
In non-interactive sessions, one should use is_installed
instead.
We have check_installed
in 2 places: for return_probabilities=TRUE
and for plot.gips
.
Both could be used in non-interactive sessions, the first one surely.
The values of get_structure_constants(perm, 5)
, where perm
r
vector should be c(1,1,1)
, but is c(1,1)
, therefore the bug is in calculate_r
function. The rest of get_structure_constants
works fine on this example.
The design and names used in the package have to be changed slightly.
gips()
function that will be the wrapper for optimization algorithms looking for the permutation. There are now two algorithms, the Metropolis-Hastings and the best growth, but we plan to introduce the simulated annealing and the evolutionary algorithm in the futureMH()
function will have changed the name to sth more meaningfulgoal_function()
, especially since now it is logarithm-basedn_number
, max_iter
(it is precisely the number of iterations for MH
, but will be the maximum number of iterations for simulated annealing and the evolutionary algorithm in the future)U
parameter from S
parameter The user can now set the maximum number of iterations for the optimization process. Worse, the time of every iteration is vastly different for each type
.
I think a good idea would be to add the user option to choose whether he wants to set the maximum optimization time.
We did not encounter such a feature before, which is surprising.
The user will run the M-H algorithm for a long time on his machine. The process has to be more informative and eye-pleasing than right now.
The test 'goal_function has the desired property' in the test_MH.R
file is designed randomly. It is supposed to show the desired property of the goal_function
that is not guaranteed. I am not sure whether it is a problem. The code below shows that this test will fail about 6% of the time.
I put the set.seed(1234)
in the test to make sure it will not be a false red flag while developing, but we should consider some other solution in the future.
For example, the test fails for set.seed(24)
or 29, 36, 39, 62, 66, 69.
library(gips)
set.seed(1234)
M <- 10000
progressBar <- utils::txtProgressBar(min = 0, max = M, initial = 1)
p <- 10
n <- 20
actual_permutation <- permutations::as.cycle(permutations::as.word(c(2:p, 1)))
id_perm <- permutations::id
mu <- numeric(p)
sigma_matrix <- matrix(numeric(p*p), nrow=p)
for(i in 1:p){
for(j in 1:p){
sigma_matrix[i,j] <- 1 - abs(i-j) / p
}
sigma_matrix[i,i] <- 1 + 1/p
}
bad_i <- 0
for(i in 1:M){
utils::setTxtProgressBar(progressBar, i)
Z <- MASS::mvrnorm(n, mu = mu, Sigma = sigma_matrix)
U <- t(Z) %*% Z
actual_permutation_function_value <- goal_function(actual_permutation,
n, U)
another_permutation_function_value <- goal_function(id_perm,
n, U)
if(actual_permutation_function_value <= another_permutation_function_value){
bad_i <- bad_i + 1
}
}
close(progressBar)
paste0(bad_i/M * 100, "% of times test failed") # 5.77% of times test failed
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.