So how do we train this network? The training procedure is actually quite simple and involves no gradient calculations:
Initialize hidden neuron weights to small random values or use PCA weight initialization
Feed data row xᵢ to input layer
Iterate through each neuron in hidden layer and find the Best Matching Unit (BMU), i.e. the neuron that has the smallest Euclidean distance or metric with the data row xᵢ
Apply a weight update to the BMU and it’s neighboring neurons. The BMU’s neighbor neurons are calculated using a neighborhood function Φ
Shrink the neighborhood function Φ
Repeat steps 2 to 5 until iteration limit reached or convergence (the average codebook vector distance between all neurons and all data is smaller than some threshold)