There was an arguably small improvement in hubness and a decline in r-precision when homogenizing GMM's by distance from their global centroid. Since we saw early on that there are frames in the "activation-gram" that show certain components are only active (i.e. likely to represent a sample) for a relative small number of frames, why not base the homogenization criterion on the activation itself, instead of an indirect correlate?
So, I looked at the median activation level for each component over the entirety of a song and simply dropped any component whose median activation did not meet a threshold (again, empirically derived).
Below are the same plots used in the distance-based homogenization: first, hubness vs. number of components removed; second, hubness histograms.
From the first figure, we see that, again, homogenization indeed tends to affect anti-hubs more than hubs, as intended.
The number of hubs (more than 200 100-occurances) for each threshold (-110, -100, -90, -80 in log likelihood) were 162, 155, 160, and 153, compared to 156 for no homogenization. The number of anti-hubs for each run were 138, 110, 142, and 149, compared to 124 for no homogenization. It seems, and is clear from the histograms, that the only threshold that is helping us (decreasing both hubs and anti-hubs) is -100. We saw over-homogenization adversely affect hubness in the distance-based method also. I should look into this.
Max hubs values for each run were 601, 592, 576, and 570, compared to the original 580, so there's at least a monotonic decrease.
Interestingly, the -100 threshold also yields a slightly higher r-precision value as well (0.16339, compared to the non-homogenized 0.16169). The other average r-precisions are 0.15947, 0.15767, and 0.15196 (for -110, -90, and -80 thresholds, I should learn to make tables). This is in contrast to the distance homogenization, where hubness was seemingly improved but r-precision suffered for all thresholds. Granted the improvement is small and may not be statistically significant (more on this in a later post).
So, even with "manually" kicking out components that do not contribute much to the model (and usually corresponding to outlier musical sections), we don't see much overall improvement. I must look into this more.