Working from Adam Berenzweig's blogged experiments, I've found a some-what strong correlation between hub songs and the overall spread of the components. Hubs tend to have components tightly clustered around their centroid, whereas anti-hubs have components significantly far from each other. To verify, I found the median intra-component KL-divergence for each song model. The correlation with this and the song's "hubness" (the number of times occurring on other top-100 lists, aka JJ's 100-occurance measure) was -0.4179 (p-value = 1.21e-45). In other words, the stronger the hub the more compact the GMM components are in MFCC space.
Then, I started looking at the activation of the individual GMM components over the MFCC frames of the songs and noticed that the more distant the component (in relation to the GMM's centroid) the more likely it came from a timbrally spurious section of the song. These sections can be as short as a few frames, but EM apparently still devotes components to them. Below is a good example from the GMM (16 components, diag covar) of The Bloodhound Gang's "Right Turn Clyde" (hubness value of zero!). The activations are shown on the right and the Euclidean distance from the GMM centroid is on the left. It's clear at least 7 of the 16 components are given to the short section in the middle, and these components are the furthest from the model's centroid.


No comments:
Post a Comment