beyond bag of frames: Strengthening neighborhoods through homogenization

One concern with homogenization is that it may uniformly pull all songs toward a central point in timbral space. Intuitively, as Aucouturier nicely points out in his thesis, homogenized models, after a point, have lost the unique timbral subtleties that provide a system with its discriminative power (the models go "from representing a given song, down to a more global style of music, down to the even simpler fact that it is music"). We hope to homogenize just enough to throw out the outlier GMM components (typical of anti-hubs) so that these songs are introduced to the pool, thus decreasing hubness and improving recommendation.
So, ideally, we'd expect homogenization (at the right level) to not affect most songs' placement in the timbral space and just bring in the outliers. I see it as strengthening timbral neighborhoods: where nasty components were breaking up these spots before, keeping fine songs from ever getting too close, homogenization (hopefully) comes in to bring these tracks together, where they belong.
Since we can't rely on artists to be self-similar or consistent, any kind of metric involving intra- vs. inter-class relations is inherently flawed to some extent. I'd just like to see if songs are simply clustering better after homogenization. So, I looked at the average distance to the top k nearest neighbors for each song, before and after homogenization. The plots shows most of these distances (truncated for clarity). x = before, y = after (using the distance-based method at a threshold of 15).
k = 20:

k = 100:

The relation seems to be more-or-less linear, with a significant y-offset (from the the decrease in all distances). The slope seems to be fairly one-to-one (i.e. songs with lots of close neighbors remain with lots of neighbors). We see that songs with distant neighbors tend to be affected more by homogenization (i.e. more off the center of the imaginary regression line). Also, the distribution about this imaginary regression line doesn't seem even. How about a histogram of the differences?
k = 20:

The differences are before minus after homogenization, so positive values indicate a decrease in neighbor distances. We do see it's almost bell-like, but with a fatter end on the right. This means that homogenization is bringing more songs closer to their neighbors than pulling them apart. In fact, the mean difference is 18.42, median 11.36, and a skewness of 17.23, verifying that long tail. (The differences passed a t-test, p-value = ~0)
So, we see that all songs are indeed getting closer to their neighbors, but a good portion more than others. Is this a sign that clusters are forming? Timbral neighborhood strengthening? Are these new neighbors good (i.e. perceptually valid) neighbors? If all songs were being pulled toward some global center by homogenization, what would we expect to see? More to come after I think.

beyond bag of frames

Friday, February 8, 2008

Strengthening neighborhoods through homogenization

No comments:

mir blogs

Blog Archive

About Me