beyond bag of frames: February 2008

Wednesday, February 20, 2008

Nice Gaussian hubs

In trying to prove that anti-hubs are the root of all CBR evil (at least in my world), I've mainly looked at the fact that they have strange distributions, which typical modeling methods (i.e. GMMs!) tend to model in a way that isn't exactly helpful.
Recently, I've been looking at non-parametric modeling and through this got a nice visualization of the distributions of MFCC frames (probably something I should have done from the beginning!). Below I show the distributions for the first 6 MFCC dimensions (by row, so first row has MFCC's 1 through 3), first for a prototypical hub (Carly Simon's "We Have No Secrets", 368 100-occurrences) and then for a prototypical anti-hub (Bloodhound Gang's "Your Only Friends Are Make Believe", 2 100-occurrences).

We see that, indeed, a hub has nice, relatively Gaussian distributions, while the anti-hub's are nasty and multi-modal. This further vindicates the rationale for homogenization: modes exist in the distribution of anti-hubs' frames that are perhaps not relevant to a good timbral model and we'd like to get rid of them. Homogenization pushed to its extreme, after all, would lead to a nice single Gaussian.
To see if this idea really generalizes, I looked at how well each distribution fit a single Gaussian distribution, parameterized to the distribution's mean and variance. Below is the scatter plot of hubness vs. the log-likelihood.

It's not as strong a correlation (rho = 0.0939, p-val = 0.0196) as I was expecting from just looking at the histograms. We do see that the top most and least likely single Gaussians are anti-hubs. I think this could be explained by songs with lots of a single timbre (i.e. silence). This would mean lots of samples fall near the mean and there would be a very small variance, leading to a high likelihood values, while all of the relevant frames are far from this mean (i.e. music) and receive low likelihood values. This is the case with our favorite subset of tracks: those with "hidden" tracks, like the previously mentioned Jamiroquai track and "Chris Cayton" by Goldfinger (2 100-occurences, MFCC histograms below) (check out the comments on its last.fm page).

Friday, February 8, 2008

Strengthening neighborhoods through homogenization

One concern with homogenization is that it may uniformly pull all songs toward a central point in timbral space. Intuitively, as Aucouturier nicely points out in his thesis, homogenized models, after a point, have lost the unique timbral subtleties that provide a system with its discriminative power (the models go "from representing a given song, down to a more global style of music, down to the even simpler fact that it is music"). We hope to homogenize just enough to throw out the outlier GMM components (typical of anti-hubs) so that these songs are introduced to the pool, thus decreasing hubness and improving recommendation.
So, ideally, we'd expect homogenization (at the right level) to not affect most songs' placement in the timbral space and just bring in the outliers. I see it as strengthening timbral neighborhoods: where nasty components were breaking up these spots before, keeping fine songs from ever getting too close, homogenization (hopefully) comes in to bring these tracks together, where they belong.
Since we can't rely on artists to be self-similar or consistent, any kind of metric involving intra- vs. inter-class relations is inherently flawed to some extent. I'd just like to see if songs are simply clustering better after homogenization. So, I looked at the average distance to the top k nearest neighbors for each song, before and after homogenization. The plots shows most of these distances (truncated for clarity). x = before, y = after (using the distance-based method at a threshold of 15).
k = 20:

k = 100:

The relation seems to be more-or-less linear, with a significant y-offset (from the the decrease in all distances). The slope seems to be fairly one-to-one (i.e. songs with lots of close neighbors remain with lots of neighbors). We see that songs with distant neighbors tend to be affected more by homogenization (i.e. more off the center of the imaginary regression line). Also, the distribution about this imaginary regression line doesn't seem even. How about a histogram of the differences?
k = 20:

The differences are before minus after homogenization, so positive values indicate a decrease in neighbor distances. We do see it's almost bell-like, but with a fatter end on the right. This means that homogenization is bringing more songs closer to their neighbors than pulling them apart. In fact, the mean difference is 18.42, median 11.36, and a skewness of 17.23, verifying that long tail. (The differences passed a t-test, p-value = ~0)
So, we see that all songs are indeed getting closer to their neighbors, but a good portion more than others. Is this a sign that clusters are forming? Timbral neighborhood strengthening? Are these new neighbors good (i.e. perceptually valid) neighbors? If all songs were being pulled toward some global center by homogenization, what would we expect to see? More to come after I think.

Monday, February 4, 2008

Homogenization and artist distance

Continuing my look into artist distance, I wanted to see if homogenization has any effect on smoothing out the nastiness that would keep an artist from coming up as self-similar across songs. I decided a good metric would be looking at the same median intra-artist distances as well as inter-artist distances (distances between an artist's songs and every other artists' songs). Ideally, we'd expect an artist's songs to be tightly clustered in some area of the timbral space, reasonably distant from other artists' songs. So, I looked at the differences between base-line (no homogenization) intra- and inter-artist distance and these distances after homogenization (currently only looking at the distance-based method since it seemed more well-behaved). The plots below show these distance differences for each component-distance threshold across artists, with peaks labeled for fun (it can be easy to forget we're not just dealing with numbers).

The first plot shows the intra-artist distance differences. Since we'd like to see songs from the same artist move closer to each other (i.e. decrease in distance), we consider positive differences "successes". The opposite is true for the second plot of inter-artist difference, since we'd like to see artists move away from others, so smaller differences are considered "successful" here.
In general, both distance differences tend to be positive, indicating that while we are moving artists closer to themselves, we are also moving them closer to everyone else. In other words, homogenization seems to compact the entire collection in timbre space. So, when we discard outlier components from models, we are in effect making all models more similar. This effect also seems to monotonically increase with the severity of the homogenization, which makes some sense.
It's interesting to note the peaks. Certain artists (like Daft Punk, Bloodhound Gang, and Mike Oldfield) see strong improvements in intra-artist distance through homogenization. These artists also tend to be ones that are the least self-similar before homogenization, so we are helping the artists who seem to need it the most. But, these same artists tend to be also growing increasingly close to other artists, which may not be helpful.
To combine these measures of homogenization success, I next looked at the ratio between these distances for each artist. Using intra over inter, smaller values are better. I again looked at the difference between the base-line and each homogenization run. We'd like to see a positive difference since we'd like homogenization to lower the distance ratio.

We see differences here varying a lot, with no clear across-the-board tendency. We see some of the same artists whose intra-artist distances improved the most here, but not all. And we see homogenization hurts a handful of artists sharply. Both Tool and the Fugees seem to fair significantly worse after homogenization. The Fugees are near the middle of the list of "consistent" artists (by distance), but Tool is second to last, just whom we aimed to help with homogenization. Perhaps Tool's high distances between songs isn't a result of anti-hubness or bad modeling at all, so homogenization of this kind is of no consequence?
Since I'm just looking at median distances, it'd be interesting to get an idea of how these models are compacting in timbral space. We simply see distances decrease with increased homogenization; we don't know whether the songs are converging to a global center or to localized neighborhoods. Maybe a visualization of the timbral space projected into a lower-dimensional space is in order.

Friday, February 1, 2008

Intra-artist distance (or adventures in CBR)

Another way to perhaps more directly see the consistency of an artist is to look at the computed distances between songs. If our models are working, songs from the same artists should be relatively similar, so their distances should tend to be low. To contrast the r-precision fun we had in a previous post (and to keep those reading who aren't MIR obsessives entertained), I found the mean intra-artist distances.

Top 10!

ricky martin - 37.5232
smash mouth - 43.5782
steve winwood - 47.0957
third eye blind - 47.3322
korn - 48.7898
fleetwood mac - 49.2755
jennifer paige - 49.5098
sugar ray - 49.724
lionel richie - 51.0326
mya - 51.204

Bottom 10! (ignoring Westlife whose distances are inaccurately huge)

prince -708.6668
jamiroquai - 673.2426
natalie imbruglia - 610.4856
oasis - 493.2065
bloodhound gang - 342.4461
daft punk - 231.6445
radiohead - 229.4721
tool - 198.714
miles davis - 197.7051
frank sinatra - 184.6662

We see a few repeats from the r-precision lists, but maybe not as many as we'd expect. There is a significantly sorta-strong (de)correlation between r-precision and mean intra-artist distance (-0.2373, p-value = 0.0153).
So, I dig deeper. The r-precision ranks were based on the top recommendations for each song, just what hubs (and anti-hubs) are best at mucking up. The list above are based on means of distances, which are particularly sensitive to outliers (which we have seen are usually badly modeled anti-hubs).
Let's take Jamiroquai. Below is a visualization his inter-song log-distances (red = distant).

Looks like we have an outlier, and it's name is "Picture Of My Life" from the epic album "Funk Odyssey". This track's hubness (using 100-occurances) is 2, so it's easily considered an anti-hub. Taking a look at the "activation-gram" we see a weird section about 3.5 minutes in.

Clearly at least 7 of the 32 components were trained to solely model this part of the track. What is this strange musical section, you may ask? It turns out to be the silence between the end of the song and the beginning of the "hidden track", "So Good To Feel Real". You can even see that the second song is also not as well modeled as the first. Oh, the joys of content-based recommendation!
After more looking around, it looks like most artists at the bottom of the list above have just one song in their set that, for some weird but reasonable reason (e.g. there's a Michael Jackson "song" which seems to just be a bonus voice-over included on the remastered edition of "Off The Wall"), doesn't fit with the others.
There's a statistic that's particularly good at weeding out these outlier songs: the median. It turns out, and makes sense, that the median intra-artist distances are more (de)correlated with the average r-precision: corr. coef. = -0.4195, p-value = ~0. And we indeed replace the suspect bottom of the above list with our familiar, typically inconsistent artists. So, the median is good and I am a fan (although without first using the mean I would have never listened to those hot Jamiroquai tracks).
I'd also like to counter what you may be asking: why not use better data? I could and have often thought about it, but the uspop collection is something of a standard and will be easier for anyone to cross-check my work against his. Besides, input problems like the ones shown here are realistic problems any good recommendation engine should be able to handle.

beyond bag of frames