Monday, February 4, 2008

Homogenization and artist distance

Continuing my look into artist distance, I wanted to see if homogenization has any effect on smoothing out the nastiness that would keep an artist from coming up as self-similar across songs. I decided a good metric would be looking at the same median intra-artist distances as well as inter-artist distances (distances between an artist's songs and every other artists' songs). Ideally, we'd expect an artist's songs to be tightly clustered in some area of the timbral space, reasonably distant from other artists' songs. So, I looked at the differences between base-line (no homogenization) intra- and inter-artist distance and these distances after homogenization (currently only looking at the distance-based method since it seemed more well-behaved). The plots below show these distance differences for each component-distance threshold across artists, with peaks labeled for fun (it can be easy to forget we're not just dealing with numbers).
The first plot shows the intra-artist distance differences. Since we'd like to see songs from the same artist move closer to each other (i.e. decrease in distance), we consider positive differences "successes". The opposite is true for the second plot of inter-artist difference, since we'd like to see artists move away from others, so smaller differences are considered "successful" here.
In general, both distance differences tend to be positive, indicating that while we are moving artists closer to themselves, we are also moving them closer to everyone else. In other words, homogenization seems to compact the entire collection in timbre space. So, when we discard outlier components from models, we are in effect making all models more similar. This effect also seems to monotonically increase with the severity of the homogenization, which makes some sense.
It's interesting to note the peaks. Certain artists (like Daft Punk, Bloodhound Gang, and Mike Oldfield) see strong improvements in intra-artist distance through homogenization. These artists also tend to be ones that are the least self-similar before homogenization, so we are helping the artists who seem to need it the most. But, these same artists tend to be also growing increasingly close to other artists, which may not be helpful.
To combine these measures of homogenization success, I next looked at the ratio between these distances for each artist. Using intra over inter, smaller values are better. I again looked at the difference between the base-line and each homogenization run. We'd like to see a positive difference since we'd like homogenization to lower the distance ratio.
We see differences here varying a lot, with no clear across-the-board tendency. We see some of the same artists whose intra-artist distances improved the most here, but not all. And we see homogenization hurts a handful of artists sharply. Both Tool and the Fugees seem to fair significantly worse after homogenization. The Fugees are near the middle of the list of "consistent" artists (by distance), but Tool is second to last, just whom we aimed to help with homogenization. Perhaps Tool's high distances between songs isn't a result of anti-hubness or bad modeling at all, so homogenization of this kind is of no consequence?
Since I'm just looking at median distances, it'd be interesting to get an idea of how these models are compacting in timbral space. We simply see distances decrease with increased homogenization; we don't know whether the songs are converging to a global center or to localized neighborhoods. Maybe a visualization of the timbral space projected into a lower-dimensional space is in order.

No comments: