beyond bag of frames: March 2008

I chose to homogenize by covariance since it looks like that's the main anti-hub correlate for this data. The plot below shows the log-determinant for each component of each model (that's 32 x 897 components). I'm convinced that the super tiny variance components are just ones that collapsed in EM and should definitely be removed. So, I picked two thresholds for now: -300 to remove all the super tiny components, and -150 to remove most everything outside of that massive band around -100.

Artist R-precision increased for both homogenization (I'm not attempting another table):
35.85% for -300
38.24% for -150

Compared to the un-homogenized 32.27% (different than the last post because of some meta-data clean-up). Differences are significant under the Wilcoxon test (p-values ~ 0).

Seems the hubness increases though.
# of hubs (100-occurrences greater than 200):
105 for no homo.
121 for -300
119 for -150

# of anti-hubs (100-occurrences less than 20):
121 for no homo.
114 for -300
131 for -150

So, I guess we're trading smooth hub distribution for precision.

I'll look into the other homogenization methods soon.

To compare against results from the uspop dataset, we put together a set of North Indian classical music (NICM) and ran it through the same CBR fun. This was done with my advisor, Parag Chordia, who's done a lot of work with MIR and Indian music. In all, there are 897 tracks from 141 artists.
For ground truth, we can of course look at artist R-precision as before: it came out to be 30.97%, random baseline of 2.3%, about the same as I was getting with the uspop set. Parag also labeled each artist with a primary instrument name. With these we can see if the modeling is matching songs based on the timbral characteristics of the main sound source present in the song, or if it's locking onto more abstract qualities (like audio fidelity).
I used a k-NN classifier with leave-one-out cross-validation, in the same way Elias did in his thesis. The results are below and hopefully readable. The means accuracies (as %) are shown for each number of nearest neighbors polled. The means basically represent the average proportion of the k nearest neighbors that share the seed's primary instrument. For a baseline, I averaged the scores over 100 random kernels for each k level; it was about 23.1% for each level.

	k=1	k=3	k=5	k=10	k=20
nicm kernel	81.05	74.96	70.68	64.97	58.12

Not bad, but to ensure the accuracy is based solely on the instrument similarity, we apply an artist filter, as advocated by Elias. This basically removes any other songs from the same artist as the seed from the potential nearest neighbor pool. This removes the chance that neighbor songs are matches only because of other timbral similarities (e.g. producer effect or audio fidelity). Guess what happens?

	k=1	k=3	k=5	k=10	k=20
nicm kernel (with af)	58.86	57.30	54.69	50.85	46.91

The random baseline is about the same at 21.4%. So, accuracy markedly decreases, but it's still significantly above random. It also doesn't fall-off as fast with increased k.

Next, I'd like to homogenize the models and see if these scores improve.

beyond bag of frames

Tuesday, March 18, 2008

Homogenization of NICM by covariance

Thursday, March 13, 2008

North Indian Classical Music dataset

mir blogs

Blog Archive

About Me