For ground truth, we can of course look at artist R-precision as before: it came out to be 30.97%, random baseline of 2.3%, about the same as I was getting with the uspop set. Parag also labeled each artist with a primary instrument name. With these we can see if the modeling is matching songs based on the timbral characteristics of the main sound source present in the song, or if it's locking onto more abstract qualities (like audio fidelity).
I used a k-NN classifier with leave-one-out cross-validation, in the same way Elias did in his thesis. The results are below and hopefully readable. The means accuracies (as %) are shown for each number of nearest neighbors polled. The means basically represent the average proportion of the k nearest neighbors that share the seed's primary instrument. For a baseline, I averaged the scores over 100 random kernels for each k level; it was about 23.1% for each level.
Not bad, but to ensure the accuracy is based solely on the instrument similarity, we apply an artist filter, as advocated by Elias. This basically removes any other songs from the same artist as the seed from the potential nearest neighbor pool. This removes the chance that neighbor songs are matches only because of other timbral similarities (e.g. producer effect or audio fidelity). Guess what happens?
|nicm kernel (with af)||58.86||57.30||54.69||50.85||46.91|
The random baseline is about the same at 21.4%. So, accuracy markedly decreases, but it's still significantly above random. It also doesn't fall-off as fast with increased k.
Next, I'd like to homogenize the models and see if these scores improve.