Classification Accuracy Is Not Enough : On the Evaluation of Music Genre Recognition Systems, 2013

Authors: Bob L. Sturm
Type: Journal paper
Title: Journal of Intelligent Information Systems
Year: 2013
Link

Abstract: We argue that an evaluation of system behavior at the level of the music is required to usefully address the fundamental problems of music genre recognition (MGR), and indeed other tasks of music information retrieval, such as autotagging. A recent review of works in MGR since 1995 shows that most (82\%) measure the capacity of a system to recognize genre by its classification accuracy. After reviewing evaluation in MGR, we show that neither classification accuracy, nor recall and precision, nor confusion tables, necessarily reflect the capacity of a system to recognize genre in musical signals. Hence, such figures of merit cannot be used to reliably rank, promote or discount the genre recognition performance of MGR systems {\em if} genre recognition (rather than identification by confounds) is the objective. This motivates the development of a richer experimental toolbox for evaluating any system designed to intelligently extract information from music signals.

All publications