Categorising audio with K-Means: an extensible system for content-based music exploration

Papunen, Janne

Categorising audio with K-Means: an extensible system for content-based music exploration

Papunen, Janne (2023)

Avaa tiedosto

Thesis_Papunen_Janne.pdf (3.031Mt)

Lataukset:

Rajoitettu käyttöoikeus / Restricted access / Tillgången begränsad

Papunen, Janne

2023

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-202301201489

Tiivistelmä

Categorisation is a fundamental part of human organisation of information. Due to the abundance of data in the present day, it is increasingly tedious to manually organise large libraries of any sort of items. In the context of music, categorisation is typically performed on genres and moods, however, there are limitations with automatic classifiers, stemming from e.g., differing perception between different people, the myriad of existing genres, and the relative difficulty of assigning genres to tracks of music adequately.
Music.Info Finland Oy, a company engaged in the digital music industry, commissioned the development of a system capable of music categorisation without any a priori knowledge about tags or other metadata. The aim of the system was to analyse raw audio files and categorise them utilising machine learning techniques, according to any discovered inter-track similarities. An important distinction to many such existing systems was that the system was not to be constrained to attain categorisation by musical similarity only, but include the possibility for other types of categorisations, such as that of recordings, as well.
Over 40 low-level audio features per track were computed with Music Information Retrieval algorithms and processed with K-Means into clusters. Two experiments were performed to assess the efficiency of the clustering: (1) track-segment categorisation, and (2) genre / artist categorisation. In the first experiment, 20 randomly selected tracks from the Music.Info Distribution Platform (MIDP) database were automatically segmented into parts, analysed, and categorised “back” into 20 clusters. The second experiment organised a larger set of 35 712 MIDP database tracks, analysed with two different sample rates (44.1 kHz and 11.025 kHz), into 12 arbitrary genre-families. Four measures to evaluate the external fitness of the clustering formations were employed: Homogeneity, Completeness, V-Measure, and Cluster-Coherence, of which the last one was conceived within the implementation of the system to assess the fitness of individual clusters.
In general, the categorisations seemed to be more successful on the recording aspects than on musical properties of the tracks. Especially genre-separation in experiment (2) performed poorly. This was not surprising, given that no mid- or high-level acoustic / musical features were computed in the track analysis phase. Experiment (1) found that parts of the same track tended to cluster together, suggesting an affinity for recording-based similarity detection, which was also corroborated to some extent by artist-to-cluster assignments in experiment (2). It was also found that the lower sample rate produced slightly lower result values.
The developed system could find application in the quality detection of recordings, playlist generation, and music exploration in general.

Kokoelmat

Opinnäytetyöt (Käyttörajattu kokoelma)