Evolution, Evaluation, and Challenges of Automatic Music Style Clustering Techniques: A Review from Handcrafted Features to Self-Supervised Learning
DOI:
https://doi.org/10.61173/axq15e97Keywords:
Music style clustering, Self-supervised learning, Unsupervised representation, Music information retrieval, Custer evaluationAbstract
Automatic discovery and clustering of music styles are fundamental to music information retrieval, recommendation systems, and computational musicology. Recent advances in deep learning and selfsupervised representation learning have substantially improved audio embeddings, enabling more effective unsupervised clustering based on musical style and acoustic characteristics. This paper reviews representative approaches to music style clustering, including traditional feature-based methods, two-stage frameworks combining deep embeddings with clustering, and emerging selfsupervised online clustering models that jointly optimize representation learning and clustering structure. Furthermore, this paper summarizes commonly used datasets and evaluation metrics in this field, reviews experimental findings from representative studies, and analyzes the advantages and limitations of various methods. Finally, this paper points out the current open challenges, including inconsistent evaluation standards, difficulties in cross-cultural and multi-label style recognition, insufficient interpretability of clustering results, and scalability issues in large-scale streaming scenarios. This paper aims to provide researchers with a clear and structured technical roadmap to facilitate the development and evaluation of future music style clustering systems.