Music Genre Classification

A comparative study of machine learning and deep learning architectures for automatic audio classification using the GTZAN dataset.

CSE342
Statistical Machine Learning
89%
Best Accuracy (RCNN)
10
Music Genres
1,000
Audio Samples
5
Models Compared

Model Performance

Decision Tree
Traditional ML
53%
Logistic Regression
Traditional ML
66%
KNN
Traditional ML
67%
CNN
Deep Learning
87%
RCNN
Deep Learning
89%

Genre Spectrograms

Mel-spectrograms visualize audio frequency content over time. Click play to listen to each sample.

blues

blues sample 1 spectrogram
blues.00000.wav
blues sample 2 spectrogram
blues.00001.wav
blues sample 3 spectrogram
blues.00002.wav

classical

classical sample 1 spectrogram
classical.00000.wav
classical sample 2 spectrogram
classical.00001.wav
classical sample 3 spectrogram
classical.00002.wav

country

country sample 1 spectrogram
country.00000.wav
country sample 2 spectrogram
country.00001.wav
country sample 3 spectrogram
country.00002.wav

disco

disco sample 1 spectrogram
disco.00000.wav
disco sample 2 spectrogram
disco.00001.wav
disco sample 3 spectrogram
disco.00002.wav

hiphop

hiphop sample 1 spectrogram
hiphop.00000.wav
hiphop sample 2 spectrogram
hiphop.00001.wav
hiphop sample 3 spectrogram
hiphop.00002.wav

jazz

jazz sample 1 spectrogram
jazz.00000.wav
jazz sample 2 spectrogram
jazz.00001.wav
jazz sample 3 spectrogram
jazz.00002.wav

metal

metal sample 1 spectrogram
metal.00000.wav
metal sample 2 spectrogram
metal.00001.wav
metal sample 3 spectrogram
metal.00002.wav

pop

pop sample 1 spectrogram
pop.00000.wav
pop sample 2 spectrogram
pop.00001.wav
pop sample 3 spectrogram
pop.00002.wav

reggae

reggae sample 1 spectrogram
reggae.00000.wav
reggae sample 2 spectrogram
reggae.00001.wav
reggae sample 3 spectrogram
reggae.00002.wav

rock

rock sample 1 spectrogram
rock.00000.wav
rock sample 2 spectrogram
rock.00001.wav
rock sample 3 spectrogram
rock.00002.wav

Methodology

Dataset

GTZAN Genre Collection with 1,000 audio samples across 10 genres, each 30 seconds long, segmented into 3-second clips.

Feature Extraction

MFCCs, spectral centroid, zero-crossing rate, tempo, and mel-spectrograms for visual representation.

Models

KNN, Logistic Regression, Decision Trees for traditional ML. CNN and RCNN for deep learning approaches.