Each of the 8 note patches is independently classified across three features using separate CNN models — one per feature. Each CNN takes a spectrogram patch as input and outputs a binary classification.
Keeping classifiers separate allows each CNN to specialize in its own visual pattern, enables independent debugging, and makes the system extensible.
Auto-labeling: tuning via librosa.pyin (cents deviation), timing via onset offset from metronome grid, timbre via spectral noisiness measure.
CLASSIFIER 01 — TIMING
On time / Off time — onset position relative to metronome beat (threshold: ±50ms)
CLASSIFIER 02 — TUNING
In tune / Out of tune — dominant frequency vs. target pitch (threshold: ±20 cents)
CLASSIFIER 03 — TIMBRE
Clear / Unclear — spectral noisiness, harmonic clarity, buzz detection