UNMIXX: Untangling Highly Correlated Vocals in Multiple Singing Voice Separation

Jihoo Jung, Ji-Hoon Kim, Doyeop Kwak, Junwon Lee, Juhan Nam, Joon Son Chung
Korea Advanced Institute of Science and Technology, South Korea

Abstract

We introduce UNMIXX, a novel framework for multiple singing voices separation (MSVS). While related to speech separation, MSVS faces unique challenges: data scarcity and the highly correlated nature of singing voices mixture. To address these issues, we propose UNMIXX with three key components: (1) musically informed mixing strategy to construct highly correlated, music-like mixtures, (2) cross-source attention that drives representations of two singers apart via reverse attention, and (3) magnitude penalty loss penalizing erroneously assigned interfering energy. UNMIXX not only addresses data scarcity by simulating realistic training data, but also excels at separating highly correlated mixtures through cross-source interactions at both the architectural and loss levels. Our extensive experiments demonstrate that UNMIXX greatly enhances performance, with SDRi gains exceeding 2.2 dB over prior work.

Audio Samples

MedleyVox Duet Samples

🎡 MedleyVox Duet β€” Sample 1

Mixture

Ground Truth

MedleyVox

TIGER

Proposed

🎡 MedleyVox Duet β€” Sample 2

Mixture

Ground Truth

MedleyVox

TIGER

Proposed

🎡 MedleyVox Duet β€” Sample 3

Mixture

Ground Truth

MedleyVox

TIGER

Proposed

🎡 MedleyVox Duet β€” Sample 4

Mixture

Ground Truth

MedleyVox

TIGER

Proposed

MedleyVox Unison Samples

🎡 MedleyVox Unison β€” Sample 1

Mixture

Ground Truth

MedleyVox

TIGER

Proposed

🎡 MedleyVox Unison β€” Sample 2

Mixture

Ground Truth

MedleyVox

TIGER

Proposed

🎡 MedleyVox Unison β€” Sample 3

Mixture

Ground Truth

MedleyVox

TIGER

Proposed

Additional Pop Samples

🎡 Jason Mraz, Colbie Caillat β€” Lucky

Mixture
Proposed

🎡 Kendrick Lamar, SZA β€” LUTHER

Mixture
Proposed

🎡 ROSΓ‰, Bruno Mars β€” APT.

Mixture
Proposed

🎡 Rumi, Jinu (K-Pop Demon Hunters) β€” Free

Mixture
Proposed