This is the official repository for: Simultaneous Music Separation and Generation Using Multi-Track Latent Diffusion Models.
The paper is published at ICASSP 2025.
Diffusion models have recently shown strong potential in both music generation and music source separation tasks. Although in early stages, a trend is emerging towards integrating these tasks into a single framework, as both involve generating musically aligned parts and can be seen as facets of the same generative process. In this work, we introduce a latent diffusion-based multi-track generation model capable of both source separation and multi-track music synthesis by learning the joint probability distribution of tracks sharing a musical context. Our model also enables arrangement generation by creating any subset of tracks given the others. We trained our model on the Slakh2100 dataset, compared it with an existing simultaneous generation and separation model, and observed significant improvements across objective metrics for source separation, music, and arrangement generation tasks.
Sound examples are available at https://msg-ld.github.io/
To install MSG-LD, follow these steps:
Clone the repository to your local machine
$ git clone https://github.com/chynggi/MSG-LD-Pytorch2To run the code in this repository, you will need python 3.9+
Navigate to the project directory and install the required dependencies
In this project, the Slakh2100 data is used by default.
Please follow the instructions for data download and set up given here:
https://github.com/gladia-research-group/multi-source-diffusion-models/blob/main/data/README.md
MSG-LD now ships with a dedicated multi-track dataloader for MUSDB18-HQ.
- Download and extract the official MUSDB18-HQ package so that you have
train/andtest/sub-directories containing stem files (bass.wav,drums.wav,other.wav,vocals.wav). - Update the new configuration templates to point at those folders:
config/MSG-LD/multichannel_musicldm_musdb18_train.yamlconfig/MSG-LD/multichannel_musicldm_musdb18_eval.yamlconfig/MSG-LD/multichannel_musicldm_musdb18_train_discoder.yaml(uses DISCoder vocoder)
- Optionally adjust the
stemslist if you prepared custom subsets (defaults match the canonical four-stem layout). - The datamodule re-samples the 44.1 kHz stems to the internal 16 kHz rate automatically, so no additional preprocessing is required.
The first-stage autoencoder now supports both HiFi-GAN and the DISCoder vocoder. HiFi-GAN remains the default, but you can opt into DISCoder by extending the ddconfig block:
first_stage_config:
params:
ddconfig:
# existing settings …
hifigan_ckpt: lightning_logs/musicldm_checkpoints/hifigan-ckpt.ckpt
vocoder:
type: discoder
repo_id: disco-eth/discoder # use checkpoint/config paths for offline runs
revision: main
target_sample_rate: 16000 # resample DISCoder output to match the training rateNote: DISCoder expects 128-bin mel spectrograms and natively operates at 44.1 kHz. The wrapper bundled with MSG-LD automatically interpolates 64-bin mels and rescales the decoded audio back to
target_sample_rate, so you can reuse existing checkpoints without retraining.
If you prefer working offline, replace repo_id/revision with local checkpoint_path and config_path entries. Make sure you install the additional dependencies listed in the updated requirements.txt (huggingface_hub, descript-audio-codec, and the DISCoder repository).
After data and conda evn are intalled properlly, you will need to dowload components of MusicLDM that are used for MSG-LD too. For this please
# Download hifigan-ckpt.ckpt
wget https://zenodo.org/record/10643148/files/hifigan-ckpt.ckpt
# Download vae-ckpt.ckpt
wget https://zenodo.org/record/10643148/files/vae-ckpt.ckpt
After placing this in some directory and changing corresponding links in the config file, train MSG-LD with one of the provided configurations. For example:
# Slakh2100 (default)
python train_musicldm.py --config config/MSG-LD/multichannel_musicldm_slakh_3d_train.yaml
# MUSDB18-HQ (new)
python train_musicldm.py --config config/MSG-LD/multichannel_musicldm_musdb18_train.yaml
# MUSDB18-HQ with DISCoder vocoder
python train_musicldm.py --config config/MSG-LD/multichannel_musicldm_musdb18_train_discoder.yamlFor separation and total generation, use the following command. Adjust the unconditional_guidance_scale parameter as follows:
- Set
unconditional_guidance_scaleto0for total generation in unconditional mode. - Set
unconditional_guidance_scaleto1or2for conditional generation, which performs separation.
# Separation and Total Generation (Slakh):
python train_musicldm.py --config config/MSG-LD/multichannel_musicldm_slakh_3d_eval.yaml
# Separation and Total Generation (MUSDB18-HQ):
python train_musicldm.py --config config/MSG-LD/multichannel_musicldm_musdb18_eval.yamlFor arrangement generation, run the command below and specify the instrument(s) you want to generate in the stems_to_inpaint parameter.
# Arrangement Generation:
python train_musicldm.py --config config/MSG-LD/multichannel_musicldm_slakh_3d_eval_inpaint.yamlWhen you only need to demix a single mixture file (instead of running the full
evaluation loop), use the lightweight helper in scripts/separate_single_mixture.py:
python scripts/separate_single_mixture.py \
--config config/MSG-LD/multichannel_musicldm_musdb18_eval.yaml \
--checkpoint /path/to/lightning_logs/.../checkpoints/last.ckpt \
--mixture path/to/song.wav \
--output-dir outputs/songKey flags:
--overlap: cross-fade ratio between consecutive segments (defaults to0).--guidance-scale: classifier-free guidance strength (same semantics as the Lightning evaluation entry point).--use-plms: switch from DDIM to PLMS sampling.--batch-size: number of chunks processed in parallel (useful when you have multiple GPUs).
The script writes one WAV file per stem plus a reconstructed mixture to the requested output directory.
To demix an entire directory of songs in one go, run the batch helper. It reuses the same weights and parameters as the single-track script but iterates over every supported audio file in the folder (recursively if requested):
python scripts/separate_folder_mixtures.py \
--config config/MSG-LD/multichannel_musicldm_musdb18_eval.yaml \
--checkpoint /path/to/lightning_logs/.../checkpoints/last.ckpt \
--input-dir path/to/mixtures \
--output-dir outputs/batchUseful optional flags:
--recursive: descend into subfolders when searching for audio files.--skip-existing: continue from where you left off by skipping already rendered stem folders.--extensions .wav .flac: control which file endings count as mixtures.--per-file-progress: enables the inner diffusion progress bar if you want detailed feedback per song.