Emotion-Conditioned Chiptune Music Generation Using a Hybrid PatchTST-LSTM Model

Jing Yuan Sun; Roy Ma

doi:10.31224/5562

##article.authors##

Jing Yuan Sun
Roy Ma

DOI:

https://doi.org/10.31224/5562

Keywords:

Music Generation, Deep Learning, LSTM, PatchTST, Symbolic Music, Transformer Architecture

Abstract

We propose and evaluate a hybrid deep learning model that combines Patch Time Series Transformers (PatchTST) with Long Short-Term Memory (LSTM) networks for symbolic music generation conditioned on emotional states. Using the YM2413-MDB dataset of annotated chiptune music, we map emotions into Russell’s circumplex model (valence-arousal space) and assess the ability of three models—vanilla PatchTST, vanilla LSTM, and our hybrid architecture—to generate emotion-aligned music. Evaluation metrics include melodic coherence, rhythmic stability, harmonic richness, structural complexity, and a custom Emotion Alignment Score. Experimental results show that while the hybrid PatchTST-LSTM model achieved competitive performance, the vanilla LSTM slightly outperformed it in both validation loss and emotional alignment. The findings suggest that recurrent models remain highly effective for short symbolic music sequences, while Transformer-based approaches may require more complex datasets or longer compositions to demonstrate advantages. We discuss limitations of emotion encoding, evaluation methods, and dataset size, and outline directions for future research. Code is available at https://github.com/qwirty123/PatchTST-LSTM.

Downloads

Download data is not yet available.

Emotion-Conditioned Chiptune Music Generation Using a Hybrid PatchTST-LSTM Model

##article.authors##

DOI:

Keywords:

Abstract

Downloads

Downloads

Posted

License

Latest preprints