LARA-Gen: Enabling Continuous Emotion Control for Music Generation Models via Latent Affective Representation Alignment Paper • 2510.05875 • Published Oct 7
SciTS: Scientific Time Series Understanding and Generation with LLMs Paper • 2510.03255 • Published Sep 26
PicoAudio2: Temporal Controllable Text-to-Audio Generation with Natural Language Description Paper • 2509.00683 • Published Aug 31
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs Paper • 2410.09503 • Published Oct 12, 2024
MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio Paper • 2503.05242 • Published Mar 7 • 1
UniFlow-Audio: Unified Flow Matching for Audio Generation from Omni-Modalities Paper • 2509.24391 • Published Sep 29
T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining Paper • 2404.17806 • Published Apr 27, 2024
AudioTime: A Temporally-aligned Audio-text Benchmark Dataset Paper • 2407.02857 • Published Jul 3, 2024
Enhance Temporal Relations in Audio Captioning with Sound Event Detection Paper • 2306.01533 • Published Jun 2, 2023
Efficient Audio Captioning with Encoder-Level Knowledge Distillation Paper • 2407.14329 • Published Jul 19, 2024 • 5
A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds Paper • 2403.04594 • Published Mar 7, 2024
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation Paper • 2407.02869 • Published Jul 3, 2024 • 21
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound Paper • 2405.00233 • Published Apr 30, 2024 • 17
A Large-scale Dataset for Audio-Language Representation Learning Paper • 2309.11500 • Published Sep 20, 2023 • 10