Xuenan Xu's picture

11 3 5

Xuenan Xu PRO

wsntxxn

·

https://wsntxxn.github.io

AI & ML interests

Text to Speech Synthesis Text to Music Synthesis Singing Voice Synthesis

Recent Activity

updated a Space 5 days ago

wsntxxn/UniFlow-Audio

updated a model 5 days ago

wsntxxn/UniFlow-Audio-small

updated a model 5 days ago

wsntxxn/UniFlow-Audio-medium

View all activity

Organizations

authored 2 papers about 1 month ago

LARA-Gen: Enabling Continuous Emotion Control for Music Generation Models via Latent Affective Representation Alignment

Paper • 2510.05875 • Published Oct 7

Bayesian Speech synthesizers Can Learn from Multiple Teachers

Paper • 2510.24372 • Published Oct 28

authored 6 papers about 2 months ago

SciTS: Scientific Time Series Understanding and Generation with LLMs

Paper • 2510.03255 • Published Sep 26

PicoAudio2: Temporal Controllable Text-to-Audio Generation with Natural Language Description

Paper • 2509.00683 • Published Aug 31

Towards Weakly Supervised Text-to-Audio Grounding

Paper • 2401.02584 • Published Jan 5, 2024

SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs

Paper • 2410.09503 • Published Oct 12, 2024

MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio

Paper • 2503.05242 • Published Mar 7 • 1

UniFlow-Audio: Unified Flow Matching for Audio Generation from Omni-Modalities

Paper • 2509.24391 • Published Sep 29

authored 9 papers over 1 year ago

T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

Paper • 2404.17806 • Published Apr 27, 2024

FakeSound: Deepfake General Audio Detection

Paper • 2406.08052 • Published Jun 12, 2024 • 2

AudioTime: A Temporally-aligned Audio-text Benchmark Dataset

Paper • 2407.02857 • Published Jul 3, 2024

Zero-Shot Audio Captioning Using Soft and Hard Prompts

Paper • 2406.06295 • Published Jun 10, 2024

Enhance Temporal Relations in Audio Captioning with Sound Event Detection

Paper • 2306.01533 • Published Jun 2, 2023

Efficient Audio Captioning with Encoder-Level Knowledge Distillation

Paper • 2407.14329 • Published Jul 19, 2024 • 5

A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds

Paper • 2403.04594 • Published Mar 7, 2024

PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

Paper • 2407.02869 • Published Jul 3, 2024 • 21

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

Paper • 2405.00233 • Published Apr 30, 2024 • 17

authored a paper about 2 years ago

A Large-scale Dataset for Audio-Language Representation Learning

Paper • 2309.11500 • Published Sep 20, 2023 • 10