Speech and Audio - a Ahalder Collection

Ahalder 's Collections

Agent

College Project

SLM

Image Processing

Image generation

NLP LLM

Speech and Audio

Games

Video generattion

papers

Speech and Audio

updated Sep 24

facebook/wav2vec2-base-960h

Automatic Speech Recognition • 94.4M • Updated Nov 14, 2022 • 1.93M • 383
ChatMusician: Understanding and Generating Music Intrinsically with LLM

Paper • 2402.16153 • Published Feb 25, 2024 • 60
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer

Paper • 2409.10819 • Published Sep 17, 2024 • 18
jadechoghari/openmusic

Text-to-Audio • Updated Oct 10, 2024 • 140 • 71
Runtime error

8

SEE-2-SOUND

👀

8

Generate spatial audio from images (and optionally text)
SWivid/F5-TTS

Text-to-Speech • Updated Mar 21 • 778k • 1.13k
Runtime error

8

Paper Whisperer

📈

8

Paper Whisperer
aiola/whisper-ner-v1

Automatic Speech Recognition • 2B • Updated Nov 21, 2024 • 176 • 24
Zyphra/Zonos-v0.1-transformer

Text-to-Speech • Updated Jun 3 • 37.9k • 419
Zyphra/Zonos-v0.1-hybrid

Text-to-Speech • Updated Jun 3 • 40.5k • 1.1k
innova-ai/AEROMamba

Updated Feb 2 • 10
herimor/voxtream

Text-to-Speech • Updated Sep 27 • 1.91k • 20