Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Ahalder 's Collections
Agent
Time series
Embedding
College Project
SLM
Multimodal
Image Processing
Image generation
Dataset
NLP LLM
Speech and Audio
Games
Segmentation
Video generattion
RAG & Quering
Recognition
papers

Speech and Audio

updated Sep 24
Upvote
-

  • facebook/wav2vec2-base-960h

    Automatic Speech Recognition • 94.4M • Updated Nov 14, 2022 • 1.93M • 383

  • ChatMusician: Understanding and Generating Music Intrinsically with LLM

    Paper • 2402.16153 • Published Feb 25, 2024 • 60

  • EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer

    Paper • 2409.10819 • Published Sep 17, 2024 • 18

  • jadechoghari/openmusic

    Text-to-Audio • Updated Oct 10, 2024 • 140 • 71

  • Runtime error
    8

    SEE-2-SOUND

    👀
    8

    Generate spatial audio from images (and optionally text)


  • SWivid/F5-TTS

    Text-to-Speech • Updated Mar 21 • 778k • 1.13k

  • Runtime error
    8

    Paper Whisperer

    📈
    8

    Paper Whisperer


  • aiola/whisper-ner-v1

    Automatic Speech Recognition • 2B • Updated Nov 21, 2024 • 176 • 24

  • Zyphra/Zonos-v0.1-transformer

    Text-to-Speech • Updated Jun 3 • 37.9k • 419

  • Zyphra/Zonos-v0.1-hybrid

    Text-to-Speech • Updated Jun 3 • 40.5k • 1.1k

  • innova-ai/AEROMamba

    Updated Feb 2 • 10

  • herimor/voxtream

    Text-to-Speech • Updated Sep 27 • 1.91k • 20
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs