@hexgrad on Hugging Face: "Wanted: Peak Data. I'm collecting audio data to train another TTS model: + AVM…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

hexgrad

posted an update Feb 7, 2025

Post

7497

Wanted: Peak Data. I'm collecting audio data to train another TTS model:
+ AVM data: ChatGPT Advanced Voice Mode audio & text from source
+ Professional audio: Permissive (CC0, Apache, MIT, CC-BY)

This audio should *impress* most native speakers, not just barely pass their audio Turing tests. Professional-caliber means S or A-tier, not your average bloke off the street. Traditional TTS may not make the cut. Absolutely no low-fi microphone recordings like Common Voice.

The bar is much higher than last time, so there are no timelines yet and I expect it may take longer to collect such mythical data. Raising the bar means evicting quite a bit of old data, and voice/language availability may decrease. The theme is *quality* over quantity. I would rather have 1 hour of A/S-tier than 100 hours of mid data.

I have nothing to offer but the north star of a future Apache 2.0 TTS model, so prefer data that you *already have* and costs you *nothing extra* to send. Additionally, *all* the new data may be used to construct public, Apache 2.0 voicepacks, and if that arrangement doesn't work for you, no need to send any audio.

Last time I asked for horses; now I'm asking for unicorns. As of writing this post, I've currently got a few English & Chinese unicorns, but there is plenty of room in the stable. Find me over on Discord at rzvzn: https://discord.gg/QuGxSWBfQy

Siddharth

Feb 12, 2025

Hi Where do I send this?

holooo

Feb 15, 2025

He also mentioned the link to Discord at the bottom.

nxym

Feb 19, 2025

Hello, I have a bunch of high-quality data from the production of a video of voices that belong to my rights from male and female on the Greek language and which have incredible access to create a small clone model of male voice from this set with only 10 minutes of audio. But I have a ton of hours of these sets for professional training. My language is complicated, and it was a surprising result. Most models out there use one robotic set and have bad datasets to create Greek human voice. Only eleven labs have better sets and open. I'll be glad to help your project because it is for sure the most promising out-of-the-box natural and fast processing on the fly real-time voice system, even in systems with low resources.

arreumb

Oct 23, 2025

Hi, I have the following, it's high quality, labelled data, but I don't know if it can be used.
Training Data for Speech Recognition and Speech Sythesis

Language: Italian (Italy)
Source: LibriVox (http://www.librivox.org/)
Total durations: 127h 40m
Female: 8h 23m
Lisa Caputo: 8h 23m
Male: 31h 45m
Riccardo Fasol: 31h 45m
Mixed: 87h 53m

This training data was created by THE TEAM @ MUNICH ARTIFICIAL INTELLIGENCE LABORATORIES GmbH
Munich, Germany

If you have any questions, please contact us at [email protected]

                                   LICENSE

Redistribution and use in any form, including any commercial use, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source data must retain the above copyright notice, this list of conditions and the following disclaimer.
Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this downloaded data, source-code or binary-code without specific prior written permission.
ANY USE BY ANY UNIVERSITY, COLLEGE, RESEARCH INSTITUTE OR SIMILAR HIGHER EDUCATION INSTITUTION IN GERMANY, SWITZERLAND or AUSTRIA, INCLUDING BY MEMBERS OF SUCH INSTITUTIONS (including but not limited to the students, tutors and teachers at those institutions), REQUIRES A SEPARATE LICENSE AND IS NOT COVERED BY THIS LICENSE AGREEMENT. PLEASE CONTACT US FOR DETAILS AT [email protected].

THIS DATA IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE and/or DATA, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

The Copyright of the original speakers is held by the respective speakers or by LibriVox. LibriVox data is in the Public Domain.

We would highly appreciate if you would let us know of your use of this data or mention us or our company if you have used this data in a succesfull project - but you don't have to :-)

Have fun

In this post