Update README.md (#11)
Browse files- Update README.md (1483d2d5c668d7921c757fc7cd014fa2cbe7dc5a)
Co-authored-by: Tanel Alumäe <[email protected]>
README.md
CHANGED
|
@@ -133,7 +133,7 @@ widget:
|
|
| 133 |
|
| 134 |
## Model description
|
| 135 |
|
| 136 |
-
This is a spoken language recognition model trained on the VoxLingua107 dataset using SpeechBrain.
|
| 137 |
The model uses the ECAPA-TDNN architecture that has previously been used for speaker recognition. However, it uses
|
| 138 |
more fully connected hidden layers after the embedding layer, and cross-entropy loss was used for training.
|
| 139 |
We observed that this improved the performance of extracted utterance embeddings for downstream tasks.
|
|
@@ -259,7 +259,7 @@ The model has two uses:
|
|
| 259 |
- use as an utterance-level feature (embedding) extractor, for creating a dedicated language ID model on your own data
|
| 260 |
|
| 261 |
The model is trained on automatically collected YouTube data. For more
|
| 262 |
-
information about the dataset, see [here](
|
| 263 |
|
| 264 |
|
| 265 |
#### How to use
|
|
@@ -330,7 +330,7 @@ Since the model is trained on VoxLingua107, it has many limitations and biases,
|
|
| 330 |
|
| 331 |
## Training data
|
| 332 |
|
| 333 |
-
The model is trained on [VoxLingua107](
|
| 334 |
|
| 335 |
VoxLingua107 is a speech dataset for training spoken language identification models.
|
| 336 |
The dataset consists of short speech segments automatically extracted from YouTube videos and labeled according the language of the video title and description, with some post-processing steps to filter out false positives.
|
|
|
|
| 133 |
|
| 134 |
## Model description
|
| 135 |
|
| 136 |
+
This is a spoken language recognition model trained on the [VoxLingua107 dataset](https://cs.taltech.ee/staff/tanel.alumae/data/voxlingua107/) using SpeechBrain.
|
| 137 |
The model uses the ECAPA-TDNN architecture that has previously been used for speaker recognition. However, it uses
|
| 138 |
more fully connected hidden layers after the embedding layer, and cross-entropy loss was used for training.
|
| 139 |
We observed that this improved the performance of extracted utterance embeddings for downstream tasks.
|
|
|
|
| 259 |
- use as an utterance-level feature (embedding) extractor, for creating a dedicated language ID model on your own data
|
| 260 |
|
| 261 |
The model is trained on automatically collected YouTube data. For more
|
| 262 |
+
information about the dataset, see [here](https://cs.taltech.ee/staff/tanel.alumae/data/voxlingua107/).
|
| 263 |
|
| 264 |
|
| 265 |
#### How to use
|
|
|
|
| 330 |
|
| 331 |
## Training data
|
| 332 |
|
| 333 |
+
The model is trained on [VoxLingua107](https://cs.taltech.ee/staff/tanel.alumae/data/voxlingua107/).
|
| 334 |
|
| 335 |
VoxLingua107 is a speech dataset for training spoken language identification models.
|
| 336 |
The dataset consists of short speech segments automatically extracted from YouTube videos and labeled according the language of the video title and description, with some post-processing steps to filter out false positives.
|