Common Voice results for OH-SMArt benchmark

opensource-spraakherkenning-nl · Jun 3, 2024 · 2b81eba · 2b81eba
1 parent a5508fb
commit 2b81eba
Show file tree

Hide file tree

Showing 2 changed files with 33 additions and 3 deletions.
diff --git a/UT/CommonVoice/cv.md b/UT/CommonVoice/cv.md
@@ -0,0 +1,29 @@
+[Back to homepage](../../index.md)
+
+<h2>Common Voice</h2>
+
+Common Voice is a crowdsourced and open initiative to create a multilingual speech dataset. Users can record themselves speaking an utterance and also review the utterances others have spoken via an upvote/downvote system to ensure that recordings correspond to the given prompts and are of sufficient audio quality. For more information about the project, check [the official website](https://commonvoice.mozilla.org).
+
+Multiple versions of the dataset are available, released by Mozilla every few months. The **test** subset has been evaluated and the version used in this benchmark is **17.0**, which contains around 15 hours of speech for the test set.
+
+<br>
+
+Here is a matrix with **WER** results and the **time** each model/configuration spent transcribing the dataset:
+
+|Model\Dataset|WER|Time|
+|---|---|---|
+|Kaldi_NL|20.7%|8h:15m:54s*|
+|faster-whisper v2|5.6%|1h:58m:37s|
+|*faster-whisper v3*|**4.3%**|1h:55m:20s|
+|faster-whisper v2 w/ VAD|5.6%|1h:58m:50s|
+|faster-whisper v3 w/ VAD|4.4%|2h:01m:33s|
+|XLS-R FT on Dutch|6.5%|1h:04m:00s|
+|*MMS - 102 languages*|13.4%|**0h:37m:50s**|
+|MMS - 1162 languages|9.5%|0:53m:56s|
+
+\* Most of the time was spent by the configuration running the speaker diarization module.
+
+
+### Normalization
+
+Text normalization techniques have been applied to both the reference and hypothesis files when calculating the WER, such as normalization of numbers and characters or variations of different words that are acceptable spellings in the context of evaluation. For more details, check the [ASR-NL-benchmark](https://github.com/opensource-spraakherkenning-nl/ASR_NL_benchmark) repository, where details about the hypothesis/reference files format can also be found.
diff --git a/index.md b/index.md
@@ -1,13 +1,14 @@
-<h1>Dutch ASR Performance</h1>
+<h1>Dutch Open Speech Recognition Benchmark</h1>
 
-Welcome to the page where researchers and developers report performance of various ASR models on Dutch datasets.
+Welcome to the benchmark page where researchers and developers report performance of various ASR models on Dutch datasets.
 
-<h2>UT's Kaldi_NL vs. Whisper vs. others evaluation</h2>
+<h2>UT's evaluation</h2>
 
 *UT = University of Twente*
 
 - [Results for N-Best 2008 Dutch Evaluation corpus](./UT/N-Best/nbest_res.md)
 - [Results for Jasmin-CGN corpus](./UT/Jasmin/jasmin.md)
+- [Results for Common Voice](./UT/CommonVoice/cv.md)
 - [Hardware setup & model configurations](./UT/hardware.md)
 - [Why do the results differ between whisper-timestamped and faster-whisper?](./UT/analysis.md)