Skip to content

Commit

Permalink
Common Voice results for OH-SMArt benchmark
Browse files Browse the repository at this point in the history
  • Loading branch information
greenw0lf committed Jun 3, 2024
1 parent a5508fb commit 2b81eba
Show file tree
Hide file tree
Showing 2 changed files with 33 additions and 3 deletions.
29 changes: 29 additions & 0 deletions UT/CommonVoice/cv.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
[Back to homepage](../../index.md)

<h2>Common Voice</h2>

Common Voice is a crowdsourced and open initiative to create a multilingual speech dataset. Users can record themselves speaking an utterance and also review the utterances others have spoken via an upvote/downvote system to ensure that recordings correspond to the given prompts and are of sufficient audio quality. For more information about the project, check [the official website](https://commonvoice.mozilla.org).

Multiple versions of the dataset are available, released by Mozilla every few months. The **test** subset has been evaluated and the version used in this benchmark is **17.0**, which contains around 15 hours of speech for the test set.

<br>

Here is a matrix with **WER** results and the **time** each model/configuration spent transcribing the dataset:

|Model\Dataset|WER|Time|
|---|---|---|
|Kaldi_NL|20.7%|8h:15m:54s*|
|faster-whisper v2|5.6%|1h:58m:37s|
|*faster-whisper v3*|**4.3%**|1h:55m:20s|
|faster-whisper v2 w/ VAD|5.6%|1h:58m:50s|
|faster-whisper v3 w/ VAD|4.4%|2h:01m:33s|
|XLS-R FT on Dutch|6.5%|1h:04m:00s|
|*MMS - 102 languages*|13.4%|**0h:37m:50s**|
|MMS - 1162 languages|9.5%|0:53m:56s|

\* Most of the time was spent by the configuration running the speaker diarization module.


### Normalization

Text normalization techniques have been applied to both the reference and hypothesis files when calculating the WER, such as normalization of numbers and characters or variations of different words that are acceptable spellings in the context of evaluation. For more details, check the [ASR-NL-benchmark](https://github.com/opensource-spraakherkenning-nl/ASR_NL_benchmark) repository, where details about the hypothesis/reference files format can also be found.
7 changes: 4 additions & 3 deletions index.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
<h1>Dutch ASR Performance</h1>
<h1>Dutch Open Speech Recognition Benchmark</h1>

Welcome to the page where researchers and developers report performance of various ASR models on Dutch datasets.
Welcome to the benchmark page where researchers and developers report performance of various ASR models on Dutch datasets.

<h2>UT's Kaldi_NL vs. Whisper vs. others evaluation</h2>
<h2>UT's evaluation</h2>

*UT = University of Twente*

- [Results for N-Best 2008 Dutch Evaluation corpus](./UT/N-Best/nbest_res.md)
- [Results for Jasmin-CGN corpus](./UT/Jasmin/jasmin.md)
- [Results for Common Voice](./UT/CommonVoice/cv.md)
- [Hardware setup & model configurations](./UT/hardware.md)
- [Why do the results differ between whisper-timestamped and faster-whisper?](./UT/analysis.md)

Expand Down

0 comments on commit 2b81eba

Please sign in to comment.