Updating results with accurate time values

opensource-spraakherkenning-nl · Jul 10, 2024 · b00d22c · b00d22c
1 parent 5db0a9a
commit b00d22c
Show file tree

Hide file tree

Showing 2 changed files with 4 additions and 30 deletions.
diff --git a/NISV/res_labelled.md b/NISV/res_labelled.md
@@ -44,26 +44,13 @@ And a matrix with the **time** spent in total by each implementation **to load a
 |---|---|---|---|---|
 |[OpenAI](https://github.com/openai/whisper)|36m:06s|32m:41s|42m:08s|30m:25s|
 |[Huggingface (`transformers`)](https://huggingface.co/openai/whisper-large-v2#long-form-transcription)|21m:48s|19m:13s|23m:22s|22m:02s|
-|**[faster-whisper](https://github.com/SYSTRAN/faster-whisper/)**|**1m:51s**|**2m:08s**|**1m:50s**|**2m:12s**|
-|[WhisperX](https://github.com/m-bain/whisperX/)**\***|11m:17s|15m:54s|11m:29s|15m:05s|
+|[faster-whisper](https://github.com/SYSTRAN/faster-whisper/)|11m:40s|22m:27s|**11m:18s**|21m:56s|
+|**[WhisperX](https://github.com/m-bain/whisperX/)\***|**11m:17s**|**15m:54s**|11m:29s|**15m:05s**|
 
 \* For WhisperX, a separate alignment model based on wav2vec 2.0 has been applied in order to obtain word-level timestamps. Therefore, the time measured contains the time to load the model, time to transcribe, and time to align to generate timestamps. Speaker diarization has also been applied for WhisperX, which is measured separately and covered in [this section](./whisperx.md).
 
 <br>
 
-As well as the **time** spent in total by **faster-whisper** and **WhisperX** to **load, transcribe + save the output to files\***:
-
-|Load+transcribe+save output|large-v2 with `float16`|large-v2 with `float32`|large-v3 with `float16`|large-v3 with `float32`|
-|---|---|---|---|---|
-|[faster-whisper](https://github.com/SYSTRAN/faster-whisper/)|**11m:40s**|22m:27s|**11m:18s**|21m:56s|
-|[WhisperX](https://github.com/m-bain/whisperX/)\**|15m:45s|**20m:26s**|16m:01s|**19m:36s**|
-
-\* It has been noticed after benchmarking that, for these 2 implementations, saving the output takes unusually long.
-
-\** For WhisperX, this includes the entire pipeline (loading -> transcription -> alignment -> speaker diarization -> saving to file).
-
-<br>
-
 Finally, a matrix with the **maximum GPU memory consumption + maximum GPU power usage** of each implementation (**on average**):
 
 |Max. memory / Max. power|large-v2 with `float16`|large-v2 with `float32`|large-v3 with `float16`|large-v3 with `float32`|

diff --git a/NISV/res_unlabelled.md b/NISV/res_unlabelled.md
@@ -23,23 +23,10 @@ Here's a matrix with the **time** spent in total by each implementation **to loa
 |---|---|---|---|---|
 |[OpenAI](https://github.com/openai/whisper)|1h:43m:47s|1h:20m:29s|1h:57m:06s|1h:28m:50s|
 |[Huggingface (`transformers`)](https://huggingface.co/openai/whisper-large-v2#long-form-transcription)|43m:05s|1h:05m:17s|41m:39s|1h:01m:45s|
-|**[faster-whisper](https://github.com/SYSTRAN/faster-whisper/)**|**4m:14s**|**4m:26s**|**3m:36s**|**5m:07s**|
-|[WhisperX](https://github.com/m-bain/whisperX/)*|26m:57s|31m:57s|27m:00s|31m:43s|
-
-\* For WhisperX, a separate alignment model based on wav2vec 2.0 has been applied in order to obtain word-level timestamps. Therefore, the time measured contains the time to load the model, time to transcribe, and time to align to generate timestamps. Speaker diarization has also been applied for WhisperX, which is measured separately and covered in a different section.
-
-<br>
-
-As well as the **time** spent in total by **faster-whisper** and **WhisperX** to **load, transcribe + save the output to files\***:
-
-|Load+transcribe+save output|large-v2 with `float16`|large-v2 with `float32`|large-v3 with `float16`|large-v3 with `float32`|
-|---|---|---|---|---|
 |[faster-whisper](https://github.com/SYSTRAN/faster-whisper/)|39m:59s|1h:18m:34s|40m:07s|1h:23m:07s|
-|**[WhisperX](https://github.com/m-bain/whisperX/)\*\***|**39m:25s**|**44m:01s**|**39m:21s**|**43m:52s**|
+|**[WhisperX](https://github.com/m-bain/whisperX/)\***|**26m:57s**|**31m:57s**|**27m:00s**|**31m:43s**|
 
-\* It has been noticed after benchmarking that, for these 2 implementations, saving the output takes unusually long.
-
-\** For WhisperX, this includes the entire pipeline (loading -> transcription -> alignment -> speaker diarization -> saving to file).
+\* For WhisperX, a separate alignment model based on wav2vec 2.0 has been applied in order to obtain word-level timestamps. Therefore, the time measured contains the time to load the model, time to transcribe, and time to align to generate timestamps. Speaker diarization has also been applied for WhisperX, which is measured separately and covered in a different section.
 
 <br>