Skip to content

Commit

Permalink
Updating results with accurate time values
Browse files Browse the repository at this point in the history
  • Loading branch information
greenw0lf committed Jul 10, 2024
1 parent 5db0a9a commit b00d22c
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 30 deletions.
17 changes: 2 additions & 15 deletions NISV/res_labelled.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,26 +44,13 @@ And a matrix with the **time** spent in total by each implementation **to load a
|---|---|---|---|---|
|[OpenAI](https://github.com/openai/whisper)|36m:06s|32m:41s|42m:08s|30m:25s|
|[Huggingface (`transformers`)](https://huggingface.co/openai/whisper-large-v2#long-form-transcription)|21m:48s|19m:13s|23m:22s|22m:02s|
|**[faster-whisper](https://github.com/SYSTRAN/faster-whisper/)**|**1m:51s**|**2m:08s**|**1m:50s**|**2m:12s**|
|[WhisperX](https://github.com/m-bain/whisperX/)**\***|11m:17s|15m:54s|11m:29s|15m:05s|
|[faster-whisper](https://github.com/SYSTRAN/faster-whisper/)|11m:40s|22m:27s|**11m:18s**|21m:56s|
|**[WhisperX](https://github.com/m-bain/whisperX/)\***|**11m:17s**|**15m:54s**|11m:29s|**15m:05s**|

\* For WhisperX, a separate alignment model based on wav2vec 2.0 has been applied in order to obtain word-level timestamps. Therefore, the time measured contains the time to load the model, time to transcribe, and time to align to generate timestamps. Speaker diarization has also been applied for WhisperX, which is measured separately and covered in [this section](./whisperx.md).

<br>

As well as the **time** spent in total by **faster-whisper** and **WhisperX** to **load, transcribe + save the output to files\***:

|Load+transcribe+save output|large-v2 with `float16`|large-v2 with `float32`|large-v3 with `float16`|large-v3 with `float32`|
|---|---|---|---|---|
|[faster-whisper](https://github.com/SYSTRAN/faster-whisper/)|**11m:40s**|22m:27s|**11m:18s**|21m:56s|
|[WhisperX](https://github.com/m-bain/whisperX/)\**|15m:45s|**20m:26s**|16m:01s|**19m:36s**|

\* It has been noticed after benchmarking that, for these 2 implementations, saving the output takes unusually long.

\** For WhisperX, this includes the entire pipeline (loading -> transcription -> alignment -> speaker diarization -> saving to file).

<br>

Finally, a matrix with the **maximum GPU memory consumption + maximum GPU power usage** of each implementation (**on average**):

|Max. memory / Max. power|large-v2 with `float16`|large-v2 with `float32`|large-v3 with `float16`|large-v3 with `float32`|
Expand Down
17 changes: 2 additions & 15 deletions NISV/res_unlabelled.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,23 +23,10 @@ Here's a matrix with the **time** spent in total by each implementation **to loa
|---|---|---|---|---|
|[OpenAI](https://github.com/openai/whisper)|1h:43m:47s|1h:20m:29s|1h:57m:06s|1h:28m:50s|
|[Huggingface (`transformers`)](https://huggingface.co/openai/whisper-large-v2#long-form-transcription)|43m:05s|1h:05m:17s|41m:39s|1h:01m:45s|
|**[faster-whisper](https://github.com/SYSTRAN/faster-whisper/)**|**4m:14s**|**4m:26s**|**3m:36s**|**5m:07s**|
|[WhisperX](https://github.com/m-bain/whisperX/)*|26m:57s|31m:57s|27m:00s|31m:43s|

\* For WhisperX, a separate alignment model based on wav2vec 2.0 has been applied in order to obtain word-level timestamps. Therefore, the time measured contains the time to load the model, time to transcribe, and time to align to generate timestamps. Speaker diarization has also been applied for WhisperX, which is measured separately and covered in a different section.

<br>

As well as the **time** spent in total by **faster-whisper** and **WhisperX** to **load, transcribe + save the output to files\***:

|Load+transcribe+save output|large-v2 with `float16`|large-v2 with `float32`|large-v3 with `float16`|large-v3 with `float32`|
|---|---|---|---|---|
|[faster-whisper](https://github.com/SYSTRAN/faster-whisper/)|39m:59s|1h:18m:34s|40m:07s|1h:23m:07s|
|**[WhisperX](https://github.com/m-bain/whisperX/)\*\***|**39m:25s**|**44m:01s**|**39m:21s**|**43m:52s**|
|**[WhisperX](https://github.com/m-bain/whisperX/)\***|**26m:57s**|**31m:57s**|**27m:00s**|**31m:43s**|

\* It has been noticed after benchmarking that, for these 2 implementations, saving the output takes unusually long.

\** For WhisperX, this includes the entire pipeline (loading -> transcription -> alignment -> speaker diarization -> saving to file).
\* For WhisperX, a separate alignment model based on wav2vec 2.0 has been applied in order to obtain word-level timestamps. Therefore, the time measured contains the time to load the model, time to transcribe, and time to align to generate timestamps. Speaker diarization has also been applied for WhisperX, which is measured separately and covered in a different section.

<br>

Expand Down

0 comments on commit b00d22c

Please sign in to comment.