View on GitHub

Dutch Open Speech Recognition Benchmark

Results of Dutch ASR models, collected by the community

N-Best 2008 Dutch

The N-Best 2008 Dutch Evaluation corpus is a corpus designed to evaluate Dutch/Flemish Speech Recognition systems in 2008. The corpus consists of 4 subsets:

bn_nl: Broadcast News programmes in the Netherlands;
cts_nl: Conversational Telephone Speech in the Netherlands;
bn_vl: Broadcast News programmes in Belgium;
cts_vl: Conversational Telephone Speech in Belgium.

For more details about the corpus, click here.

Here is a matrix with WER results of the baseline model, Kaldi_NL, as well as different end-to-end models tested on this corpus:

Model\Dataset	Broadcast News in NL	Conversational Speech in NL	Broadcast News in BE	Conversational Speech in BE
Kaldi_NL	12.6%	38.6%	21.2%	59.4%
Whisper v2	12.7%	25.9%	-	-
Whisper v3	13.7%	28.1%	-	-
Whisper v2 w/ VAD	11.6%	25.3%	-	-
Whisper v3 w/ VAD	14.1%	26.5%	-	-
faster-whisper v2	10.6%	24.1%	13.0%	38.5%
faster-whisper v3	12.5%	25.5%	14.9%	38.4%
faster-whisper v2 w/ VAD	10.0%	23.9%	13.6%	37.9%
faster-whisper v3 w/ VAD	12.3%	25.1%	14.6%	36.9%
XLS-R FT on Dutch	14.8%	33.5%	17.0%	51.7%
MMS - 102 languages	23.4%	49.0%	25.0%	64.2%
MMS - 1162 languages	18.5%	42.7%	19.4%	57.7%

And here are results for the same models on Broadcast News in the Netherlands with the foreign speech lines removed from the dataset:

Model\Dataset	WER
Kaldi_NL	12.1%
Whisper v2	12.3%
Whisper v3	13.6%
Whisper v2 w/ VAD	11.1%
Whisper v3 w/ VAD	13.9%

Something to note about this setup of the dataset is that Whisper v3 was already able to recognize foreign speech and ignore it. The only notable exception was for a German speaker, where I assume that it was still transcribed because German and Dutch are very similar as languages. So, Whisper v3’s hypothesis files were mostly unedited.

Here is also a matrix with the time spent in total by each model to evaluate the respective subset:

Model\Dataset	Broadcast News in NL	Conversational Speech in NL	Broadcast News in BE	Conversational Speech in BE
Kaldi_NL	0h:08m:58s	0h:14m:47s	0h:15m:57s	0h:20m:07s
Whisper v2	1h:11m:59s	0h:53m:55s	-	-
Whisper v3	1h:09m:00s	0h:40m:20s	-	-
Whisper v2 w/ VAD	0h:52m:03s	0h:40m:09s	-	-
Whisper v3 w/ VAD	1h:02m:13s	0h:37m:50s	-	-
faster-whisper v2	0h:11m:31s	0h:09m:30s	0h:10m:55s	0h:09m:39s
faster-whisper v3	0h:11m:21s	0h:09m:41s	0h:10m:36s	0h:09m:54s
faster-whisper v2 w/ VAD	0h:12m:13s	0h:09m:36s	0h:11m:01s	0h:09m:46s
faster-whisper v3 w/ VAD	0h:12m:25s	0h:09m:13s	0h:10m:45s	0h:09m:04s
XLS-R FT on Dutch	0h:07m:36s	0h:07m:52s	0h:08m:44s	0h:08m:21s
MMS - 102 languages	0h:04m:33s	0h:04m:23s	0h:04m:38s	0h:03m:54s
MMS - 1162 languages	0h:05m:26s	0h:04m:14s	0h:04m:37s	0h:03m:55s

Preprocessing, setup, and postprocessing

For more details, click here.