View on GitHub

Dutch Open Speech Recognition Benchmark

Results of Dutch ASR models, collected by the community

Jasmin-CGN

Corpus description

Jasmin-CGN is a Dutch/Flemish corpus that contains speech from less-represented groups, such as the elderly, children, or non-natives.

The corpus is split mainly according to 2 criteria:

Dutch/Flemish
Read/HMI speech
Speaker groups based on age and native/non-native

Thus, we have:

comp_p: Speech recordings where a human interacts with a machine (HMI) in a Wizard of Oz setup
comp_q: Speech recordings of people reading a text

And the speaker groups:

Native children (ages 7-11)
Native teenagers (ages 12-16)
Non-native children (ages 7-16)
Non-native adults (ages 18-60)
Native elderly (ages 65+)

For more details about the corpus, check the documentation. The corpus, including its documentation, can be downloaded from here. The paper released for the corpus can be found here.