View on GitHub

Dutch Open Speech Recognition Benchmark

Results of Dutch ASR models, collected by the community

Back to homepage

Jasmin-CGN

Corpus description

Jasmin-CGN is a Dutch/Flemish corpus that contains speech from less-represented groups, such as the elderly, children, or non-natives.

The corpus is split mainly according to 2 criteria:

  1. Dutch/Flemish
  2. Read/HMI speech
  3. Speaker groups based on age and native/non-native

Thus, we have:

And the speaker groups:

  1. Native children (ages 7-11)
  2. Native teenagers (ages 12-16)
  3. Non-native children (ages 7-16)
  4. Non-native adults (ages 18-60)
  5. Native elderly (ages 65+)

For more details about the corpus, check the documentation. The corpus, including its documentation, can be downloaded from here. The paper released for the corpus can be found here.