Corpora

Currently available corpora (status as of April 2026).

Title Full Name Corpus Language(s) Corpus size Lead Researcher
AURIS Augsburg Corpus for Reference and Information Structure German, Various - Prof. Dr. Christian Chiarcos
CronIT Cronache linguistiche italiane dal 1947 al 2017 Italian 939 documents - 1.5M tokens PD Dr. habil. Franz Meier, Prof. Dr. em. Sabine Schwarze
MMM Materia Médica Misionera (currently only available on the intranet) Espanol, Guaraní 126 documents - 99K tokens Prof. Dr. Joachim Steffen
NMK Nordmärkisch-Mittelmärkisches Korpus Low German (Low Saxon) 10 books - 300K tokens Prof. Dr. Christian Chiarcos