Training Overview

Use this page when you need to retrain or evaluate GROBID models rather than just run the service.

When training makes sense

Train custom models if you need to:

If the default models already work for your workload, do not start here.

GROBID uses multiple task-specific models rather than one giant model.

Examples include:

Model files live under grobid-home/models.

Training data is organized under grobid-trainer/resources/dataset/<MODEL>/.

Typical layout:

The project historically favors:

smaller but high-quality manually corrected data
iterative improvement based on real failures
holdout or end-to-end evaluation instead of trusting training-set validation alone

That makes training slower to bootstrap, but usually more trustworthy.

The usual paths are:

simple train-and-evaluate Gradle tasks such as ./gradlew train_header
more flexible trainer-jar commands for train-only, eval-only, split-and-eval, or n-fold runs
automatic pre-annotation through createTraining, followed by manual correction

Make sure you have: