Model Selection

Use this page when you need to decide whether a model should stay on CRF or move to a DeLFT-backed deep-learning implementation.

Default recommendation

Keep CRF as the default unless you have a task-specific reason to change it.

Why:

The repository history and older documentation point to the strongest deep-learning gains for:

The practical value is not equal across these tasks. citation is usually the clearest candidate when accuracy matters.

CRF remains a strong default for:

the overall default system configuration
tasks where the measured gain from deep learning is small
deployments where runtime predictability matters more than squeezing out a few extra F1 points
fulltext, which is still not a good fit for the same deep-learning approach because the input sequences are too large

Model selection happens in grobid.yaml.

Example for moving citation to DeLFT:

models:
  - name: "citation"
    engine: "delft"
    architecture: "BidLSTM_CRF_FEATURES"

Typical recommendation:

Transformer-based options such as BERT_CRF can be configured, but they are not the practical default.

Why:

Use mixed mode.

That means:

keep most models on CRF
switch only the models that clearly improve your workload
validate runtime and memory impact on real documents, not synthetic expectations