GROBID Tutorial

Experimental tutorial — defer to the official documentation

This site is an experimental, community-maintained tutorial — a hands-on, beginner-friendly on-ramp that gets you from zero to your first successful PDF extraction as quickly as possible. It is not the official GROBID documentation and may be incomplete, out of date, or differ from the canonical reference in places.

For authoritative information — every endpoint, every configuration parameter, every supported flag, every training option — always refer to the official GROBID documentation at grobid.readthedocs.io. When this tutorial and the official docs disagree, the official docs are correct.

Not sure which to read? If you've never run GROBID before, this tutorial is a good place to start — it's optimized for getting you unstuck quickly. If you're already running GROBID and looking up a specific flag, schema element, or behavior, go straight to grobid.readthedocs.io.

GROBID extracts structured data from scholarly PDFs: titles, authors, affiliations, references, citations, section structure, full text, and TEI XML.

Common capabilities include:

header extraction for titles, abstracts, authors, affiliations, and keywords
reference extraction and parsing from PDFs or raw citation strings
fulltext structuring into sections, paragraphs, figures, tables, notes, and citations
PDF coordinates for mapping extracted structures back onto the source document
metadata enrichment through CrossRef or biblio-glutton when consolidation is enabled
specialized processing flavors for non-standard document types

If you want to get productive quickly, start with Docker. The documentation builder generates a safer docker run command, explains the important flags, and helps you avoid the most common setup mistakes reported in GitHub issues.

For most users, the shortest successful path is:

open the Docker Builder
start the service with the CRF image
verify localhost:8070
make your first REST API request

Start here

Recommended: Docker Builder

Best for most users on Windows, macOS, and Linux
Guides you through image choice, paths, ports, consolidation, and shell-specific command syntax
Prevents common mistakes like invalid config mounting or unsafe grobid-home overrides

Open the Docker Builder

Quick path

If you already know you want Docker, go directly here:

Quick Start with Docker

Choose your path

I want GROBID running as fast as possible

Use Quick Start (Docker)
Then continue with REST API Usage
If startup fails, go directly to Docker Troubleshooting

I need help choosing Docker options

Use the Docker Builder
If startup or mounts fail, check Docker Troubleshooting
If requests fail after startup, check Troubleshooting

I need to understand the API or outputs

Start with REST API Usage
For the full endpoint catalog, see the GROBID Service reference

What users most often get stuck on

The issue triage shows a clear pattern:

Docker setup and shell-specific command syntax
Configuration files and consolidation setup
Error diagnosis and recovery
API request details and failure modes

The docs are therefore optimized around three early moves:

get the service running safely
recover quickly when startup or requests fail
make the first correct API request without reading a giant reference page

This documentation is organized to get you past those blockers early.

Documentation map

Tutorials

Learn by doing with short, guided outcomes
Start with Quick Start (Docker)

How-to guides

Solve a task you already know you need
Start with Docker Setup, Troubleshooting, or Configuration

Reference

For detailed reference documentation (API endpoints, configuration parameters, TEI encoding, training guidelines), see grobid.readthedocs.io.

About this tutorial

This tutorial site provides practical, task-oriented guides designed to get newcomers productive fast. For the full reference documentation — API endpoints, configuration parameters, TEI encoding, and training annotation rules — see grobid.readthedocs.io.

Start here​

Recommended: Docker Builder​

Quick path​

Choose your path​

I want GROBID running as fast as possible​

I need help choosing Docker options​

I need to understand the API or outputs​

What users most often get stuck on​

Documentation map​

Tutorials​

How-to guides​

Reference​