Performance Tuning
Use this page when GROBID is already working and your next problem is speed, throughput, or stability under load.
Do not start here. First make sure you have:
- a clean startup
- one successful request
- a small stable batch run
Only then start tuning.
Tune in this order
The safest tuning order is:
- choose the right image
- tune concurrency
- tune client parallelism
- revisit memory
- only then revisit deeper config and model choices
Most performance problems come from getting the early decisions wrong, not from missing exotic flags.
1. Choose the right image first
CRF image (latest-crf)
Best for:
- CPU-only systems
- high throughput
- lower memory usage
- simpler and more predictable operations
This is usually the best tuning baseline.
Full image (latest-full)
Best for:
- users who explicitly want the deep-learning-enhanced path
- systems with enough resources, ideally GPU-backed where applicable
Trade-offs:
- larger image and memory footprint
- more operational complexity
- slower and less predictable on CPU-only hardware
If you have not proven that you need the full image, do not tune around it first.
2. Tune server-side concurrency
The main server-side knob is:
grobid:
concurrency: 10
poolMaxWait: 1
concurrency controls how many processing workers can run in parallel.
General guidance:
- for CRF on CPU, start around available thread count or slightly above
- for full image on CPU only, start lower and be more conservative
- if the service becomes unstable, reduce it before doing anything else
poolMaxWait is normally not the first thing to tune.
3. Tune client-side parallelism
Server tuning alone is not enough.
If the client sends too many simultaneous requests, you will still overload the service.
Practical guidance:
- keep client concurrency near the server's actual capacity
- increase gradually rather than jumping to large values
- treat repeated
503responses as a sign to reduce pressure, not to panic
If you are using an official client, prefer adjusting its concurrency settings there instead of writing your own uncontrolled request flood.
4. Understand 503 correctly
503 in GROBID usually means the service is saturated and protecting itself from collapse.
That means:
- it is often a tuning signal, not a crash signal
- reduce concurrency first
- add backoff and retry second
- only then consider deeper changes
For endpoint-specific retry guidance, see Batch Processing.
5. Memory tuning
If the service is unstable under load, memory is one of the next things to revisit.
Main levers:
- Docker/container memory allocation
- server-side concurrency
- client-side concurrency
pdfaltomemory/timeouts in config
If you are memory constrained
Try this order:
- lower client concurrency
- lower server concurrency
- use or switch back to
latest-crf - increase available memory if possible
- only then change parser safety limits
This order avoids turning a resource problem into a hard-to-debug configuration problem.
6. PDF parser safety limits
Relevant config:
pdf:
pdfalto:
memoryLimitMb: 6096
timeoutSec: 60
blocksMax: 100000
tokensMax: 1000000
These are safety controls, not routine performance knobs.
Use them when:
- very large PDFs time out repeatedly
- memory pressure causes parser instability
- you need tighter circuit breakers for a constrained environment
Do not lower them aggressively without evidence.
7. Model preload
grobid:
modelPreload: true
modelPreload: true is a good default for service stability and predictable warm behavior.
Use lazy loading only if you intentionally prefer different startup/memory trade-offs and accept slower first requests.
8. CRF vs full image tuning heuristics
CRF on CPU
This is the easiest path to tune well.
Good default strategy:
- keep
modelPreload: true - tune
concurrencyupward gradually - keep client concurrency only slightly above the service capacity
Full image with GPU
If you have a real GPU-backed setup, full-image tuning can be more forgiving than CPU-only full-image use.
Still:
- do not assume GPU removes every bottleneck
- watch request latency and throughput together
- keep the rest of the setup simple while validating GPU benefits
Full image on CPU only
Be conservative.
Why:
- deep-learning inference can push CPU much harder
- memory pressure rises
- throughput becomes less regular
A safer pattern is to lower both server concurrency and client concurrency compared with the CRF path.
9. Consolidation and throughput
Consolidation changes performance behavior materially.
If you turn it on:
- request latency rises
- throughput drops
- external service behavior becomes part of your runtime
Practical rule:
- keep consolidation off until plain extraction is stable
- if you need scaling with enrichment, biblio-glutton is usually a better operational fit than CrossRef alone
See Consolidation for the trade-offs.
10. Practical baseline recommendations
Small CPU-only deployment
- image:
latest-crf - no consolidation initially
- moderate concurrency
- admin port enabled for diagnostics
As a practical starting point, keep server concurrency near your available thread count or only slightly above it, then keep client concurrency only slightly above that.
Higher-throughput CPU deployment
- image:
latest-crf - increase
concurrencygradually - use official clients with bounded parallelism
- tune memory only after concurrency is understood
On a machine with 16 available threads, a reasonable first pass is server concurrency around 16 to 20, with client concurrency around 20 to 24, then adjust based on observed 503 rates and throughput stability.
Accuracy-first deployment with stronger hardware
- full image only if you know why you need it
- validate resource headroom first
- keep batch pressure conservative at the start
If you run the full image on CPU only, start much lower than the CRF path. A safer first pass is to cut server concurrency roughly in half relative to available threads, then keep client concurrency near that reduced level.
If you also enable citation consolidation at scale, expect throughput to drop materially. CrossRef-heavy citation enrichment can become the limiting factor before GROBID itself does.
11. Signs you are tuning the wrong thing
You may be tuning the wrong layer if:
- a single known-good PDF still fails
- startup is unstable before load increases
- config overrides are still changing frequently
- Docker mounts or shell syntax are still part of the problem
If so, go back to: