Performance Tuning

Use this page when GROBID is already working and your next problem is speed, throughput, or stability under load.

Do not start here. First make sure you have:

a clean startup
one successful request
a small stable batch run

Only then start tuning.

Tune in this order

The safest tuning order is:

choose the right image
tune concurrency
tune client parallelism
revisit memory
only then revisit deeper config and model choices

Most performance problems come from getting the early decisions wrong, not from missing exotic flags.

1. Choose the right image first

CRF image (`latest-crf`)

Best for:

CPU-only systems
high throughput
lower memory usage
simpler and more predictable operations

This is usually the best tuning baseline.

Full image (`latest-full`)

Best for:

users who explicitly want the deep-learning-enhanced path
systems with enough resources, ideally GPU-backed where applicable

Trade-offs:

larger image and memory footprint
more operational complexity
slower and less predictable on CPU-only hardware

If you have not proven that you need the full image, do not tune around it first.

2. Tune server-side concurrency

The main server-side knob is:

grobid:
  concurrency: 10
  poolMaxWait: 1

concurrency controls how many processing workers can run in parallel.

General guidance:

for CRF on CPU, start around available thread count or slightly above
for full image on CPU only, start lower and be more conservative
if the service becomes unstable, reduce it before doing anything else

poolMaxWait is normally not the first thing to tune.

3. Tune client-side parallelism

Server tuning alone is not enough.

If the client sends too many simultaneous requests, you will still overload the service.

Practical guidance:

keep client concurrency near the server's actual capacity
increase gradually rather than jumping to large values
treat repeated 503 responses as a sign to reduce pressure, not to panic

If you are using an official client, prefer adjusting its concurrency settings there instead of writing your own uncontrolled request flood.

4. Understand `503` correctly

503 in GROBID usually means the service is saturated and protecting itself from collapse.

That means:

it is often a tuning signal, not a crash signal
reduce concurrency first
add backoff and retry second
only then consider deeper changes

For endpoint-specific retry guidance, see Batch Processing.

5. Memory tuning

If the service is unstable under load, memory is one of the next things to revisit.

Main levers:

Docker/container memory allocation
server-side concurrency
client-side concurrency
pdfalto memory/timeouts in config

If you are memory constrained

Try this order:

lower client concurrency
lower server concurrency
use or switch back to latest-crf
increase available memory if possible
only then change parser safety limits

This order avoids turning a resource problem into a hard-to-debug configuration problem.

6. PDF parser safety limits

Relevant config:

pdf:
  pdfalto:
    memoryLimitMb: 6096
    timeoutSec: 60
  blocksMax: 100000
  tokensMax: 1000000

These are safety controls, not routine performance knobs.

Use them when:

very large PDFs time out repeatedly
memory pressure causes parser instability
you need tighter circuit breakers for a constrained environment

Do not lower them aggressively without evidence.

7. Model preload

grobid:
  modelPreload: true

modelPreload: true is a good default for service stability and predictable warm behavior.

Use lazy loading only if you intentionally prefer different startup/memory trade-offs and accept slower first requests.

8. CRF vs full image tuning heuristics

CRF on CPU

This is the easiest path to tune well.

Good default strategy:

keep modelPreload: true
tune concurrency upward gradually
keep client concurrency only slightly above the service capacity

Full image with GPU

If you have a real GPU-backed setup, full-image tuning can be more forgiving than CPU-only full-image use.

Still:

do not assume GPU removes every bottleneck
watch request latency and throughput together
keep the rest of the setup simple while validating GPU benefits

Full image on CPU only

Be conservative.

Why:

deep-learning inference can push CPU much harder
memory pressure rises
throughput becomes less regular

A safer pattern is to lower both server concurrency and client concurrency compared with the CRF path.

9. Consolidation and throughput

Consolidation changes performance behavior materially.

If you turn it on:

request latency rises
throughput drops
external service behavior becomes part of your runtime

Practical rule:

keep consolidation off until plain extraction is stable
if you need scaling with enrichment, biblio-glutton is usually a better operational fit than CrossRef alone

See Consolidation for the trade-offs.

10. Practical baseline recommendations

Small CPU-only deployment

image: latest-crf
no consolidation initially
moderate concurrency
admin port enabled for diagnostics

As a practical starting point, keep server concurrency near your available thread count or only slightly above it, then keep client concurrency only slightly above that.

Higher-throughput CPU deployment

image: latest-crf
increase concurrency gradually
use official clients with bounded parallelism
tune memory only after concurrency is understood

On a machine with 16 available threads, a reasonable first pass is server concurrency around 16 to 20, with client concurrency around 20 to 24, then adjust based on observed 503 rates and throughput stability.

Accuracy-first deployment with stronger hardware

full image only if you know why you need it
validate resource headroom first
keep batch pressure conservative at the start

If you run the full image on CPU only, start much lower than the CRF path. A safer first pass is to cut server concurrency roughly in half relative to available threads, then keep client concurrency near that reduced level.

If you also enable citation consolidation at scale, expect throughput to drop materially. CrossRef-heavy citation enrichment can become the limiting factor before GROBID itself does.

11. Signs you are tuning the wrong thing

You may be tuning the wrong layer if:

a single known-good PDF still fails
startup is unstable before load increases
config overrides are still changing frequently
Docker mounts or shell syntax are still part of the problem

If so, go back to:

Tune in this order​

1. Choose the right image first​

CRF image (latest-crf)​

Full image (latest-full)​

2. Tune server-side concurrency​

3. Tune client-side parallelism​

4. Understand 503 correctly​

5. Memory tuning​

If you are memory constrained​

6. PDF parser safety limits​

7. Model preload​

8. CRF vs full image tuning heuristics​

CRF on CPU​

Full image with GPU​

Full image on CPU only​

9. Consolidation and throughput​

10. Practical baseline recommendations​

Small CPU-only deployment​

Higher-throughput CPU deployment​

Accuracy-first deployment with stronger hardware​

11. Signs you are tuning the wrong thing​

Related pages​