Features
Everything CompilePDF does
The complete capability list — from the four producer engines to the CJD pipeline to the lineage store. If something on this page would make your prepress workflow click, it's already in the box.
Rewrite producer
Object-tree surgery without the surprises
Fifteen mutations across structural, hygiene, and lifecycle categories — every op verified against a three-layer post-condition gate.
-
Structural: OCG flips, page lifecycle (insert / delete / reorder / rotate), page-box patches (Trim / Bleed / Art / Crop / Media), page-label set, page-tree normalize.
-
Hygiene: metadata set / strip, color-space swap, JavaScript strip, embedded-file strip.
-
Lifecycle: PDF/X pin (X-1a / X-3 / X-4 / X-6), Producer/Creator stamp.
-
Three-layer verifier: schema (every mutation observable), determinism (replay byte-identity), nothing-else-touched.
Marks producer
Twelve mark types, one stamp
Production, proofing, and universal marks — plus PDF/PNG external-template ingestion for tenant watermarks.
-
Production: register cross-hairs, crop ticks, bleed indicators, color bars (process + spot).
-
Proofing: fold marks, center marks, slug text, 1-up proofing slug.
-
Universal: cut marks, ink-key bars, tile-stitch marks, operator-defined custom polygons.
-
External-file ingestion via POST /v1/marks/apply-multipart — PDF as Form XObject, PNG as Image XObject.
-
Four-layer verifier: schema, determinism, nothing-else, plus marks-layer SHA-256 reproducibility.
Impose producer
Sheet-level step-and-repeat, codex-solved
Layout solved by codex_pdf.geom.tile_grid; Compile drops cells via pikepdf. No Compile-side layout math.
-
Configurable sheet, cell, gutter, marks zone, cell rotation, flip_per_row.
-
Page mapping: sequential (input page N → cell N, multi-sheet pagination) or repeat (input 0 in every cell).
-
Back-side modes: work-and-turn, work-and-tumble, none — Compile derives back placements from front via deterministic transforms.
-
Cell-extract round-trip verifier (Layer 5): every cell's Form XObject content-stream SHA-256 matches its source page.
Trap producer
Spread / choke with three engine slots
Pure-Python (default) consumes polygon_offset + delta_e_2000; Ghostscript + external slots gated by extras.
-
pure_python: bit-deterministic, uses codex_pdf.geom.polygon_offset + the spot-color resolver. Engine fingerprint baked into every record.
-
ghostscript: bootstrap fallback gated by the [trap-gs] extra.
-
external: Esko / Heidelberg integration gated by [trap-external]; vendor licensing required.
-
Real ink-pair extraction: compile-pdf trap-extract walks PDF content streams, finds spot-ink rectangles, emits suggested trap_zones.
-
Non-rect polygons: TrapZone.polygon_pt accepted; engine routes through a documented upstream workaround until codex-pdf × pyclipr ABI fix lands.
-
trap-diff artifact records every operation (ink pair, page, polygon, engine fingerprint, achieved delta_e).
CJD pipeline
Multi-producer orchestrator with lineage
Compile Job Definition envelopes bundle rewrite → marks → impose → trap into one submission; one lineage record per step.
-
Canonical step ordering enforced; strict_order=true rejects out-of-spec submissions with 422.
-
JSON: POST /v1/cjd/apply. XML: POST /v1/cjd/apply-xml (defusedxml-protected against XXE / billion-laughs).
-
trap-diff auto-emitted for any job containing a trap step.
-
lineage_id is operator-supplied or deterministically derived from input + steps; the same job re-runs to the same lineage.
Lineage store
Durable record per producer step
Memory, S3, and Redis backends — all behind the same LineageStore protocol.
-
LineageStep records input/output SHA-256 + cache_key + producer extras + trap_diff (when present).
-
S3 backend: one JSON object per step at {prefix}/{lineage_id}/{step_index:04d}.json.
-
Redis backend: RPUSH onto lineage:{id} lists, scan_iter-based listing for large keyspaces.
-
GET /v1/lineage/{id} + compile-pdf lineage <id> — chain or summary mode.
-
Backend selection via COMPILE_LINEAGE_BACKEND env (memory | s3 | redis).
Operational readiness
Production-ready by default
Auth on every producer route, Celery workers ready, queue_depth + celery_workers signals on /v1/healthz.
-
COMPILE_AUTH_MODE gates producer + CJD + lineage routes (none | bearer | api-key | internal | basic).
-
/v1/healthz exposes queue_depth (Celery or Redis-backed) and celery_workers (live ping count).
-
Celery task wrappers for all four producers + the CJD orchestrator — celery -A compile_pdf.tasks worker.
-
X-Compile-Request-Id propagation through middleware; instance_id stamped on every response.
-
/v1/contract exposes producer_schema_versions + codex_section_versions for client-side pinning.
Cache + determinism
Same input + same plan → same SHA-256
Cache keys reproduce across machines; Codex section bumps auto-invalidate downstream cached output.
-
Pdf.save(deterministic_id=True), no wall-clock time inside engines, fixed-decimal numeric formatting.
-
Cache-key composer signs: codex_pdf wheel version, color/geom/document schema versions, producer, plan SHA-256, input SHA-256.
-
Plan canonicalization (sort + drop comments + normalize numbers) ensures equivalent JSON inputs hash identically.
-
Deterministic resource naming (/CellSrcN, /MarkF1, /MarkExtN) — re-runs match byte-for-byte.
Developer love
A codebase you'd want to own
Modern stack, real docs, a clean integration path.
-
Python 3.12+, fully typed under mypy --strict. 400+ tests; ruff + ruff format clean throughout.
-
Three deploy modes: in-process library, FastAPI sidecar, Celery worker. Pick what fits your stack.
-
consume-surface audit runs in CI to enforce the Codex boundary — every PR fails fast on any re-implementation attempt.
-
Comprehensive docs (pulled live from the compile-pdf repo), per-phase changelog, security disclosure process, AGPL-3.0 license.
Ready to integrate it?
The docs walk through running the FastAPI sidecar, submitting a CJD job, and querying the lineage chain.