← All engineering posts


3-Part Series
Document AI

Part 1
Document Processing in Production: Why Every Platform Breaks at 60%
Every vendor demo hits 95% accuracy. Deploy on real documents and it drops to 60%. Here's what breaks, why, and the hybrid pipeline that fixes it.

Part 2
PDF Table Extraction: Why Structure Recognition Breaks
PDF table extraction breaks on merged cells, borderless layouts, and cross-page tables. The detection-then-structure pipeline we built to fix it.

Part 3
LLM vs OCR Pipeline: Why the Answer Is Both
Multimodal LLMs understand document layout but hallucinate numbers. OCR extracts exact text but misses structure. The hybrid architecture that gets both right.