Document AI Series

4-Part Series

Document AI

Part 1March 17, 2026

Document Processing in Production: Why Every Platform Breaks at 60%

Every vendor demo hits 95% accuracy. Deploy on real documents and it drops to 60%. Here's what breaks, why, and the hybrid pipeline that fixes it.

Part 2March 31, 2026

PDF Table Extraction: Why Structure Recognition Breaks

PDF table extraction breaks on merged cells, borderless layouts, and cross-page tables. The detection-then-structure pipeline we built to fix it.

Part 3April 14, 2026

LLM vs OCR Pipeline: Why the Answer Is Both

Multimodal LLMs understand document layout but hallucinate numbers. OCR extracts exact text but misses structure. The hybrid architecture that gets both right.

Part 4April 28, 2026

SEC EDGAR and XBRL: Financial Document AI at Scale

XBRL promised comparable financial data. Across three companies, only 12% is directly comparable. The NLP mapping layer and document AI pipeline that close the gap.