This is the surface for pulling data out of documents. It runs a deterministic donkey that extracts the tables from a PDF and ties every single cell back to the spot on the page it came from. Nothing is guessed, and you can prove where each number came from.
When an AI reads a document and gets bored, it fills the gaps. The donkey does not.
Ask a model to read a statement and it will confidently return figures that do not appear in the document. It fabricates to finish rather than admit it could not find them.
A total lands in the wrong row, a column shifts, a footnote merges into a value. The number is real but attached to the wrong thing, which is just as wrong.
Even a correct extraction is useless to an audit team if you cannot point at where it came from. No provenance means no sign-off.
WYSIWYD. Deterministic: same PDF in, the same numbers out, every time. Live today.
Why: a model will invent a number to finish the job; a script cannot.
How: it detects the table structure on the page and extracts each cell mechanically,
then ties every value back to its place on the page. 100% reproducible across 90 extractions, 0
errors on 2,332 cells. Try it on your own PDF.
An invoice in, a sourced table out. Every cell knows where it came from.
$ check invoice.pdf
verdict: PASS cells: 162 sourced: 162 / 162
every value traced to a box on the page:
"Subtotal 1,240.00" page 1, box 84 ✓
"Tax 99.20" page 1, box 91 ✓
"Total 1,339.20" page 1, box 97 ✓ (= subtotal + tax)
run it again on the same PDF: byte-identical output.
Same PDF in, the same numbers out, every time, with a source for each.
Call the donkey on a file, or run the surface inside a machine that remembers. The difference is state.
Send a PDF, get the sourced table back. Stateless and simple: same file in, the same numbers out. Try it free right now, no sign-up. Nothing remembered.
Connect your own AI to the doloop machine in document mode and the donkey runs inside the loop: your AI reads, the donkey ties every number to the page and rejects anything it cannot source, and only verified data ships. The machine learns your document templates.
Want this on your document pipeline? Talk to us, or see the other surfaces.