The Difference Between a Data PDF and a Scanned PDF Explained 150 150 Tim Robertson

There are 2 types of PDF file formats used for supplier invoices, and they are a data PDF and a scanned PDF.

Data PDF

A data PDF is an invoice that is generated by an ERP or accounting system, or with a utility that creates PDF files for documents. If you can highlight the text in a PDF document, and then successfully paste the text into a spreadsheet or Word document, then the PDF is a data PDF.

AP Automation Invoice

In a data PDF you are able to highlight the text, and paste that text into another document.


Scanned PDF

A scanned PDF is an invoice that has been printed and scanned. It is not possible to copy the text.

Because this invoice has been printed and scanned it loses the data layer in the scanning process (as the scanning is only taking a picture of the text information on the page) and you can no longer highlight the text. Instead of highlighting the data, you can highlight a box, as in this example.

AP Automation Invoice

In a scanned PDF, you are not able to select text. Dragging your mouse across the page results in a box.

For digital capture, you need a system that will capture both types of documents. Intelligent OCR captures data from both paper and emailed invoices, regardless if the PDF invoice is a data PDF or a scanned PDF. Find our more about Intelligent OCR.