Paper Comes in All Kinds of Shapes and Sizes, Forms and Formats.

With each variation of the form, content, layout, and complexity of the document come different challenges. For those familiar with a loan package, think about all the different document types, page sizes, designs, colors, formats, sources, and file types.

Structured FormWe’ll now describe these different document types, with reference to the mortgage and title industry, since most people have experience the documents in this busiess transaction, and illustrate where the challenges lie and how they are being addressed.

There are three main paper or document formats; structured, semi-structured, and unstructured.

Structured or Fixed Form

These are generally the easiest documents to index and store. The documents are created as forms, and then someone fills in the form. The data is always in the same place; the indexes are clearly defined since the form identifies to the client where to enter the information. Think of these fields like fields in an electronic form or database. It is generally a one-to-one ratio. For example: First Name, Last Name, Street Address, Zip Code, Loan Number, and so on. Examples of forms in the mortgage and title market would include HUD-1, tax forms, and loan applications.

For software to know how and what data to extract, a sample document is scanned into the system, and the fields are mapped out as a template. Nothing actually moves around these pages, so the software just knows to look in the same place every time for information.

You can see an example of how easy this works by using software like Adobe Acrobat Professional. Run an image of a form through this software, and it’ll automatically identify areas it thinks are form data. The imaging industry has had great success with these document types for more than a decade.

In our next chapter we’ll focus on Semi-Structured Forms.