Our third chapter in the “Best Practices for Managing Unstructured Data” blog series will focus on the definition of a Unstructured document, we’ll continue to add chapters around the solutions and best practices regarding managing this information.

Unstructured Documents

The third document classification type, Unstructured Documents, presents the biggest challenge for Document Imaging. These documents are defined as having little structure and consistency; they are more free-flowing reports, like the one you are reading today. Examples of such include Correspondence, Deeds, Title Release, Contracts, Plant Records, Claims, and hopefully not complaints.

Unstructured DocumentThose familiar with the documents processed by the Mortgage and Title industry will not be surprised to learn that it is estimated that nearly 80 percent of all documents in business, in general, fall into this category.

The challenge falls into a variety of factors. First the index or metadata that clients wish to extract is free-form and unstructured; it could be a sentence, paragraph or whole page, or a few key words embedded within a description. For example, on a Release document the Borrower Name is usually embedded within a sentence on the first page, but that sentence changes based on how the Title Insurance company wishes to describe it.

On a Grant Deed the Borrower Name presents the same issue. Even worse—the Legal Description can extend over two or more pages. The formats are irregular; even a human has to be trained how to read the document to determine what is really the Legal Description.

Due to these complexities it’s been next to impossible to prepare your typical document imaging solution to address these complex formats, and organizations have had to fall back on either their own staff to enter the data into a line-of-business system or ship the images over to a DPO for data entry. Axis saw an opportunity to introduce new technology at this challenge and has recently introduced a new software solution to address these very complex formats.

In our next chapter we’ll focus on Handwriting on documents and forms.

More details about the challenges of working with Unstructured Documents