If you’re building software in proptech, chances are that you’ll have come across the closing disclosure. A closing disclosure contains the final details about the home buyer’s mortgage – things like loan terms, projected monthly payments, and the closing cost.
The primary reason to pull data from a closing disclosure is to keep track of changes in the closing process to ensure that all parties are kept up-to-date. Outside of that, the data from closing disclosures can be used in aggregate to create a more accurate and complete picture of the mortgage market. For example, by knowing the details of recently closed mortgages, companies involved in the home-buying process can help potential buyers predict how much house they’ll be able to afford and help them evaluate whether a particular lender’s terms are favorable compared to the market.
The information found in a closing disclosure isn’t always easily accessible. Closing disclosure data isn’t usually available through an API.
Fortunately, with Sensible you can easily extract key information out of closing disclosure PDFs using SenseML, Sensible’s query language for extracting data from documents. We’ve written a library of open-source SenseML configurations, so you don’t need to write queries from scratch for common documents. From there, your closing disclosure data is accessible via API, Sensible’s UI, or 5,000 other software integrations thanks to Zapier.
What we'll cover
This blog post briefly walks you through configuring extractions for closing disclosures. By the end, you’ll know a couple of SenseML methods and you’ll be on your way to extracting any data you choose using our documentation or our prebuilt open-source closing disclosure configurations.
Write document extraction queries with SenseML
Let's walk through extracting specific pieces of data from a mortgage closing disclosure. Here's an example of a closing disclosure PDF with redacted data:
To follow along, you can signup for a Sensible account, then download an example PDF and upload it to the Sensible app, or import the PDF and prebuilt open-source closing disclosure configurations directly to the Sensible app.
Our configuration for closing disclosure extractions is comprehensive for this PDF, but for the example in this post, let's keep it simple. We'll extract just the:
Date issued
Loan type
Estimated escrow
Table of borrowers' transactions
Extract date issued
See the following screenshot for an overview of how to extract the date issued:
The query in the left pane in the preceding image treats the string "date issued" as the first cell in a row, and searches to the right of it for the date. The PDF is displayed in the middle pane, and the extracted date (2021-09-14) is in the right pane.
To try this out yourself, paste the following query, or "field" into the left pane of the Sensible app.
{
/* SenseML support code comments using JSON5 */
"preprocessors": [
{
/* correct oversplit lines
see https://docs.sensible.so/docs/merge-lines */
"type": "mergeLines",
"directlyAdjacentThreshold": 0.16,
"adjacentThreshold": 0.8,
"yOverlapThreshold": 0.7
}
],
"fields": [
{
/* ID for target data */
"id": "closing_information.date_issued",
/* target data is a date, else return null */
"type": "date",
/* search for target data
near text "date issued" in doc*/
"anchor": {
"match": {
"text": "date issued",
"type": "startsWith"
}
},
"method": {
/* target to extract is in a row
see https://docs.sensible.so/docs/row */
"id": "row",
/* target is to right of anchor
("date issued") in row */
"position": "right",
/* grab 1st row cell (right of anchor) */
"tiebreaker": "first"
}
}
]
}
See the following screenshot for an overview of how to extract the loan type:
The query in the left pane in the preceding image looks for a checkbox near the text "loan type", starting 0.2 inches from the left side of its bounding box. It returns its selection status as true or false.
What if you want to examine all the loan type checkboxes and return only the selected choice? To try it out, paste the following query, or "field" into the left pane of the Sensible app:
{
"fields": [
{
"id": "transaction_information.loan_type.conventional",
"method": {
/* target data is true/false checkbox.
look for nearest checkbox starting 0.2"
left of the anchor's right boundary (orange-outlined box) */
"id": "nearestCheckbox",
"position": "left",
"offsetX": 0.2
},
"anchor": {
/* target data is near text "Loan Type" */
"match": {
"text": "Loan Type",
"type": "startsWith"
}
}
},
/* field extracts true/false checkbox for VA loans */
{
"id": "transaction_information.loan_type.va",
"method": {
"id": "nearestCheckbox",
"position": "left"
},
"anchor": {
"start": {
"text": "Loan Type",
"type": "startsWith"
},
"match": {
"text": "VA",
"type": "includes",
"isCaseSensitive": true
}
}
},
/* field extracts true/false checkbox for FHA loans */
{
"id": "transaction_information.loan_type.fha",
"method": {
"id": "nearestCheckbox",
"position": "right"
},
"anchor": {
"match": {
"text": "FHA",
"type": "includes"
}
}
},
],
"computed_fields": [
{
/* to clean up output, return the single
"true" checkbox value among 3
checkboxes */
"id": "selected_loan_type",
"method": {
"id": "pickValues",
"match": "one",
"source_ids": [
"transaction_information.loan_type.conventional",
"transaction_information.loan_type.fha",
"transaction_information.loan_type.va"
]
}
},
{
"id": "hide_fields",
"method": {
/* to clean up output, suppress the
source selection statuses */
"id": "suppressOutput",
"source_ids": [
"transaction_information.loan_type.conventional",
"transaction_information.loan_type.fha",
"transaction_information.loan_type.va"
]
}
}
]
}
See the following screenshot for an overview of how to extract the estimated escrow:
The query in the left pane in the preceding image looks for an intersection point between the horizontal and vertical lines bisecting two text phrases. This method, as well as the Region method, are ways to extract text in a coordinate-defined area when the document has too much variability to rely on methods such as Row.
To try this out yourself, paste the following query, or "field", into the left pane of the Sensible app:
{
"fields": [
{
"id": "projected_payments.estimated_escrow",
"type": "currency",
"method": {
/* intersection is an alternative to the
Row method when table cells
are unpredictably populated
target data is at intersection
of vertical and horizontal lines
defined by 2 anchors
*/
"id": "intersection",
/* target data is on vertical line
bisecting "Years 1-30"
*/
"verticalAnchor": {
"match": {
// match "Years 1-##" or "Years 1 - ##"
"pattern": "Years 1-\\d{1,2}|Years 1 - \\d{1,2}",
"type": "regex"
}
},
// offsets the horizontal line downward
"offsetX": 0.1,
// offsets the vertical line to right
"offsetY": 0.05
},
"anchor": {
/* start looking for anchor match
after "projected payments" */
"start": {
"text": "projected payments",
"type": "startsWith"
},
"match": {
/* target is on horizontal line
bisecting "estimated escrow" */
"text": "estimated escrow",
"type": "startsWith"
}
}
}
]
}
See the following screenshot for an overview of how to extract a table summarizing the borrower's transactions:
To try this out yourself, paste the following query, or "field" into the left pane of the Sensible app.
{
"preprocessors": [
{
"type": "mergeLines",
"directlyAdjacentThreshold": 0.16,
"adjacentThreshold": 0.8,
"yOverlapThreshold": 0.7
}
],
"fields": [
{
"id": "_summaries_of_transactions_tables.due_from_borrower_at_closing",
/* target data is a table */
"type": "table",
"method": {
/* of several table methods,
textTable is fastest */
"id": "textTable",
"columns": [
{
/* first table column starts 0.5" from left page edge
and ends 3" from left edge */
"id": "due_from_borrower_at_closing",
"minX": 0.5,
"maxX": 3,
"type": {
"id": "custom",
/* each cell starts w/ 2 numbers
followed by text or #s */
"pattern": "^\\d{2} [A-Za-z ()0-9]+$"
},
/* if cell doesn't start with 2 numbers,
omit its row from output */
"isRequired": true
},
/* 2nd column is 3.28-4.15" from left edge of page,
cell contents are currency */
{
"id": "amount",
"minX": 3.28,
"maxX": 4.15,
"type": "currency"
}
],
/* (recommended for performance)
table ends at "adjustments" */
"stop": {
"text": "adjustments",
"type": "equals"
}
},
"anchor": {
/* table starts after "due from borrower" line
preceded by "summaries" line */
"start": [
{
"text": "summaries of transactions",
"type": "equals"
}
],
"match": {
"text": "due from borrower at closing",
"type": "includes"
}
}
},
],
"computed_fields": [
{
/* by default, table methods return
column objects. transform these to
row objects using the Zip method */
"id": "summaries_of_transactions_tables.due_from_borrower_at_closing",
"method": {
"id": "zip",
"source_ids": [
"_summaries_of_transactions_tables.due_from_borrower_at_closing"
]
}
},
/* to avoid redundant output, return
the zipped table row objects and suppress the
original column objects */
{
"id": "clean_output",
"method": {
"id": "suppressOutput",
"source_ids": [
"_summaries_of_transactions_tables.due_from_borrower_at_closing"
]
}
}
]
}
You'll get this output:
{
"summaries_of_transactions_tables.due_from_borrower_at_closing": [
{
"due_from_borrower_at_closing": {
"source": "01 Sale Price of Property",
"value": "01 Sale Price of Property",
"type": "custom",
"customType": "string"
},
"amount": {
"source": "$400,491.00",
"value": 400491,
"unit": "$",
"type": "currency"
}
},
{
"due_from_borrower_at_closing": {
"source": "02 Sale Price of Any Personal Property Included in Sale",
"value": "02 Sale Price of Any Personal Property Included in Sale",
"type": "custom",
"customType": "string"
},
"amount": null
},
{
"due_from_borrower_at_closing": {
"source": "03 Closing Costs Paid at Closing (J)",
"value": "03 Closing Costs Paid at Closing (J)",
"type": "custom",
"customType": "string"
},
"amount": {
"source": "$9,039.47",
"value": 9039.47,
"unit": "$",
"type": "currency"
}
}
]
}
Extract more data
We've covered how to extract a few pieces of data from a closing disclosure. Our prebuilt config extracts much more information. Check it out! In the following screenshot, every blue-outlined line is a piece of extracted data:
Start extracting
Congratulations, you've learned some key methods for extracting structured data from closing disclosure documents. There's more extraction power for you to uncover. Sign up for a free account (150 docs a month, no credit card required), check out our prebuilt closing disclosure config in our open-source library, and peruse ourdocs to start extracting data from your own documents.
Stop relying on manual data entry. With Sensible, claim back valuable time, your ops team will thank you, and you can deliver a superior user experience. It’s a win-win.