If you’re building software for healthcare providers, chances are that you’ve come across the explanation of benefits (EOB) document.
EOBs are generated by insurance companies to describe what costs the patient’s insurance policy will cover for the medical care that the patient received. This document is created whenever a healthcare provider submits a claim to the patient’s insurance company.
If a healthcare provider isn’t set up to receive an electronic remittance advice (ERA) with an insurance company, that insurer will instead send paper EOBs. Manually recording and reconciling an EOB is a slow, time-consuming process which then delays downstream workflows – like how quickly the healthcare provider gets paid.
By automating pulling data out of EOBs, you can upload information into billing systems sooner and identify exceptions faster. Digitizing the information on EOBs also means that you can query and search records in the future.
With Sensible you can easily extract key information out of EOB PDFs using SenseML, Sensible’s query language for extracting data from documents. We’ve written a library of open-source SenseML configurations, so you don’t need to write queries from scratch for common documents. From there, your EOB data is accessible via API, Sensible’s UI, or 5,000 other software integrations thanks to Zapier.
What we'll cover
This blog post briefly walks you through configuring extractions for Medicaid EOBs. By the end, you’ll know a couple of SenseML methods and you’ll be on your way to extracting any data you choose using our documentation or our prebuilt open-source EOB configurations, which currently support Medicaid, Tricare, and Cigna EOBs.
Write document extraction queries with SenseML
Let's walk through extracting specific pieces of data from an explanation of benefits. Here's an example of an EOB PDF with redacted or dummy data:
To follow along, you can sign up for a Sensible account, then download an example PDF and upload it to the Sensible app, or import the PDF and prebuilt open-source configurations directly to the Sensible app.
See the following screenshot for an overview of how to extract the remittance date.
The query in the left pane in the preceding image treats the string "DATE" as the first cell in a row, and searches to the right of it for the date. The PDF is displayed in the middle pane, and the extracted date (2022-01-13) is in the right pane.
To try this out yourself, paste the following query, or "field" into the left pane of the Sensible app.
{
/* SenseML uses JSON5 to support code comments */
"preprocessors": [
{
/* correct oversplit lines
see https://docs.sensible.so/docs/merge-lines */
"type": "splitLines",
"minSpaces": 2
}
],
"fields": [
{
/* ID for target data */
"id": "date_of_remittance",
/* target data is a date, else return null */
"type": "date",
"anchor": {
"match": {
/* search for target data
near text "DATE" in doc*/
"text": "DATE",
"type": "includes",
"isCaseSensitive": true
}
},
"method": {
/* target to extract is in a row
see https://docs.sensible.so/docs/row */
"id": "row",
/* target is to right of anchor
("date issued") in row */
"position": "right",
/* grab 1st row cell (right of anchor) */
"tiebreaker": "first"
}
},
]
}
See the following screenshot for an overview of how to extract the first total allowed covered charge that occurs in the document:
The query in the left pane in the preceding image extracts all the text in a rectangular region with coordinates defined relative to the anchoring text. This method, as well as the Intersection method, are ways to extract text in a coordinate-defined area when the document has too much variability to rely on methods such as Row.
To try this out yourself, paste the following query into the left pane of the Sensible app:
{
/* SenseML uses JSON5 to support code comments */
"preprocessors": [
{
/* correct oversplit lines
see https://docs.sensible.so/docs/merge-lines */
"type": "splitLines",
"minSpaces": 2
}
],
"fields": [
{
"id": "totals.allowed_covered_charges",
/* target data is currency */
"type": "currency",
/* use first anchor match in doc */
"match": "first",
"anchor": {
"match": {
/* target is near **CLAIM TOTAL**
anchor text */
"type": "includes",
"text": "**CLAIM TOTAL**",
"isCaseSensitive": true
}
},
"method": {
/* target data is in rectangular region
with coordinates relative to anchor.
To display coordinates,
drag mouse in Sensible app */
"id": "region",
/* use Region as alternative to Row
to handle unpredictably empty cells */
"start": "left",
"width": 0.49,
"height": 0.17,
"offsetX": 3.2,
"offsetY": -0.07
}
}
]
}
You just saw how to extract a single total charge. In real life, you’ll want to extract each total charge for each claim. See the following screenshot for an overview of how to extract an array of claims with charges, instead of a single charge:
The query in the left pane in the preceding image slices the document into claims sections, each of which starts with a 7-digit claim number and ends with “**CLAIM TOTAL**”. Each claim section is its own mini document with its own set of fields. For example, you just learned about using the Region method to get the first total allowed covered charge. When you use the Region method in a section, you get back the charge for each claim.
To try it out, paste the following query into the left pane of the Sensible app:
{
/* SenseML uses JSON5 to support code comments */
"preprocessors": [
{
/* correct oversplit lines
see https://docs.sensible.so/docs/merge-lines */
"type": "splitLines",
"minSpaces": 2
}
],
"fields": [],
"sections": [
{
/* Get array of claims, each containing info
like total charges, dates, and services.
Each claim is a 'section', or mini-document,
isolated from the rest of the doc.
For more info see https://docs.sensible.so/docs/sections */
"id": "claims_details_sections",
/* Display section boundaries in the Sensible app
as brackets */
"display": true,
"range": {
"anchor": {
"start": {
/* start looking for claims after this text
("PAID CLAIMS") (near start of doc)" */
"text": "PAID CLAIMS",
"type": "startsWith",
"isCaseSensitive": true
},
/* each claim starts with 7-digit claim number */
"match": {
"type": "regex",
"pattern": "^\\d{7}$"
},
"end": {
/* stop looking for claims
above this text (near end of doc) */
"text": "paid claim totals",
"type": "includes"
}
},
/* each claim ends with this text (***CLAIM TOTAL**) */
"stop": {
"text": "**CLAIM TOTAL**",
"type": "includes",
"isCaseSensitive": true
},
"requireStop": true,
},
"fields": [
{
/* extract patient name for each claim */
"id": "patient_name",
"anchor": {
"match": {
/* patient name is near a 7-digit number (claim #) */
"type": "regex",
"pattern": "^\\d{7}$"
}
},
"method": {
/* name is in a row, 1st cell to right of claim # */
"id": "row",
"position": "right",
"tiebreaker": "first"
}
},
/* extract total allowed covered charges for each claim */
{
"id": "totals.allowed_covered_charges",
"type": "accountingCurrency",
"anchor": {
"match": {
"type": "includes",
"text": "claim total"
}
},
"method": {
"id": "region",
"start": "left",
"width": 0.49,
"height": 0.13,
"offsetX": 3.2,
"offsetY": -0.06
}
}
]
}
]
}
Each claim can contain several columns containing services with dates and number of procedures. In this configuration, Sensible extracts these columns as parallel arrays. See the following screenshot for an overview:
Since each row represents a service, rows are more convenient than columns in this case. For an overview of how to zip the “column” arrays into “rows” of corresponding data for each service, see the following screenshot:
To try this out yourself, paste the following query into the left pane of the Sensible app:
{
"preprocessors": [
{
/* correct oversplit lines
see https://docs.sensible.so/docs/merge-lines */
"type": "splitLines",
"minSpaces": 2
}
],
"fields": [],
"sections": [
{
"id": "claims_details",
"range": {
"anchor": {
"start": {
"text": "paid claims",
"type": "startsWith"
},
"match": {
"type": "regex",
"pattern": "^\\d{7}$"
},
"end": {
"text": "paid claim totals",
"type": "includes"
}
},
"stop": {
"text": "claim total",
"type": "includes"
},
"requireStop": true
},
"fields": [
{
"id": "patient_name",
"anchor": {
"match": {
"type": "regex",
"pattern": "^\\d{7}$"
}
},
"method": {
"id": "row",
"position": "right",
"tiebreaker": "first"
}
},
/* extract start date of service */
{
"id": "dates_of_service.begin",
/* match all to get an array of
begin dates in this claim */
"match": "all",
"type": {
"id": "date",
/* target is a custom 8-digit
date format (05252022) */
"format": "%M%D%Y"
},
"anchor": {
"match": {
"type": "regex",
/* anchor on 2 consecutive
8-digit numbers (dates) */
"pattern": "\\d{8} \\d{8}"
}
},
"method": {
"id": "regex",
/* extract the 1st of 2 consecutive dates */
"pattern": "(\\d{8}) \\d{8}"
}
},
{
/* extract array of end dates of service */
"id": "dates_of_service.end",
"match": "all",
"type": {
"id": "date",
"format": "%M%D%Y"
},
"anchor": {
"match": {
"type": "regex",
"pattern": "\\d{8} \\d{8}"
}
},
"method": {
"id": "regex",
/* extract the 2nd of 2 consecutive dates */
"pattern": "\\d{8} (\\d{8})"
}
},
/* extract array of procedure codes */
{
"id": "procedure",
"match": "all",
"anchor": {
"match": {
/* procedure is near 2 8-digit numbers (dates) */
"type": "regex",
"pattern": "\\d{8} \\d{8}"
}
},
"method": {
/* procedure is in a row, 2nd cell to
right of dates */
"id": "row",
"position": "right",
"tiebreaker": "second"
}
},
{
/* extract array of number of services */
"id": "number_of_services",
"match": "all",
"anchor": {
"match": {
"type": "regex",
/* services # is near 2
8-digit numbers (dates) */
"pattern": "\\d{8} \\d{8}"
}
},
"method": {
/* services # is in a row, 1st cell
to right of dates */
"id": "row",
"position": "right",
"tiebreaker": "first"
}
}
],
"computed_fields": [
{
/* zip following fields from parallel
"column" arrays
into "row" objects for each service */
"id": "services_zipped",
"method": {
"id": "zip",
"source_ids": [
"dates_of_service.begin",
"dates_of_service.end",
"procedure",
"number_of_services"
]
}
},
{
"id": "clean_output",
"method": {
/* for cleaner output,
don't output the
source fields used in zip */
"id": "suppressOutput",
"source_ids": [
"dates_of_service.begin",
"dates_of_service.end",
"procedure",
"number_of_services"
]
}
}
]
}
]
}
We've covered how to extract a few pieces of data from an explanation of benefits (EOB). Our prebuilt config extracts much more information. Check it out! In the following screenshot, every blue-outlined line is a piece of extracted data:
Start extracting
Congratulations, you've learned some key methods for extracting structured data from EOB documents. There's more extraction power for you to uncover. Sign up for a free account (150 docs a month, no credit card required), check out our prebuilt EOB config in our open-source library, and peruse our docs to start extracting data from your own documents.
Stop relying on manual data entry. With Sensible, claim back valuable time, your ops team will thank you, and you can deliver a superior user experience. It’s a win-win.