Many of us have useful personal finance data hidden away in PDFs, like old bank statements from closed accounts or utility bills. Getting the information into usable form can be painfully manual. In this post, let’s explore how you can extract nicely structured data from your personal PDFs with a free Sensible account.
For this example, let’s imagine you want to reduce your energy use by analyzing the data in the PDF statements your gas utility company emails to you. With a bit of technical savvy, you can:
- Spend a few minutes in the Sensible app writing SenseML queries to extract key data from the statements, like the price of gas per therm and number of therms used per month.
- Trigger the Sensible API to extract the data specified by your queries, when the gas company emails you the PDF statement.
- Send the Sensible API responses to CSV, Google Sheets, Airtable, or another destination. For example, each time you receive a statement, you use Sensible to automatically add a row to a custom table, like this:
This post assumes you’re comfortable with scripting or with automation tools like Zapier for tasks like setting up email triggers and sending API results to your preferred destination. So, we’ll focus on the Sensible part of this dataflow:
- Write SenseML document extraction queries
- Validate the extraction
- Call the Sensible API
Write document extraction queries
Let’s say we want to extract data from monthly gas statements that look like this:
To extract the data, you can use Sensible’s query language, SenseML. First you first need to decide on the data to extract from the PDFs. To keep it simple, let’s choose:
- statement date
- total therms used
- cost per therm
- total bill
Use the Sensible app to author and check the output of your extraction queries (“configs”). For example, your finished queries might look like this:
If you want to follow along, you can use an example PDF and the following example config to try it out yourself in the Sensible app:
Validate the data
To make sure you extract reasonable values, let’s write some validations. For example, let’s say the cost of gas per therms has historically been under $1. We can trigger a warning if that extracted value is over $1 using JsonLogic:
Let’s specify the warning in the Sensible app:
Call the API
Now you’ve got your queries and validations set up, you can start extracting from your PDF statements using the Sensible API. For example, if you assume:
- the config example from the previous section is in a document type called "utility_bills"
- you have a PDF named "utility_statement_gas_dec_2019"
Then you could use curl to call the API:
And get a response like:
Try it for free
To get data out of your personal PDFs, sign up for Sensible (free extractions for up to 150 documents a month).