When you extract document data at scale using Sensible, automating human-in-the-loop review can become essential to your quality-control process. At a high level, this post covers how to integrate human review into your document processing life cycle. As the following figure shows, it guides you through automating flagging document extractions for review, notifying reviewers of extractions that need review, and setting up webhooks to ingest corrected extractions into your system once reviewers approve them.
You’ll learn how to take the following steps:
- Configure review triggers: Configure extraction quality validation for a document type, for example, tax documents or pay stubs. Any extraction that doesn’t meet your quality validations triggers a human review.
- Specify a webhook for each document extraction: When extracting data from a document using Sensible’s API or SDK, specify a webhook destination URL that receives updates to the extraction’s review status.
- Notify a reviewer: When the webhook indicates that a completed extraction needs review and correction, notify a reviewer and send them a link to the review interface.
- Ingest corrected extractions: When the webhook indicates that a reviewer approved an extraction, ingest the document data into your system.
(Prerequisite) Configure support for pay stub data extraction
For this tutorial, let’s extract document data from pay stubs.
To add support for extracting data from pay stubs to your account, follow the steps in Out-of-the-box extractions.
1. Configure review triggers
To ensure data quality, you can add pass/fail tests for your document extractions, and trigger human review for extractions that fail tests. In this example, you’ll write logic to test that each paystub extraction:
- Reports a pay period start date
- Reports a plausible number of hours worked.
If either of the preceding tests, or validations, fails, Sensible triggers a "NEEDS_REVIEW" status for the extraction so that a reviewer can correct the errors or reject the extraction completely.
Implement validations
To implement the preceding validations for the pay_stubs document type, take the following steps:
1. In the pay_stubs document type you created in a previous step, click the Validations tab:
2. Click Create validation. In the dialog, fill in the fields as follows to implement a test that fails if the paystub extraction is missing an employee name, then click Create:
- Description: Pay period start date must be present (non-null)
- Severity: Error
- Condition:
The preceding condition is written in JsonLogic and tests that a value for the extracted data field with the key "pay_period_start_date" exists, i.e. is non-null. JsonLogic is a library for processing rules written in JSON. A JsonLogic rule is structured as follows: { "operator" : ["values" ... ] }. For example, { "cat" : ["I love", "pie"] } results in "I love pie".
3. Create a second validation to check if a paystub reports a plausible number of regular hours worked. The validation assumes that 80 hours is the norm for a two-week paystub, and that if a paystub contains less than 1 or more than 80 regular hours, it’s a mistake in the extraction. Follow the directions in the preceding step to create a second validation with the following conditions:
- Description: regular hours worked must be 1-80 hrs
- Severity: Warning
- Condition:
Configure validation-based review triggers
Take the following steps to trigger human review for each extraction that fails either of the tests you created in the previous steps:
1. Click the Human review tab and click Enable Human Review. Select the validations you created in the previous steps:
Now Sensible assigns a "NEEDS_REVIEW" status to any pay stub extraction that fails any validations you selected. For example, if a pay stub extraction reports 95 regular hours worked, Sensible flags it for review.
Note that you have options for triggering human review other than selecting individual validations. For example, you can trigger review if an extraction exceeds an acceptable percentage of null data points (a minimum coverage score), or if it exceeds an acceptable number of failed validations.
2. Specify a webhook for each document extraction
To enable handling human reviews programmatically, you must specify a webhook destination for each paystub extraction. You can’t specify webhooks using the Sensible app’s extraction UI, so you must use the Sensible API or SDK. The following code example shows specifying a webhook in an extraction request for a sample paystub document using the Sensible API:
Or in the Javascript SDK:
The preceding code extracts data from a sample paystub PDF using the pay_stubs document type you configured in a previous step, applies the validations and human review triggers you configured, and posts the results to a webhook.
The example document used in the preceding extraction requests fails the validations you set up in previous steps because it’s missing a pay period start date and has an incorrect number of regular hours:
3. Notify a reviewer
In the Sensible app, reviewers can manually check the Human review tab to view extractions flagged for review.
If you want to skip this manual process and automatically notify a reviewer that an extraction needs review, you must write code to handle the results posted to the webhook. Your code must check if the results include the reviewStatus parameter (for single-document extractions) or "reviewStatuses" (for portfolio documents) parameter.
Parse the webhook
The following code shows the webhook results for the example paystub document referenced in previous steps.
Note that you can trace why the "reviewStatus"parameter is set to "NEEDS_REVIEW" by comparing the "validations" and "validation_summary" parameters to the human review triggers you configured in previous steps. For example, the following entry in the "validations" array tells you that the regular hours reported in this extraction is incorrect:
Send the review link
To automate sending reviewers links to failed extractions, your code needs to handle the "id" and "type" parameters in the webhook results. Compose these parameters to create a review link. In the previous example, these are "b84bd1c8-113e-4e1e-8462-379f0dde2abf" and "pay_stubs", respectively, so the review link is:
https://app.sensible.so/editor/review/?d=pay_stubs&b84bd1c8-113e-4e1e-8462-379f0dde2abf
You can then write code to send the link to the reviewer via email or other notification method. Note that the reviewer needs to log into your customer account to access the link.
4. Ingest corrected extractions
Using the interface in the review link, the reviewer can edit individual failed fields and approve or reject the extraction. For example, if 84 hours were an OCR error for the field "hours.regular" and the correct value were 64, they could edit it to the correct value, 64:
On the other hand, if the original paystub document incorrectly lists 84 hours, as in this example, the reviewer can choose to reject the extraction since the document itself is invalid, and then your system can use business logic to handle the invalid document:
Once the reviewer clicks Approve Extraction or Reject Extraction, Sensible posts the updated extraction, including any edited fields and the new review status, to the webhook. Now you can ingest approved extractions into your app, or handle rejected extractions according to your business logic.
Conclusion
By automating human review, you gain quality control at scale for large volumes of document extractions. Over time, you'll gain insights into common extraction issues, helping you refine your automated processes. Human review is available now in beta for all existing users at no additional cost. After the beta period, it will become an optional add-on, priced at $150 per month, for users on the Scale plan and above.
Ready to enhance your extraction accuracy? Enable Human review today and check out our updated documentation. We're excited to see how this blend of automation and human oversight improves your document processing workflows and, as always, we welcome your feedback as you start using it.