Introduction
With Sensible, you can extract data from documents in structured JSON format using the SenseML query language. Once you’ve extracted data with our developer platform, you can transform the extraction to add or remove data, perform boolean validations on the data, or write logic to conform to your desired schema using Sensible’s support for JsonLogic.
This post is an opinionated guide to the basics of writing JsonLogic to transform your extracted document data. It’s for developer-adjacent technical folks who’re already familiar with SenseML.
What is JsonLogic?
In brief, JsonLogic is a way to process rules written in JSON. Specially structured JSON goes in, some new JSON value comes out.
A JsonLogic rule is structured like this: { "operator" : ["values" ... ] }. The operator is the name of the operation you'd like to use, and the arguments are the values the operator will be operating on. For example, { "cat" : ["I love", "pie"] } results in "I love pie".
Sensible uses a library called json-logic-engine to process these rules. Documentation for json-logic-engine and its built-in operators is here, and documentation for json-logic-js, of which json-logic-engine is a superset, is here.
Note: json-logic-engine is more actively maintained than json-logic-js, but its documentation may be somewhat out of date.You may also find that json-logic-js offers better explanations for some operators, since the docs for json-logic-engine assume familiarity with the original library.For example, the docs say that the "if" operator does not support the same type of chaining the original library did - this is no longer the case. Some research and poking around in the project's GitHub may be useful if you're trying to do something that doesn't seem supported, or if some behavior isn't what you expected it to be.
In this post, we’ll cover two contexts for writing JsonLogic:
- Writing JsonLogic whose output can be whatever schema you want it to be. For this type of output, you’ll write JsonLogic in Sensible’s postprocessor.
- Writing JsonLogic whose output conforms to Sensible’s parsed_document output schema. This generally means outputting typed fields like the following:
Sensible’s Custom Computation method automatically produces this output schema, so you’ll have to be mindful of the constraints of the output schema when you write JsonLogic with this method.
General approach: break down the problem
In general, writing a complex JsonLogic rule is easiest if you start from the inside and work your way out. This follows the structure of how these rules are evaluated, and helps you to be sure you’re processing each small piece correctly before you start trying to do complicated loops. It’s often true that the "business logic" where you actually do calculations or otherwise transform values tends to happen in these small pieces; the rest is structure.
In practice, this means you can often start rule-writing on a whiteboard or notebook - somewhere you can write by hand, easily erase and rearrange, etc. Isolate the piece you want to work on, write out what the data looks like at that point, and start sketching out what's needed.
Once you’re satisfied with these small pieces of logic, start working on how to wrap them in things like maps, filters, etc. to get the overall shape you need. There are two essential questions you need to know the answers to before you start writing a rule:
- What is the shape of the data to start?
- What is the final shape you want to see?
Basic Example
Let’s get into how to write complex JsonLogic rules using a basic example.
Say you have extracted document output that looks like this:
Let’s say that your goal is to sum up the balance of all accounts into a single field.
Before you start getting into how you'll accomplish that, sketch out what that output should look like:
This shape will suggest to you what you might need to do to reach it:
- You'll need to make a new field called cumulative_balance.
- Within that field, you'll need to add up the balance values from each existing account and set the value field to the resulting sum.
- You'll include the type field in the output.
The specifics of how this is accomplished may vary based on where you're using JsonLogic - for example, if you were making a new field in the parsed document using the Custom Computation method, you would not need to create a new object to contain the output, and the type field would be included automatically. If you were looking to get this output using a postprocessor, you'd need to create the object and explicitly include the type field.
Overall though, determining the final shape tells you the two core things you'll need to do to get your output:
- iterate over existing accounts
- add up values.
Let's say we're including this as a field called cumulative_balance using the Custom Computation method. In that case, the rule we pass to jsonLogic would look like the following
This will return the number 9200. Here's how that works:
- The "map" operation:
- { "var": "accounts" }: Look at the items in "accounts"
- { "var": "balance.value" }: For each item in the array specified ("accounts"), return back the value of the "value" field within the "balance" object.
- The end result of "map" will be an array of numbers: [9000, 200].
- The "+" operation:
- Look at the array that came back from "map": [9000, 20].
- Add together all the numbers in that array.
This will result in the single value 9200. Note that this syntax is a simpler alternative to using the built-in Reduce operator, and in general we recommend avoiding the Reduce operator in favor of simpler syntactic alternatives because it can be difficult to troubleshoot the Reduce operation.
For straightforward scenarios like this one it may not feel necessary to clearly identify your target shape beforehand, but getting into the habit will help substantially as the rules get more complicated.
Concepts
There are some concepts that are relevant when using JsonLogic, and json-logic-engine in particular, that may not be obvious from reading the documentation.
Context
The most important concept is context. Every rule that you write has a context. That context consists of the data that the rule looks at to know what values it’s working with.
In SenseML configurations, the top-level context is either the parsed document (in the case of postprocessor or validations) or the values in the config’s “fields” array (in the case of the Custom Computation method). As you drill down into the data, the context changes.
Let’s say we’re writing a postprocessor rule in SenseML, so your output schema can be arbitrary. Our parsed document looks like this:
We’ve decided that the output we need in the postprocessor should be an array of book objects with the title and author name, using only the values and omitting the types. The shape we want looks like this:
The var operator is the main way you interact with context. At the top level of the rule, the whole parsed document is our context. { "var": "" } will return the whole parsed document. That empty string, when passed to var, means “Return the current context.”
As you can probably guess, { "var": "books" } will return the value of the “books” field, which is an array of book objects. Maybe less obviously, { "var": "books.0.title.value" } will return “A Really Good Book”. This illustrates that from the perspective of the current context, you can drill down into properties (including array items, identified by their index) using dot-notation.
Let’s build this out in small pieces. From the top level, we can get the author’s name with:
This will give us "Somebody 1".
Next, we need to get all the book titles. From the top level, we can map over books:
This will give us an array of all the book titles: ["A Really Good Book", "A Boring Book"].
The interesting thing that happens here is that the context changes. In the first argument, { "var": "books" }, we’re still at the top level. This tells the map operation what it should iterate over - the “books” array. In the second argument, where the mapping action happens, the context becomes the current item in the array you're looping through. In this case, that means that during each loop, one of the book objects is the context. That's why we can use { "var": "title.value" } to access the title of each book - we're now “inside” each book object as we iterate.
The same context shift applies with other operators that behave similarly (see Higher-Order Operators).
To put it together, we can use the map operator along with the eachKey operator (which will create an object) and a concept we’ll explore in a moment called traversal:
This gives us what we want! But what’s going on with that ../../ when we get the author name?
Traversal
You’ll notice that we needed to access the name of the author while we were mapping over the books array. As described above, while we’re in the process of that map, the context we can access with { "var": "" } is the current item in the books array. This means asking for { "var": "author_name.value" } while inside the map will not find anything - a book object does not have an author_name, but we’d like to give it one in our output.
This is where traversal comes in. Traversal allows us to move up and down the context hierarchy using "../" notation, similar to how you might navigate file directories on the command line. In this case, "../../author_name.value" moves us up two levels in the context - out of the current book object and out of the books array - allowing us to access the author_name at the top level of our data.
Here’s that broken down more explicitly:
In practice, you’ll usually be traversing two levels at a time - this takes you up to the level past your current loop (one level takes you to the data being looped over, which is rarely what you’re after). For example, to access a top level field when you’re within, say, a filter operation and a map operation, you would use { "var": "../../../../field_you_want" }.
Troubleshooting
The following is a running list of things I’ve found useful when writing and updating configurations:
Logging tips
- Sensible’s custom log operator is incredibly useful when writing SenseML, because it gives you an inside look at what any given value is at different layers of context. Its output appears in the “errors” array of the extracted output. We strongly advise taking some time with simple rules to practice using it. If values aren’t coming back the way you expect, the first thing to do is wrap them in the log operator. For example, "author": { "log": [ "author_name", { "var": "../../author_name.value" }] }.
- If you’re having trouble finding out what your “data” or “context” is at any given point, it can be very useful to start returning var rules around the areas where you’re getting tripped up. In combination with log, this is excellent for getting oriented. If you’re mapping over an array, for example, and the results aren’t coming back quite how you expect, change the second argument to just be { "var": "" } - this will return the current item for each item in the array, showing you what you’re working with. If it’s not a value you’ll see in your output for whatever reason, wrap that in a log.
- Similar to the previous item, you can always make test fields for yourself to see what values are coming back for simpler versions of your rule. This is especially useful when figuring out how to drill down through various arrays and objects you might be moving through, and can let you test out how a rule might work on a more basic level before combining it with other rules.
Custom computation tips
- The Custom Computation method is picky about what values can be assigned to the fields it creates. They have to match the parsed_document Sensible schema. Try the following if the Custom Computation method isn’t working:
- Try the same thing in the postprocessor - does it work there? This can tell you if the rule you wrote simply doesn’t work at all, or if it’s only having issues in the Custom Computation method. This is a good first step so you don’t end up spending too much time on something that wasn’t going to work in the first place.
- Ask the following questions about the value you’re trying to return; if none of the answers are “yes,” you can’t assign that value to a field using the Custom Computation method:
- Is the return value a field type? ({ value: [some value], type: [some type] ...(other fields depending on type) })
- Is the return value a basic type? (string, number, boolean, null)
- If the return value is an array, is it an array of either of the above?
- Note that that data available in custom computations is not exactly the same as the data available to the postprocessor. The Custom Computation method can only see values that are in the “fields” array in the config, rather than every field included in the final parsed document, and it sees the extracted values before they’ve been fully transformed to JSON. You may find, for example, that a value you were using in the postprocessor that came from a computed field can’t be used directly in custom computation. Logging will be a big help here.
General troubleshooting tips
- Make use of the ability to collapse fields in the Sensible app's editor. If you’re working on the structure of a rule that includes several levels of nesting, or one with conditionals - anything where the arguments passed to an operator might end up very far apart, visually - collapse the sections of the rule until you only see the structure you currently need to think about.
- In general, try to explain what you want to do in normal language first. You’ll probably end up using some key words that will suggest which operators to use, and it will help you solidify your goal before getting too deep in the weeds with navigating the data.
- Keep the list of available operators handy and scan it as you work so it’s easy to know what’s available to you.
- In general we recommend avoiding the built-in Reduce operator in favor of simpler syntactic alternatives because it can be difficult to troubleshoot the Reduce operation.
Samples
For examples of using JsonLogic to transform extracted document data, see this repo. To test them out, upload the examples to your Sensible account.
Resources
- Official JSON Logic Engine Docs
- JSON Logic Engine Default Methods - Code
- Original JsonLogic Docs
- Sensible JsonLogic Extensions Docs
- Sensible Custom Computation Docs
- Sensible Postprocessor Docs
- Sensible Validation Docs
Conclusion
Sensible’s Custom Computation method and postprocessor gives you all the power of JsonLogic for transforming your extracted document data schemas.
Explore our prebuilt open-source library for extracting from common business documents, check out our docs, and sign up for an account to start extracting and transforming data from your own documents.