15 July 2020

RPA vs IQ Bot for Processing Unstructured Data from Invoices

The Developers at Extra Technology (me included) spend a considerable amount of time creating and developing Bots which automate processes for our clients in the financial sector.

Not surprisingly, as a result, invoices are frequently handled by Bots. After all, the invoice process is integral to most financial transactions.

Those Pesky Invoices

Invoices can be incredibly tricky to work with, and this is primarily due to a lack of uniformity between invoices from different companies. It can make extracting information from them a Developer’s nightmare, even if all invoices are presented in a single format (and we cannot dangerously assume that they will all be PDFs). Organisations will frequently use completely different layouts, rendering each invoice unique in its format and presentation.

I discovered how difficult this can make automating invoice processes during one of my first forays in RPA. Tasked with sorting invoices into folders depending on the organisation they were from, I found that even finding a way to ascertain the invoice’s sender was a struggle. Typically, the invoices being processed identified their sender using an image (which was useless for the Bot’s purposes, as it needed to extract a name) or if using text, located it in an entirely new position.

A Solution? Maybe Not...

By law, a company’s registration number and registered address are required to be present on any invoice. Using this one (supposed) constant, I was able to create a workable solution to the issue.

First, the PDF had to be converted to a text file, which I did using Automation Anywhere’s ‘convert PDF to text’ feature. Once this had been done, the file could be searched for its company registration number. As the registration number for English and Welsh companies is supposed to be included after the text ‘Registered in England and Wales Number’, it was a simple matter to copy this number to a variable. Then, using the website ‘Companies House’, this company number could then be entered into a search bar with Automation Anywhere’s web recorder. The name of the company that the search tool returns was then used to name a folder, into which the invoice and any other invoices with that registration number could be saved.

Unfortunately, a simple task transformed into an extremely complicated one and it was soon apparent this method wasn’t going to work well. Whilst it was able to sort a small portion of the invoices given for testing, the majority still could not be processed by this method due to a variety of reasons. Some PDFs were encrypted and couldn’t be converted to a text file. Others were completely missing a registration number. But the greatest problem was that most company numbers were not preceded by the correct text, which then meant that most of the files wound up being uncategorised. I realised that I had to change the method of identifying company codes to be the first eight (or seven) digit number after any instances of the word ‘company’, ‘reference’ or ‘registered’ appeared in a line. This made the Bot able to sort most of the invoices, however, a few always managed to slip through the net, resulting in the method having to become yet more convoluted.

This process wound up being an imperfect one. Even if I’d put code in place to catch every single format that might have occurred, there was still no guarantee that an invoice would adhere to any of them. Our financial officer would still have needed to sort through many invoices that had been added to the unclassified folder. In addition to this, the next task required would have been to extract data from these invoices, which would likely have experienced similar (if somewhat less complicated) issues to ascertaining their parent company.

The Solution? Definitely.

I dedicated some time to thinking about how I would have solved this frustrating puzzle.

Fortunately, the answer was presented on a silver platter by Automation Anywhere™, in the form of their IQ Bot.

IQ Bot features a variety of pre-made domains, including an ‘invoices’ domain. Selecting this allows the user to draw a list of form fields in a manner like the ‘extract form fields’ command for PDFs. It also allows them to capture ‘table’ fields in addition to extracting key information from the document. When creating fields in this tool, the creator can also capture various identifying headers from nearby.

On documents with a different layout, the IQ Bot will search for these headers first. It may of course, not find them. At this point, the most interesting part of this Bot will become apparent: the machine learning that it uses. The Bot will prompt a user to show it whether it has missed any fields or collected the wrong information from them. It will remember any changes made by the user, therefore learning the layout of new documents dynamically.

This is initially more of a time investment for the user than a typical Bot, but eventually, an IQ Bot should be able to extract information from any document that we give it without requiring any type of human input. This includes entirely new layouts that it has not previously been corrected on.

In my experience, it is clear that IQ Bot is the logical tool to process invoices and is capable to deal with the large numbers of variations that can occur in these documents.

IQ Bot can be used as a ‘normal command’ when developing in Automation Anywhere™ and I strongly recommend anyone automating invoice processes do so.

 
George Pettipher

About the Author
George is a Master Certified Automation Anywhere™ Developer, delivering critical Bots for global organisations in multiple sectors. A key member of our technical team, equally at home undertaking face-to-face process analysis with customers or coding in our R&D lab. He contributes to the best practices policies set-out by our Centre of Excellence and works closely with our CTO defining future strategies.

Copyright © Extra Technology Ltd. All Rights Reserved. All trademarks, trade names, service marks, and logos referenced herein belong to their respective companies.

Our Latest Blog Posts