What I Learned: AI Agents and Exploring GPTs - Invoice Summarizer
Building an invoice summarizing bot with ChatGPTs new "GPT" feature.
A few months ago I published a post here called “Can ChatGPT Actually Do Audit Work?”. In it, I explained how I was able to use ChatGPT's new “Code Interpreter” feature to extract information from an invoice and summarize it in a table that an auditor could easily copy into their working paper. I also shared a shortened version of the post to LinkedIn which was pretty successful.
Since then, OpenAI has released quite a few new features including “GPT Vision” - a feature that allows the tool to better interact with images, and as of last week, “GPTs” - a new capability that allows users to create customized AI agents, tailored with specific instructions and unique data sets.
This is a massive step forward for the technology. I’ve already seen discussions online talking about “AI swarms” - groups of specialized AI bots, each fine-tuned for a distinct task which is then passed onto another AI agent as a complex job is completed. It could revolutionize the workflow of traditional knowledge work.
So, in light of my previous post and OpenAI’s new features, I decided to try to create an AI agent myself. The goal was to create a GPT that could read and summarize multiple invoices into a structured table, regardless of format or currency, as well as perform any calculations needed to get the total pre-tax amount of the invoice in Canadian Dollars.
I’ll admit, it took me much longer than I expected, but I was able to pull it off. Keep reading for a description of how it works and my major takeaways, or you can just watch the video of it in action below:
1×
How It Works
User Input: The agent begins by inquiring which details the user needs from the invoices, offering standard options for simplicity.
GPT Vision at Play: Using GPT Vision, the agent scans through the invoices provided by the user, pulling the needed information into a table.
Currency Translation Checkpoint: At this stage, users can request foreign currency conversions and verify the extracted data, ensuring accuracy before proceeding.
Real-time Exchange Rates: Leveraging the Bank of Canada's "Valet" API, the tool fetches historical exchange rates for precise calculations.
Final Output: A neatly organized table is presented, ready for integration into the user’s workflow.
Key Takeaways
Room for Improvement: Despite its efficacy in some scenarios, the tool still struggles with a few key issues that likely prevent a tool like this from being able to be used in real life:
- Consistency of output: Despite spending hours refining my prompts, for such a specific task like this, there are still common instances where it ignores my instructions or gets things wrong.
- Highly complex invoices: The agent is sometimes unable to identify the required information.
- Pre-tax amounts: An auditor paying attention will probably point out that the tool should really be pulling the pre-tax amount of the transaction rather than the total. This is true, but at the moment the tool still struggles to identify this value. I think it may be possible to resolve this with more time and a better refined set of instructions, but the issue remains nonetheless.
Overall, the necessity of human verification remains, but the potential is undeniable.GPT Vision vs. OCR: A weekend of trials and errors taught me the distinct difference between GPT Vision and traditional Optical Character Recognition (OCR). Basically, OCR technology scans digital images to identify and extract text. It works like a text detector, focusing purely on recognizing characters and words. It’s rule-based and struggles with variability in formats, fonts and layouts - something that invoices often have plenty of.GPT Vision, however, not only recognizes text within images but also interprets it in context. It’s like OCR with a layer of AI understanding. So it can understand nuances in the text like variances in terminology or layouts.I spent hours trying to get OCR to work, but after switching to GPT Vision, the tool worked almost immediately - Lesson learned!
API Adventures: Getting the tool to accurately and flexibly perform foreign exchange calculations was also a major sticking point for me. At first I assumed ChatGPT would simply be able to search the web for the needed rate but continuous errors forced me to change my approach.I resorted to downloading CSV files from the Bank of Canada website containing historical exchange rates for every day of the last 5 years and then uploading them to the tool's “knowledge base”. This worked at first, however, after realizing that I needed to remove “Code Interpreter” functionality from the agent, as discussed above, the tool could no longer read the CSVs. Furthermore, I wasn’t truly happy with the solution anyway because it only worked on the few currencies that I downloaded the information for and would require constant updating.Finally, after extensive experimentation, I learned that the best option would be to integrate the tool with the Bank of Canada website, using the BoC’s API called “Valet”. I had never done this before and doing so requires writing quite a bit of code. Nonetheless, I was able to get it done by sharing the BoC Valet documentation page with ChatGPT in another conversation window, asking it to write the code for me and then iterating back and forth through each issue and error until the code worked - pretty incredible honestly.
Conclusion
Yet again, this foray into the new GPT feature has convinced me of its potential as a pivotal breakthrough in AI technology. While there are challenges in consistency and controllability, the level of customization and power it offers users is unprecedented.
Feel free to check out my tool yourself by using the link below: