ML for Documents Understanding


What is document understanding and why it's a crucial part for many businesses?

Document understanding allows the processing of different types of documents including images and PDF files in a streamlined manner. For example, extracting key information from invoices (total amount, address, email, ...).

Many businesses need to process huge amounts of documents everyday. Without proper automation tools that perform document understanding, this processing can be costly both in terms of time and money.

In today's edition of AIFEE, we're going to look at some of the advanced techniques in deep learning that can be very useful for document understanding tasks.

How to Extract Key Information from Documents using Deep Learning?

​

Document understanding is a crucial part of many businesses. Why?
​
Because it aims at making it possible to make documents such as PDFs and images with text, easily understood by computers. Which in turns can save a ton amount of time and consequently money.
​
For example, a document understanding system can be used to extract important information such as: "customer name", "customer address" and "total amount" from an invoice.
​
Building document understanding systems has been heavily relying on OCR (Optical Character Recognition).
​
This means that to understand a document, you would need to first pass it through an OCR system such as Tesseract to extract the text and its position from the document.
​
This text is later used as input to your system for understanding the document.
​
A new approach has been developed for document understanding which is completely OCR free!
​
The approach is called Donut!
​
Donut tries to address 3 drawbacks of OCR based document understanding systems:
​
1 - High computational costs for using OCR.
2 - Inflexibility of OCR models on languages or types of documents.
3 - OCR error propagation to the subsequent process.
​
Donut has achieved state of the art on several document understanding datasets and has exceeded them both in terms of speed and accuracy.
​
More on Donut in the original paper and on the original github repo.

​

Recognize Handwritten Text in Documents

Did you know that you can recognize handwriting with high accuracy using deep learning?
​
This process is called OCR or ICR.
​
Although I've seen several models attempting to solve this problem and I have personally built some of them, one approach is just so advanced that it's mind boggling how accurate it is.
​
I personally tested this approach on my own (very terrible) handwriting and it gave very accurate results!
​
The approach is called TrOCR.
​
It's a transformer based encoder-decoder model.
​
With a few lines of code, you can instantiate the model and make predictions on your own images. Below is a sample code.
​
You can test the code on HuggingFace.

​

Tweet of the week

Is document understanding with AI endangering white collar professions?


​

What'd you think of today's edition?

​

Machine Learning for Medical Imaging

👉 Learn how to build AI systems for medical imaging domain by leveraging tools and techniques that I share with you! | 💡 The newsletter is read by people from: Nvidia, Baker Hughes, Harvard, NYU, Columbia University, University of Toronto and more!

Read more from Machine Learning for Medical Imaging

Hello Reader, Welcome to another edition of PYCAD newsletter where we cover interesting topics in Machine Learning and Computer Vision applied to Medical Imaging. The goal of this newsletter is to help you stay up-to-date and learn important concepts in this amazing field! I've got some cool insights for you below ↓ AI Scribes: Transforming Medical Documentation Web Application for Medical Note Generation AI-powered medical scribes are revolutionizing clinical workflows by automating...

Hello Reader, Welcome to another edition of PYCAD newsletter where we cover interesting topics in Machine Learning and Computer Vision applied to Medical Imaging. The goal of this newsletter is to help you stay up-to-date and learn important concepts in this amazing field! I've got some cool insights for you below ↓ DeepSeek: A New Player in AI for Healthcare The new open-source LLM, DeepSeek, is creating buzz for its potential to transform AI in medicine and healthcare. Designed for...

Hello Reader, Welcome to another edition of PYCAD newsletter where we cover interesting topics in Machine Learning and Computer Vision applied to Medical Imaging. The goal of this newsletter is to help you stay up-to-date and learn important concepts in this amazing field! I've got some cool insights for you below ↓ Now You Can Use Large Language Models that are HIPAA Compliant People are finding ways to use large language models in all fields. MedTech is no exception. The amount of work...