What is document understanding and why it's a crucial part for many businesses?

Document understanding allows the processing of different types of documents including images and PDF files in a streamlined manner. For example, extracting key information from invoices (total amount, address, email, ...).

Many businesses need to process huge amounts of documents everyday. Without proper automation tools that perform document understanding, this processing can be costly both in terms of time and money.

In today's edition of AIFEE, we're going to look at some of the advanced techniques in deep learning that can be very useful for document understanding tasks.

How to Extract Key Information from Documents using Deep Learning?

Document understanding is a crucial part of many businesses. Why?

Because it aims at making it possible to make documents such as PDFs and images with text, easily understood by computers. Which in turns can save a ton amount of time and consequently money.

For example, a document understanding system can be used to extract important information such as: "customer name", "customer address" and "total amount" from an invoice.

Building document understanding systems has been heavily relying on OCR (Optical Character Recognition).

This means that to understand a document, you would need to first pass it through an OCR system such as Tesseract to extract the text and its position from the document.

This text is later used as input to your system for understanding the document.

A new approach has been developed for document understanding which is completely OCR free!

The approach is called Donut!

Donut tries to address 3 drawbacks of OCR based document understanding systems:

1 - High computational costs for using OCR.
2 - Inflexibility of OCR models on languages or types of documents.
3 - OCR error propagation to the subsequent process.

Donut has achieved state of the art on several document understanding datasets and has exceeded them both in terms of speed and accuracy.

More on Donut in the original paper and on the original github repo.

Recognize Handwritten Text in Documents

Did you know that you can recognize handwriting with high accuracy using deep learning?

This process is called OCR or ICR.

Although I've seen several models attempting to solve this problem and I have personally built some of them, one approach is just so advanced that it's mind boggling how accurate it is.

I personally tested this approach on my own (very terrible) handwriting and it gave very accurate results!

The approach is called TrOCR.

It's a transformer based encoder-decoder model.

With a few lines of code, you can instantiate the model and make predictions on your own images. Below is a sample code.

You can test the code on HuggingFace.

Tweet of the week

Gokul Rajaram

@gokulr

We are seeing the first wave of white collar workers being laid off as a result of AI. Specifically, as law firms adopt  AI tools for document processing, they are starting to lay off paralegals. Moving very fast.
No white collar profession is safe.

12:17 AM • Mar 18, 2023

120

Retweets

836

Likes

Read 119 replies

Is document understanding with AI endangering white collar professions?

Machine Learning for Medical Imaging

ML for Documents Understanding

What is document understanding and why it's a crucial part for many businesses?

How to Extract Key Information from Documents using Deep Learning?

Recognize Handwritten Text in Documents

Tweet of the week

What'd you think of today's edition?

AI Scribes: The Future of Medical Documentation?

From DeepSeek to Lung Tumors

LLMs that are HIPAA Compliant!