INFORMATION EXTRACTION FROM STRUCTURED DOCUMENT
Date
2023-04-30
Journal Title
Journal ISSN
Volume Title
Publisher
I.O.E. Pulchowk Campus
Abstract
This project proposes the use of the LayoutLMv2 model, a deep learning model, for information
extraction from form-like documents. Specifically, the IRS 990 tax form was used as
the dataset for testing and optimization. The information extraction process from form-like
documents can be challenging due to the complex layout analysis and text recognition required
to identify fields and corresponding values. The proposed model, LayoutLMv2, has
demonstrated its effectiveness in these tasks, making it a promising solution for information
extraction from form-like documents. The project resulted in the development of a web
application and annotation tools that provide users with a user-friendly interface to upload
documents and extract relevant information accurately and efficiently. The annotation tool
enables users to label data and train custom models, while the web application streamlines
document processing for businesses and organizations.
Description
This project proposes the use of the LayoutLMv2 model, a deep learning model, for information
extraction from form-like documents. Specifically, the IRS 990 tax form was used as
the dataset for testing and optimization.
Keywords
Transformers,, OCR engine,, Information Extraction