INFORMATION EXTRACTION FROM STRUCTURED DOCUMENT

KANU, AAYUSH SHAH; POKHREL, ADITHYA; BASHYAL, BISHAL; SHARMA, JANAK

INFORMATION EXTRACTION FROM STRUCTURED DOCUMENT

Files

Aayush shah kanu et al. be report computer apr 2023.pdf (3.8 MB)

Date

2023-04-30

Authors

Publisher

I.O.E. Pulchowk Campus

Abstract

This project proposes the use of the LayoutLMv2 model, a deep learning model, for information extraction from form-like documents. Specifically, the IRS 990 tax form was used as the dataset for testing and optimization. The information extraction process from form-like documents can be challenging due to the complex layout analysis and text recognition required to identify fields and corresponding values. The proposed model, LayoutLMv2, has demonstrated its effectiveness in these tasks, making it a promising solution for information extraction from form-like documents. The project resulted in the development of a web application and annotation tools that provide users with a user-friendly interface to upload documents and extract relevant information accurately and efficiently. The annotation tool enables users to label data and train custom models, while the web application streamlines document processing for businesses and organizations.

Description

This project proposes the use of the LayoutLMv2 model, a deep learning model, for information extraction from form-like documents. Specifically, the IRS 990 tax form was used as the dataset for testing and optimization.