Please use this identifier to cite or link to this item: https://elibrary.tucl.edu.np/handle/123456789/18835
Title: INFORMATION EXTRACTION FROM STRUCTURED DOCUMENT
Authors: KANU, AAYUSH SHAH
POKHREL, ADITHYA
BASHYAL, BISHAL
SHARMA, JANAK
Keywords: Transformers,;OCR engine,;Information Extraction
Issue Date: 30-Apr-2023
Publisher: I.O.E. Pulchowk Campus
Institute Name: Institute of Engineering
Level: Bachelor
Abstract: This project proposes the use of the LayoutLMv2 model, a deep learning model, for information extraction from form-like documents. Specifically, the IRS 990 tax form was used as the dataset for testing and optimization. The information extraction process from form-like documents can be challenging due to the complex layout analysis and text recognition required to identify fields and corresponding values. The proposed model, LayoutLMv2, has demonstrated its effectiveness in these tasks, making it a promising solution for information extraction from form-like documents. The project resulted in the development of a web application and annotation tools that provide users with a user-friendly interface to upload documents and extract relevant information accurately and efficiently. The annotation tool enables users to label data and train custom models, while the web application streamlines document processing for businesses and organizations.
Description: This project proposes the use of the LayoutLMv2 model, a deep learning model, for information extraction from form-like documents. Specifically, the IRS 990 tax form was used as the dataset for testing and optimization.
URI: https://elibrary.tucl.edu.np/handle/123456789/18835
Appears in Collections:Computer Engineering

Files in This Item:
File Description SizeFormat 
Aayush shah kanu et al. be report computer apr 2023.pdf3.89 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.