INFORMATION EXTRACTION FROM STRUCTURED DOCUMENT

dc.contributor.authorKANU, AAYUSH SHAH
dc.contributor.authorPOKHREL, ADITHYA
dc.contributor.authorBASHYAL, BISHAL
dc.contributor.authorSHARMA, JANAK
dc.date.accessioned2023-07-31T05:32:37Z
dc.date.available2023-07-31T05:32:37Z
dc.date.issued2023-04-30
dc.descriptionThis project proposes the use of the LayoutLMv2 model, a deep learning model, for information extraction from form-like documents. Specifically, the IRS 990 tax form was used as the dataset for testing and optimization.en_US
dc.description.abstractThis project proposes the use of the LayoutLMv2 model, a deep learning model, for information extraction from form-like documents. Specifically, the IRS 990 tax form was used as the dataset for testing and optimization. The information extraction process from form-like documents can be challenging due to the complex layout analysis and text recognition required to identify fields and corresponding values. The proposed model, LayoutLMv2, has demonstrated its effectiveness in these tasks, making it a promising solution for information extraction from form-like documents. The project resulted in the development of a web application and annotation tools that provide users with a user-friendly interface to upload documents and extract relevant information accurately and efficiently. The annotation tool enables users to label data and train custom models, while the web application streamlines document processing for businesses and organizations.en_US
dc.identifier.urihttps://hdl.handle.net/20.500.14540/18835
dc.language.isoenen_US
dc.publisherI.O.E. Pulchowk Campusen_US
dc.subjectTransformers,en_US
dc.subjectOCR engine,en_US
dc.subjectInformation Extractionen_US
dc.titleINFORMATION EXTRACTION FROM STRUCTURED DOCUMENTen_US
dc.typeReporten_US
local.academic.levelBacheloren_US
local.affiliatedinstitute.titlePulchowk Campusen_US
local.institute.titleInstitute of Engineeringen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Aayush shah kanu et al. be report computer apr 2023.pdf
Size:
3.8 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: