Light

xuan2k / awesome-key-information-extraction Goto Github PK

View Code? Open in Web Editor NEW

This project forked from entropy2333/awesome-key-information-extraction

0.0 0.0 0.0 7 KB

A curated list of papers about key information extraction.

awesome-key-information-extraction's Introduction

Awesome Key Infomation Extraction

A curated list of papers about key information extraction.

Paperswithcode links will be preferred.

Welcome contributions!

Tabel of Contents

Awesome Key Infomation Extraction
- Tabel of Contents
- Datasets
- Survey
- Toolkits
- Models
- Reference

Datasets

Name	Title	Links
DUE	DUE: End-to-End Document Understanding Benchmark	[link]
RVL-CDIP	Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval	[link][download]
SROIE	ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction	[link][download]
FUNSD	FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents	[link][download]
XFUND	XFUND: A Multilingual Form Understanding Benchmark	[link]
CORD	CORD: A Consolidated Receipt Dataset for Post-OCR Parsing	[link]
EPHOIE	Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution	[link]
EATEN	EATEN: Entity-aware Attention for Single Shot Visual Text Extraction	[link]
Train Ticket	PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks	[link][download]
POIE	Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution	[link][download]

Survey

Year	Title	Links
2023	On the Hidden Mystery of OCR in Large Multimodal Models	[link]
2021	Document AI: Benchmarks, Models and Applications	[link]

Toolkits

Year	Title	Links
2022	DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding	[paper][code]
2021	MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding	[paper][code]
2020	PP-OCR: A Practical Ultra Lightweight OCR System	[paper][code]

Models

⭐LLM-Based

Pub.	Year	Title	Links
ICML	2023	BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models	[link]
Arxiv	2023	InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning	[link]
Arxiv	2023	MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models	[link]
Arxiv	2023	Visual Instruction Tuning	[link]
Arxiv	2023	Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond	[link]
Arxiv	2023	mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality	[link]
Arxiv	2023	mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding	[link]
Arxiv	2023	mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration	[link]
Arxiv	2023	Otter: A Multi-Modal Model with In-Context Instruction Tuning	[link]
Arxiv	2023	UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model	[link]
Blog	2023	Fuyu-8B: A Multimodal Architecture for AI Agents	[blog][model]

Graph-Based

Pub.	Year	Title	Links
ICDAR	2023	LayoutGCN: A Lightweight Architecture for Visually Rich Document Understanding	[paper]
ACL-Findings	2021	Spatial Dependency Parsing for Semi-Structured Document Information Extraction	[link]
Arxiv	2021	Spatial Dual-Modality Graph Reasoning for Key Information Extraction	[link]
ICPR	2020	PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks	[link]

Transformer-Based

Pub.	Year	Title	Links
ACL	2022	LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding	[link]
ACL	2022	FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction	[link]
CVPR	2022	XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding	[link]
Arxiv	2022	LoPE: Learnable Sinusoidal Positional Encoding for Improving Document Transformer Model	[link]
Arxiv	2022	LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking	[link]
Arxiv	2022	ERNIE-Layout: Layout-Knowledge Enhanced Multi-modal Pre-training for Document Understanding	[link]
AAAI	2022	BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents	[link]
ICDAR	2021	ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents	[link][code]
Arxiv	2021	TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models	[link]
ACM-MM	2021	StrucTexT: Structured Text Understanding with Multi-Modal Transformers	[link]
ACL	2021	LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding	[link]
KDD	2020	LayoutLM: Pre-training of Text and Layout for Document Image Understanding	[link]

Grid-Based

Pub.	Year	Title	Links
ICDAR	2021	ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents	[link]
ICDAR	2021	VisualWordGrid: Information Extraction From Scanned Documents Using A Multimodal Approach	[link]
NIPS	2019	BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding	[link]
EMNLP	2018	Chargrid: Towards Understanding 2D Documents	[link]

End-to-end

Pub.	Year	Title	Links
ICDAR	2023	Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution	[link]
ICML	2023	Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding	[link]
ECCV	2022	OCR-free Document Understanding Transformer	[link]
Arxiv	2022	TRIE++: Towards End-to-End Information Extraction from Visually Rich Documents	[link]
ICCV	2021	DocFormer: End-to-End Transformer for Document Understanding	[link]
ACM-MM	2020	TRIE: End-to-End Text Reading and Information Extraction for Document Understanding	[link]
ICDAR	2019	EATEN: Entity-aware Attention for Single Shot Visual Text Extraction	[link]

Others

Pub.	Year	Title	Links
ICDAR	2023	Information Extraction from Documents: Question Answering vs Token Classification in real-world setups	[link]

Reference

awesome-key-information-extraction's People

Contributors

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.