Giter Site home page Giter Site logo

awesome-key-information-extraction's Introduction

Awesome Key Infomation Extraction

Awesome

A curated list of papers about key information extraction.

Paperswithcode links will be preferred.

Welcome contributions!

Tabel of Contents

Datasets

Name Title Links
DUE DUE: End-to-End Document Understanding Benchmark [link]
RVL-CDIP Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval [link][download]
SROIE ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction [link][download]
FUNSD FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents [link][download]
XFUND XFUND: A Multilingual Form Understanding Benchmark [link]
CORD CORD: A Consolidated Receipt Dataset for Post-OCR Parsing [link]
EPHOIE Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution [link]
EATEN EATEN: Entity-aware Attention for Single Shot Visual Text Extraction [link]
Train Ticket PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks [link][download]
POIE Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution [link][download]

Survey

Year Title Links
2023 On the Hidden Mystery of OCR in Large Multimodal Models [link]
2021 Document AI: Benchmarks, Models and Applications [link]

Toolkits

Year Title Links
2022 DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding [paper][code]
2021 MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding [paper][code]
2020 PP-OCR: A Practical Ultra Lightweight OCR System [paper][code]

Models

⭐LLM-Based

Pub. Year Title Links
ICML 2023 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models [link]
Arxiv 2023 InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning [link]
Arxiv 2023 MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models [link]
Arxiv 2023 Visual Instruction Tuning [link]
Arxiv 2023 Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond [link]
Arxiv 2023 mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality [link]
Arxiv 2023 mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding [link]
Arxiv 2023 mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration [link]
Arxiv 2023 Otter: A Multi-Modal Model with In-Context Instruction Tuning [link]
Arxiv 2023 UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model [link]
Blog 2023 Fuyu-8B: A Multimodal Architecture for AI Agents [blog][model]

Graph-Based

Pub. Year Title Links
ICDAR 2023 LayoutGCN: A Lightweight Architecture for Visually Rich Document Understanding [paper]
ACL-Findings 2021 Spatial Dependency Parsing for Semi-Structured Document Information Extraction [link]
Arxiv 2021 Spatial Dual-Modality Graph Reasoning for Key Information Extraction [link]
ICPR 2020 PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks [link]

Transformer-Based

Pub. Year Title Links
ACL 2022 LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding [link]
ACL 2022 FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction [link]
CVPR 2022 XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding [link]
Arxiv 2022 LoPE: Learnable Sinusoidal Positional Encoding for Improving Document Transformer Model [link]
Arxiv 2022 LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking [link]
Arxiv 2022 ERNIE-Layout: Layout-Knowledge Enhanced Multi-modal Pre-training for Document Understanding [link]
AAAI 2022 BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents [link]
ICDAR 2021 ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents [link][code]
Arxiv 2021 TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models [link]
ACM-MM 2021 StrucTexT: Structured Text Understanding with Multi-Modal Transformers [link]
ACL 2021 LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding [link]
KDD 2020 LayoutLM: Pre-training of Text and Layout for Document Image Understanding [link]

Grid-Based

Pub. Year Title Links
ICDAR 2021 ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents [link]
ICDAR 2021 VisualWordGrid: Information Extraction From Scanned Documents Using A Multimodal Approach [link]
NIPS 2019 BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding [link]
EMNLP 2018 Chargrid: Towards Understanding 2D Documents [link]

End-to-end

Pub. Year Title Links
ICDAR 2023 Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution [link]
ICML 2023 Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding [link]
ECCV 2022 OCR-free Document Understanding Transformer [link]
Arxiv 2022 TRIE++: Towards End-to-End Information Extraction from Visually Rich Documents [link]
ICCV 2021 DocFormer: End-to-End Transformer for Document Understanding [link]
ACM-MM 2020 TRIE: End-to-End Text Reading and Information Extraction for Document Understanding [link]
ICDAR 2019 EATEN: Entity-aware Attention for Single Shot Visual Text Extraction [link]

Others

Pub. Year Title Links
ICDAR 2023 Information Extraction from Documents: Question Answering vs Token Classification in real-world setups [link]

Reference

awesome-key-information-extraction's People

Contributors

entropy2333 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.