Giter Site home page Giter Site logo

vrdu-doctor's Introduction

VrDU-Doctor 🏥💊

We research why and how VrDU Multimodal models answer what they answer

Introduction

Visually-rich Document Understanding (VrDU) consists of a Deep Learning (DL) Model synthesizing or selecting information from documents (images with text) to answer a question or classify a chunk of text. VrDU tasks are multimodal, i.e., models use information from text, images, or even the document layout to solve the tasks.

Humans have worked with different formats of documents since the beginning of History (inscriptions, cards, books, etc.). Nowadays, we still work with non-digital documents: we still receive a medical record when visiting the doctor or applying to the university with a transcript of records. On the other hand, our lives are increasingly digital: part of our relevant data is digital; therefore, we deal with data transfer from the analogical to the digital domain.

Examples 🔎

There are multiple application examples:

  • 🎫 Purchase tickets ➡️ Splitwise/Tricount 💵📲
  • 📑 Transcript of records ➡️ University database 💯 📊
  • 📜 Your rental contract ➡️ Tax Agency 👮💰

Hypothesis 💭

Recent contributions have shown the potential models that can automate data gathering and bridge the gap between the analogical and digital domains. Nevertheless, they have not paid much attention to answering some questions.

  • What are the contributions of the multimodal parts?
  • Could we avoid using any modal part?
  • Is every piece of data equally important to solve a well-defined task?
  • What is the perfect dataset size to fine-tune a VrDU model?
  • Could we achieve similar results, improving the quality and reducing the amount of pre-training data?

Models

We are researching with one of the SOTA family models: LayoutLM. We expect to broaden our scope soon:

  • LayoutLMv2
  • LayoutXLM
  • LayoutLMv3
  • Donut
  • Udop

Dataset

We use the OOL Dataset (A Cool Tool for School Fool). The OOL Dataset is a synthetic multimodal dataset (Image + Text + Layout) crafted explicitly for the Visually-rich Document Understanding task. The OOL Dataset contains 33k fully labeled samples and gathers students' records in English and Spanish. You can find more about the OOL Dataset here:

Results

Under development 🛠️

Team

We are researchers from Comillas Pontifical University

  • Ignacio de Rodrigo @nachoDRT: PhD Student. Benchmark Design, Software Development, and Data Analysis.
  • Alberto Sánchez @ascuadrado: Research Assistant. Benchmark Design and  Data Analysis.

Citation

If you find our research interesting, please cite our work. 📃✒️

vrdu-doctor's People

Contributors

nachodrt avatar

Stargazers

Alberto Sánchez avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.