Giter Site home page Giter Site logo

andreagemelli / doc2graph Goto Github PK

View Code? Open in Web Editor NEW
106.0 106.0 18.0 476.85 MB

Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.

Home Page: https://link.springer.com/chapter/10.1007/978-3-031-25069-9_22

License: MIT License

Python 26.85% Jupyter Notebook 73.15%
deep-learning document-understanding geometric-deep-learning gnn key-information-extraction layout-analysis nlp pytorch table-detection

doc2graph's Introduction

πŸ’« About Me:

Hi πŸ‘‹,
I am Andrea, a passionate person who loves to study and discorver.
I've started a Ph.D. about 🌱 Graph Neural Networks and Document Understanding in 2020, based in πŸ“ UniverstΓ  degli Studi di Firenze, Florence. I have also been working one year as a visiting researcher at the Computer Vision Center (CVC) in Barcelona, for a joint program between our labs. At the start of 2023 I have been entitled as IAPR International Scholar winning the IAPR RESEARCH SCHOLARSHIPS.

πŸ“ My Github Projects

On my github you can find several repos / projects, that I have been working on through the last five years.
I used to work on Machine Learning 🧠, Games πŸ•ΉοΈ and Softwares πŸ’Ύ

During Ph.D.

  • 🧠 Doc2Graph: transforms your documents into graphs and exploit a GNN to solve several tasks. (πŸ”— repo | πŸ“„ paper)
  • 🧠 GNN-TableExtraction: a graph-based technique to extract tables from scientific papers. (πŸ”— repo | πŸ“„ paper)
  • 🧠 DA-GraphTab: data augmentation for graph structures (πŸ”— repo | πŸ“„ paper)
  • 🧠 cte-dataset: a dataset for Contextualized Table Extraction (πŸ”— repo | πŸ“„ paper)
  • πŸ•ΉοΈ guessmylanguage: a team-based game to annotate handwritings (πŸ”— repo)

During MCS

  • 🧠 Action-recognition-by-2D-skeleton-analysis: a computer vision project to recognize actions in a scene, analyzing the movement of people skeleton. (πŸ”— repo | πŸ“„ report)
  • 🧠 Flying-Objects-Detection-and-Recognition: a computer vision project aiming at detecting UAVs in airport spaces. (πŸ”— repo | πŸ“„ report)
  • 🧠 Floorplan-Text-Detection-and-Recognition: automatic detection and recognition of text in floorplan images, e.g. rooms' names (πŸ”— repo | πŸ“„ report)
  • πŸ’Ύ lapis: a tool to organize professors and students meetings (πŸ”— repo | πŸ“„ report)
  • πŸ’Ύ OnlineShopSimulator: a TDD project, simulating an online shop (πŸ”— repo)
  • πŸ•ΉοΈ Escape-Room-VR: a 3D escape room game developed using an Oculus Rift (πŸ”— repo)
  • πŸ•ΉοΈ gameoflife: the famous game developed using Kivy (πŸ”— repo)

During free time

  • πŸ•ΉοΈ FightGPT: an RPG game developed using ChatGPT (πŸ”— repo)
  • πŸ’Ύ andreagemelli.github.io: my portoflio website (πŸ”— repo)

πŸ“¬ Keep in touch!

You can write me at βœ‰οΈ [email protected]
Get to know more about me on my Web Page

Instagram LinkedIn Medium Twitter

doc2graph's People

Contributors

andreagemelli avatar enricivi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

doc2graph's Issues

Training on DocVQA?

Super interesting project you have here!

I'm interested in training for DocVQA. I've used models like LiLT and LayoutLMV1/2/3 in the past, but graph approaches like this are pretty new to me.

Traditionally, DocVQA models are setup to predict start/end positions of the answer within the input text. Any tips on setting this up with Doc2Graph?

Passing linking for test images .

Hi ,

At the time of prediction we are passing linking of graph which we need to predict ?
in graph builder.py we creating linking for graph nodes during training and performing the same for testing also.

How to set tg batch_size

When I train FUNSD e2e, Error appears on the relu function line:
image

RuntimeError: CUDA out of memory. Tried to allocate 4.21 GiB (GPU 0; 11.74 GiB total capacity; 4.55 GiB already allocated; 3.44 GiB free; 4.56 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF.

batch_size: 16
epochs: 10000
lr: 1e-3
weight_decay: 1e-4
val_size : 0.1
optimizer: Adam # or AdamW, SGD - NOT YET IMPLEMENTED
scheduler:
ReduceLROnPlateau # CosineAnnealingLR, None- NOT YET IMPLEMENTED
stopper_metric: acc # loss or acc
seed: 42
the batch_size is useless,
train_graphs = [data.graphs[i] for i in train_index]
tg = dgl.batch(train_graphs)
tg = tg.int().to(device)

If I don't k-fold, train_graphs size is 149, so tg's batch_size is 149. Then CUDA out of memory has occurred. I have tested both fully and knn, all error. How do I set tg batch_size to aviod RuntimeError. train.yaml above seems to be useless.

My GPU is 12G 3060, Thank you!

Environment installation problem

Hi! Thank you for repo!
I have a problem installing dependencies. Could you help solve this?

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
dglgo 0.0.2 requires pydantic>=1.9.0, but you have pydantic 1.8.2 which is incompatible.

And when i run scripts:

Traceback (most recent call last):
  File "/home/addudkin/doc2graph/doc2graph/src/main.py", line 4, in <module>
    from src.inference import inference
  File "/home/addudkin/doc2graph/doc2graph/src/inference.py", line 7, in <module>
    from src.data.feature_builder import FeatureBuilder
  File "/home/addudkin/doc2graph/doc2graph/src/data/feature_builder.py", line 3, in <module>
    import spacy
  File "/home/addudkin/miniconda3/envs/doc2graph/lib/python3.9/site-packages/spacy/__init__.py", line 14, in <module>
    from . import pipeline  # noqa: F401
  File "/home/addudkin/miniconda3/envs/doc2graph/lib/python3.9/site-packages/spacy/pipeline/__init__.py", line 1, in <module>
    from .attributeruler import AttributeRuler
  File "/home/addudkin/miniconda3/envs/doc2graph/lib/python3.9/site-packages/spacy/pipeline/attributeruler.py", line 6, in <module>
    from .pipe import Pipe
  File "spacy/pipeline/pipe.pyx", line 8, in init spacy.pipeline.pipe
  File "/home/addudkin/miniconda3/envs/doc2graph/lib/python3.9/site-packages/spacy/training/__init__.py", line 11, in <module>
    from .callbacks import create_copy_from_base_model  # noqa: F401
  File "/home/addudkin/miniconda3/envs/doc2graph/lib/python3.9/site-packages/spacy/training/callbacks.py", line 3, in <module>
    from ..language import Language
  File "/home/addudkin/miniconda3/envs/doc2graph/lib/python3.9/site-packages/spacy/language.py", line 25, in <module>
    from .training.initialize import init_vocab, init_tok2vec
  File "/home/addudkin/miniconda3/envs/doc2graph/lib/python3.9/site-packages/spacy/training/initialize.py", line 14, in <module>
    from .pretrain import get_tok2vec_ref
  File "/home/addudkin/miniconda3/envs/doc2graph/lib/python3.9/site-packages/spacy/training/pretrain.py", line 16, in <module>
    from ..schemas import ConfigSchemaPretrain
  File "/home/addudkin/miniconda3/envs/doc2graph/lib/python3.9/site-packages/spacy/schemas.py", line 216, in <module>
    class TokenPattern(BaseModel):
  File "pydantic/main.py", line 299, in pydantic.main.ModelMetaclass.__new__
  File "pydantic/fields.py", line 411, in pydantic.fields.ModelField.infer
  File "pydantic/fields.py", line 342, in pydantic.fields.ModelField.__init__
  File "pydantic/fields.py", line 451, in pydantic.fields.ModelField.prepare
  File "pydantic/fields.py", line 545, in pydantic.fields.ModelField._type_analysis
  File "pydantic/fields.py", line 550, in pydantic.fields.ModelField._type_analysis
  File "/home/addudkin/miniconda3/envs/doc2graph/lib/python3.9/typing.py", line 852, in __subclasscheck__
    return issubclass(cls, self.__origin__)
TypeError: issubclass() arg 1 must be a class

My packages:
my_requirements.txt

get RuntimeError: Error(s) in loading state_dict for E2E: when doing test on model from training

here's my config for training, i use FUNSD dataset
%run 'main.py' --add-geom --add-embs --add-hist --add-visual --add-eweights --src-data 'FUNSD' --gpu 0 --edge-type 'fully' --node-granularity 'gt' --model 'e2e' --weights *.pt

then i run the best model using this:
%run 'main.py' -addG -addT -addE -addV --gpu 0 --test --weights e2e-20230213-0530.pt

then i got the Error:
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1497, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for E2E:
Unexpected key(s) in state_dict: "projector.modalities.3.0.weight", "projector.modalities.3.0.bias", "projector.modalities.3.1.weight", "projector.modalities.3.1.bias".
size mismatch for projector.modalities.1.0.weight: copying a param with shape torch.Size([300, 4]) from checkpoint, the shape in current model is torch.Size([300, 300]).
size mismatch for projector.modalities.2.0.weight: copying a param with shape torch.Size([300, 300]) from checkpoint, the shape in current model is torch.Size([300, 1448]).
size mismatch for message_passing.linear.weight: copying a param with shape torch.Size([1200, 2400]) from checkpoint, the shape in current model is torch.Size([900, 1800]).
size mismatch for message_passing.linear.bias: copying a param with shape torch.Size([1200]) from checkpoint, the shape in current model is torch.Size([900]).
size mismatch for message_passing.lynorm.weight: copying a param with shape torch.Size([1200]) from checkpoint, the shape in current model is torch.Size([900]).
size mismatch for message_passing.lynorm.bias: copying a param with shape torch.Size([1200]) from checkpoint, the shape in current model is torch.Size([900]).
size mismatch for edge_pred.W1.weight: copying a param with shape torch.Size([300, 2414]) from checkpoint, the shape in current model is torch.Size([300, 1814]).
size mismatch for node_pred.0.weight: copying a param with shape torch.Size([4, 1200]) from checkpoint, the shape in current model is torch.Size([4, 900]).

Pydantic problems with dependency mismatches

I get some conflicts where I can't run the main.py script.


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spacy 3.3.0 requires pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4, but you have pydantic 2.5.3 which is incompatible.
thinc 8.0.17 requires pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4, but you have pydantic 2.5.3 which is incompatible.
doc2graph 0.2.0b0.post7+git.99ac9e69 requires pydantic==1.8.2, but you have pydantic 2.5.3 which is incompatible.

then if I install pydantic 1.8.2

I get


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
dglgo 0.0.2 requires pydantic>=1.9.0, but you have pydantic 1.8.2 which is incompatible.

and back and forth.

[edit]
Here is what I get when I run main.py for set-up.

Traceback (most recent call last):
  File "/Users/z1ggy/projects/forma/doc2graph/src/main.py", line 4, in <module>
    from src.inference import inference
  File "/Users/z1ggy/projects/forma/doc2graph/src/inference.py", line 7, in <module>
    from src.data.feature_builder import FeatureBuilder
  File "/Users/z1ggy/projects/forma/doc2graph/src/data/feature_builder.py", line 3, in <module>
    import spacy
  File "/Users/z1ggy/anaconda3/envs/doc2graph/lib/python3.9/site-packages/spacy/__init__.py", line 14, in <module>
    from . import pipeline  # noqa: F401
  File "/Users/z1ggy/anaconda3/envs/doc2graph/lib/python3.9/site-packages/spacy/pipeline/__init__.py", line 1, in <module>
    from .attributeruler import AttributeRuler
  File "/Users/z1ggy/anaconda3/envs/doc2graph/lib/python3.9/site-packages/spacy/pipeline/attributeruler.py", line 6, in <module>
    from .pipe import Pipe
  File "spacy/pipeline/pipe.pyx", line 8, in init spacy.pipeline.pipe
  File "/Users/z1ggy/anaconda3/envs/doc2graph/lib/python3.9/site-packages/spacy/training/__init__.py", line 11, in <module>
    from .callbacks import create_copy_from_base_model  # noqa: F401
  File "/Users/z1ggy/anaconda3/envs/doc2graph/lib/python3.9/site-packages/spacy/training/callbacks.py", line 3, in <module>
    from ..language import Language
  File "/Users/z1ggy/anaconda3/envs/doc2graph/lib/python3.9/site-packages/spacy/language.py", line 25, in <module>
    from .training.initialize import init_vocab, init_tok2vec
  File "/Users/z1ggy/anaconda3/envs/doc2graph/lib/python3.9/site-packages/spacy/training/initialize.py", line 14, in <module>
    from .pretrain import get_tok2vec_ref
  File "/Users/z1ggy/anaconda3/envs/doc2graph/lib/python3.9/site-packages/spacy/training/pretrain.py", line 16, in <module>
    from ..schemas import ConfigSchemaPretrain
  File "/Users/z1ggy/anaconda3/envs/doc2graph/lib/python3.9/site-packages/spacy/schemas.py", line 216, in <module>
    class TokenPattern(BaseModel):
  File "/Users/z1ggy/anaconda3/envs/doc2graph/lib/python3.9/site-packages/pydantic/main.py", line 299, in __new__
    fields[ann_name] = ModelField.infer(
  File "/Users/z1ggy/anaconda3/envs/doc2graph/lib/python3.9/site-packages/pydantic/fields.py", line 411, in infer
    return cls(
  File "/Users/z1ggy/anaconda3/envs/doc2graph/lib/python3.9/site-packages/pydantic/fields.py", line 342, in __init__
    self.prepare()
  File "/Users/z1ggy/anaconda3/envs/doc2graph/lib/python3.9/site-packages/pydantic/fields.py", line 451, in prepare
    self._type_analysis()
  File "/Users/z1ggy/anaconda3/envs/doc2graph/lib/python3.9/site-packages/pydantic/fields.py", line 545, in _type_analysis
    self._type_analysis()
  File "/Users/z1ggy/anaconda3/envs/doc2graph/lib/python3.9/site-packages/pydantic/fields.py", line 550, in _type_analysis
    if issubclass(origin, Tuple):  # type: ignore
  File "/Users/z1ggy/anaconda3/envs/doc2graph/lib/python3.9/typing.py", line 852, in __subclasscheck__
    return issubclass(cls, self.__origin__)
TypeError: issubclass() arg 1 must be a class

any protips here?

Anyone successfully had this run on M1 Mac CPU?

As the title says πŸ‘

Curious what I need to change as I keep running into run-time errors:

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

RuntimeError: weight tensor should be defined either for all 4 classes or no classes but got weight tensor of shape: [3]

When I set train_batch_size=1, I got this error.
It happen in

 n_loss = compute_crossentropy_loss(n_scores.to(device), tg.ndata['label'].to(device)) 
==>  
def compute_crossentropy_loss(scores: torch.Tensor, labels: torch.Tensor):
    w = class_weight.compute_class_weight(class_weight='balanced', classes=np.unique(labels.cpu().numpy()),
                                          y=labels.cpu().numpy())
    return torch.nn.CrossEntropyLoss(weight=torch.tensor(w, dtype=torch.float32).to('cuda:0'))(scores, labels)

RuntimeError: weight tensor should be defined either for all 4 classes or no classes but got weight tensor of shape: [3]

I think it may be because there are only three labels in one image that caused this error. Could you tell me the reason and how to fix it? Thank you so much.

e2e-funsd-best.pt Error(s) in loading state_dict

`
import torch
from src.models.graphs import SetModel
from src.paths import CHECKPOINTS

sm = SetModel(name='e2e', device=device)
model = sm.get_model(4, 2, chunks, False) # 4 and 2 refers to nodes and edge classes, check paper for details!
model.load_state_dict(torch.load(CHECKPOINTS / 'e2e-funsd-best.pt', map_location=torch.device('cpu'))) # load pretrained model
model.eval() # set the model for inference only
`

MODEL

-> Using E2E
-> Total params: 7674914
-> Device: False


RuntimeError Traceback (most recent call last)
Cell In[19], line 7
5 sm = SetModel(name='e2e', device=device)
6 model = sm.get_model(4, 2, chunks, False) # 4 and 2 refers to nodes and edge classes, check paper for details!
----> 7 model.load_state_dict(torch.load(CHECKPOINTS / 'e2e-funsd-best.pt', map_location=torch.device('cpu'))) # load pretrained model
8 model.eval() # set the model for inference only

File /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/torch/nn/modules/module.py:1671, in Module.load_state_dict(self, state_dict, strict)
1666 error_msgs.insert(
1667 0, 'Missing key(s) in state_dict: {}. '.format(
1668 ', '.join('"{}"'.format(k) for k in missing_keys)))
1670 if len(error_msgs) > 0:
-> 1671 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
1672 self.class.name, "\n\t".join(error_msgs)))
1673 return _IncompatibleKeys(missing_keys, unexpected_keys)

RuntimeError: Error(s) in loading state_dict for E2E:
Missing key(s) in state_dict: "projector.modalities.3.0.weight", "projector.modalities.3.0.bias", "projector.modalities.3.1.weight", "projector.modalities.3.1.bias", "projector.modalities.4.0.weight", "projector.modalities.4.0.bias", "projector.modalities.4.1.weight", "projector.modalities.4.1.bias", "projector.modalities.5.0.weight", "projector.modalities.5.0.bias", "projector.modalities.5.1.weight", "projector.modalities.5.1.bias".
size mismatch for projector.modalities.0.0.weight: copying a param with shape torch.Size([300, 4]) from checkpoint, the shape in current model is torch.Size([300, 0]).
size mismatch for projector.modalities.1.0.weight: copying a param with shape torch.Size([300, 300]) from checkpoint, the shape in current model is torch.Size([300, 0]).
size mismatch for projector.modalities.2.0.weight: copying a param with shape torch.Size([300, 1448]) from checkpoint, the shape in current model is torch.Size([300, 0]).
size mismatch for message_passing.linear.weight: copying a param with shape torch.Size([900, 1800]) from checkpoint, the shape in current model is torch.Size([1800, 3600]).
size mismatch for message_passing.linear.bias: copying a param with shape torch.Size([900]) from checkpoint, the shape in current model is torch.Size([1800]).
size mismatch for message_passing.lynorm.weight: copying a param with shape torch.Size([900]) from checkpoint, the shape in current model is torch.Size([1800]).
size mismatch for message_passing.lynorm.bias: copying a param with shape torch.Size([900]) from checkpoint, the shape in current model is torch.Size([1800]).
size mismatch for edge_pred.W1.weight: copying a param with shape torch.Size([300, 1814]) from checkpoint, the shape in current model is torch.Size([300, 3614]).
size mismatch for node_pred.0.weight: copying a param with shape torch.Size([4, 900]) from checkpoint, the shape in current model is torch.Size([4, 1800]).

dont know where the output /prediciton is saved..

python src/main.py -addG -addT -addE -addV --gpu 0 --test --weights e2e-funsd-best.pt

using this command i got BEST and AVERAGE results for the model , but i dont know where the output/prediciton is saved..

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.