Giter Site home page Giter Site logo

Comments (19)

kwcckw avatar kwcckw commented on May 30, 2024 1

I'm thinking to use augraphy to further improve the work in old document restoration, since those documents are quite rare and it would be helpful if we are able to produce some synthetic samples to further aid in their model training process.

Example of old document restoration:
https://arxiv.org/pdf/2001.08742.pdf

Do you guys think this is a good idea?

I tried to create some old documents by using different fonts and papers. Here are the current results:

img5
img6
img7
img8
img9
img1
img2
img3
img4

Is this real enough? Or do let me know if you think there's any area of improvement.

One of the constraint right now is the page is in perfect condition, it would be nice if we can create some teared paper effect.

from augraphy.

proofconstruction avatar proofconstruction commented on May 30, 2024 1

I think this should be a "pipeline level" demo where we apply several augmentations to produce a more relatable real life application. So eventually we can have several predefined pipelines (sort of utility class), each with different purpose, for example photocopy machine effect, old document effect, and etc (can't think of more now).

I like this idea, and I think we can use the pipelines for the Reproducing Archetypes work as the predefined ones. Once we get those working well, we can add them to augraphy.default.prebuilt or something, so users can select one of those or the default_augraphy_pipeline containing all augmentations.

from augraphy.

kwcckw avatar kwcckw commented on May 30, 2024

I'm thinking to use augraphy to further improve the work in old document restoration, since those documents are quite rare and it would be helpful if we are able to produce some synthetic samples to further aid in their model training process.

Example of old document restoration:
https://arxiv.org/pdf/2001.08742.pdf

Do you guys think this is a good idea?

from augraphy.

kwcckw avatar kwcckw commented on May 30, 2024

I added a demo here using google colab:
https://colab.research.google.com/drive/1g0hDeneIGcy0sYtIzZSnNmZfesFdu5Sn?usp=sharing

from augraphy.

jboarman avatar jboarman commented on May 30, 2024

Thanks @kwcckw

Can you update that notebook to use gdown?

Once you've compiled all the files into a Google Drive folder, then you will need to file IDs for each file: https://clay-atlas.com/us/blog/2021/07/01/python-en-gdown-package-download-file-google-drive/

If we are sharing more than a few files, then it might be best to ZIP the files together first so that it's a single file to be downloaded and unpacked within the notebook.

from augraphy.

kwcckw avatar kwcckw commented on May 30, 2024

Thanks @kwcckw

Can you update that notebook to use gdown?

Once you've compiled all the files into a Google Drive folder, then you will need to file IDs for each file: https://clay-atlas.com/us/blog/2021/07/01/python-en-gdown-package-download-file-google-drive/

If we are sharing more than a few files, then it might be best to ZIP the files together first so that it's a single file to be downloaded and unpacked within the notebook.

Thanks, i will look into this and apply the changes accordingly.

from augraphy.

kwcckw avatar kwcckw commented on May 30, 2024

I added a new example with gdown, now user will be able to download files and run the same thing from their end. Is there anyone can try this out and let me know if this is run-able?

Link: https://colab.research.google.com/drive/1IDkfmKKajMb2dlchW-rWN6kmrnGk2TWX?usp=sharing

i just tried with another account and with a new copy of notebook, so far there's isn't any issue yet and please let me know if you guys facing any problem to run the notebook.

image

from augraphy.

jboarman avatar jboarman commented on May 30, 2024

Can you remove all dependency on connecting to a personal Google Drive (outside of using gdown I mean)?

For example, we should be able to just use the local folder on the Colab server instead mounting the “My Drive” reference.

from augraphy.

kwcckw avatar kwcckw commented on May 30, 2024

Can you remove all dependency on connecting to a personal Google Drive (outside of using gdown I mean)?

For example, we should be able to just use the local folder on the Colab server instead mounting the “My Drive” reference.

Alright, at this point i'm not aware of such function yet, let me check again.

from augraphy.

kwcckw avatar kwcckw commented on May 30, 2024

Here's the updated notebook which is able to run without connecting to any gdrive, but you still need to login to your google account to run it:
https://colab.research.google.com/drive/1MXNhPQ_37aiPmAOkfKNCDhKnlaPc1zDt?usp=sharing

Also if the code is updated in the pip installation , then we can skip the section to download the zip file too.

from augraphy.

kwcckw avatar kwcckw commented on May 30, 2024

I added new notebook on the example of pipeline for photocopy paper, and I've updated the previous example on old document to use the code directly from the repo. So right now there's 2 examples from my end:

  1. Photocopy paper:
    https://colab.research.google.com/drive/1hLHyyvNi-cRRDOcqtPnZPCw4j25oZMDG?usp=sharing

  2. Old document:
    https://colab.research.google.com/drive/1MXNhPQ_37aiPmAOkfKNCDhKnlaPc1zDt?usp=sharing

from augraphy.

tbrettallen avatar tbrettallen commented on May 30, 2024
  1. Photocopy paper:
    https://colab.research.google.com/drive/1hLHyyvNi-cRRDOcqtPnZPCw4j25oZMDG?usp=sharing

I'm having an issue running this notebook. I've attached the issue I've run into in the image below.
PencilScribblesSubtleNoise

from augraphy.

kwcckw avatar kwcckw commented on May 30, 2024
  1. Photocopy paper:
    https://colab.research.google.com/drive/1hLHyyvNi-cRRDOcqtPnZPCw4j25oZMDG?usp=sharing

I'm having an issue running this notebook. I've attached the issue I've run into in the image below.
PencilScribblesSubtleNoise

Sorry, it should be caused by a typo in the init file. Now i updated the code and tested it. It should be having no issue to run now.

from augraphy.

proofconstruction avatar proofconstruction commented on May 30, 2024

I get a type error running this.
2021-08-27-160614_3840x2160_scrot

from augraphy.

kwcckw avatar kwcckw commented on May 30, 2024

I get a type error running this.
2021-08-27-160614_3840x2160_scrot

Sorry, this notebook is still pending update, since it is using the previous badphotocopy code. I will post here again when it is updated.

from augraphy.

kwcckw avatar kwcckw commented on May 30, 2024

Sorry, this notebook is still pending update, since it is using the previous badphotocopy code. I will post here again when it is updated.

@proofconstruction I just updated the notebook, please try again and let me know if you facing any issue to run the notebook.

from augraphy.

proofconstruction avatar proofconstruction commented on May 30, 2024

These look really good!

Is the "Photocopy machine" code now part of BadPhotoCopy?

The "Synthetic old documents" code can probably become a new augmentation, or maybe a utility class. We should add a new Issue to discuss this.

from augraphy.

kwcckw avatar kwcckw commented on May 30, 2024

These look really good!

Is the "Photocopy machine" code now part of BadPhotoCopy?

The "Synthetic old documents" code can probably become a new augmentation, or maybe a utility class. We should add a new Issue to discuss this.

I think this should be a "pipeline level" demo where we apply several augmentations to produce a more relatable real life application. So eventually we can have several predefined pipelines (sort of utility class), each with different purpose, for example photocopy machine effect, old document effect, and etc (can't think of more now).

from augraphy.

proofconstruction avatar proofconstruction commented on May 30, 2024

Closed with PR #154

from augraphy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.