Giter Site home page Giter Site logo

Incremental Reading of PDFs about org-fc HOT 4 CLOSED

l3kn avatar l3kn commented on June 1, 2024
Incremental Reading of PDFs

from org-fc.

Comments (4)

l3kn avatar l3kn commented on June 1, 2024

So far I've only tried IR with websites imported using https://github.com/alphapapa/org-web-tools,
I'll check out org-noter and report back.

from org-fc.

l3kn avatar l3kn commented on June 1, 2024

I think a good first step would be to write

  1. a function that extracts only annotations from a pdf
  2. a function that extracts a list of existing annotations (page top) from an org file,
    based on NOTER_PAGE properties.

One problem is that there is no way to determine whether a heading belongs to a note or an annotation once its "Highlight on page n" title has been changed.

A workaround would be adding a NOTER_TYPE: note/highlight property to each headline.

In your issue on org-noter, you mention precise locations. Do these include enough information to tell apart notes from annotations?
If not, a good implementation of this features would require some changes on the side of org-noter.

from org-fc.

nanjigen avatar nanjigen commented on June 1, 2024

Thanks for getting back to me

I'm using org-pdftools, which generates links with the following format: pdftools:~/Documents/PDF/roberts2013.pdf::3++0.7294236666666667;;annot-3-5 and is part of a key pair with NOTER_PAGE in the property drawer under the given heading for that extraction. Maybe we can hook into that annot bit of text? That text property is used for normal notes but with an ID instead of a link.

To reiterate for others who maybe reading this, org-noter actually provides a helm(like?) selection menu via org-noter-create-skeleton, which gives you an option to extract the outline of the PDF (if it has chapter/sub headings in its metadata), annotations or links. If you select annotations you are fed to the next selection mini-buffer whereby you can select the specific type of annotation, such as highlight or strikethrough (you can select all of these as well), then you are given the option to simply create headers in the noter file with links to the positions, or to extract the content of the annotations themselves into their headers. This is what I use. Unfortunately this extracts all of the highlights in the file, with no regard to what may have been already created in the noter file in previous extractions. These links are generated with @fuxialexander 's pdf-notes-booster branch of org-noter - I believe @wierdnox uses these more precise locations (precise as in the actual position within a page of an annotation, such that you can click a pdftools link and it will take you to the page and 'select' the highlight referenced in the link).

So that machinery is there, its just dumb to our use case. However What they seem to be doing over at Supermemo for their recent PDF IR addon is to manually extract a highlight directly into Supermemo while they are reading with a chord like C-x or something (they actually do this with images too, its all really seemless), seeing as pdftools requires you to press a chord to highlight a selected bit of text anyway, maybe we can create a function and assign it a shortcut whereby it highlights (in a different colour?) and extracts to the org-noter file with the necessary formatting in place for it to be used in drilling by org-fc in one fell swoop? That way we know what has already been extracted, its less complicated than trying to build a smarter org-noter-create-skeleton that is aware of previous extractions based on links or IDs (though that would be maybe a cool long term addition).

from org-fc.

l3kn avatar l3kn commented on June 1, 2024

Closing, as this is beyond the scope of org-fc. I think an extension package is a better solution.

from org-fc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.