Giter Site home page Giter Site logo

Comments (7)

0xabu avatar 0xabu commented on August 17, 2024

Thanks for the report. Could you upload or link to a sample PDF that demonstrates these issues?

Just guessing, but is it possible you accidentally made two overlapping highlights?

from pdfannots.

Chris-mik avatar Chris-mik commented on August 17, 2024

Thank you for the immediate response! Your guessing is totally validated! It seems that some pdf readers (I am using readers on iPad) when the user highlights text they add a comment as well of the highlighted text. I am not sure whether I am stating this right because I have limited coding/software development skills! Also, I am not sure which one of the readers mentioned previously leads to this issue but I have re-checked with adobe in a plain pdf and seems that the "double highlights" is resolved.

I do have some other questions regarding the format of the output (i.e. which symbols refer to comments/highlights, which are the appropriate values used in each one of the arguments of the function, does the "no-group" property have any dependency with other arguments, etc), let me know whether I should open a new thread for them.

Thanks again for your prompt reply!

from pdfannots.

0xabu avatar 0xabu commented on August 17, 2024

I'm happy to try to improve the docs or --help output, if you have specific feedback.

--no-group just affects markdown output format, and concretely it selects between the flat MarkdownPrinter and the GroupedMarkdownPrinter implementations

from pdfannots.

Chris-mik avatar Chris-mik commented on August 17, 2024

Thank you, I really appreciate it! Please find below some questions:

  • what is the appropriate syntax for the argument sections? It may be more helpful (esp. for novice user) to provide an example of the default syntax and values for each argument that can take various values.
  • what does each symbol (i.e. >,",--) included in the md output mean about the type of annotation extracted? For example, maybe adding a list associating symbols with type in the help message?
  • I have added the "no-group" property in the function and I get back some errors (see below) and a blank md output:
    Traceback (most recent call last):
    File "/Users/.../pdfannots-master-2/pdfannots.py", line 10, in
    sys.exit(main())
    File "/.../pdfannots-master-2/pdfannots/cli.py", line 129, in main
    printer = (GroupedMarkdownPrinter if args.group else MarkdownPrinter)(**mdargs)
    TypeError: init() got an unexpected keyword argument 'sections'

The syntax of the call function used in this case is the following: python3 pdfannots.py "file.pdf" -o notes.md --print-filename --no-group -p

Finally, some thoughts on extensions:

  • When a highlight has a comment then put comment text before the highlight. Currently the ordering is first the highlighted text and then follows the comment text.
  • Have each highlight in bullet/number list.
  • Get an even more condensed output where there are no blank lines separating each annotation.
    I guess some of these extensions could be implemented using other apps to format the md output, but maybe it would be helpful to have an all-in-one program!

Thanks again for this program! It already has an impact on the way I am taking notes from papers and books!

from pdfannots.

0xabu avatar 0xabu commented on August 17, 2024

what is the appropriate syntax for the argument sections

The default sections are documented in the README: highlights, comments, nits in that order. Passing --sections allows you to ignore or reorder the output, e.g.: ignore highlights (--sections comments nits), or place them last (--sections comments nits highlights).

what does each symbol (i.e. >,",--) included in the md output mean about the type of annotation extracted

> is a standard Markdown format blockquote.
" is just an inline quote (not actual markdown, just typical English usage)
-- is likewise just approximating an em-dash to separate a short quote from a comment

The latter two are avoided if you pass --no-condense.

I have added the "no-group" property in the function and I get back some errors

Thanks for reporting. I just fixed this.

When a highlight has a comment then put comment text before the highlight.

... I guess this could be implemented as an option, but I'm not sure it's logical.

Have each highlight in bullet/number list.

Isn't that the current format? Or, you want numbers rather than bullets?

Get an even more condensed output where there are no blank lines separating each annotation.

This is just my attempt at making the markdown format readable -- the blanks are necessary to separate quotes, so putting them between bullets is also IMO helpful for normal editable output. If you want to make it pretty for reading, I'd suggest you convert to HTML and style as desired.

I'll try to improve the docs (or you are welcome to submit a PR!), but for now I'm going to close this issue.

from pdfannots.

Chris-mik avatar Chris-mik commented on August 17, 2024

Thank you for the detailed response!
I would very much like to help with the documentation, so please let me know how I can contribute.
I have downloaded the folder including your last update, run an extraction from a pdf and I get back the following errors:

  File "/Users/.../pdfannots.py", line 10, in <module>
    sys.exit(main())
  File "/Users/.../pdfannots/cli.py", line 141, in main
    doc = process_file(
  File "/Users/.../pdfannots/__init__.py", line 448, in process_file
    annot = _mkannotation(pa.resolve(), page)
  File "/Users/.../pdfannots/__init__.py", line 46, in _mkannotation
    subtype = pa.get('Subtype')
AttributeError: 'NoneType' object has no attribute 'get'

The call function used is: python3 pdfannots.py "file_2017.pdf" -o notes.md --print-filename -p
The md output is blank.

from pdfannots.

0xabu avatar 0xabu commented on August 17, 2024

For how to "help with documentation", feel free to:

  • submit PRs against the README
  • submit PRs to improve the --help output (look in cli.py - it should be fairly obvious how the strings there translate to help output)

from pdfannots.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.