Giter Site home page Giter Site logo

Document metadata/XMP access? about pdfpig HOT 7 CLOSED

uglytoad avatar uglytoad commented on August 14, 2024
Document metadata/XMP access?

from pdfpig.

Comments (7)

EliotJones avatar EliotJones commented on August 14, 2024

I've added a method to PdfDocument to get the wrapped metadata stream, this gives you access to both the raw pdf token and the decoded stream data (where any encryption or filters are un-applied).

The method is PdfDocument.TryGetXmpMetadata(out XmpMetadata metadata) and there is an example of usage here: https://github.com/UglyToad/PdfPig/blob/master/src/UglyToad.PdfPig.Tests/Integration/LaTexTests.cs#L130

I'd be interested in feedback/ways to improve this because I'm not familiar with typical use-cases for this data so if it could be easier to consume I'd be happy to make those changes.

I've attached a NuGet package with this change in.

PdfPig.0.0.7.57.zip

from pdfpig.

Numpsy avatar Numpsy commented on August 14, 2024

Thanks, I'll give it a test later.

I was looking at it because one of the projects I work on can use XMP to store custom meta data in various types of files (in the case of PDF, you can use the XMP data instead of or on top of custom entries in the document information).
As far as APIs go, the lib that uses is built on the Adobe (C++) XMP SDK, and I was just testing with the C# port of that. That lib can use both a byte[] and an XDocument as an input, so if reading the XDocument directly isnt sufficient those are still useful formats to consume.

from pdfpig.

Numpsy avatar Numpsy commented on August 14, 2024

On a related note though, section 14.3.2 of the (PDF32000) spec says

The metadata framework provides a date stamp for metadata expressed in the  framework.  If  this  date  stamp  is  equal  to  or  later  than  the  document  modification  date  recorded  in  the  document  information  dictionary...

So perhaps it would be useful if the modification date were directly exposed though DocumentInformation in order to check?

from pdfpig.

EliotJones avatar EliotJones commented on August 14, 2024

Good point, I read that in my version as being specific to backwards compatibility for pre 1.4 versions of PDF but I probably misunderstood it. Either way it's useful information to expose.

from pdfpig.

EliotJones avatar EliotJones commented on August 14, 2024

Apologies for the long delay with this but I think the API should now have everything you need, or is there an additional date field on the XMP dictionary that needs to be exposed to perform the check in 14.3.2?

from pdfpig.

Numpsy avatar Numpsy commented on August 14, 2024

Thanks for the changes.
I can get the date fields out of the the Xml/Xmp data directly ok, which is fine for what i'm doing with it.

from pdfpig.

EliotJones avatar EliotJones commented on August 14, 2024

Thanks, I'll close this for now then, let me know if you run into any more missing API in future for any metadata related stuff.

from pdfpig.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.