Giter Site home page Giter Site logo

safedocs's Introduction

Safedocs

GitHub     LinkedIn     Twitter Follow     YouTube Channel Subscribers

Artifacts from the DARPA-funded SafeDocs research program.

Compacted Syntax

The compacted PDF syntax test file and associated matrix is for testing PDF lexical analyzers to ensure they comply with some of the finer points of the PDF standard and to ensure interoperability between implementations. The focus is on non-whitespace token delimiters for different adjacent PDF objects. Read more here.

Miscellaneous Targeted Test PDFs

This folder contains miscellaneous PDF files which have been hand-coded for targeted testing of PDF processors. The *-raw.pdf files have been hand-written in a text editor and have then been repaired using mutool clean to correct stream lengths, xref offsets, etc. to create the non--raw equivalent PDFs. Test with the PDF files that do NOT have -raw in their filenames as these are 100% valid PDFs. The README file in this folder explains each targeted test case.

Unicode passwords

This folder contains PDF test files and documentation related to investigations into the correct handling of Unicode passwords in PDF (i.e. Unicode 3.2 with SASLprep/stringprep RFC algorithms) and Unicode in content (supporting much later versions of Unicode). PDF Unicode password support is required to maintain backwards compatibility with legacy PDF files created under previous PDF specifications. Confusion and bugs in Unicode password handling creates a potential for interoperability issues and malformed PDF files.


This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR001119C0079. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Defense Advanced Research Projects Agency (DARPA). Approved for public release.

safedocs's People

Contributors

matthiasvalvekens avatar petervwyatt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

ae38 frankrem

safedocs's Issues

obj followed by endobj?

Peter - this is really great, by the way. The only assertion I have an issue with in this PDF is this token sequence:

2 0 obj endobj

I don't think this meets the requirements of the spec: here's the text (identical in both PDF1.4 and ISO32K2:2020)

The definition of an indirect object in a PDF file shall consist of its object number and generation number (separated by white-space), followed by the value of the object bracketed between the keywords obj and endobj.

The "value of the object" is missing, so I'm not sure that's a valid sequence of tokens.

Inline image Length is incorrect

ISO 32000-2:2020 clause 8.9.7 says:

The value of the Length (or L) key, which shall be present on all inline images, is the length of the data between the ID and EI operators excluding the white-space delimiting those operators. The value of the Length key should not exceed 4096 bytes.
[…]
NOTE 2 The L key permits PDF processors to efficiently skip inline images if they do not need to display them. To skip an image a processor can advance beyond the single white-space character following the ID operator, then if the final or only filter is ASCIIHexDecode or ASCII85Decode skip any further white-space. The number of characters expressed by the L key is then skipped, and the EI operator is expected following optional white-space.

The value of the Length key for the inline image data has been calculated ignoring the line-ends within the image data, which is not correct. Length should be 1276 instead of 1240 according to a hex editor. That would start after the white space after ID and end after the > delimiter, immediately before the white space character before the EI.

InlineAbbreviations.pdf

Hi!

I downloaded InlineAbbreviations.pdf and it does not open regardless if I try with Mac Preview or Adobe Reader
although github does render it correctly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.