Giter Site home page Giter Site logo

pdf-issues's People

Contributors

duffjohnson avatar lrosenthol avatar petervwyatt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

optimusmaxx

pdf-issues's Issues

Table 317 3D background - confusion over CS and C keys

Table 317 — Entries in a 3D background dictionary

3D support in PDF is currently heavily centred around RGB-based colour spaces - see ColorSpace key in "Table 311 — Entries in a 3D stream dictionary" and many places in clause 13.6 and related sub-clauses. Trying to future-proof things for other colour families for some future PDF version is fraught with potential issues and creates confusion now. Of course, the future is also highly unlikely to support Patterns, named colours or other advanced PDF colour spaces!

Lack of clarity for CS and C keys with confusing statements about a possible future:

The CS key is a "name or array" yet the description explicitly constrains values to only DeviceRGB (name) by an explicit shall statement: "The only valid value shall be the name DeviceRGB."

It then goes on to state "PDF consumers shall be prepared to encounter other values that may be supported in future versions of PDF.". Clearly, the intention is that the array format might be one of the other PDF colour spaces, but this is no different to handling future PDF features anywhere else in PDF as described in "Annex I (normative) PDF versions and compatibility". So this second explicit "shall" makes no sense with the other explicit "shall" statement.

  • Change CS Type to just "name" (delete "or array").
  • Delete the confusing "PDF consumers ..." sentence.

Correspondingly the C key is typed as "(various)" with a description that doesn't really help, except for the DeviceRGB case (which is the only valid current color space as mandated by the CS "shall" statement!). It needs to be reworded to a similar statement to the C key from "Table 166 — Entries common to all annotation dictionaries":

"An array of 3 numbers in the range 0.0 to 1.0, representing the background colour in the colour space defined by the CS key".

If/when 3D ever supports other colour spaces, this would just be one place of many that need careful review and updating.

Table 132 Default key description has incorrectly worded requirement

"Table 132 — Entries in a Type 5 halftone dictionary" Default key description says "The value shall not be 5." but it is of type "dictionary or stream". not a number or integer.

It should really say what is said for the "any colorant name" key description in Table 132: "The halftone may be of any Type other than 5."

Missing info on expected behaviour for unrecognised annots

Describe the bug
Section 12.5.1, Paragraph 4:

An interactive PDF processor shall provide certain expected behaviour for all annotation types that it does not recognise, as documented in 12.5.2, "Annotation dictionaries".

However, there is nothing in 12.5.2 (or anywhere else that I can find) that actually tells a processor (interactive or otherwise) what to do in the case of an unknown annotation Subtype.

Recommend that some text be written to define the behavior - or at least to agree that it is undefined.

Does annot dictionary BM key apply to 3D artwork backgrounds?

Clause "13.6.4.3 3D background dictionaries" gives no clarity of if/when the BM key that is in all annotation dictionaries (see Table 166 — Entries common to all annotation dictionaries) is used.

3D annots have two situations - when an appearance stream is used (such as when print rendering or in a non-3D-capable processor) and when the 3D artwork is invoked (such as in an interactive viewer that is 3D aware). 13.6.4.3 also states "In effect, the 3D artwork and its background form a transparency group whose flattened results have an opacity of 1 (see 11, "Transparency")." but this does not mention the BM key...

Table 166 defines BM as "The blend mode that shall be used when painting the annotation onto the page .." so does it also apply when using the 3D artwork???

Table 5 confusion if Length is optional when F key is present

In ISO 32000-2:2020, "Table 5 — Entries common to all stream dictionaries" states that Length is a required key.

The F key is then described as "(Optional; PDF 1.2) The file containing the stream data. If this entry is present, the bytes between stream and endstream shall be ignored. However, the Length entry should still specify the number of those bytes (usually, there are no bytes and Length is 0). The filters that are applied to the file data shall be specified by FFilter and the filter parameters shall be specified by FDecodeParms."

In this text it says Length "should still specify..." but if Length is really mandatory then it "shall specify...".
Or Length is optional when F is present.

I personally think /Length is always required so the correct fix is "should" to "shall" in the description of F.

Make requirement for AP in annotation dictionaries more obvious

The Value cell for AP in table 166 explicitly says "(Optional; PDF 1.2)", but then goes on to say that "Every annotation [...], except for the two cases listed below, shall have at least one appearance dictionary."

I recommend changing the initial parenthetical text to "(Usually required; PDF 1.2)".

Are Crypt filter Recipient strings always byte strings?

"Table 27 — Additional crypt filter dictionary entries for public-key security handlers" Recipients key is described as being of type "string or array".

The array case is then clearly defined to be a byte string: "If the crypt filter is referenced from StmF or StrF in the encryption dictionary, this entry shall be an array of byte strings, where each string shall be a binary-encoded CMS object that shall ...".
An improvement would be to change "where each string ..." to "where each byte string ..."

However for the string case, it just says "... this entry shall be a string that shall be a binary-encoded CMS object that shall contain a list of all recipients ..."

Question: is this also required to be a byte string or is just string OK?

File encryption key derivation for public key encryption: EncryptMetadata flag

Clause 7.6.5.3 describes the key derivation procedure for PubKey handlers as follows:

These operations digest the following data, in order:
a) The 20 bytes of seed.
b) The bytes of each item in the Recipients array of CMS objects in the order in which they appear in the array.
c) 4 bytes with the value 0xFF if the key being generated is intended for use in document-level encryption and the document metadata is being left as plaintext.
d) The first n/8 bytes of the resulting digest shall be used as the file encryption key, where n is the bit length of the file encryption key.

The step I'm confused about is (c).

  • Does the language "for use in document-level encryption" mean that this step would not apply if the procedure is run by a crypt filter that is only used to encrypt embedded files? More generally: is this step supposed to be skipped if invoked from a crypt filter that isn't used as as a value for StrF or StmF in the document-wide encryption settings?
  • When using encryption handlers with SubFilter set to adbe.pkcs7.s4 or adbe.pkcs7.s3, crypt filters are not supported. In that case, how do we know whether metadata is supposed to be encrypted? The EncryptMetadata entry in the document-wide encryption dictionary is only defined for the standard security handler, after all. Is this an oversight? If not, should we assume that EncryptMetadata is true for the purposes of key derivation unless a crypt filter dictionary says otherwise?

Add a `PV` entry in Table 322.

This entry has been undocumented since PDF Reference 1.7 era, but it has always been supported by commercial pdf products like Acrobat and so on. I'd like to propose formalization this entry into the standard for completeness and consistency.

Proposed entry:
PV

Proposed description text:

(Optional) A flag indicating the visibility of of the cutting plane. If true, then the cutting plane shall be visible. If false, then the cutting plane shall not be visible.

Default value: true

Is it permissible to use SubFilter adbe7.pkcs.s4 with a security handler of version 5?

There is language in clause 7.6.5.2 indicating that adbe7.pkcs.s5 shall be used when crypt filters are used in a public key security handler. Clause 7.6.4.1 also states that when a security handler of version 4 or 5 is specified, the standard reader shall support crypt filters.

A literal reading of this requirement doesn't appear to forbid using adbe7.pkcs.s4 as the SubFilter entry for a V5 public key security handler (e.g. in case you'd want to encrypt a document using AES-256 without bothering with crypt filters). Nonetheless, apparently Acrobat refuses to decrypt such files.

Was the intention here to make crypt filters effectively mandatory for version 4 and 5? In V4 they'd be effectively required anyway (to distinguish between RC4 and AES-128), but that doesn't apply to V5. If V4 and V5 require crypt filters to be used, adding a sentence to clause 7.6.5.2 to that effect could be helpful.

I've attached a couple of example files testing different combinations of security handler versions and subfilter values. As indicated in the specification, the S5 files use crypt filters, while the S4 files do not. Acrobat opens all of them except the V5-S4 one.

aes-tests-V2-S5.pdf
aes-tests-V5-S5.pdf
aes-tests-V2-S4.pdf
aes-tests-V5-S4.pdf

GitHub won't allow me to attach it, but the key material can be found here: https://github.com/MatthiasValvekens/pyHanko/tree/master/pyhanko_tests/data/crypto (the relevant files are named selfsigned.*; you can either grab them in PKCS#12 format or as a straight PEM dump of the certificate + key)

Permissible public-key encryption schemes

Clause 7.6.5.3 mandates the following:

A key shall be used to encrypt (and decrypt) the enveloped data. This key (the plaintext key in "Figure 4 — Public-key encryption algorithm") shall be encrypted for each recipient, using that recipient’s public key, and shall be stored in the CMS object (as the encrypted key for each recipient). To decrypt the document, that key shall be decrypted using the recipient’s private key, which yields a decrypted (plaintext) key.

The next paragraph includes provisions on the (symmetric) ciphers that can be used to encrypt the envelope contents (i.e. 20-byte seed + permissions):

The algorithms that shall be used to encrypt the enveloped data in the CMS object are:

  • RC4 with key lengths up to 256-bits (deprecated);
  • DES, Triple DES, RC2 with key lengths up to 128 bits (deprecated);
  • 128-bit AES in Cipher Block Chaining (CBC) mode (deprecated);
  • 192-bit AES in CBC mode (deprecated);
  • 256-bit AES in CBC mode.

However, there is nothing in the clause restricting which public-key encryption schemes and key lengths are permissible to encrypt the plaintext key.

Even if we take "public-key encryption" to mean "RSA", there's still the issue of padding schemes. Obviously, classic RSA with PCKS#1 v1.5 padding probably works with virtually every implementation, but what about RSA-OAEP? The latter is a more modern parametrised scheme (RSA-OAEP is to encryption what RSA-PSS is to signing, essentially), and is not as widely supported.

I wouldn't necessarily oppose leaving this up to the implementation, but it feels a bit strange to me to constrain the enveloped data encryption to a well-defined list of ciphers, while at the same time not restricting the ways in which the envelope key can be encrypted for each recipient.

Mention of AES-256 in ECB mode

Clauses 7.6.4.3.3 and 7.6.4.4.9 state that the Perms entry shall be computed using AES-256 in ECB mode. More precisely:

[...] Encrypt the 16-byte block using AES-256 in ECB mode with an initialization vector of zero, using the file encryption key as the key.

I believe this is a typo, since initialisation vectors don't make sense for block ciphers operating in ECB mode, and the specification consistently uses CBC mode elsewhere. This includes cases where the initialisation vector is mandated to be zero.

Note: for encrypting a single block, ECB mode is equivalent to CBC mode with an IV of zero, so this issue is unlikely to lead to errors in implementation. Nonetheless, it's a little confusing.

Proposed solution: either strike the words "with an initialization vector of zero", or replace "ECB" with "CBC" in both instances.

In the former case, explaining that both produce the same result in a note might also be useful, since some cryptographic libraries don't expose ECB on account of its obvious issues with repeating patterns (see here for an example).

Clarify ColorSpace for 3D Data Streams

Table 311, entry ColorSpace

The RGB colour space in which the 3D artwork’s colour values are encoded. Valid values are the name DeviceRGB, an array specifying a valid CalRGB color space (see 8.6.5.3 "CalRGB colour spaces"), or an array specifying a valid RGB-based ICCBased color space (see 8.6.5.5 "ICCBased colour spaces"). If this key is not present, the colour space for the 3D artwork colour values are considered undefined and a PDF processor may choose any appropriate RGB-based colour space, such as sRGB.

It is not clear from other parts of the text whether a DefaultRGB present on the page where the Annotation referencing the stream is used, should be used "in place" of the DeviceRGB (as would be the case for other uses of DeviceRGB). I believe that it should.

Cannot represent link semantics for content that spans pages

Clause 14.8.4.7.2 General inline level structure types, 2nd bullet below Table 368/Note 1 says:

"One object reference (see 14.7.5.3, "PDF objects as content items") to one link annotation associated with the content"

This restriction (new to PDF 2.0) is too strong; it doesn't allow for multiple link annotations tagged with a single element as would be necessary to fully represent semantics for content that spans pages.

Whether page Contents array elements are each a stream in own right or only when combined

There is a contradiction between the Contents entry in Table 31, which says that an array value for Contents in a page object is only a single content stream, built from multiple streams.

"(Optional) A content stream (see 7.8.2, "Content streams") that shall describe the contents of this page. If this entry is absent, the page shall be empty. The value shall be either a single stream or an array of streams. If the value is an array, the effect shall be as if all of the streams in the array were concatenated with at least one white-space character added between the streams’ data, in order, to form a single stream. PDF writers can create image objects and other resources as they occur, even though they interrupt the content stream. The division between streams may occur only at the boundaries between lexical tokens (see 7.2, "Lexical conventions") but shall be unrelated to the page’s logical content or organisation. Applications that consume or produce PDF files need not preserve the existing structure of the Contents array. PDF writers shall not create a Contents array containing no elements."

And clause 7.8.3 Resource dictionaries, which says that each item in the array is a content stream in its own right:

"For a content stream that is the value of a page’s Contents entry (or is an element of an array that is the value of that entry), ..."

Removing the text in parentheses in clause 7.8.3 would remove this contradiction.

Typos in Table 404 for Trap network annotations

Trap network annotations are deprecated and the error goes back the PDF 1.3 Reference, but the following should probably be corrected nevertheless. Table 404 "Additional entries specific to a trap network appearance stream" contains the following obvious typos:

"Valid values are DeviceGray, DeviceRGB, DeviceCMYK, DeviceCMY, DeviceRGBK, and DeviceN."

This should probably read:

"Valid values are DeviceGray, DeviceRGB, DeviceCMYK, and DeviceN."

Tagged PDF - should PDF 2.0 Artifacts be ignored for child ordering rules?

In PDF2, Artifacts can be added to the document Structure Tree at any point in the tree. There are rules about tag ordering in section 14.8.4 which don't take this into account, and I'm not sure if this is an oversight (as this didn't apply in PDF1.7) or by design.

Specifically: a Caption has to be "the first or last structure element inside its parent structure element". If a Caption is preceded by an Artifact StructureElement in the tree (as it might well be if that Artifact was used to wrap the draw operations for the table background or border, for example) it's going to fail this rule.

Obviously in PDF1 this wasn't an issue as Artifacts were never part of the tree. Given that Artifacts represent "not real content" and can pop up anywhere (largely depending on the technical requirements of the tool creating the document), it seems to me their position relative to anything else shouldn't really matter; they should be transparent to any sort of restrictions on ordering of children. I am pretty certain the only time this restriction occurs is this rule for the Caption element.

I'll be sure to raise this in the PDF/UA-2 WG too, but as the restriction comes from the wording in ISO32K it really needs to be resolved there too. If it's intentional, it's not an insurmountable problem I'm sure, but wanted to flag it up in case it slipped through.

Non-namespaced elements are allowed but this is unclear

Clause 14.8.6.1 Namespaces for standard structure types and attributes

There is a gap in ISO 32000-2:2020 that allows undefined element types. Need to make it clearer in future editions that we still allow non-namespaced elements. But they need to be roll-mapped to a known type.

Clarify requirement for AP in annotation dictionaries

The Value cell for AP in table 166 explicitly says "(Optional; PDF 1.2)", but then goes on to say that "Every annotation [...], except for the two cases listed below, shall have at least one appearance dictionary."

I recommend changing the initial parenthetical text to something that make people realise that AP is not simply optional and that they need to read the full text; perhaps "(Usually required; PDF 1.2)".

Several integer key values do not state explicit valid ranges

Several integer keys in dictionaries do not state any explicit valid ranges, such as "positive integer ..." or "non-negative integer ...".

One way to fix this quickly may be to simply state once up the front of ISO 32K somewhere (where?) that key values that represent counts, sizes, widths, heights, file byte offsets, object numbers, page numbers and <anything else that is common?> are non-negative unless stated otherwise.

Or we could review each and add the explicit wording in place.

Here is an incomplete list (from a search of ISO 32000-2 for "integer" up to about Table 100 - more to be added later):

  • Table 11 Columns: unstated non-negative
  • Table 11 Rows: unstated non-negative
  • Table 11 DamagedRowsBeforeError: unstated non-negative
  • Table 15: Size: unstated positive
  • Table 15: Prev: unstated non-negative or positive??
  • Table 16: N: unstated non-negative / positive??
  • Table 16: First: unstated non-negative / positive??
  • Table 17: Size: unstated > 1 (since has to be +1 on highest object# and PDFs need at least 1 object (technically more!))
  • Table 17: Prev: unstated non-negative or positive??
  • Table 19: XRefStm: unstated non-negative (just "byte offset")
  • Table 30: Count: unstated non-negative or positive?? (just "count of leaf nodes" - does it include Template pages?)
  • Table 31: StructParents: ?? is there a valid range ??
  • Table 45: Size: unstated non-negative
  • Table 87 (Image XObject): Width: unstated non-negative
  • Table 87 (Image XObject): Height: unstated non-negative
  • Table 87 (Image XObject): StructParent ?? is there a valid range ??
  • Table 93 (Form XObject): StructParent ?? is there a valid range ??
  • Table 87 (Form XObject): StructParents ?? is there a valid range ??
  • Table 95 (Reference dict): Page: unstated non-negative

Incorrect use of Bold

In 14.13.5, paragraph 3 it emboldens the phrase "Property List", which is not a formal key and therefore should not be bold.

The simplest change is to make it italic and all lower case, which matches other uses of that phrase.

Clarify white space requirements for inline images using ASCII~ filters

In 8.9.7 Inline images is inconsistent between the text in the 1st para after Table 90: “Unless the image uses ASCIIHexDecode or ASCII85Decode as one of its filters” and in Note 2: “if the final or only filter is ASCIIHexDecode or ASCII85Decode”.

Shouldn’t the first of those also talk about “if the final or only filter” rather than ASCII~ being “one of its filters”? I realise you’d need to have done something very odd if you’re using ASCII~ and something else, and the ASCII~ is not last, but …

9.8.3.3 FD has wrong reference and is unclear which keys are "metric information only"

Clause 9.8.3.3 FD, 2nd paragraph currently states:

The key for each entry in an FD dictionary shall be the name of a class of glyphs — that is, a particular subset of the CIDFont’s character collection. The entry’s value shall be a font descriptor whose contents shall override the font-wide attributes for that class only. This font descriptor shall contain entries for metric information only; it shall not include FontFileFontFile2FontFile3, or any of the entries listed in “Table 120 — Entries common to all font descriptors”.

All the metrics that should go in the FD sub-dictionaries are thus supposedly listed in Table 120. Looking back through old specs, ISO 32000-1:2008 references Table 122, which is also "Entries common to all font descriptors”, but section 5.7.2 of the PDF 1.6 spec (page 433) references Table 5.21, which is "Additional font descriptor entries for CIDFonts”, which makes total sense. Looks like there was a typo in the table number when transitioning the PDF spec to ISO.

Data type of operands for Type 3 glyph operators d0 and d1 is missing

Table 111 "Type 3 font operators" describes the d0 and d1 operators which specify metrics of glyphs in a Type 3 font. wx is described as horizontal displacement in the glyph coordinate system, but its data type is not specified. Is it integer, real or number (i.e. both)?

The data type for wx, wy, llx, lly, urx, ury should be stated explicitly.

Glyph name confusion in Type 3 font example

The example below Table 111 "Type 3 font operators" and Figure 62 "Output from the example" generates two glyphs which are appropriately named /square and /triangle. However, the EXAMPLE text mentions character codes a and b, and the comment within the code similarly talks about Type 3 font definition encoding two glyphs, 'a' and 'b'.

While the glyphs are placed at positions 97/98 in the Encoding which correspond to a and b in WinAnsiEncoding, there's nothing in the Type 3 font which would imply any relationship of those glyphs to the "glyphs" or "character codes" a and b. It just so happens that they occupy the same slot as a and b in some other encoding which is unrelated to the example.

Suggestions:

  • Change ...a filled square and a filled triangle, selected by the character codes a and b. " to
    ...a filled square and a filled triangle at positions 97 and 98 of the font's Encoding.
  • Change the comment Type 3 font definition encoding two glyphs, 'a' and 'b' to
    Type 3 font definition encoding the two glyphs square and triangle.

Table 166: Clarify requirements for annotation appearances

Some clarification is required regarding whether or not an annotation requires an appearance stream /AP. The main source is Table 166 "Entries common to all annotation dictionaries" which states that the following don't need any appearance:

  • zero-size annotations
  • annotations of type Popup, Projection or Link (this somewhat obscures the fact that 12.5.6.14 even disallows /AP for Popups)

But there are additional sources:

  • 12.5.6.18 "Screen annotations": the clause If AP is not present,... implies that /AP is not required for Screen annotations.
  • 12.5.6.24 "Projection annotations":
    A projection annotation with a Rect entry that has zero height or zero width shall not have an AP dictionary.
    This overlaps with the general exemption per Table 166, but uses zero height OR zero width as opposed to AND in Table 166.
  • Table 177 "Additional entries specific to a free text annotation":
    The annotation dictionary’s AP entry, if present, shall take precedence...
    This either implies that /AP is not required for FreeText annotations, or is a wrong left-over. I believe it's the latter.

Suggestions:

  • Change the /AP description in Table 166 from Optional to Required in some cases; see below.

    (UNCHANGED) Every annotation (including those whose Subtype value is Widget, as used for form fields), except for the two cases listed below, shall have at least one appearance dictionary.

    • (UNCHANGED) Annotations where the value of the Rect key consists of an array where the value at index 1 is equal to the value at index 3 and the value at index 2 is equal to the value at index 4.
    • (MODIFIED) Annotations whose Subtype value is Projection, Screen, or Link.

    (NEW) The AP entry is not allowed in the following cases:

    • Annotations whose Subtype value is Popup.
    • Annotations whose Subtype value is Projection and where the value of the Rect key consists of an array where the value at index 1 is equal to the value at index 3 and the value at index 2 is equal to the value at index 4.
  • Table 177: The annotation dictionary’s AP entry, if present, shall take precedence...
    Modify as follows:
    The annotation dictionary’s AP entry shall take precedence...

  • Delete the following phrase in 12.5.6.24 as it duplicates information from Table 166:
    A projection annotation with a Rect entry that has zero height or zero width shall not have an AP dictionary.

Type3 FontDescriptors FontName confusion

ISO 32000-2:2020 "Table 120 — Entries common to all font descriptors" FontName is described as "(Required) The PostScript name of the font. This name shall be the same as the value of BaseFont in the font or CIDFont dictionary that refers to this font descriptor."

Type3 fonts don't have BaseName as defined by Table 110 so therefore the FontDescriptor FontName should be not required for Type3.

Clarification for TR structure type

Section 14.8.4.8.3 - Table 371
Current text reads: "A row of table header cells (TH) or table data cells (TD) in a table."

This could imply that a TR cannot have both TH and TD in it.

Recommended edit:
"A row of table header cells (TH) and/or table data cells (TD) in a table." (Adding "and".)

ICCBased Lab colorspace

This text is from ISO32K2:2020, bottom of page 191:

PDF writers shall only use the profile types shown in "Table 67 — ICC profile types" for specifying calibrated colour spaces for colouring graphics objects. Each of the indicated fields shall have one of the values listed for that field in the second column of the table. Profiles shall satisfy both the criteria shown in the table. The terminology is taken from the ICC specifications.
...
Note 1. XYZ and 16-bit Lab profiles are not listed.

and here's table 67:

Header Field Required Value
deviceClass icSigInputClass ('scnr')
icSigDisplayClass ('mntr')
icSigOutputClass ('prtr')
icSigColorSpaceClass ('spac')
colorSpace icSigGrayData ('GRAY')
icSigRgbData ('RGB ')
icSigCmykData ('CMYK')
icSigLabData ('Lab ')

I've a few minor issues with this.

First, "The terminology is taken from the ICC specifications." - maybe, but there's no "icSigNNN" anywhere in ICC v4.3 or 2.4. Not a big deal, but replacing "icSigDisplayClass ('mntr')" with just "mntr" might be an improvement.

Second, the note explicitly states "16 Bit Lab is not listed" - not listed doesn't really tell us anything. The text was "not supported" in PDF1.7, which I think was a better phrasing.

Third, despite being not listed (or supported), Lab is listed in table 67. It's not clear what's supposed to be allowed - 8 bit Lab only? In fact there's no such thing as "16 bit Lab" in ICC: there's just Lab. It may have an 8-bit table, a 16 bit lookup table or a parametric curve. I've never seen an 8-bit one, but neither 16-bit nor parametric curve Lab profiles are supported in Acrobat.

I think the intent is just to exclude Lab as a colorSpace type, in which case I'd suggest dropping the "Lab" line from Table 67, and changing the Note to "Note 1. in particular, XYZ and Lab profiles are not supported." - although it's a bit redundant given the lack of those entries in table 67.

Lack of clarity in BS entry for link annotations

In Table 176 the BS entry specifies "the line width and dash pattern that shall be used in drawing the annotation’s border". It's pretty obvious what that means when only Rect is present. But what does it mean if QuadPoints is in use? Should the outline of every rectangle be painted?

Correct default value of `IV` entry in Table 322 from `false` to `true`.

In table 322 the default value of IV entry is false. However in existing commercial products like acrobat, the value of this entry has always been interpreted as true if missing. (And it is indeed that true as default value is more natural)

I'm not sure whether it's better to just change the specification in this case, or just claim that every existing implementation has been wrong. Personally i feel the former approach is more practical and helpful.

Clarify whether default colour spaces apply to inline images

Note 3 in "8.9.7 Inline images" says that various colour space names don’t refer to resources in the ColorSpace subdictionary; all well and good. But it starts by saying that they identify the corresponding colour spaces directly, which implies that DefaultGray, DefaultRGB and DefaultCMYK do not apply here.

If that's true then it shouldn't be in a note, but it would be odd to make inline images so different to everything else. I recommend clarifying that default colour spaces do apply here.

I suggest something like "The names DeviceGray, DeviceRGB, and DeviceCMYK (as well as their abbreviations G, RGB, and CMYK) never refer to resources in the ColorSpace subdictionary; they always identify the corresponding colour spaces either directly or via a default color space (see 8.6.5.6 Default colour spaces)."

Table 116 — Predefined CJK CMap names contains only deprecated Korean CMaps

Table 116 — Predefined CJK CMap names

Table 116 in the "Korean" section lists only predefined CMaps that belong to the deprecated "Adobe-Korea1-2" character collection. On the other hand the predefined CMaps belonging to the new "Adobe-KR-9" are not listed in the "Korean" section of Table 116.

The predefined CMaps in the "Korean" section belonging to the deprecated "Adobe-Korea1-2" character collection should be removed from Table 116. The new predefined CMaps "UniAKR-UTF8-H", "UniAKR-UTF16-H" and "UniAKR-UTF32-H" should be added to the "Korean" section of Table 116.

See also The Adobe-KR-9 Character Collection in the README.md file in the Adobe CMap Resources GitHub repository.

Wording around CIEJab is still unclear

I've just had another developer who misunderstood the text in 7.4.9 around using CIEJab for JPXDecode.

It says "Data used in PDF image XObjects shall be limited to the JPX baseline set of features, except for enumerated colour space 19 (CIEJab)." My understanding, supported by the wording in Adobe v1.7, is that CIEJab is not in the baseline set, but is allowed.

That's immediately followed by "In addition, enumerated colour space 12 (CMYK), which is part of JPX but not JPX baseline, shall be supported in a PDF file." I think the difference in the way that CIEJab and CMYK are described contributes to the confusion.

I suggest something like: "Data used in PDF image XObjects shall be limited to the JPX baseline set of features. In addition, enumerated colour spaces 12 (CMYK) and ### (CIEJab), which are part of JPX but not JPX baseline, shall be supported in a PDF file."

Clarity over which of adbe.x509.rsa_sha1, adbe.pkcs7.sha1 and adbe.pkcs7.detached are deprecated in PDF 2.0

Clause 12.7.5.5, Table 237 (under "AddRevInfo") [...] adbe.pkcs7.detached and adbe.pkcs7.sha1 are deprecated in PDF 2.0.
Clause 12.8.3.1, Table 260 [...] values adbe.x509.rsa_sha1 and adbe.pkcs7.sha1 have been deprecated with PDF 2.0.

PeterW: If you search for “adbe.x509.rsa_sha1”, it comes up in:

  • clause 0.3 where it is explicitly stated as deprecated;
  • Table 237 has NOTE 3 which says “NOTE 3 adbe.pkcs7.detached and adbe.pkcs7.sha1 are deprecated in PDF 2.0.” – but not adbe.x509.rsa_sha1;
  • Table 255 SubFilter key which has a last para saying it is deprecated;
  • Table 255 Cert key – which is described as only valid if SubFilter key is adbe.x509.rsa_sha1 and thus arguably you go on to read the SubFilter key in Table 255 where it explicitly says deprecated
  • Table 260 which has table-note (c) which states it is deprecated
    -1st para 12.8.3.2 – which admitted neglects to mention deprecated
  • 12.8.3.3.1, 2nd last bullet which refers back to Table 260 (where it is explicitly stated as deprecated).

If you search for “adbe.pkcs7.sha1”, it comes up in:

  • clause 0.3 where it is explicitly stated as deprecated;
  • Table 237 has NOTE 3 which says “NOTE 3 adbe.pkcs7.detached and adbe.pkcs7.sha1 are deprecated in PDF 2.0.”;
  • Table 255 SubFilter key which has a last para saying it is deprecated;
  • Table 255 Cert key – but this time there is no statement either way…
  • Table 260 which has table-note (c) which states it is deprecated – the (c) should be added to the right-most column to reference it correctly…
  • 12.8.3.3.1 2nd bullet which explicitly states it is deprecated.

If you search for “adbe.pkcs7.detached”, you get the following:

  • It is NOT called out in clause 0.3;
  • It IS marked as deprecated in NOTE 3 of Table 237;
  • It is NOT marked as deprecated in Table 255 SubFilter key;
  • It is NOT marked as deprecated in Table 255 Cert key;
  • It is NOT marked as deprecated in Table 260

Annex Q neglects new BM key for annotations

Clause Q.2 (last para) states the following in regards to annotations:

Since Annotations require an appearance stream which is drawn by a PDF processor on top of the page content, it is possible that their presence may cause a page without any transparency to acquire some transparency. Therefore, all annotations object's in the page dictionary's Annots array shall have their appearance streams processed as a form XObject, according to Q.3, "Form XObjects".

This neglects the new BM key introduced by PDF 2.0 in Table 166 which is described as "The blend mode that shall be used when painting the annotation onto the page ..."

An additional sentence should be added describing the BM key when its value is not Normal.

Default color spaces for annotation C and IC entries

Section 8.6.5.6 - Default color spaces defines the following cases when the default color spaces are applicable as:

A colour space is selected for painting each graphics object. This is either the current colour space parameter in the graphics state or a colour space given as an entry in an image XObject, inline image, or shading dictionary.

On the other hand, we have C entry in the annotation dictionary (Table 166) specifying a number of cases when Device colors are used to define the appearance of the annotation. Currently there is no mechanism available to remap these device colors to any device-independent ones.

This is not an issue if the annotation has an appearance stream, which is required for most cases in PDF 2.0. But, for example, Link annotations do not require AP entry and yet use the value of C entry to draw the border. Another case is when the annotation is modified and the appearance stream has to be recreated.

One way to resolve this would be to state that default color spaces defined in the page resource dictionary are applicable also to color spaces used in C and IC entries of the annotation dictionaries on that page.

This is especially important for PDF/A and PDF/X standards which forbid the use of Device colors with undefined matching output profile.

Handling of inline images with both abbreviated and full keys

Clause 8.9.7 Inline images, Table 91 establishes that many keys in an inline image header pseudo-dictionary can be abbreviations of longer named keys in an Image XObject. But what happens if an inline image has both the full key name and its abbreviation (and they have different values)?

Is this a "duplicate key" and an error? In which case the clause 7.3.7 wording "Multiple entries in the same dictionary shall not have the same key" is inadequate as they are not actually the same key names (just logically/semantically).

We do NOT want to have "first key in dict" / "last key in dict" logic as that would also clearly contradict everywhere else in PDF and clause 7.3.7 "That ordering shall be ignored."

Or does one form of the key take precedence? e.g. long-form over abbreviation? Or vice-versa?

Or do we call this case out explicitly as an error in 8.9.7 and state that the inline image "shall" be skipped in any rendered output.

My preference is for some type of "resolve and continue processing" handling so that the inline image might hopefully get painted.

Unclear wording in 7.10.3 - Type 2 (exponential interpolation) functions

The first sentence of the first para under Table 40 says:
"Values of Domain shall constrain x in such a way that if N is not an integer, all values of x will be nonnegative,
and if N is negative, no value of x will be zero."

We have a sample file that has opened a debate about whether those are two separate statements, or if the second clause is a follow on from the first.

In other words, which of these is it?

a) "Values of Domain shall constrain x in such a way that:

  • if N is not an integer, all values of x will be nonnegative; and
  • if N is negative, no value of x will be zero."

b) "Values of Domain shall constrain x in such a way that if N is not an integer, all values of x will be nonnegative,
and if N is a negative integer, no value of x will be zero."

My reading woiuld be a), but a couple of editing applications seem to be happy to retain a negative non-integer value of N in combination with a Domain of [ 0 1 ] when updating files.

Thanks

Martin

Editorial issue with Standard 14 fonts

Clause 9.6.2.2 Standard Type 1 fonts (standard 14 fonts) (PDF 1.0-1.7) ends with the following text:

PDF processors supporting PDF 1.0 to PDF 1.7 files shall have these fonts, or their font metrics and suitable substitution fonts, available.
These fonts, or their font metrics and suitable substitution fonts, shall be available to the PDF processor.

Editorial issue: the second sentence duplicates the first sentence. Suggestion: delete second sentence.

Table 47 Collection Subitem - unclear if Default statements indicate text string value or no default

Table 47 — Entries in a collection subitem dictionary

"Default: None" is stated for both D and P keys. Both D and P allow text-strings as a valid type. "None" is also capitalized and italic implying it is a value.

Does this mean the default is a text-string with the value None (as in "(None)") - or does it mean that there is no default specified?

If the latter then suggest deleting both Default statements as this is the usual way of indicating there is no default value. I believe this is what is intended.

What happens when there's no AP in an annotation dictionary?

In Table 166 AP is not required in a couple of cases, but immediately after the table the text says "A PDF reader shall render the appearance dictionary without regard to any other keys and values in the annotation dictionary and shall ignore the values of the C, IC, Border, BS, BE, BM, CA, ca, H, DA, Q, DS, LE, LL, LLE, and Sy keys."

So what should the renderer do if there is no AP?

For a zero-size annotation I recommend that it doesn't render anything, and maybe the requirement that the border is drawn completely inside the annotation rectangle (12.5.4 para 1) is enough to state that anyway.

I think for Popup and Projection annotations the correct behaviour is stated adequately elsewhere, so the lack of any allowance for a missing AP here simply muddies the water. It's less clear that there is a clear statement for Link annotations anywhere. Did I miss something?

A good start would be to amend the para after Table 166 to start "For all annotations containing an AP entry, a PDF reader ...", and then amending the following note to say "Requiring an appearance dictionary for most annotations ..."

Vague term "meaningful" used often

A large number of places (49 to be precise – see attached XLSX file) in ISO 32000-2 describe keys as being “meaningful” under certain conditions. This is a vague and unclear term, as it is unclear what “meaningful” really means and how/when you codify or might validate that, or if it has exactly the same meaning in every case.

Meaningful.xlsx

Join Soft-mask image dictionary key limitations between Table 143 and Table 87

Clause 11.6.5.2, "Table 143 — Restrictions on the entries in a soft-mask image dictionary" defines a number of image XObjects keys as being "Ignored" yet this information to ignore for soft-masks is not mentioned or cross-referenced from "Table 87 — Additional entries specific to an image dictionary".

Specifically for Table 87 keys: Intent, Alternates, Name, ID, StructParent.

Table 143 also says SMask key shall be absent, but Table 87 doesn't mention this.

Various simple, not overly wordy solutions and that don't duplicate technical requirements include:

  • add a reference to 11.6.5.2 Soft-mask images to the NOTE above Table 87

  • adding "Additional limitations also apply to this key when used in soft-mask image dictionaries - see clause 11.6.5.2 Soft-mask images." to each of the above keys in Table 87.

Table 226, T key stated as Required but should be Optional?

ISO 32000-1:2008, Table 220, has the T key identified as Optional. However, ISO 32000-2:2020, Table 226, shows the same T key as being required. However, in 12.7.4.2, it says:

A field dictionary that does not have a partial field name (T entry) of its own shall not be considered a field but simply a Widget annotation.

Which clearly implies that a field dictionary need not have a T.

I don't know how the Optional->Required, but I consider it a typo and we need to put it back.

Mapping of character collections to PDF versions is missing

Table 117 — Character collections for predefined CMaps, by PDF version

The information in "Table 117 — Character collections for predefined CMaps, by PDF version" was eliminated in ISO 32000-2. In ISO 32000-1:2008 this table contained the information in which PDF version which character collection was first introduced.

In ISO 32000-2 the contents of the table were replaced with the following text:

Table intentionally empty to retain table numbering in this document (2020).  Information is now located in the appropriate normative reference for each character collection.

According to section "2 Normative references" the "appropriate normative reference for each character collection" is located in the Adobe CMap Resources GitHub repository.

Nowhere in that repository I can find any resource that contains the information that formerly was available in Table 117.

The contents of Table 117 should either be restored and extended for PDF 2.0, or the README.md file in the CMap Resources GitHub repository should be amended with this information.

Text in FontDescriptor description no longer correct

In 9.6.2.1, Table 109, the entry for FontDescriptor includes the text:

For the standard 14 fonts, the entries FirstChar, LastChar, Widths, and FontDescriptor shall either all be present or all be absent. Ordinarily, these dictionary keys may be absent; specifying them enables a standard font to be overridden; see 9.6.2.2, "Standard Type 1 fonts (standard 14 fonts) (PDF 1.0-1.7)".

For PDF 2.0. all of those fields are marked required, so there is NEVER a case where they can "all be absent".

I would recommend we simply remove the paragraph.

Unclear how multiple attribute objects with same owner apply to structure element

Do all non-repeated attributes from multiple attribute objects with a repeated owner apply to a structure element?

14.7.6.1, Paragraph 1 defines:

  • attribute object: "a dictionary or stream that includes an O entry"
  • attributes: "Other entries, except the NS entry, shall represent the attributes: the keys shall be attribute names, and values shall be the corresponding attribute values."

Paragraph 3 ("When an array...") states that when the owner is repeated and a given attribute (as opposed to attribute object) is also repeated, the later entry takes precedence. I interpret "entry" to mean a single attribute, not an array entry. This implies that the following two arrays of attribute objects on a TD structure element should be equivalent:
[ << /O /Table /RowSpan 2 >> << /O /Table /ColSpan 2 >> ]
[ << /O /Table /RowSpan 2 /ColSpan 2 >> ]

In contrast, I have seen two different PDF processors treat the first array as equivalent to specifying only one of the two attribute objects, although admittedly this happened on a PDF 1.5 document. I'm not sure what is the correct interpretation here.

Proposed Solutions (mutually exclusive). Replace the final sentence in paragraph 3 with:

  1. "All the attributes in all such attribute objects apply to the structure element. If a given attribute is specified more than once, the attribute in the later (in array order) attribute object shall take precedence."
  2. "The last such attribute object (in array order) shall take precedence."

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.