Giter Site home page Giter Site logo

Comments (8)

raducoravu avatar raducoravu commented on June 12, 2024

@simonbate if you are using for publishing the DITA OT PDF5 plugin developed by Antenna House maybe they could help you further:
https://www.antennahouse.com/dita-pdf5-plugin

from dita-ot.

simonbate avatar simonbate commented on June 12, 2024

Unfortunately, we're using the OT pdf2 plugin as our base.
Our temporary solution is to ensure the sources do not include newlines, but that shouldn't be necessary.

from dita-ot.

raducoravu avatar raducoravu commented on June 12, 2024

@simonbate right you can probably define a Schematron rule to catch such problems.
I added a similar problem but for CHM and indexterm elements a couple of days ago: #4336

from dita-ot.

chrispy-snps avatar chrispy-snps commented on June 12, 2024

Testcase here: 4337.zip

<index-see> and <index-see-also> handle surrounding whitespace correctly; only <index-sort-as> has the issue.

from dita-ot.

chrispy-snps avatar chrispy-snps commented on June 12, 2024

I think the fix would be inside src/main/java/org/dita/dost/reader/IndexTermReader.java by applying the trimSpaceAtStart() function somewhere, but I can't figure out where it should be. @raducoravu - if you see a potential fix for this, I would be happy to test it.

from dita-ot.

raducoravu avatar raducoravu commented on June 12, 2024

@chrispy-snps I'm afraid I do not have time to look into this, here "org.dita.dost.reader.IndexTermReader.characters(char[], int, int)" there seems to be an initial normalization using normalizeAndCollapseWhitespace which replaces all consecutive spaces with one.
And then there is this code for the index sort as:

  else if (insideSortingAs && temp.length() > 0) {
    final IndexTerm indexTerm = termStack.peek();
    temp = trimSpaceAtStart(temp, indexTerm.getTermKey());
    indexTerm.setTermKey(setOrAppend(indexTerm.getTermKey(), temp, false));
  } 

It's unclear to me what the trimSpaceAtStart does, it seems to not always remove the first space, but removes it when the second parameter to the method also starts with a space. But the "indexTerm.getTermKey()" is probably null at that moment. So maybe replace:

temp = trimSpaceAtStart(temp, indexTerm.getTermKey());

with:

temp = trimSpaceAtStart(temp, indexTerm.getTermKey() != null ? indexTerm.getTermKey() : temp);

so that the first time when the term key is not yet computed to always remove the first space?

from dita-ot.

chrispy-snps avatar chrispy-snps commented on June 12, 2024

So I think the IndexTermReader.java code is used only by the htmlhelp transformation. For pdf2 transformations, index term processing appears to be provided by the org.dita.index plugin. It looks like there is a unit test for <index-sort-as> in there that could be modified to test and resolve the whitespace issue, but I haven't been able to build and test that plugin yet.

from dita-ot.

chrispy-snps avatar chrispy-snps commented on June 12, 2024

In org.dita.index, I filed the following issue:

#2: cannot build or test plugin

Once that is figured out, I hope to implement a fix there.

from dita-ot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.