Giter Site home page Giter Site logo

Comments (6)

mohamnag avatar mohamnag commented on May 30, 2024

what kind of help is required here? I'm a Frasi native speaker that can help change code and verify the results, however I'm not really sure if I know what piece of code is to be changed here. I took a short look at the latest version and can't really spot the place where the drawing of an element with unicode text is happening.

from flyingsaucer.

mohamnag avatar mohamnag commented on May 30, 2024

FYI, I tracked it down to this method com.lowagie.text.pdf.BaseFont#convertToBytes(java.lang.String) and it looks like the encoding is always set to Cp1252 from which I would not expect much to render any non-latin chars. maybe properly setting the charset on that (don't know how) will fix the issue. eventually using a font that has proper characters too.

from flyingsaucer.

asolntsev avatar asolntsev commented on May 30, 2024

@mohamnag Hi. Wow, thank you for debugging this problem with fonts.
Yes, now I see: FS always uses encoding winansi (which I guess means Cp1252). I don't know why, but it was used from the very beginning 01.02.2006 :)

I think we can change this encoding. Can you provide a simple example of such html and font, so we could add this example to FS tests?

from flyingsaucer.

mohamnag avatar mohamnag commented on May 30, 2024

well I went on and used a custom font where I can set the encoding. the result was unfortunately still problematic.

lets take this sample HTML:

<html lang="fa">
<head>
    <meta charset="UTF-8"/>
    <title>Title</title>
    <style>
        .rtl-font {
            font-family: Vazirmatn;
            direction: rtl;
        }
    </style>
</head>
<body>
<div style="background-color: blue">
    تست فارسی
</div>
<div class="rtl-font" style="background-color: green">
    تست فارسی
</div>
<div dir="rtl" style="background-color: red; font-family: Vazirmatn">
    تست فارسی
</div>
</body>
</html>

I have the font (can get it for free from https://github.com/rastikerdar/vazirmatn/releases/tag/v33.003) unzipped into resources directory and this is my Java code:

        try (OutputStream outputStream = new FileOutputStream("build/pdf/method4.pdf")) {
            // parse and improve HTML
            Document document = Jsoup.parse(new File(inputHtml.getFile()), "UTF-8");
            document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
            var htmlString = document.html();

            // initialize Flying Saucer
            ITextRenderer renderer = new ITextRenderer();
            SharedContext sharedContext = renderer.getSharedContext();
            sharedContext.setPrint(true);
            sharedContext.setInteractive(false);

            renderer
                    .getFontResolver()
                    .addFont(
                            Main.class.getClassLoader().getResource("Vazirmatn/ttf/Vazirmatn-Regular.ttf").toString(),
                            BaseFont.IDENTITY_H,
                            true
                    );

            renderer.setDocumentFromString(htmlString);

            renderer.layout();
            renderer.createPDF(outputStream);
            // relative resources: see https://www.baeldung.com/java-html-to-pdf#dependencies-4
        }

now this is the output that FS is giving me:
image

and this is what a browser gives me (ignoring the font not being applied):
image

there are two problems here:

  1. the connection between letters: farsi/arabic letters get connected and change shape based on position and neighbouring letters. this is somehow not handled
  2. the RTL orientation is not applied. the first letter ت should be positioned right most but is left most.

in general I would first go for solving this problem using a custom font (which for sure has all chars) and then maybe looking into fixing that charset for default font.

from flyingsaucer.

mohamnag avatar mohamnag commented on May 30, 2024

btw, you have probably seen this example of RTL rendering using OpenPDF but I just to mention it: https://github.com/LibrePDF/OpenPDF/blob/master/pdf-toolbox/src/test/java/com/lowagie/examples/fonts/styles/RightToLeft.java

I don't know if this is different than what FS is doing under the hood when working with OpenPDF but I couldn't find any of those methods being called.

from flyingsaucer.

mohamnag avatar mohamnag commented on May 30, 2024

I also found this post: https://groups.google.com/g/flying-saucer-users/c/n0CfuYfpQ6I/m/3iJIaZ4IAAAJ
and a whole thread there that is related to this ticket.

from flyingsaucer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.