Giter Site home page Giter Site logo

Comments (14)

speckyspooky avatar speckyspooky commented on June 30, 2024 1

@doortokaos
Confirmed, my testing use according to my comment see above the style-element.

I'm with @hvbtup that not in every case the usage of the "Locale" is a good one
because we would change the default behavior and you won't have this behavior in every case.

My idea would be to test the following:
1. I won't change the default behavior and default will be "en-US"

  1. if it is finished with succes my testing we could have 2 user-properties
    2.A) user-property to configure explicit the language code
    2.B) user-property to activate the "Locale" usage
    2.C) if both values are active then the 2.A) will win because it is the explicit definition

  2. validation will be implemented, is the language code invalid then fallback again to "en-US"

from birt.

hvbtup avatar hvbtup commented on June 30, 2024 1

Thomas, I really like your example reports!

from birt.

hvbtup avatar hvbtup commented on June 30, 2024

PRs are welcome!

Specifiying the language of (parts of) the document is probably described in Microsoft's specification for the DOCX format, and that's basically just a ZIP with several XML files, so you should be able to reverse-engineer this by saving the same document with two different language settings.

But the topic is not as simple as you might think:

First of all, I think that the preview locale is definitely not the correct source for determining the locale.
Instead, the text language must be an attribute of all texts inside the generated report itself.

Sometimes, a report contains texts in more than one language.
For example, our neighbors in Switzerland often use 3 languages in the same report.

So, one metadatum for the languae of the document will not suffice.
Instead, one would need a language attribute for all the texts inside a document, or the option to specify a "main language" for the document and an optional "per-item" language for text passages written in different languages.

This is for the generated documents (eg. HTML or PDF).

We would also need to specify how that languages are determined from the rptdesign file.
AFAIK we have a "locale" property for individual report items (inclduding the report itself). The language is part of the locale specication, so I think this part is clear:

  1. Take the language from the report layout item locale, if specified
  2. Take the language from the report locale, if specified
  3. Take the language from the system property user.language, if that is explicitly specified.

Otherwise, do not assume a language - I think we should not guess.
In particular, many companies in the EU run their servers on Windows servers with US English locale, but their reports should come out in German, French or whatever, so guessing the language from the OS locale settings is error-prone.

I don't know if and how the locale is also part of CSS (or in BIRT speak: the style sheet).

Adding metadata about the language of texts is also a precondition for creating accessible documents, BTW.

from birt.

doortokaos avatar doortokaos commented on June 30, 2024

@hvbtup thanks for the extensive reply.
It all makes sense and is more complicated as it seems to be.
I already thought that it can't be that easy because it would have been done already but I couldn't find information why it wasn't done already.

I wouldn't use the locale of the OS running BIRT as well, but I think using the locale in which the report is generated is a bit better than using nothing or defaulting to en_US. Since the user or report creator already set a locale for the complete report.

Since the perfect solution with a language for each text in the report seems rather complex, using a "better" locale for the complete document would be a step in the right direction in my opinion.

In my experience and user environment most documents contain only one language, so this could be an improvement, while not being perfect.

What do you think about this?

PS I'm glad that BIRT is alive again and that the issues are seen and read. @ all keep up the good work 👍

from birt.

speckyspooky avatar speckyspooky commented on June 30, 2024

Yes, the tickets will be read :o)

The solution to verify if we could set the property for the whole documents sounds good to me.
The special thing is more a technical thing because the implementation of the docx-version is a mixture of a central library and own written source (from the original developers).
Therefore we cannot use the library-api directly and so we need a research to figure out is there a cetral property on document level.
(The latest MS Word versions support the language on document level and on paragraphs & tables.)

The otherone would be that the language value would be set through a user-property as a docx-emitter specific user-property.

from birt.

hvbtup avatar hvbtup commented on June 30, 2024

The language is always set to "English (US)" no matter in which language I create the report

@doortokaos Can you find out if the en-US locale is specified somewhere explicitly in the (etracted) DOCX file structure or if this is just a default which Word assumes if there is no explicit entry?

... but I think using the locale in which the report is generated is a bit better ...

I'm strictly -1 on some kind of magic to determine the language if it is not explicitly defined in the rptdesign file.

... using a "better" locale for the complete document would be a step in the right direction in my opinion.

Yes.

The otherone would be that the language value would be set through a user-property as a docx-emitter specific user-property.

I don't think we need to extend the data model or use a UserProperty. The locale property of items should suffice.

A good starting point for where to look into the code should be the DocxWriter.java file. It's perfectly possible to write xml fragments directly into the output. I did this in our fork to support Word "Felder" (probably called "fields" in English Word?).

from birt.

speckyspooky avatar speckyspooky commented on June 30, 2024
  • Confirmed, the usage of a user-property with explicit language-code is the better way (see my comment).
  • The language can be added on document and paragraph/table level, so we will be focused on document-level fort the first steps.
  • The language will be entered with the tag <w:lang> and different attributes.
  • 2 classes identified: Document (emitter.docx) and DocWriter (emitter.wpml)

According to the docx-definition, the value is a part of the assigned "style" of the according document/paragraph/table.

I will test the user-property option with a first draft on my side to verify the option a little bit more.

from birt.

doortokaos avatar doortokaos commented on June 30, 2024

@doortokaos Can you find out if the en-US locale is specified somewhere explicitly in the (etracted) DOCX file structure or if this is just a default which Word assumes if there is no explicit entry?

@hvbtup here you go:
In the file word/styles.xml I find an entry
<w:lang w:val="en-US" w:eastAsia="zh-CN" w:bidi="ar-SA" />
somewhere beneath the <w:docDefaults> node:
grafik
When I replace "en-US" with "de-DE", save the styles.xml in the DOCX file with an archiver and open the changed DOCX with word, "German (Germany)" is set as language for the document.

So it seems, that BIRT sets en-US for the whole document as default.

The unaltered file created by BIRT for reference
korrekturhilfe.docx

from birt.

doortokaos avatar doortokaos commented on June 30, 2024
  • Confirmed, the usage of a user-property with explicit language-code is the better way (see my comment).

@speckyspooky I don't quite get it why you want to use a user-property.

Correct me if I'm wrong, but as far as I know, there is a locale in the report context, that is used to determine which translation is loaded, when you have assigned a localization text key to a label and registered resource files with the translated texts for the text key.
It is also used to define the formats on elements when the locale is set to "Auto".
grafik

Why can't we use this locale to set a default for the whole document?

from birt.

hvbtup avatar hvbtup commented on June 30, 2024

@speckyspooky

You misunderstood me. I'm +1 for using the locale property as defined inside the report.
grafik

grafik

I'm -1 on guessing the local from the environment.

And I think that that existing default "en-US" is a minor bug (not everyone lives in the USA), so for me it's reasonable to change the behavior like this (changing the default behavior):

If the report property locale is explicitly set as shown above, then extract the language from there and write it into the DOCX file instead of "en-US". Otherwise, dont write the w:lang value into the DOCX.

from birt.

speckyspooky avatar speckyspooky commented on June 30, 2024

Ok, understand than it was my fault.
Yes, we can use directly the "Locale" without user-properties.

from birt.

doortokaos avatar doortokaos commented on June 30, 2024

Thanks for your patience with me. I'm new to the whole GitHub-thing and trying my best.

@speckyspooky After clarifying your idea, I like it

from birt.

speckyspooky avatar speckyspooky commented on June 30, 2024

I added PR #1627.

The enhancement include only the usage of the "report locale" without user-properties.
The fallback of "empty" or "invalid locale" will be the language "en", so we have the behavior like currently.

In my test cases I used "MS Word based on Office 365".

Example 01: "fr_FR" value

spell-check-fr_FR

Example 02: "it" value

spell-check-it

Demo report

docx_language.zip

from birt.

speckyspooky avatar speckyspooky commented on June 30, 2024

The enhancement is merged to the master with PR #1627

from birt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.