[RFC] Locale formats
For several days I now read on the topic of Locales in regards to our current handling and the Symfony way. I'm writing down here what I found so we're all up to date and to solve confusion once and for all. Loosely related to contao/core-bundle#190 and contao/core-bundle#171.
Background
In computing, a locale is a set of parameters that defines the user's language, country and any special variant preferences that the user wants to see in their user interface.
https://en.wikipedia.org/wiki/Locale
This is probably familiar to everyone of us.
Locale formats
These two formats are relevant for our case.
- IETF Language Tag, specified in BCP 47, (aka RFC 5646, RFC 4646, RFC 3066, RFC 1766).
Used in:
- HTTP Accept-Language header
- XML & HTML documents (e.g. <html lang="">)
- Locale ID, according to the International Components for Unicode (ICU).
Used in:
- PHP Intl extension
- Symfony Intl component, a replacement layer for the PHP intl extension
- Recommended format for the Symfony Translation component
- Returned by Symfony Request::getLanguages() (after parsing the HTTP Accept-Language header)
Transifex uses a Locale ID to represent regional languages (e.g. zh_TW
) but a Language Tag to represent language scripts (e.g. zh-Hant
) :-(
Differences between Language Tag and Locale ID
As far as I understand, there is no major difference for our use case.
- A Language Tag uses a dash (
-
) as delimiter between language, script and region.
- A Locale ID uses an underscore (
_
) as delimiter between language, script and country.
- In a Locale ID, the third subtag is always a country (according to ISO 3166) whereas in a Language Tag it can also be an UN M.49 region code.
A Locale ID also does allow to specify more details on locales like the currency, calendar or collation. However, they are (currently) not relevant for our case.
Structure of Language Tag and Locale ID
Apart from the differences noted above, both Language Tag and Locale ID are very similar in their format:
- The language subtag specified using a two- or three-letter lowercase code (using ISO 639-1 or ISO 639-2).
- An optional script subtag (specified in ISO 15924)
(Examples: Latn = Latin, Cyrl = Cyrillic, Hans = Chinese Simplified, Hant = Chinese Traditional)
- The country or region subtag, commonly using a two letter ISO 3166 country code.
Best practice is to add subtags only if they add relevant information. As an example, it's not recommended to write en-Latn
because english is almost always written in latin characters.
Situation in Contao
Just so that everyone is on the same track, I'm quickly writing down the current Contao approach:
- A page language (
tl_page.language
) is a Language Tag. It can either be two characters ISO 639-1 code (de
, en
) or a five characters language and country (de-DE
, de-CH
, en-US
).
- Contrary, language files are using Locale ID. The folder name can either be two characters ISO 639-1 code (
de
, en
) or a five character language and country (de_DE
, de_CH
, en_US
).
- The languages list (
system/config/languages.php
) is also using Locale ID, which means the same applies for member and user language (tl_member.language
/ tl_user.language
).
The writing of both formats is case sensitive. The $GLOBALS['TL_LANGUAGE']
variable is inherited from the page language and therefore is a Language Tag.
As our language folders are a Locale ID, we're converting the representation everywhere where we try to match a page language to a language folder (str_replace('-', '_', $lang)
). Because we're relying on Transifex, the package format is somewhat predefined.
Situation in Symfony
Translation
The Translation component accepts numerous formats for the translation files (see [1], [2]). It simply tries to find a file with the given locale.
However, loading fallbacks (zh
for zh_CN
) only works when using a Locale ID with underscore (see [1]). It does, however, not support three level fallbacks (loads zh_Hant
but not zh
when locale is zh_Hant_TW
).
Request
The request has methods for setLocale
and getLocale
. They are NOT related to the _locale
attribute (see HttpKernel). On a call to setLocale
, the property is forwarded to the PHP Intl subsystem if available.
The good news is: the PHP Intl component does happily accept both dash (-
) or underscore (_
) as delimiter and correctly detects country, script and region.
Intl
The Symfony Intl component provides fallback information if PHP Intl is not available. Same as PHP Intl, it uses Locale IDs (see [1]). I have not tested it, but I doubt it will work with Language Tags.
HttpKernel / Magic
The HttpKernel component contains some dependency injection magic in regards to locale handling. There are two listeners in place:
- The
LocaleListener
triggers on kernel.request
(priority 16) and calls Request::setLocale
if a _locale
attribute is found in the current request (see [1]).
- The
TranslatorListener
triggers on kernel.request
(priority 10) and sets the locale for the default translator from Request::getLocale
(see [1]).
Problems
Frontend
In our current implementation (Contao 4.0.0-beta1), the request _locale
attribute is parsed from the route path, which means it will be a Language Tag according to the setting in tl_page.language
. If zh-TW
is set as the page language, the Translator would not find the zh_TW
language pack (and it would not load zh
either).
If routes do not contain locales (contao.prefix_locale
is false
), the fallback language will be used for the Symfony Translator. Contao will use the best-matching language from Accept-Language header to find an appropriate page, and use that language then.
Backend
Backend paths do not contain language information. For Symfony Translation, the fallback language (en
) will always be used. Contao will use the best-matching language from Accept-Language header.
After a user login, the user's language (tl_user.language
) will be used to load Contao languages.
Conclusion
For Symfony, it does not matter what Locale format you use, because it's just a framework. As long as your routes and translation files do match, the Translator will happily load what your URL contains.
The current Contao implementation is not really compatible with Symfony Translation though. It would only be possible to load translations by using Language Code files (messages.de-CH.xliff
), which is neither the recommended way nor the default in Contao or supported by Transifex.
Solutions
TBD
Tools