projectfluent / python-fluent Goto Github PK

View Code? Open in Web Editor NEW

202.0 13.0 27.0 1.32 MB

Python implementation of Project Fluent

Home Page: https://projectfluent.org/python-fluent/

License: Other

Python 88.04% FreeMarker 0.53% Batchfile 0.29% Fluent 11.15%

i18n l10n internationalization localization

python-fluent's Introduction

Project Fluent

This is a collection of Python packages to use the Fluent localization system.

python-fluent consists of these packages:

`fluent.syntax`

The syntax package includes the parser, serializer, and traversal utilities like Visitor and Transformer. You’re looking for this package if you work on tooling for Fluent in Python.

`fluent.runtime`

The runtime package includes the library required to use Fluent to localize your Python application. It comes with a Localization class to use, based on an implementation of FluentBundle. It uses the tooling parser above to read Fluent files.

`fluent.pygments`

A plugin for pygments to add syntax highlighting to Sphinx.

Discuss

We’d love to hear your thoughts on Project Fluent! Whether you’re a localizer looking for a better way to express yourself in your language, or a developer trying to make your app localizable and multilingual, or a hacker looking for a project to contribute to, please do get in touch on the mailing list and the IRC channel.

Mozilla Discourse: https://discourse.mozilla.org/c/fluent
Matrix channel: #fluent:mozilla.org

Get Involved

python-fluent is open-source, licensed under the Apache License, Version 2.0. We encourage everyone to take a look at our code and we’ll listen to your feedback.

python-fluent's People

Contributors

Stargazers

Watchers

python-fluent's Issues

ast.BaseNode.equals shouldn't ignore order for Attributes and Variants

When talking through this API in fluent.js, we found that ignoring order in Attributes and Variants is actually not the right thing.

Backport this fix to fluent.syntax in python land.

Add a base Localization class to fluent.runtime

fluent.runtime should have a base Localization class to provide a developer-facing API with language fallback etc.

CC @pmac.

This depends on #118 .

Create a Pygments plugin for Fluent syntax highlighting

I'm trying to make the Firefox Fluent docs better on https://firefox-source-docs.mozilla.org/, and one thing that's missing is somewhat non-broken syntax highlighting.

In particular because pygments doesn't highlight at all if there's an error, so our properties attempts bail out badly most of the time.

CC @Flod, @zbraniecki

FluentLocalization swallows errors

fluent_value does not provide a way to retrieve errors produced out of bundle -

python-fluent/fluent.runtime/fluent/runtime/fallback.py

Line 37 in c58681f

val, errors = bundle.format_pattern(msg.value, args)

This makes it really hard to debug them for the user :)

AttributeError: 'FluentBundle' object has no attribute 'add_messages'

I'm shopping around for gettext replacements. Is the python fluent client (and fluent more broadly) still being maintained?

I mainly ask because even the happy path described in the docs does not work properly right now:

pip install fluent.runtime
python

from fluent.runtime import FluentBundle
bundle = FluentBundle(['en-US'])
bundle.add_messages("""""")
Traceback (most recent call last):
  File "/park/server/env/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-2a5756cb5134>", line 4, in <module>
    bundle.add_messages("""""")
AttributeError: 'FluentBundle' object has no attribute 'add_messages'

Looks like it was yanked out in the 0.3.0 "pre-release" and the docs weren't updated.

Anyway, Fluent seems cool, but I'd rather not go through the heartache of using it only to learn in 6 months that Mozilla has abandoned it. Thanks for any insight you have!

How to retrieve a list of available locales?

Is there a way to retrieve a list of available locales using Fluent.runtime (or maybe Babel)?
Or would I have to retrieve a list of folders directly from the operating system?
How do I do that? :-)
Thank you.

Release plan for fluent.runtime

For want of a better place to put this, I'm making an issue for this.

What is the plan for releasing?

I suggest:

Release fluent.runtime 0.1 to PyPI. It's got a working implementation, which is enough for now I think.
Move README docs to Sphinx/readthedocs, at least for fluent.runtime. This shouldn't be too much work, I'm happy to do it for fluent.runtime.
Release fluent.syntax to PyPI. I suggest keeping the same version number, or a minor bump, just changing the package name.
Get fluent.runtime to work with latest version of fluent.syntax (i.e. Fluent 0.8 spec). I already have a branch for this.
Do another release of fluent.runtime at this point.
Look at the second, compiler implementation of FluentBundle, and other branches I'm working on (the escapers mechanism, for example) - this is a bit down the road, so we should decide at this point when we want to do releases for these additional features.

I'm assuming that @stasm or @Pike will be responsible for all releases of fluent.syntax, and I'm happy to be responsible for releasing fluent.runtime, or for you guys to do that.

Drop support for Python 2.7 and 3.5

... and update our testing and compat matrix beyond 3.5, too.

This should be done syntax first, and then the other two.

(fluent.runtime) Add a FluentString type

Arguments to FluentBundle.format may currently be instances of FluentType subclasses, or Uniode strings (str in Python 3). This mirrors the optimization from fluent.js, where primitive strings are treated as if they were FluentTypes thanks to supporting the same {toString, valueOf} interface. The same optimization was applied to the contents of TextElements and StringLiterals.

Would it make sense to drop this optimization? Perhaps the impact wouldn't be as significant in Python as it was in JS. And the benefit would be that the mental model of the resolver would become simpler, with one exception fewer.

@spookylukey What do you think?

Uplift nested selector tests

We need to uplift projectfluent/fluent#279. See projectfluent/fluent.js#417 for an example.

Remove support for Fluent 0.4 syntax from python-fluent

Fluent 0.5 was released in January 2018, let's remove the 0.4 syntax code-paths in the python Fluent parser.

Notably, that's old-style comments, tags?, sections and enforced =.

@stasm, happy to help here, unless you're like I waited to write this patch for a year, don't take it away from me.

CC @spookylukey @zbraniecki @hkasemir

Remove MESSAGE_REFERENCE and EXTERNAL_ARGUMENT helpers

The helpers.py file currently defines two pseudo-AST nodes which can be used in migration specs: MESSAGE_REFERENCE and EXTERNAL_ARGUMENT. I'd like to discuss the general approach regarding such helpers.

On one hand the allow to write more concise AST in migrations. On the other, they introduce abstractions which need to be learned besides the AST itself in order to write and maintain migrations. This in turn results in situations when it's tempting to randomly try different combinations of the helpers to see if they fix an issue with the migration :)

AST being the source of truth, I think there's value in sticking to it exclusively, even at the cost of increased verbosity.

@flodolo, @Pike, @zbraniecki - What do you think?

I already see @zbraniecki not use the helpers in bug 1424682 in some cases, like here:

FTL.Message(
    id = FTL.Identifier('category-general'),
    attributes = [
        FTL.Attribute(
            FTL.Identifier('tooltiptext'),
            FTL.Pattern(
                elements = [
                    FTL.Placeable(
                        # This could read:
                        # expression=MESSAGE_REFERENCE('pane-general-title')
                        expression = FTL.MessageReference(
                            id = FTL.Identifier(
                                'pane-general-title'
                            )
                        )
                    )
                ]
            )
        )
    ]
)

This is also related to what the API of REPLACE and CONCAT should be. Should we only accept Patterns and PatternElements (i.e. TextElements and Placeables)? This would make the API closer to the AST. Or should we accept Expressions as well? Compare:

# REPLACE converts the MessageReference expression returned by the helper to a Placeable.
FTL.Message(
    id = FTL.Identifier('help-button-label'),
    value = REPLACE(
        'browser/chrome/browser/preferences/preferences.dtd',
        'helpButton2.label',
        {
            '&brandShortName;': MESSAGE_REFERENCE('-brand-short-name')
        }
    )
)

# REPLACE requires an explicit Placeable wrapping the MessageReference expression.
FTL.Message(
    id = FTL.Identifier('help-button-label'),
    value = REPLACE(
        'browser/chrome/browser/preferences/preferences.dtd',
        'helpButton2.label',
        {
            '&brandShortName;': FTL.Placeable(
                MESSAGE_REFERENCE('-brand-short-name')
            )
        }
    )
)

# REPLACE requires an explicit Placeable and an explicit MessageReference.
FTL.Message(
    id = FTL.Identifier('help-button-label'),
    value = REPLACE(
        'browser/chrome/browser/preferences/preferences.dtd',
        'helpButton2.label',
        {
            '&brandShortName;': FTL.Placeable(
                FTL.MessageReference(
                    FTL.Identifier('-brand-short-name')
                )
            )
        }
    )
)

Is it possible to dynamically choose language?

Hello. I was going to use Fluent to add translations to my Telegram bot.
When a message arrives, my code gets user's language code to get a string on desired language.
However, I didn't find an option "get string in {language}" in Python-fluent.

Am I missing something? If there's no such feature, is it planned at all? I'm looking for a user-friendly gettext alternative, which support choosing language on-the-fly for any string.

Update fluent.runtime to fluent.syntax 0.10

Filing an issue for @spookylukey's work in django-ftl/python-fluent@master...bundle_fluent_syntax_08.

Luke, is this something that's ready to look at? Related, this probably wants a squash, but does it also want a split into multiple commits?

Cannot read string beyond 1000 characters

When trying to access a string with the format_value() method, it returns the string normally given it's a relatively short string. But if a string longer than 73 lines is entered, it returns {???} instead of said string. PFA test case which will help reproduce the bug here.

Empty comments get dropped by the serializer

A simple fragment like

## This is a group

g1 = one
g2 = two

##

not = in that group

doesn't serialize right. The reason is the side note in https://docs.python.org/3/library/stdtypes.html#str.splitlines, which mentions that splitlines returns an empty list for an empty string instead of [""]. Pfff.

tests don't work in py3.6

I spent a good 2 hours today trying to understand why my patch breaks tests.

The two things that made it really hard were:

In py2.7 when json doesn't match expected the diff contains all u'foo' != 'foo' which is terribly misleading.
py3.6 did not report any errors.

The combination of those two led me onto the path of trying to discover if I changed somehow encoding of the tests, because only py2.7 seemed to be affected.

If we can't fix (1), can we at least fix (2) please?

STR:

Change any of the fixture_structure json files, for example span start or end.
run tox -e py36

Current result:
No error reported

Expected result:
Error reported

Implement BaseNode.visit API

BaseNode.traverse can be costly because it clones the entire tree under the node it ran on. We should add a new method just for non-destructive visiting. This will be useful for the equals methods, as well as for the compare-locales checks.

Variants parsed as junk if closing brace not indented

For example, the following message, taken from the JS test suite - https://github.com/projectfluent/fluent.js/blob/master/fluent/test/values_ref_test.js#L15

key2 = {
      [a] A2
     *[b] B2
}

This is parsed as 'Junk' by the Python parser. If you add a space before the closing brace, it is parsed properly. I'm assuming the JS version is correct.

Don't use Tox in Travis

According to https://docs.travis-ci.com/user/languages/python/#travis-ci-uses-isolated-virtualenvs, using both Travis and Tox isn't well supported. Travis already runs all tests from the matrix in separate virtualenvs. Can we drop Tox?

Create some benchmarks for fluent.runtime

@spookylukey has some benchmarks on his compiler branch, I'd love to land some variant of that ahead of time. That way we have a baseline idea on how the current code performs.

I'm wondering if we should, perhaps at least in addition to what's on branch, have some less micro-benchmark-y mini-benchmark, with, say 20 strings, 1 Term reference, one plural and variable ref? Or something like that? Still pretty unrealistic at that, but maybe a tad better.

And, maybe adjust the readme to use py-spy instead of pyflame? py-spy runs on more OSes, and bundles the flamegraph stuff.

fluent.syntax.ast.Visitor could use `ignore_fields` option

In BaseNode.equals, we have an ignore_fields option. That'd also be nice for Visitor, as it would allow a visitor to skip annotations and spans at the point of enumerating the properties, which is one or two cycles before actually going in to the Span or Annotation nodes, where we can handle them right now.

Use flake8

How do you feel about adding flake8 to the project? I never code in Python without it. It looks like the vast majority of the code base already conforms - mainly it complains about some imports not being at the top of the module. This seems to be because of some sys.path hackery in the tests that I don't understand the purpose of - if I remove those lines, the tests still run fine.

So there are two questions I guess:

would you accept a PR that makes flake8 compliance a part of the test suite and TravisCI?
can I remove the sys.path hackery and fix the other remaining issues that flake8 complains about?

Why would I want my Python app to show: 'Hello, \u2068Jane\u2069!'

Hi!

I don't quite understand the documentation here:
https://www.projectfluent.org/python-fluent/fluent.runtime/stable/internals.html

This example:

greet = bundle.get_message('greet-by-name')
translated, errs = bundle.format_pattern(greet.value, {'name': 'Jane'})
translated
'Hello, \u2068Jane\u2069!'

Is explained like this:
You will notice the extra characters \u2068 and \u2069 in the output. These are Unicode bidi isolation characters that help to ensure that the interpolated strings are handled correctly in the situation where the text direction of the substitution might not match the text direction of the localized text.

But why would I want my user to look at this: 'Hello, \u2068Jane\u2069!'

fluent.util might be dead code

fluent.util was only used by migrate, iirc, so that can go down, too.

Forgot that in my review of the actual migrate removal.

Implement MessageContext and format method

In other words, make this Python implementation able to generate translations, as mentioned in #58

I'm adding this issue as a placeholder for anyone else working on this, because I have started to tackle the problem in the 'implement_format' branch on this repo: https://github.com/django-ftl/python-fluent/tree/implement_format

I am following the Javascript implementation to a large degree, but making it more pythonic, and I'm especially leaning on the Javascript tests.

So far I have only the most basic cases covered (simple text messages, and external number/text arguments interpolated into them), but with a decent start in terms of a framework for the remainder. I don't know when I'll be able to look at this again.

parser could be pythonic and performant

The current parser in fluent.syntax.parser closely matches the js parser, and python doesn't cope with that well.

We could make that more pythonic.

I also think that we can implement the backwards-compat part of the parser as a subclass, so tools and deployments that just want current fluent don't need to accept files that are no-more-cool in fluent today.

I'll take a stab at this, let's see how it goes.

Add type hinting

Currently, mypy checking of my python code fails with:

 error: Skipping analyzing 'fluent.runtime': found module but no type hints or library stubs

I'll just ignore the error for now but it would be nice to have type hint support. It looks like this is blocked by #162.

Attributes and tags and multiline patterns should allow blank lines before them

Currently, none of the following messages parses correctly:

foo =


    Mutliline Foo Value


bar1


    .attr = Attr


bar2


    .attr1 = Attr1


    .attr2 = Attr2



bar3 =


    Multiline Bar Value


    .attr = Attr


bar4


    .attr =


        Multiline Attr Value


qux1 = Qux


    #tag


qux2 = Qux


    #tag1


    #tag2

Publish python-fluent packages after doc updates

Now that we have docs and READMEs with long descriptions worth that name, let's update the packages and upload to pypi.

My gut-instinct says that we're close to 1.0 versions. @stasm might like the idea.

I'm a bit hesitant to do that for fluent.syntax and fluent.runtime, given how much we changed, but I think they're also in RC state.

For fluent.pygments, I wouldn't mind going to 1.0 directly.

Readme.md : 'Discuss' section still points to Mailing list.

The 'Discuss' section points to Mailing list for contact.
However, the last message on this mailing list from 14.06.2018 says:

We're retiring the tools-l10n mailing list for Fluent-related
communications in favor of GitHub and Mozilla Discourse.

The link in the 'Discuss' section in Readme.md should point to the Mozilla Discourse page instead.

Unable to serialize JSON

The old (l20n) parser returned JSON, which Pontoon used internally, and the serializer took it.

The new parser returns an object and the serializer takes it as input. Since we still use JSON for storing AST in Pontoon, we need a method to conver AST object to JSON (already exists) and JSON to AST object (doesn't exist yet).

Docs and/or status of project

At first I was going to ask for docs on how to actually use this package. On further inspection it doesn't look like it has yet reached minimum functionality. If would be great if the README could be updated to indicate the project status.

Thanks!

Spec compliance: report error for lines with unusual characters at start inside of multiline text.

See projectfluent/fluent.js#620

TL;DR is that python-fluent does parse multiline strings that start with * (and maybe other unusual characters), while spec tells that it shouldn't.

Move Visitor and Transformer out of ast.py

I'd like to consider making ast.py strictly about AST nodes. The processing classes could be moved to another module: tools.py, visitor.py, process.py or something else.

Generic enter/exit visitor to replace BaseNode.traverse and friends

The simple read-only visitor we landed in #96 is great for performance, but doesn't cut it for actual transformations. I'm having a local branch, that I hope is generic enough to replace traverse.

Not happy with the name of it, I'm using ContextVisitor, because it uses enter and exit, but I can't really use python context managers, because you can't pass arguments to __exit__.

I'm implementing .traverse() as part of the patch to fluent.syntax, but to make things more tangible, here's how transforms_from could look like. Compare with https://hg.mozilla.org/l10n/fluent-migration/file/797c19359d4b/fluent/migrate/helpers.py#l43 through line 123.

class IntoTranforms(FTL.ContextVisitor):
    IMPLICIT_TRANSFORMS = ("CONCAT",)
    FORBIDDEN_TRANSFORMS = ("PLURALS", "REPLACE", "REPLACE_IN_TEXT")

    def __init__(self, substitutions):
        self.substitutions = substitutions

    def generic_exit(self, node, props):
        return node.__class__(**props)

    def enter_Junk(self, node):
        anno = node.annotations[0]
        raise InvalidTransformError(
            "Transform contains parse error: {}, at {}".format(
                anno.message, anno.span.start))

    def enter_CallExpression(self, node):
        name = node.callee.id.name
        if name in self.IMPLICIT_TRANSFORMS:
            raise NotSupportedError(
                "{} may not be used with transforms_from(). It runs "
                "implicitly on all Patterns anyways.".format(name))
        if name in self.FORBIDDEN_TRANSFORMS:
            raise NotSupportedError(
                "{} may not be used with transforms_from(). It requires "
                "additional logic in Python code.".format(name))
        return True

    def exit_CallExpression(self, node, props):
        if node.callee.id.name != 'COPY':
            return self.generic_exit(node, props)
        args = (self.into_argument(arg) for arg in node.positional)
        kwargs = {
            arg.name.name: self.into_argument(arg.value)
            for arg in node.named}
        return COPY(*args, **kwargs)

    def exit_Placeable(self, node, props):
        if isinstance(props['expression'], Transform):
            return props['expression']
        return self.generic_exit(node, props)

    def exit_Pattern(self, node, props):
        # Replace the Pattern with CONCAT which is more accepting of its
        # elements. CONCAT takes PatternElements, Expressions and other
        # Patterns (e.g. returned from evaluating transforms).
        return CONCAT(*props['elements'])

    def into_argument(self, node):
        """Convert AST node into an argument to migration transforms."""
        if isinstance(node, FTL.StringLiteral):
            # Special cases for booleans which don't exist in Fluent.
            if node.value == "True":
                return True
            if node.value == "False":
                return False
            return node.value
        if isinstance(node, FTL.MessageReference):
            try:
                return self.substitutions[node.id.name]
            except KeyError:
                raise InvalidTransformError(
                    "Unknown substitution in COPY: {}".format(
                        node.id.name))
        else:
            raise InvalidTransformError(
                "Invalid argument passed to COPY: {}".format(
                    type(node).__name__))

Parser should allow white-space before call arguments

This is parser fixes to match the added tests for projectfluent/fluent#281.

Use namespace packages

In preparation for #67 (or as a follow-up), I'd like to change the directory structure of this repository and start distributing Fluent code as namespaced packages:

Namespace packages allow you to split the sub-packages and modules within a single package across multiple, separate distribution packages

Here's the directory structure I suggested in #67 (comment):

fluent-syntax
    setup.py
    fluent
        __init__.py
        syntax
            __init__.py
            ...
fluent-bundle
    setup.py
    fluent
        __init__.py
        bundle
            __init__.py
            ...

Make fluent.syntax a proper pkgutil-style namespace package

This pairs with bug 1452900 for fluent.migrate, we need to do the same thing on both sides.

Use github actions for automation

Let's consider to switch to gh actions for our automation of python-fluent.

Our travis setup is rather involved, and also over-tests.

Looking at the documentation of gh actions, we'll have a ton of nice things there.

Documentation is out of date

Edit: It appeared that the fluent.runtime I installed with pip install fluent.runtime was actually a lot older than what I expected.

When adding messages to Fluent bundles, the documentation at https://fluent-runtime.readthedocs.io/en/latest/index.html states you can directly add the messages as a template string like so:

>>> bundle.add_messages("""
... welcome = Welcome to this great app!
... greet-by-name = Hello, { $name }!
... """)

In reality, you have to wrap the template string into a FluentResource and use bundle.add_resource instead of bundle.add_messages because it does not exist. The add_resource function checks if there is a body key in the resource function parameter, which is generated by the FluentResource class.

from fluent.runtime import FluentBundle, FluentResource

bundle = FluentBundle(['en-us'])
bundle.add_resource(FluentResource("""
welcome = Welcome!
"""))

Also, bundle.format() does not exist.

FluentDateTime inheriting from datetime breaks self.replace on pypy >= 5.10

pypy changed how .replace creates a new instance, by instantiating type(self).

Which ends up being a FluentDateTime, which doesn't take the positional arguments that datetime does.

@spookylukey, is this something you could tackle? I managed to reproduce this in the existing test suite on pypy3.5 6.0, but I'm not sure I understand how you're using kwargs in the __new__ and _init.

This blocks updating our python testing setup in #70 .

how can I extract translation percentage?

hello, is there a way to have the translation progress of a fluent file?

for po files, I'm using this: http://docs.translatehouse.org/projects/translate-toolkit/en/latest/commands/pocount.html

can I use python-fluent to do something equivalent?

Could support be added for async functions?

Add support for `NUMBER(..., currencyDisplay="name")`

I was reading through the docs and noticed:

We do not yet support NUMBER(..., currencyDisplay="name") - see this python-babel pull request which needs to be merged and released.

The linked pull request was merged and there have been releases since then, so it should be possible to add this support now.

README claims Rust rather than Python

The README claims that this is a Rust implementation.

Incorrect handling of tabs in message value

As identified in mozilla/pontoon#2470 (comment), this happens:

from fluent.syntax import ast, FluentParser, FluentSerializer
parser = FluentParser()
serializer = FluentSerializer()

string = """places-open-in-container-tab =
    .label = Բացել նոր ներդիրում
    .accesskey =	
"""

string
'places-open-in-container-tab =\n    .label = Բացել նոր ներդիրում\n    .accesskey = \t\n'

serializer.serialize(parser.parse(string))
'places-open-in-container-tab =\n    .label = Բացել նոր ներդիրում\n    .accesskey = \n'

Tabs should not be lost; they count as valid inline_text characters according to the spec.

Adjacent Junks are incorrectly merged

Per upstream issue 296, each Junk should end whenever a new line begins with a single character that could be the start of a new Entry (i.e. ^[-#a-zA-Z]). This is working correctly only in the case of potential Comment entries (see comments.ftl where the leading # starts a new Junk) but is incorrect for all other cases such as possible Messages and Terms (see unclosed.ftl which should result in 4 Junks but instead merges 6 junk_lines into 1 Junk).