Giter Site home page Giter Site logo

fititnt / hxltm-action Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 2.0 77 KB

[non-production-ready] Multilingual Terminology in Humanitarian Language Exchange. TBX, TMX, XLIFF, UTX, XML, CSV, Excel XLSX, Google Sheets, (...)

Home Page: https://hxltm.etica.ai/

License: The Unlicense

Dockerfile 14.36% Shell 85.64%
non-production-ready github-actions

hxltm-action's Introduction

Actions with HXLTM: terminology, translation & localization

[non-production-ready] GitHub Action for HXLTM (Multilingual Terminology in Humanitarian Language Exchange). TBX, TMX, XLIFF, UTX, XML, CSV, Excel XLSX, Google Sheets, and more.

Preface

Click to see more context details.

What is HXLTM? Referece tooling? The HXLTM Action?

What is HXLTM?

The HXLTM documented convetions (ontologia) explains how store terminology and translation memories in HXL. This make both very compact storage while viable to alow human colaborative editing for complex cases even without advanced frontends.

Referece tooling

Public domain reference tooling enable direct convertion from HXLTM to both templated files (in short: more-than-string-replace placeholders with content from HXLTM) and both user customizable and industry standards related to linguistic content.

The HXLTM Action

This GitHub Action abstract part of what is possible use with underling HXLTM cli tools. This action also allow use the fantastic command line tools shipped with libhxl-python configurable with the bin parameter.

Source code for underlining applications:


Table of Contents


Example usage

Are you new to GitHub Actions? PROTIP!

PROTIP: if you are new to GitHub Actions consider each published action with 💖 by with others (TL;DR the - uses: of - uses: actions/checkout@v2 part) as building blocks who run (TL;DR the runs-on: ubuntu-latest part) on 8GB to 14GB RAM powerful virtual machines and are 100% free and unlimited (*) to public open source projects.

(*): but even in good intent, avoid too often unauthenticated request for external services without strong reason, like Google Sheets. Special care with Scheduled jobs for datasets someone else already is sharing a cached version and hosting on GitHub Pages or some other site.

Quickstart

on: [push]

jobs:
  HXLTM-export:
    name: Converts HXLTM to multilingual data formats
    runs-on: ubuntu-latest
    steps:

      - name: Checkout the git repository to the actions temporary host runner
        uses: actions/checkout@v2

      - name: "HXLTM to TBX (TermBase eXchange)"
        uses: actions/[email protected]
        with:
            bin: 'hxltmcli'
            # https://hdp.etica.ai/hxltm/archivum/#TBX-Basim
            args: "--objectivum-TBX-Basim"
            infile: 'fontem.tm.hxl.csv'
            outfile: 'objectivum.tbx'

      - name: "HXLTM to TMX (Translation Memory eXchange)"
        uses: actions/[email protected]
        with:
            bin: 'hxltmcli'
            args: "--objectivum-TMX"
            infile: 'fontem.tm.hxl.csv'
            outfile: 'objectivum.tmx'

      - name: "HXLTM to UTX (Universal Terminology eXchange)"
        uses: actions/[email protected]
        with:
            bin: 'hxltmcli'
            args: "--objectivum-UTX"
            infile: 'fontem.tm.hxl.csv'
            outfile: 'objectivum.utx'

Full example usages

Examples of repositories using this action

hxltm-action-example

The hxltm-action-example is used to test the lasted version of hxltm-action. It's recommended to specify a version (or a strict hash), like @v0.4.0 instead of @main, so - uses: fititnt/hxltm-action@main would become - uses: fititnt/[email protected].

Documentation

This documentation explains the action.yml and entrypoint.sh strategy to abstract the command line usage described at https://hdp.etica.ai/hxltm/archivum/.

Baseline inputs

Baseline inputs, together with Environment variables, are enough to abstract how to use the underlying command line tools. The syntactic sugar inputs offer some level of abstraction.


      # TODO: explain this snipped a bit better
      - # name: "Some description here"
        uses: fititnt/[email protected]
        with:
          bin: "hxltmcli" # hxltmcli, hxltmdexml
          args: ""  # 
          infile: path/to/fontem.tm.hxl.csv
          outfile: path/to/objectivum

bin

Required The executable to run.

Parameter examples:

  • hxltmcli (or .github/hxltm/hxltmcli.py) (*)
  • hxltmdexml (or .github/hxltm/hxltmdexml.py) (*)

(*): If necessary, a local customized fork of the reference HXLTM tools can be stored near where the data is processed. The suggested places are .github/hxltm/(file).py. This can both be useful for testing proposes or immediate hotfixes under urgency response where you as implementer cannot wait.

args

Arguments passed for the program defined by bin parameter.

Parameter examples:

  • --help
  • -v
  • --sheet 7 (Select sheet from a Excel workbook (1 is first sheet))

infile

The input file for the program defined by bin parameter Note on non use of pipelines. Default "fontem.ext".

Parameter examples:

  • fontem.hxl.csv
  • fontem.tbx

outfile

The output file for the program defined by bin parameter Note on non use of pipelines. Default "objectivum.ext".

Parameter examples:

  • objectivum.tbx
  • objecricum.hxl.csv

Environment variables

Reusable defaults

The way GitHub Actions steps works, environment variables can be both passed at the entire job level or at specific tasks. One implication of action.yml and entrypoint.sh is that the use of environment variables at job level can be used to create default values for potentially repetitive values, like working_languages.

TODO: test this potential implication and document it.

Syntactic sugar inputs

This section shows some syntactic sugar (or intentional syntactic saccharins) for what could be done using other ways, often with args parameter. Some of these only use English for what hxltm cli tools use Latin.

help

A syntax sugar to evoke bin program with --help and exit without raising error. Default false.

Just copy and paste the following.

      - name: "hxltmcli --help"
        uses: fititnt/[email protected]
        with:
          bin: "hxltmcli"
          args: "--help"

      - name: "hxltmdexml --help"
        uses: fititnt/[email protected]
        with:
          bin: "hxltmdexml"
          args: "--help"
Extra: HXLStandard cli tools

Since libhxl-python is a requeriment of hxltm, you can reuse this action to pre-process already HXLated datasets (if not HXLated yet, use hxltag and manually map.)

      # Bonus: HXLStandard cli tools ___________________________________________
      # @see https://github.com/HXLStandard/libhxl-python/wiki/Command-line-tools
      - name: "hxlspec --help"
        uses: fititnt/[email protected]
        with:
          bin: "hxlspec"
          args: "--help"

      - name: "hxltag --help"
        uses: fititnt/[email protected]
        with:
          bin: "hxltag"
          args: "--help"

      - name: "hxldedup --help"
        uses: fititnt/[email protected]
        with:
          bin: "hxldedup"
          args: "--help"

      ### Full list (as 2021-11-07)
      # compgen -c | grep hxl
      # hxlreplace
      # hxlexplode
      # hxlselect
      # hxladd
      # hxlspec
      # hxlcount
      # hxltag
      # hxlcut
      # hxlsort
      # hxlexpand
      # hxlmerge
      # hxldedup
      # hxlfill
      # hxlrename
      # hxlclean
      # hxlappend
      # hxlimplode
      # hxlvalidate
      # hxlhash

working_languages

List of one or more working languages Note on language options. Use new lines or , as separator.

Parameter examples:

  • TODO: add example parameters for IATE and UN working languages here

non_working_languages

Opposite of working_languages Note on language options.

auxiliary_languages

List of one or more auxiliary languages (order ir important) Note on language options. Use new lines or , as separator.

Parameter examples:

  • TODO: add example parameters for IATE and UN working languages here

source_language

Source language Note on language options. Single item.

target_language

Target language Note on language options. Single item.

export_ad_hoc_template

  • Syntactic sugar for HXLTM: --objectivum-formulam)

Export custom template (HXLTM Ad Hoc Fōrmulam). Path to a single file on local disk.

Parameter examples:

  • data/README.🗣️.md

export_data_exchange_standard

  • Syntactic sugar for HXLTM: --objectivum-<VALUE>)

Export to data standard documented on the HXLTM ontologia.

Parameter examples:

  • TMX
  • XLIFF

dump_abstract_syntax_tree

Specify a file to dump the HXLTM Abstractum Syntaxim Arborem [Note on HXLTM-ASA].

Parameter examples:

  • .asa.hxltm.yml
  • .asa.hxltm.json

Outputs

resultatum

TODO: explain better the outputs.

Annotations

Note on non use of pipelines

Piping from stdin and stout, available as an efficient way by underlining cli tools, is not available. If you're working with gigabytes size datasets that would exist on GitHub Actions free disk, consider using actions-python and install all dependencies manually.

Note on language options

The main reason for the hxltm-action documentation on these options to be more conceptual is both because the HXLTM reference implementation tooling allows users specifying them and explose their value for who document custom standards on your ontologia even when original data exchange standards don't use it.

TODO: give even more context

Note on HXLTM-ASA

TODO: explain what is special about the way the reference implementation of HXLTM use HXLTM-ASA.

To do

  • Even if the @v0.*.* already are usable (but recommended to users to specify exact version), eventually release a @v1.0.0 fo uses can use the convention of GitHub actions of define @v1 / @v2 / @v3 (...) as their version.
  • Potential new Action Translate Toolkit

License

Public Domain

To the extent possible under law, Emerson Rocha and non anonymous collaborators have waived all copyright and related or neighboring rights to this work to Public Domain.

Optionally, the BSD Zero Clause License is also one explicit alternative to the Unlicense as an older license approved by the OSI:

SPDX-License-Identifier: Unlicense OR 0BSD

hxltm-action's People

Contributors

fititnt avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

cloud90-co

hxltm-action's Issues

Test case for HXLM-Action: datasets from Translation Initiative for COVID-19 "TICO-19"


While not crucial to implement the V1 of hxltm-action, this issue will be used to reference strategy used to convert this real dataset as test case for conversion. The end result both could be useful and also help to understand what additional tooling could be relevant.

TODO: add more context.

Optimize the base Docker image used for hxltm-action

The current Dockerfile is using FROM python:3.9-bullseye (Debian, bigger base image), but the libhxl-python can run on Alpine. (see https://github.com/HXLStandard/hxl-proxy/blob/main/Dockerfile).

While is possible to manually copy the hxltmcli.py (and hxltmdexml.py), the point here is refactor the hdp-toolchain to remove extra dependencies on https://github.com/EticaAI/HXL-Data-Science-file-formats/blob/main/requirements.txt.

# (...)
#
# When working with urnresolver. Not used by others
cryptography
keyring

I'm almost sure that was the cryptography that make the alpine fail, so both dependencies on https://github.com/EticaAI/HXL-Data-Science-file-formats must be optional AND documentation should be updated. Then we could optimize the Docker base image here.

This point actually don't block functionality, but allow speed up a bit more.

List examples of external repositories using hxltm-action

While already at the v0.2.0 do exist documentation that explains how to use this action (but is too basic), and do exist some self testing, still better to give different repositories to avoid user confusion trying to copy the .github/workflows here (which uses uses: ./ instead of uses: user/hxltm-action@version.

The idea here could be both one or both of these types of example

hxltm-extras-action: ad hoc GitHub Actions for non-HXL / non-HXLTM cli tools

Both for convert to new formats that HXLTM reference tooling may never do directly (that would make it too heavy on dependencies) and because there is some data preprocessing for tabular data before even start to think to HXLate (so, hxltmcli don't work, because it needs at least input be HXLated; the hxltmdexml was special case because input is XML mapped on the ontologia, so never would exist such special tooling) makes sense to at least tell everyone (even if is just for our internal use) which tools to use to pre-process or post-process.

Then, there is one problem. We're using GitHub actions for HXL / HXLTM, but most tools that do have cli tools to manipulate data don't have any published version on GitHub Market place, including very popular cli tools to deal with CSV or tabular data.

The plan

While this issue here is not about this repository, is at least a reference place to mention which strategies we use to create such GitHub actions, so even at least we can get things going.

"One big with everything" or several smaller actions?

I'm not sure which approach to follow, but one "One big with everything" is anti-pattern. But it may works to know which tools we decide to create separate action.

Also, one advantage of get the approach of simple abstract the command line tools, is that is is easier to create several of these actions and they still works on long term. So, the point here is for example allow the #5 be implementable without make too much hacks.

Implement and document HXLTM ad hoc templated file generation based on source multilingual dataset (use case: generate monolingual templates, like translated documentation)

Links


The HXLTM have an experimental feature implemented (but not fully documented) called now HXLTM Ad Hoc Fōrmulam (HXLTM templated export) and with output from hxltmcli --help

# (...)
  --objectivum-formulam OBJECTIVUM_FORMULAM
                        Template file to use as reference to generate an output. Less powerful than custom file but can be used for simple cases.
# (...)

The idea of this topic here (maybe also with an example for #2) is use hxltm-action to showcase this feature. Maybe one perfect example is use this to store translations from README.md files on seperate place, then automatically generate the readmes.

This, with combo of fetch translations from remote sources (like Google Sheets) could allow create translations for projects (even as simple as README files).

Implement and document HXLTM configurable ontologia (use case: create non ad hoc templated files, but data standards not shiped with default hxltm ontologia, like custom XMLs, JSONs, etc)

  • Implement and document HXLTM ad hoc templated file generation based on source multilingual dataset (use case: generate monolingual templates, like translated documentation) #3
    • Note: one difference with the #\3, is here you're likely to create like a custom XML, (like and XLIFF version or some TBX dialect) or some new JSON data standard that is not added to everyone use.
    • The #\3 is likely to be more focused on "generate files like READMEs or JSON with specific concept translations" while this issue here is a full dump (to export, maybe import if you can explain on ontologia) that is generic for any dataset, without hardcoded concepts

The reference public domain cli tools of HXLTM have option to specify a different ontologia file (which basically, have full control of not only how it exports, but import back from documented data standards). But this is not documented here on hxltm-action.

Add to this that the special option --archivum-configurationem-appendicem which we had not change to test how would be implemented, somewhat already would allow to only merge/replace specific additions. The advantage of this is the end user (or an GitHub Action documentation, trying to do some data transformation) could both still have the reference ontologia AND specify customizations.

Some objetives on this issue

The hxltm-action should explain both how to override all the ontologia and partial override.

Since this may be a so common scenario, some custom paths if they exist (since the HXLTM cli tolling already try to use files of user instead of what ship with the program) but instead of search by user home directory for the YAML/JSON, search on folder .github/hxltm.

Anyway, both cases will require customize the upstream programs. So this issue here is to keep track of what reasoning behind this

Implicit objetive: explain also how to replace the programs

This point actually already have some references for the next release (this is explained on the input parameter bin:

### Inputs
#### `bin`
**Required** The executable to run.

**Parameter examples**:
- `hxltmcli` _(or `.github/hxltm/hxltmcli.py`)_ (*)
- `hxltmdexml` _(or `.github/hxltm/hxltmdexml.py`)_ (*)

> <sub>(*): If necessary, a local customized fork of the reference HXLTM tools
  can be stored near where the data is processed. The suggested places are
  .github/hxltm/(file).py. This can both be useful for testing proposes or
  immediate hotfixes under urgency response where you as implementer cannot
  wait.</sub>

The "implicit objetive" is give some guidance when to override the ontolgia, and when to override the program. Some great use case to override the program is both for

  • very advanced customization (where the ontologia alone is not enough), for
  • lazy way to archive a schedule runner, and keep running for years, without care about updates
  • security validation (e.g. someone using the HXLTM as data format, but needs to have code evaluated by experts and cannot trust some community effort)

Current --help output

hxltmcli --help
# hxltmcli v0.8.8
# (...)
  --archivum-configurationem
                        Path to custom configuration file (The cor.hxltm.yml)
  --archivum-configurationem-appendicem
                        (Not implemented yet)Path to custom configuration file (The cor.hxltm.yml)

# (...)
hxltmcli --help
# hxltmdexml v0.7.1
# (...)
  --archivum-configurationem
                        Path to custom configuration file (The cor.hxltm.yml)
# (...)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.