TL,DR: When building in Docker, at least from the ubuntu
image, one has to set a different locale in order to get the compilation to not crash. I just tested with en_US.UTF-8
, which makes compilation succeed. However, if this means that everything is as it should be, I do not know.
(Can a solution be to just document this behaviour in the "getting started" pages, or is there some underlying cause of this error that can be fixed, so that compiling the model does not require setting these environment variables?)
The required lines in the Dockerfile
:
FROM ubuntu:20.04
RUN apt-get update
# required to install the locales
RUN apt-get install locales
RUN locale-gen en_US.UTF-8
RUN update-locale LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8
# some part of the build process seems to be happy when LANG and LC_ALL is set as such:
ENV DEBIAN_FRONTEND="noninteractive" TZ="Europe/Oslo" LANG="en_US" LC_ALL="en_US.UTF-8"
# .... the rest of the Dockerfile
If you however have a Dockerfile without the locale stuff, but only consisting of instructions very similar to those found in https://giellalt.github.io/infra/GettingStarted.html, like this one...:
$ cat Dockerfile.nob.failing
FROM ubuntu:20.04
RUN apt-get update
ENV DEBIAN_FRONTEND="noninteractive" TZ="Europe/Oslo"
RUN apt-get -y install bc curl git autoconf automake cmake libtool wget antiword wv python3-pip python3-bs4 python3-lxml python3-html5lib python3-feedparser python3-yaml python3-tidylib
RUN curl https://apertium.projectjj.com/apt/install-nightly.sh | bash
RUN apt-get -yf install apertium-all-dev cg3 hfst
# giella-core
WORKDIR /giellalt
RUN git clone --depth 1 https://github.com/giellalt/giella-core
WORKDIR /giellalt/giella-core
RUN ./autogen.sh
RUN ./configure
RUN make -j
# shared-mul
WORKDIR /giellalt
RUN git clone --depth 1 https://github.com/giellalt/shared-mul
RUN git clone --depth 1 https://github.com/giellalt/lang-nob
WORKDIR /giellalt/lang-nob
RUN ./autogen.sh
RUN ./configure --enable-fst-hyphenator --enable-spellers --enable-tokenisers --enable-phonetic --enable-tts
RUN make -j
...Then, trying to build the image using this command...:
$ docker build . -f Dockerfile.nob -t lang-nob
...the process fails, with the following error lines printed before the process terminates:
HRGX2FST spellrestrict-nfd2nfc.hfst
/usr/bin/hfst-regexp2fst: spellrestrict-nfd2nfc.regex: XRE parsing failed in expression #1 separated by semicolons
make[2]: *** [Makefile:885: spellrestrict-nfd2nfc.hfst] Error 1
make[2]: *** Deleting file 'spellrestrict-nfd2nfc.hfst'
make[2]: *** Waiting for unfinished jobs....
/usr/bin/hfst-regexp2fst: spellrelax-nfc2nfd.regex: XRE parsing failed in expression #1 separated by semicolons
make[2]: *** [Makefile:885: spellrelax-nfc2nfd.hfst] Error 1
make[2]: *** Deleting file 'spellrelax-nfc2nfd.hfst'
rm spellrelax-with-flagtags.hfst spellrelax-mobile-keyboard.hfst spellrelax-with-tags.hfst inituppercase.hfst spellrelax-tags.hfst allcaps.hfst downcase-derived_proper-strings.hfst spellrelax.hfst
make[2]: Leaving directory '/giellalt/lang-nob/src/orthography'
make[1]: *** [Makefile:1212: all-recursive] Error 1
make[1]: Leaving directory '/giellalt/lang-nob/src'
make: *** [Makefile:540: all-recursive] Error 1
The command '/bin/sh -c make -j' returned a non-zero code: 2