Comments (36)
from liblouis.
This link takes you to the conversation on this issue on liblouis-liblouisxml free list
https://www.freelists.org/post/liblouis-liblouisxml/Emphasis-phrase-question-and-forking-till-fixed,7
from liblouis.
See also https://www.freelists.org/post/liblouis-liblouisxml/Emphasis-phrase
from liblouis.
from liblouis.
I'm going to need more tests to see if there is more to it, but it seems there is already one thing we can do: we can change how the counting of words to determine the length of phrases happens. Currently Liblouis only counts whole words (thereby treating unemphasisable characters at the beginning and end as spaces). But since the goal of marking phrases is to reduce the number of indicators, we could perfectly well justify counting half words too, as this won't increase the number of indicators. (Note that this true only on the condition that endemphphrase after
is used, like is the case in UEB. When endemphphrase before
, or begemphword
, is used and the emphasized part of the last word does not span the whole word, one additional indicator is needed to cancel emphasis.)
from liblouis.
Regarding the all caps issue: there is some code in Liblouis that was added exactly to achieve the opposite of what you say should happen, namely that if the last word of the phrase ends with non-letters (punctuation), the endemphphrase after
indicator is inserted after them. The code was included specifically when the noempclass
feature was added, in order to preserve the old behavior.
When I disable this code, your "'ABC ABC DEF' defg" example is translated the way you say it should, but several UEB tests start failing because they claim the opposite should happen. @krperry You may want to check them, they are in en-ueb-08-capitalization.yaml.
Apart from the UEB tests, a number of other tests are affected. Some, like Norwegian, are actually improvements, but other, such as Swedish, are regressions. In other words, different braille codes do it differently.
from liblouis.
from liblouis.
from liblouis.
from liblouis.
I think the three tests pretty much show everything
They may explain the requirement, but for really good coverage we need more tests IMO:
- tests with other kinds of punctuation
- tests with punctuation that is not enclosing (only before, only after, or different punctuation before and after)
- tests with enclosing punctuation combined with other punctuation
- tests with punctuation at the beginning and/or end included in the emphasis
- etc. etc.
I am going by what transcribers tell me at APH
The existing tests are coming from the UEB rule book AFAIK. So they are probably correct. But nevertheless I think it's worth checking them out. These are the tests:
- CAUTION: WET PAINT!
- IT'S A HOAX! (APRIL FOOL!)
- V-NECK SWEATERS FOR SALE!
It would be good if you could add some clarifying comments to the tests, and also to your new tests, to explain the different expectations in different cases. The more clarifying comments are included in YAML files, the better.
from liblouis.
Regarding the placement of the caps terminator, there are two cases:
-
When there is punctuation which does not "balance", then the caps terminator goes at the end. This is the case with the examples from RUEB 8. The examples are correct. e.g.
CAUTION: WET PAINT!
⠠⠠⠠⠉⠁⠥⠰⠝⠒⠀⠺⠑⠞⠀⠏⠁⠔⠞⠖⠠⠄
Here, the caps terminator comes after the exclamation. -
But, when there are balancing punctuation marks (like the brackets or quotes), the capitals terminator should go before the matching punctuation, the principle of "nesting". So @krperry's examples, e.g.:
"ABC ABC DEFG" defg
⠦⠠⠠⠠⠁⠃⠉⠀⠁⠃⠉⠀⠙⠑⠋⠛⠠⠄⠴⠀⠙⠑⠋⠛
Note I have deliberately changed to double quotes to make the punctuation and caps terminator more obvious.
What I have said for capitals is equally true for typeforms (like bold):
-
Caution: wet paint!
⠘⠶⠠⠉⠁⠥⠰⠝⠒⠀⠺⠑⠞⠀⠏⠁⠔⠞⠖⠘⠄ -
"abc abc defg" defg
⠦⠘⠶⠁⠃⠉⠀⠁⠃⠉⠀⠙⠑⠋⠛⠘⠄⠴⠀⠙⠑⠋⠛
See Rules of Unified English Braille section 9.7 re typeforms and punctuation.
It would be difficult to get all the examples passing - some require "an understanding of the text".
I agree with @bertfrees more examples are needed. I will attempt to write some and add them to this ticket.
from liblouis.
from liblouis.
Thank you @jrbowden, that makes things more clear for me. And I agree with you that it might be difficult to get all the tests to pass.
The "IT'S A HOAX! (APRIL FOOL!)" example comes from RUEB section 8.6.2, which seems to be the relevant section for this issue:
The capitals terminator may precede or follow punctuation and other terminators but it is best that indicators and paired characters such as parentheses, square brackets and quotes be nested. That is, close punctuation and indicators in reverse order of opening.
Strange that this hasn't been referenced by anyone before.
I notice that it says "it is best that", not "it is required that". So that means the behavior of Liblouis should ideally be improved, but is acceptable as it is.
So I guess the "serious bug" that Ken referred to on the mailing list is the emphasis part. Regarding that part: my commit 30df4d9 fixes Ken's first and second test. I'm not sure that means it resolves the issue. It's only two tests.
from liblouis.
from liblouis.
@krperry Great, keep me posted.
from liblouis.
from liblouis.
The only commit you need is 30df4d9. It's on the phrase-emphasis branch, but don't use the last commit of that branch.
from liblouis.
from liblouis.
from liblouis.
from liblouis.
git fetch https://github.com/liblouis/liblouis.git && git checkout 30df4d99f6
from liblouis.
This seems to work for the double quotes and the single quoest with your fix. It has mostly fixed the other punctuation like (), {}, [], and <> the only problem is the enclosing ending punctuation is part of the emphasis even though it is not marked. It is hard to post it here but if you Put a paran at the start of a phrase and a paran at the end of a phrase bold the phrase but do not bold the parans. The end paran should be outside the phrase end mark. I am pretty sure that was in the tests I sent in. I will download them and check. This is the same for all enclosing punctuation. This is much better than it was now if we can only get that last bit fixed.
from liblouis.
Do I need to put more tests in to get the enclosing punctuation problems fixed? I know one of the tests I put in this original ticket showed the problem but maybe more is needed? The part I am talking about is when you have text enclosed with parenthesis, brakets, braces, angle brackets, pretty much any punctuation that a person might use for enclosures other than quotes. Then if you bold the inside but not the punctuation. Liblouis still gets that wrong. If we can get that fixed then this ticket can be closed. AS it is we put the current liblouis in the current stable brailleblaster 2.1 and so far it is good other than this problem.
from liblouis.
Hi Ken. I'm a bit confused. As far as I can tell the issue you describe is fixed. The test that I am running is the following:
table: |
include tables/unicode.dis
include tables/spaces.uti
include tables/en-chardefs.cti
include tables/en-ueb-g1.ctb
tests:
- - "(abc abc defg) defg"
- ⠷⠘⠶⠁⠃⠉⠀⠁⠃⠉⠀⠙⠑⠋⠛⠘⠄⠾⠀⠙⠑⠋⠛
- typeform:
bold: ' ++++++++++++ '
- - "(abc abc abc defg) defg"
- ⠷⠘⠶⠁⠃⠉⠀⠁⠃⠉⠀⠁⠃⠉⠀⠙⠑⠋⠛⠘⠄⠾⠀⠙⠑⠋⠛
- typeform:
bold: ' ++++++++++++++++ '
It corresponds with your "example 2" in your initial comment:
- A bold phrase with non-bolded parentheses:
(abc abc defg) defg
Can you spot a mistake in the test?
from liblouis.
from liblouis.
I'm sorry, your last comment makes no sense to me...
In the test the parens are not bolded, and the opening and closing bold tags are inside the parens.
The test I'm using is simply taken from the file that you sent us earlier. And as far as I can tell it matches the requirement.
@jrbowden what is your take on this?
from liblouis.
from liblouis.
OK (relief).
I'm not entirely sure but I think it was passing before. In any case, it does with the current release.
from liblouis.
from liblouis.
from liblouis.
Hi @krperry and @bertfrees ,
As promised, I attached a set of more extensive tests. I hope this helps.
When I run it here, all but 1 test passes. The one that fails is possibly debatable, but I think it is true.
I hope this helps.
Question is where should these tests go?
from liblouis.
Somewhere in tests/braille-specs/en-ueb-rueb.yaml?
from liblouis.
@krperry Do you already have more clarity?
from liblouis.
from liblouis.
Yes, if it all works then let's close this :-)
from liblouis.
working tested in Brailleblaster.
from liblouis.
Related Issues (20)
- In Dutch, don't collapse Unicode braille HOT 2
- Issues with Kannada table kannada.cti HOT 1
- Split tables/ja-kantenji.utb into one for UCS2 and one for UCS4 HOT 4
- Issues when backtranslating Serbian braille HOT 6
- Allow noback as a keyword before include HOT 1
- [SEGV](lou_checkyaml): access `NULL` pointer `table` in `getCharForDots`
- [SEGV](lou_checkyaml): access `NULL` pointer `emph_classes` in `read_typeforms`
- [SEGV; heap-buffer-overflow](lou_checkyaml): index `pos` out of range (`input->chars[pos]`)
- [SEGV](lou_checkyaml): negative index for `passPosMapping[realInlen]` in `_lou_backTranslate` HOT 3
- [stack-buffer-overflow](lou_checkyaml): dangerous `widechar` string copy in `compileString`
- [heap-buffer-overflow](lou_checkyaml): 0-byte malloc results in out-of-bound read in `_lou_extParseChars`
- [heap-buffer-overflow](lou_checkyaml): wild pointer is used in `getCharForDots`
- [heap-buffer-overflow](lou_checkyaml): Out of bounds when accessing array `expected_inputPos`
- [heap-buffer-overflow](lou_checkyaml): Invalid out-of-bound index to access array `outbuf` in `check_base`
- [heap-buffer-overflow](lou_checkyaml): Index `kk` out of bounds when accessing array `input->chars` in `doPassSearch` HOT 4
- [heap-buffer-overflow](lou_translate): Negligence in parameter handling HOT 1
- 'utf-32-le' codec can't decode bytes in position 0-3 running on s390x arch
- Make match fully case insensitive HOT 1
- Prefix opcode for rule case sensitivity
- Document the pre- and post- conditions for all opcodes
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from liblouis.