A possible enhancement to the new opcodes introduced in issue <a class="issue-link js-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Combine emphasis opcodes about liblouis HOT 17 CLOSED

liblouis commented on May 18, 2024

Combine emphasis opcodes

from liblouis.

Comments (17)

egli commented on May 18, 2024

Sounds like a brilliant idea, at least from the perspective of table and documentation maintenance. How would it be code wise? How do you define different behaviour for firstletteremph ital and firstletteremph under? Or is there none?

from liblouis.

bertfrees commented on May 18, 2024

Cool. Well yeah, there should be different behavior of course otherwise it's not very useful. I haven't looked into the code yet so I don't know how easy this change would be. Maybe @MikeGray-APH can give us a clue?

from liblouis.

egli commented on May 18, 2024

Yes OK, different behaviour. But then how do you define the behaviour of the following:

emphclass mySpecialCaps
lenemphphrase mySpecialCaps 3

from liblouis.

bertfrees commented on May 18, 2024

Oh I see what you mean, I think. The behavior would still be defined with the opcodes. In that sense nothing changes with the way things are now. The only difference is the way we define it in the tables.

Does that answer your question?

from liblouis.

MikeGray-APH commented on May 18, 2024

In my code, capitols are treat the same as the other emphases except it will process word resets.

from liblouis.

bertfrees commented on May 18, 2024

@MikeGray-APH Yes I know. That's why it's probably best to have a predefined class "caps" with slightly different behavior.

from liblouis.

dkager commented on May 18, 2024

@bertfrees wrote:

The order of class definitions determines how typeform bits are mapped to the classes.

I think this should be enclosed in the class definitions themselves, e.g. emphclass italic 1 for bit 1. This to avoid the ordering problem such as we're currently dealing with using the class opcode.

@MikeGray-APH wrote:

In my code, capitols are treat the same as the other emphases except it will process word resets.

Ideally there should be an opcode for this, maybe emphmodechars that works similar to numericmodechars. This is probably the only opcode missing to make the Dutch tables work.

from liblouis.

bertfrees commented on May 18, 2024

The order of class definitions determines how typeform bits are mapped to the classes.

I think this should be enclosed in the class definitions themselves, e.g. emphclass italic 1 for bit 1. This to avoid the ordering problem such as we're currently dealing with using the class opcode.

OK I see where you're coming from, but the two cases are fundamentally different.

The problem with using the "class" opcode and $w, $x, etc. in multipass rules is that such tables can't just be included in any table because the number of "class" rules that are defined before the include must be known.

Tables with "emphclass" definitions, the way I propose it, can be included in other tables without a problem.

A possible issue with my proposal, you might say, is that you can not guarantee that the included tables, and therefore the behavior of your table, will not change. But you could make the same argument for any rule, including your version of the "emphclass" rule. The only possible solution to this problem is to say: the behavior of a table is 100% the responsibility of the table author, and this includes any tables that he wishes to include. Carefully testing you table is what you need.

One thing that your proposal has what mine doesn't is that you could override the order of classes, but I don't immediately see any need for that.

from liblouis.

dkager commented on May 18, 2024

OK, but either solution will completely change typeform handling for external applications. These applications will at least want to know which classes a table has defined and how to use them. Using them can either be done numericcally, as with the current typeform implementation, or using a class name. The latter causes a lot of overhead for longer strings. And using numbers requires there to be a lookup function. Whichever approach you choose, this is going to break every application using this feature of liblouis. Therefore I'm thinking it might be useful to predefine {italic, bold, under} so they are guaranteed to keep their current bits.

Another question: how does the computer_braille typeform fit in with this? I believe there are opcodes to signal where computer braille begins and ends. Should this be considered emphasis? If so the term "emphasis" is a bit of a misnomer. Or to generalize the question: are all emphasis options typeforms and vice versa?

from liblouis.

bertfrees commented on May 18, 2024

The usage will stay the same, i.e. numeric. The difference is that now the emphasis classes are defined and documented per table. A look up function is not strictly necessary but could be useful, yes.

You have a good point regarding the possibility of breaking how applications currently use liblouis. Yes, I do want typeform handling by external application to change in the long run. And even though things won't break immediately (because we'll make sure the behavior doesn't change initially by running the existing tables through our own conversion tool, and we'll give applications some time to adapt to the new approach), still of course there is the risk that applications are lazy and will break eventually.

We could anticipate on that by reserving some bits to bold, italic and underlined (or by having a look up function) and by requiring tables to support at least those 3 classes. I have an idea for handling this in a way that doesn't force table authors to think in the "old" pattern.

But first let me explain the new approach and why we need it (in case not everybody is convinced yet).

Let's start by saying that "italic", "bold", "underline" etc. are print artifacts, i.e. properties of a font. During transcription these are mapped to braille artifacts (indicators). How that mapping is done depends on language and possibly context (e.g. depending on what types of emphasis appear in a text). Sometimes the braille artifacts have the same name as a print artifact, sometimes not.

Up till now liblouis has handled the problem by providing, through a liblouis table, a mapping between the 3 most common emphasis types and a set of indicators. This simple model is limiting in several ways:

limited number of different braille indicators (max. 3)
limited number of emphasis types (3)
fixed mapping

For applications that don't do any special handling per language ("braille code agnostic"), this is an acceptable generic solution, provided that the liblouis tables implement the mapping as good as possible. For emphasis beyond bold, italic and underlined, the best an application can do is to either map it to the type it is most similar to, or ignore it.

It is clear that this is not an optimal solution for all braille codes and all input. But trying to handle everything is not in the scope of liblouis either:

In order to handle all possible emphasis types liblouis would need a notion of CSS for example.
Solving the problem of context dependent mapping (e.g. UEB) is not possible since the context that liblouis gets is typically only a single paragraph.

This means that applications that use liblouis have the responsibility of doing language specific handling anyway, and therefore it's acceptable that the liblouis interface differs between tables.

What I like so much about this idea is that it doesn't force the table author into a certain pattern. He can freely choose the interface and how much of the mapping he implements in the table. The interface can be a list of distinct indicators (i.e. braille artifacts, e.g. "ind1", "ind2", etc.), or it can be a list of print artifacts, some of which may map to the same indicators. Or it can be a mixture.

To better support multiple emphasis types mapping to the same indicators, without having to duplicate a lot, I had this idea of emphasis "aliases". It could look something like this:

emphclass bold ind1
emphclass ital ind2
emphclass under ind2
firstletteremph ind1 46
...

The exact syntax is not so important. What matters is that tables can easily provide a mapping for bold, italic and underlined, ensuring backwards compatibility, while not being stuck with the old approach.

from liblouis.

dkager commented on May 18, 2024

A look up function is not strictly necessary but could be useful, yes.

The alternative, unless I misunderstand the concept, is that application developers look at the classes a table defines and then hard-code them. This will break if a table is later updated with different numbers assigned to these classes. While table authors should avoid such backwards-incompatible changes, I think we should anticipate this by providing a lookup function. This is also required for applications that allow the user to load arbitrary "custom tables".

Even if the table behavior isn't changed there's already one incompatibility: the change from char to unsigned short which requires applications to be updated. This is of course a minor problem, but it does mean you can't just drop in a >2.6.3 DLL into an existing application.

Question: what are ind1, ind2, etc? E.g. why not write firstletteremph ital 46? Is the idea to make ind1 an alias of ital?

Another idea I had for preventing duplication was something like this:
emphdots ind1 46
firstletteremph ital ind1
lastletteremph ital ind1

I.e. the ability to define virtual dot patterns. The same could probably be achieved by assigning a virtual dot, say a, and then replacing it with the desired dots using multi-pass rules. But this has some disadvantages:

This limits you to the number of virtual dots, which I believe is 8?
Multi-pass rules aren't as transparant.

from liblouis.

bertfrees commented on May 18, 2024

Yes, that was the idea. Applications would look at the "table API". For every change to a table a note is made in the changelog, so applications that do language specific handling (i.e. use more than ital, bold, under) can adapt themselves with each update. Of course table authors should try to make as little backward incompatible changes as possible, just like with any other software component. A look up function can make this more robust indeed, although it's only really helpful when the order of classes changes (and why would you need to do that?). For applications that allow the user to load custom tables it may be best to rely on ital, bold and under only. If they want more, the custom tables should probably follow a well-defined standard anyway, which could possibly include a fixed order of classes. But again, a look up function could be convenient here. So it's an idea worth considering.

Because of the change from char to unsigned short we'll change the version to 3.0.

ind1 etc. were just examples of how indicators could be named in a braille code. E.g. UEB has "first transcriber-defined typeform", "second transcriber-defined typeform", etc. In this particular example ital is an alias of ind1. Why not write firstletteremph ital 46? This is the whole essence of what I've been trying to explain. A table author can still write that if he wishes, but he can also use words that more closely match the braille code and use aliases to do the mapping from print artifacts to braille artifacts.

from liblouis.

bertfrees commented on May 18, 2024

Your idea about "dot pattern aliases" could remove some extra duplication, yes. I need to think about it. I guess I would make it something more general than emphdots. But we have to be careful about inventing things nobody will use. This is indeed something that could be solved already, quite elegantly actually, with multipass opcodes.

There are 6 virtual dots by the way (9, a, b, c, d and e), so the number of virtual dot patterns is (2^6 - 1) * 2^8 = 16128.

If you want to work out this idea some more, please make a new issue for it (as it's not directly related with the opcode unification).

from liblouis.

dkager commented on May 18, 2024

For applications that allow the user to load custom tables it may be best to rely on ital, bold and under only.

I agree, but how are applications supposed to know which bit corresponds with which typeform? A custom table could define ital=1 and bold=2, but it could just as easily define ital=32 and bold=1. The only way around this is to hard-code these three classes with their current values. But this kind of voids the problem dynamic classes are trying to fix.

from liblouis.

bertfrees commented on May 18, 2024

Yep, either reserve some bits, or have a look up function. Or what I said earlier about custom tables following a well-defined standard. The contract could simply say that it is illegal to define ital as 32 and bold as 1.

The first option, reserving some bits, doesn't necessarily conflict with dynamic classes IMO. We need dynamic classes, not dynamic bits. Besides we'll need to reserve the bit for computer_braille anyway. What matters is that there is a whole range of bits available (maybe starting at bit 5) that can be filled in freely.

from liblouis.

egli commented on May 18, 2024

I think we can safely close this issue as this has been implemented

from liblouis.

bertfrees commented on May 18, 2024

The lookup function has been added in 511d91e.

from liblouis.

Combine emphasis opcodes about liblouis HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent