I have been working on the rules for emphasis today and I finally understand why it is such an issue. I have been breaking my head (a little) over why the multiline tag (?m:subexp)
in regexp doesn't work. I haven't found definite proof, but because TextMate parsed things line by line, Atom apparently does as well. As I see it, this puts us in an either/or situation.
Option 1: multiline
Multiple lines are kind of possible when using nested patterns and thus the following type of pattern matching:
begin: $pattern
end: $pattern
By far the biggest issue with this option is that the end
pattern is optional. That means that when the begin
part is matched, the scope will be open until and only when it is closed. That's the cause of that single * or _ somewhere that sets all your text in italics. I've personally had completely realistic uncomplicated situations where this caused Atom to be unusable for that particular document.
Option 2: no multiline
Because I hoped I could rely on (?m:subexp)
I have already started to work into that direction. But because that flag doesn't work, this turns my rules in to single lines. A big bonus is that that automatically fixes a lot of edge cases with weirdly nested types of emphasis, but it also doesn't allow for displaying multiline inlines in Atom.
To start of honestly, I am already biased against multiline inlines.
My point of view is that emphasis is something that applies to most likely a few words, and that I see nested emphasis as a realistic possibility, but one that should be avoided and to 'just' write a better text. When you want to emphasize something that spans multiple paragraphs (ie, a single paragraph is technically a single line) then I think those paragraphs shouldn't be split up in the first place.
In an attempt to be unbiased, I will try to sum up some practical stuff.
- A syntax highlighter's purpose is to give semantic meaning to a text by augmenting it with scopes.
- The definitive styling of a document is defined by a stylesheet, and perhaps prematurely interpreted by a preview-package.
- The syntax highlighter itself does not even add styling, that's what a syntax theme does. But that also means that we can't rely on that syntax theme.
- By definition, I think a syntax highlighter should be more reluctant, rather than eager to do something. It shouldn't try to decide things for a user.
I think that from these could be concluded that option 1 could be considered invasive to a user, and that therefor option 2 would be slightly better as a default setting, because...
Option 3: let the user decide
Since we are already creating dynamic grammar; why not allow the user to select whether he prefers multiline emphasis or not?
On a side note, I think this issue spans across more than merely emphasis. Any inline that spans some indefinite type/amount of content (emphasis, inline-code, strike-through, CriticMark, etc.) has to deal with this. In general, I would say that the default is single-line. For CriticMark though, I think that since its tags are anything but randomly possible/appearing in documents, it would always be multiline.