Prerequisites
- [X ] Put an X between the brackets on this line if you have done all of the following:
Description
This is similar to #45 but with an extreme amount of detail. I also originally posted this on VSCode's issue page, but was then told to go here since I'm guessing they use the language packs here. I've essentially copied and pasted the issue here.
As a frequent user of XML for planning layouts and storing/retrieving data, I can say that some parts of XML aren't that well covered by VSCode. Specifically, this is XML DTD (Document Type Definitions) and the highlighting (and when inspecting using TM Scopes, the internal structure) isn't as fleshed out. Within this issue, I'd like to submit my own ideas for improving this. Below is a screenshot of some XML DTD Code which I will reference frequently to make my points:
I am using the theme "One Dark Pro" but testing the default themes also gives this same issue.
The example I am using is taken from
This issue will cover the following areas which I've created:
- Keywords
- Special Characters
- Highlighting and Structure - This is more interweaved throughout, but thought I'd mention it for clarity
DISCLAIMER: For those who say use XSD, I do use that but sometimes I revert to DTD just for historical reasons or (at least for me), it's easier to define document-only related elements/attributes without having to do too much XSD Namespacing and that. Now onto the bulk!
Keywords
In XML DTD, Keywords can be split into multiple sections (at least this is what I can split them into):
- Declarations: These include
<!ELEMENT>
and <!ATTLIST>
and the name says it all
- Modifiers: Such as
EMPTY
, ANY
, #REQUIRED
, #IMPLIED
- Data Keywords: Including ones such as
#PCDATA
, CDATA
- Values: Usually something that the user would type in. According to w3schools, these can be examples such as
attribute-name`` and
attribute-value```
As per my screenshot above, it is evident that for Declarations, there is some syntax there, as <!DOCTYPE>
and <!ENTITY>
is highlighted, these both fit in the scope keyword.other.doctype.xml
and keyword.other.entity.xml
respectively. However, it seems other keywords, especially <!ELEMENT> and <!ATTLIST> don't seem to have anything. The scope only shows them as part of meta.internalsubset.xml
. Here, the obvious resolution would be to stick these with their own scope such as keyword.other.element.xml
and keyword.other.attlist.xml
and give them that purply highlighting (or blue in default dark+).
Secondly, for Modifier Keywords, I believe they should have a colour similar to that of JavaScript type objects such as Boolean
(green default), and have a scope such as keyword.other.modifier.xml
.
Again, these keywords present meaning and therefore, should be highlighted to give this meaning. Now this is where it gets more ambiguous as Data Keywords also share similar syntax, they can have #s too but not have them at the same time, in my examples, I've given #PCDATA
and CDATA
. To me, these should have the scope of charData or something of the sorts, since these keywords represent what can be 'parsed' inside an XML element, so possibly along the lines of keyword.other.char-data.xml
, and also highlight them in the colour which is used in JavaScript variable number values
Now, this part gets really complicated. Welcome to Values! This section gets crazy because there are many different scopes that could be implemented and some can only appear on certain Declarations etc., and it's just a huge XMessL (see what I did there? I'm sorry). If I were to take the <!ELEMENT>
declaration and give a quick diagnose:
- Accepts
element-name
- Accepts
category
(This is essentially Modifiers but I renamed it to cover more values)/Modifiers
- Accepts
element-content
including Data Keywords and Other Elements, along with Special Characters (more on that later)
Some of this is easy, some not. element-name
, easy, just highlight it like any HTML/XML element that a theme would do? Default is blue, but One Dark Pro recognises it as a variable so it's red, doesn't matter to me that much though. Modifiers, explained pretty much. element-content
is slightly more challenging, since it uses ()
and within the brackets, Data Keywords can be put inside, or special characters and other elements, separated by commas. In my screenshot, a good example is Line 10: <!ELEMENT PRODUCT (SPECIFICATIONS+,OPTIONS?,PRICE+,NOTES?)>
and Line 23: <!ELEMENT OPTIONS (#PCDATA)>
. My solution to highlight those elements is simply the element colour. I don't think the brackets need to be highlighted though, since I think there is enough mishmash XML colour here already right?
The next step is to diagnose the <!ATTLIST> declaration, as there is whole lot more that can be put into here:
- They take
element-name
and modifiers as stated above, but also:
attribute-name
attribute-type
attribute-value
- Can repeat the attribute stuff again so you can have multiple
Good examples from the screenshot are at Lines 19 - 21:
<!ATTLIST SPECIFICATIONS
WEIGHT CDATA #IMPLIED
POWER CDATA #IMPLIED>
and Lines 24 - 26:
<!ATTLIST OPTIONS FINISH (Metal|Polished|Matte) "Matte"
ADAPTER (Included|Optional|NotApplicable) "Included"
CASE (HardShell|Soft|NotApplicable) "HardShell">
Just to explain briefly, this issue is already goddamn long enough.... The first example best represents the basic syntax. It has element SPECIFICATIONS
, attribute WEIGHT
, type CDATA
(characters basically) and a modifier of #IMPLIED
. The Second one is monstrous at first sight, but really, the only differences are that it contains a Default Value (in the double quotes) and Options for the attribute-value, separated by the special character |
. To resolve this structure, obviously provide scopes such as keyword.other.attlist.attribute-name.xml
, and replace attribute-name
with the relevant one such as type and value. For highlighting the ones in brackets in example 2, the words would probably have to be highlighted in the colour used for attribute values in HTML/XML, same as the default-value since, well they are and can be values after all. Possibly add a .attribute-value-range
scope?
That's basically it for this section, next up is Special Characters! Luckily, this is a much shorter section, less to develop so don't worry, you're almost done (if you haven't stopped reading already)
Special Characters
So far, we've already touched up on some special characters, these include the + ? | *
operators, their meanings aren't relevant right now, but they usually appear inside parentheses, and denote when and how an element to appear (basically frequency). These should have a scope such as operators.xml
and be highlighted in the colour of normal operators in say JavaScript (yes I've been referencing JS the most but it's one language I'm very close to)
One thing that I anticipate could be thrown up, is controlling it appearing in actual XML documents or in DTD. In actual XML, any text between elements is just text, so it shouldn't be highlighted. Plus, It's scope and hierarchal based, so it wouldn't be incorrect anyways. There are examples of this everywhere.
Finally (yes, I know!), i know I've probably made some errors, I've grouped things unnecessarily and jumped a few barriers. Below I'll provide a few links for reading into XML DTD more so you can understand where I;m coming from, and if any questions pop up, feel free to ask: