Identifier names,about modlanguage/grammar

Comments (29)

alexdalitz commented on July 20, 2024

I think this make a lot of sense!

However, we'd need to make sure we could still support things like *class - which means we'd need to allow a * at the beginning of the identifier (and the interpreter will need to check that only allowed directives - e.g. *class, *c, *import, etc. are allowed)

from grammar.

twalmsley commented on July 20, 2024

It should be possible to define the grammar so that *class, *c, and *import (and any other specific keywords) are allowed and others rejected, which will reduce the burden on the interpreter.

If we prevent identifiers with special characters now, we can add them in later with specific, controlled, meaning for MODL. E.g. maybe at some point in the future you need to have keywords such as $abc or abc* for specific purposes. Obviously we don't know what these could be, so preventing wacky identifiers now makes room for expansion later.

from grammar.

alexdalitz commented on July 20, 2024

Yes, I'm all in favour of this - thanks!

from grammar.

twalmsley commented on July 20, 2024

Making a start on this now.

from grammar.

twalmsley commented on July 20, 2024

@elliottinvent do you agree with this change or have any additional requirements? The MODL spec says of keys: All unreserved characters are valid in a key., so if you want this to remain as-is then I'll close the issue.

from grammar.

alexdalitz commented on July 20, 2024

I think we should make this change, and change the spec appropriately.

from grammar.

elliottinvent commented on July 20, 2024

Yes, we should make this change thanks Tony and dealing with it in the grammar is a better approach.

from grammar.

twalmsley commented on July 20, 2024

Looks like the grammar approach is going to need a lot of changes to get it to work - its fine for most cases, but the conditionals would need to be reworked fairly extensively, and there's no guarantee it could be made to work, which is disappointing.

There is also an impact on the interpreters, which would be made worse if the conditionals were changed as well. On balance, and after this morning's discussion with @alexdalitz , the path of least resistance is to handle it in the interpreters and to change the MODL spec to say that keys should be one of the keywords already defined (*class etc.) or match _?[0-9a-zA-Z][0-9a-zA-Z_]*, i.e. optional '_' to start, then at least one alphanumeric, followed by any number of alphanumerics or underscores. This allows identifiers of the form:

_a_b___
_abc___def

etc. but not the wacky ones in the original comment above, and not multiple underscores at the beginning. Note: It also allows identifiers that are entirely digits, so the interpreters will need to catch those as well and reject them.

Once @alexdalitz has finished the latest set of java interpreter changes I'll add some extra tests for the valid and invalid identifiers so that all interpreters will need to satisfy, unless you prefer to do it Alex?

from grammar.

alexdalitz commented on July 20, 2024

Yes, I think the simplest thing to do is to add base_tests and error_tests to check that the directives are handled correctly (and other * keys raise errors).
I will attempt to remember to do this in the current batch of work.

from grammar.

alexdalitz commented on July 20, 2024

So we need the first character to be a letter, not a number, and we need to allow it to start with a *

from grammar.

alexdalitz commented on July 20, 2024

Maybe something like : (_|\*)?[a-zA-Z][0-9a-zA-Z_]*

from grammar.

twalmsley commented on July 20, 2024

I was expecting to leave out the * so that it wasn't allowed in an identifier unless it was one of the existing predefined keywords *class,*c, etc.
Also, the current spec says they may contain numbers, may start with a number but must not be made up of solely numbers, unless of course you decide to change that too.

from grammar.

alexdalitz commented on July 20, 2024

yes, let's change the spec so they must start with a letter. Hard to make the grammar say that they can't be a number otherwise?
I think we must leave * in there, or else *class won't be seen as a valid key?

from grammar.

twalmsley commented on July 20, 2024

I was assuming the approach would be to leave the grammar as it is (because its a complex change) and update the interpreter to check the keys along the lines of:

if(key.isOneOfTheExpectedKeywords()) {
    // ok
} else if(key.matches(theRegex)) {
    // ok as well
} else {
    // Not a valid key
}

If we just use a regex we'd still have to check the valid keywords that use * unless the regex also specified the keywords, which would make it quite long.
Actually, whatever works is find by me 👍 :-)

from grammar.

alexdalitz commented on July 20, 2024

Ah I understand your point of view now - sorry for being a little slow...

I had been hoping that we could specify a regex which handled most of the checking - e.g. starts with _ or *, isn't purely a number, has only letters and numbers, etc., in the grammar, and handle the *class type checks in the interpreter.

I understand now that your suggestion is to handle it solely in the interpreter, and make no identifier regexes in the grammar.

Of course, either could work - but the grammar will be re-used across languages, so my preference would also be to ask it to do as much of the work as possible.

I'm therefore interested in why you say it is a complex change to do what we can in the grammar?

Thanks!

from grammar.

twalmsley commented on July 20, 2024

This was my hope as well! Currently keys are matched as STRING or QUOTED, and its straightforward to make keys match an identifier made up of KEYWORD or KEY or QUOTED instead, with KEYWORD being a list of *class etc, and KEY being the regex.

The problem comes when changing modl_condition to match an identifier? instead of STRING? because then the parser fails to match modl_condition_groups properly, so that's the point where I decided the changes would be too great and stopped, which means relying entirely on the interpreter. I can take it further if you don't mind the extra impact on the interpreters due to changes in conditionals?

from grammar.

alexdalitz commented on July 20, 2024

I'm not sure we'd need to change modl_condition? Would we not use leave it expecting STRING?

I was thinking of this purely as a pair key change. Instead of STRING we would have :

  // A pair can be a traditional name-value pair split by an equals sign (standard pair),
  // e.g. name=John
  //
  // For efficiency, it's also possible to assign a map to a pair without an equals sign,
  // since the left bracket separates the key from the value  – this is called a map pair.
  // e.g. person(name=John) – this is equivalent to person=(name=John)
  //a
  // It's also possible to do the same with an array pair
  // e.g. numbers[1;2;3] – equivalent to numbers=[1;2;3]

  : ( IDENTIFIER | QUOTED_IDENTIFIER) NEWLINE* EQUALS NEWLINE* modl_value_item                // key = value        (standard pair)
  | IDENTIFIER modl_map                                                            // key( key = value ) (map pair)
  | IDENTIFIER modl_array                                                          // key[ item; item ]  (array pair)
  ;

If we can't define the pair name as an invalid string, then I think we're good? Then we can just leave other instances where the pair name might be used (such as modl_conditional) just using STRINGs

Or am I missing something again?

from grammar.

twalmsley commented on July 20, 2024

I did try this approach, but it breaks down in conditionals, in this example:

_co=ca
_l=fr
{
{ co = ca & l = fr } | co = fr?
    support_number=14161234567
  /?
    support_number=441270123456
}

it fails on support_number, even though support_number works fine outside of conditionals and is a perfectly valid pair. So then I started looking at changing conditionals as well and the rabbit hole was getting deeper.

from grammar.

alexdalitz commented on July 20, 2024

OK, I'm sure you know what you're doing! ;-)

It's perfectly easy to enforce these checks in interpreters with test cases, and it's perfectly easy for interpreters to handle them.

I guess @elliottinvent needs to update the spec, and I'll add some error_test and base_test cases in

from grammar.

twalmsley commented on July 20, 2024

I do wonder whether the fix was just round the next bend in the rabbit hole, but having spent two hours rather than the one hour I said I'd give it I though it best to pull the plug (mixing my metaphors here!).

from grammar.

elliottinvent commented on July 20, 2024

Has this been brought about by a recent grammar change because I thought (might be mistaken) that the original grammar handled all allowed keys (according to the spec)?

Original grammar is still running on the playground:
https://api.modl.uk

from grammar.

twalmsley commented on July 20, 2024

I raised it for discussion because the current grammar is a little 'loose' around identifiers and it allows the strange identifier names listed in the original comment above. The suggestion was to tighten it up in a way similar to other languages so that identifiers are more restricted for now, that way it can be relaxed in future for specific cases.

For example, users might get into the habit of using +abc-style identifiers for their own purposes and we wouldn't know, then if you introduce your own + keywords it could break existing MODL objects, so it might be better to restrict now then relax later, that way you won't break any existing objects.

(You are right that the current interpreter in the playground prevents * keywords other than those specified.)

from grammar.

elliottinvent commented on July 20, 2024

Understood and agree that the interpreter approach is best at the moment. If, in the future, we’re able to somehow address this in the grammar without needing to change tokens then great. Until then, interpreters will need to handle it.

from grammar.

alexdalitz commented on July 20, 2024

This is now handled in the interpreter, and test have been added to error_tests.json

from grammar.

elliottinvent commented on July 20, 2024

Are we sure we want to ban keys starting with numbers? I'm on board with banning all numeric keys and keys that start with non alpanumeric chars (except those reserved, e.g. _ and *) but wonder if banning keys starting with a number is necessary?

from grammar.

twalmsley commented on July 20, 2024

Personally I don't have a problem with keys that start with a number. The specification for valid keys says that all unreserved characters are allowed but doesn't say what those unreserved characters are. I'd prefer to see the valid characters listed and some example valid and invalid keys. Also, what is the maximum length of a key? Does that length include the '_' prefix?

from grammar.

elliottinvent commented on July 20, 2024

The spec shows what the reserved characters are: https://www.modl.uk/specification#reserved-characters but I agree this should be clearer. I'm currently updating spec so that it says:

Pair keys must be a string: they may contain numbers, may start with a number but must not be made up of solely numbers. Keys must start with a letter or a number except in the case of instructions [link to that part of the spec].

Unless @alexdalitz has any objections, I think we should update the interpreters to allow keys starting with a number. Looking at the thread above, I think the reason for the decision to do this was related to the grammar but as we're doing this at the interpreter level I think we've got more flexibility.

from grammar.

twalmsley commented on July 20, 2024

Added to my to-do list.

from grammar.

twalmsley commented on July 20, 2024

I think this one is ready to close - I've added tests to allow identifiers that start with numbers, including ones with references, and error tests to make sure they're not entirely numeric and don't dereference to entirely numeric values. E.g.

_a=456; 123%a=x

is not allowed since the key resolves to 123456, but

_a=abc; 123%a=x

is fine since the key resolves to 123abc.

from grammar.

Identifier names about grammar HOT 29 CLOSED

Comments (29)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent