Comments (29)
I think this make a lot of sense!
However, we'd need to make sure we could still support things like *class
- which means we'd need to allow a *
at the beginning of the identifier (and the interpreter will need to check that only allowed directives - e.g. *class
, *c
, *import
, etc. are allowed)
from grammar.
It should be possible to define the grammar so that *class
, *c
, and *import
(and any other specific keywords) are allowed and others rejected, which will reduce the burden on the interpreter.
If we prevent identifiers with special characters now, we can add them in later with specific, controlled, meaning for MODL. E.g. maybe at some point in the future you need to have keywords such as $abc
or abc*
for specific purposes. Obviously we don't know what these could be, so preventing wacky identifiers now makes room for expansion later.
from grammar.
Yes, I'm all in favour of this - thanks!
from grammar.
Making a start on this now.
from grammar.
@elliottinvent do you agree with this change or have any additional requirements? The MODL spec says of keys: All unreserved characters are valid in a key.
, so if you want this to remain as-is then I'll close the issue.
from grammar.
I think we should make this change, and change the spec appropriately.
from grammar.
Yes, we should make this change thanks Tony and dealing with it in the grammar is a better approach.
from grammar.
Looks like the grammar approach is going to need a lot of changes to get it to work - its fine for most cases, but the conditionals would need to be reworked fairly extensively, and there's no guarantee it could be made to work, which is disappointing.
There is also an impact on the interpreters, which would be made worse if the conditionals were changed as well. On balance, and after this morning's discussion with @alexdalitz , the path of least resistance is to handle it in the interpreters and to change the MODL spec to say that keys should be one of the keywords already defined (*class etc.) or match _?[0-9a-zA-Z][0-9a-zA-Z_]*
, i.e. optional '_' to start, then at least one alphanumeric, followed by any number of alphanumerics or underscores. This allows identifiers of the form:
_a_b___
_abc___def
etc. but not the wacky ones in the original comment above, and not multiple underscores at the beginning. Note: It also allows identifiers that are entirely digits, so the interpreters will need to catch those as well and reject them.
Once @alexdalitz has finished the latest set of java interpreter changes I'll add some extra tests for the valid and invalid identifiers so that all interpreters will need to satisfy, unless you prefer to do it Alex?
from grammar.
Yes, I think the simplest thing to do is to add base_tests and error_tests to check that the directives are handled correctly (and other *
keys raise errors).
I will attempt to remember to do this in the current batch of work.
from grammar.
So we need the first character to be a letter, not a number, and we need to allow it to start with a *
from grammar.
Maybe something like : (_|\*)?[a-zA-Z][0-9a-zA-Z_]*
from grammar.
I was expecting to leave out the *
so that it wasn't allowed in an identifier unless it was one of the existing predefined keywords *class
,*c
, etc.
Also, the current spec says they may contain numbers, may start with a number but must not be made up of solely numbers
, unless of course you decide to change that too.
from grammar.
yes, let's change the spec so they must start with a letter. Hard to make the grammar say that they can't be a number otherwise?
I think we must leave *
in there, or else *class
won't be seen as a valid key?
from grammar.
I was assuming the approach would be to leave the grammar as it is (because its a complex change) and update the interpreter to check the keys along the lines of:
if(key.isOneOfTheExpectedKeywords()) {
// ok
} else if(key.matches(theRegex)) {
// ok as well
} else {
// Not a valid key
}
If we just use a regex we'd still have to check the valid keywords that use *
unless the regex also specified the keywords, which would make it quite long.
Actually, whatever works is find by me π :-)
from grammar.
Ah I understand your point of view now - sorry for being a little slow...
I had been hoping that we could specify a regex which handled most of the checking - e.g. starts with _
or *
, isn't purely a number, has only letters and numbers, etc., in the grammar, and handle the *class
type checks in the interpreter.
I understand now that your suggestion is to handle it solely in the interpreter, and make no identifier regexes in the grammar.
Of course, either could work - but the grammar will be re-used across languages, so my preference would also be to ask it to do as much of the work as possible.
I'm therefore interested in why you say it is a complex change to do what we can in the grammar?
Thanks!
from grammar.
This was my hope as well! Currently keys are matched as STRING
or QUOTED
, and its straightforward to make keys match an identifier
made up of KEYWORD
or KEY
or QUOTED
instead, with KEYWORD
being a list of *class
etc, and KEY
being the regex.
The problem comes when changing modl_condition
to match an identifier?
instead of STRING?
because then the parser fails to match modl_condition_group
s properly, so that's the point where I decided the changes would be too great and stopped, which means relying entirely on the interpreter. I can take it further if you don't mind the extra impact on the interpreters due to changes in conditionals?
from grammar.
I'm not sure we'd need to change modl_condition
? Would we not use leave it expecting STRING
?
I was thinking of this purely as a pair
key change. Instead of STRING
we would have :
// A pair can be a traditional name-value pair split by an equals sign (standard pair),
// e.g. name=John
//
// For efficiency, it's also possible to assign a map to a pair without an equals sign,
// since the left bracket separates the key from the value β this is called a map pair.
// e.g. person(name=John) β this is equivalent to person=(name=John)
//a
// It's also possible to do the same with an array pair
// e.g. numbers[1;2;3] β equivalent to numbers=[1;2;3]
: ( IDENTIFIER | QUOTED_IDENTIFIER) NEWLINE* EQUALS NEWLINE* modl_value_item // key = value (standard pair)
| IDENTIFIER modl_map // key( key = value ) (map pair)
| IDENTIFIER modl_array // key[ item; item ] (array pair)
;
If we can't define the pair name as an invalid string, then I think we're good? Then we can just leave other instances where the pair name might be used (such as modl_conditional
) just using STRING
s
Or am I missing something again?
from grammar.
I did try this approach, but it breaks down in conditionals, in this example:
_co=ca
_l=fr
{
{ co = ca & l = fr } | co = fr?
support_number=14161234567
/?
support_number=441270123456
}
it fails on support_number
, even though support_number
works fine outside of conditionals and is a perfectly valid pair. So then I started looking at changing conditionals as well and the rabbit hole was getting deeper.
from grammar.
OK, I'm sure you know what you're doing! ;-)
It's perfectly easy to enforce these checks in interpreters with test cases, and it's perfectly easy for interpreters to handle them.
I guess @elliottinvent needs to update the spec, and I'll add some error_test and base_test cases in
from grammar.
I do wonder whether the fix was just round the next bend in the rabbit hole, but having spent two hours rather than the one hour I said I'd give it I though it best to pull the plug (mixing my metaphors here!).
from grammar.
Has this been brought about by a recent grammar change because I thought (might be mistaken) that the original grammar handled all allowed keys (according to the spec)?
Original grammar is still running on the playground:
https://api.modl.uk
from grammar.
I raised it for discussion because the current grammar is a little 'loose' around identifiers and it allows the strange identifier names listed in the original comment above. The suggestion was to tighten it up in a way similar to other languages so that identifiers are more restricted for now, that way it can be relaxed in future for specific cases.
For example, users might get into the habit of using +abc
-style identifiers for their own purposes and we wouldn't know, then if you introduce your own +
keywords it could break existing MODL objects, so it might be better to restrict now then relax later, that way you won't break any existing objects.
(You are right that the current interpreter in the playground prevents *
keywords other than those specified.)
from grammar.
Understood and agree that the interpreter approach is best at the moment. If, in the future, weβre able to somehow address this in the grammar without needing to change tokens then great. Until then, interpreters will need to handle it.
from grammar.
This is now handled in the interpreter, and test have been added to error_tests.json
from grammar.
Are we sure we want to ban keys starting with numbers? I'm on board with banning all numeric keys and keys that start with non alpanumeric chars (except those reserved, e.g. _
and *
) but wonder if banning keys starting with a number is necessary?
from grammar.
Personally I don't have a problem with keys that start with a number. The specification for valid keys says that all unreserved characters are allowed but doesn't say what those unreserved characters are. I'd prefer to see the valid characters listed and some example valid and invalid keys. Also, what is the maximum length of a key? Does that length include the '_' prefix?
from grammar.
The spec shows what the reserved characters are: https://www.modl.uk/specification#reserved-characters but I agree this should be clearer. I'm currently updating spec so that it says:
Pair keys must be a string: they may contain numbers, may start with a number but must not be made up of solely numbers. Keys must start with a letter or a number except in the case of instructions [link to that part of the spec].
Unless @alexdalitz has any objections, I think we should update the interpreters to allow keys starting with a number. Looking at the thread above, I think the reason for the decision to do this was related to the grammar but as we're doing this at the interpreter level I think we've got more flexibility.
from grammar.
Added to my to-do list.
from grammar.
I think this one is ready to close - I've added tests to allow identifiers that start with numbers, including ones with references, and error tests to make sure they're not entirely numeric and don't dereference to entirely numeric values. E.g.
_a=456; 123%a=x
is not allowed since the key resolves to 123456
, but
_a=abc; 123%a=x
is fine since the key resolves to 123abc
.
from grammar.
Related Issues (20)
- Possibly ambiguous expressions HOT 4
- Double negatives HOT 3
- Observation HOT 2
- Operator Precedence HOT 5
- Stray tokens files HOT 2
- Only strings in conditional tests HOT 1
- Upcase method not invoked HOT 5
- Non-UTF-8 Input Characters in tests HOT 2
- Deep object referencing in conditionals HOT 3
- Array problem HOT 6
- Standard Error Messages for Failures HOT 5
- New Built-in Method Suggestions
- Short path for loading modules HOT 1
- Graves and References Handling HOT 7
- Grammar problem - escaped grave HOT 7
- Grammar Problem - double quotes HOT 4
- File update needed for new grammar HOT 1
- Interpreter escape for % HOT 2
- String Unicode Fragment Issues
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from grammar.