scintillaorg / lexilla Goto Github PK

A library of language lexers for use with Scintilla

Home Page: https://www.scintilla.org/Lexilla.html

License: Other

Batchfile 0.12% C 1.91% C++ 92.23% Python 0.88% Shell 0.09% Makefile 3.49% PHP 0.07% Classic ASP 0.01% HTML 0.13% TeX 0.03% Lua 0.07% Module Management System 0.05% Nim 0.02% Perl 0.48% Raku 0.03% Ruby 0.16% Tcl 0.01% Visual Basic .NET 0.01% F# 0.20% D 0.02%

lexilla's Introduction

README for Lexilla library.

The Lexilla library contains a set of lexers and folders that provides support for
programming, mark-up, and data languages for the Scintilla source code editing
component.

Lexilla is made available as both a shared library and static library.
The shared library is called liblexilla.so / liblexilla.dylib / lexilla.dll on Linux / macOS /
Windows.
The static library is called liblexilla.a when built with GCC or Clang and liblexilla.lib
when built with MSVC.

Lexilla is developed on Windows, Linux, and macOS and requires a C++17 compiler.
It may work on other Unix platforms like BSD but that is not a development focus.
MSVC 2019.4, GCC 9.0, Clang 9.0, and Apple Clang 11.0 are known to work.

MSVC is only available on Windows.

GCC and Clang work on Windows and Linux.

On macOS, only Apple Clang is available.

Lexilla requires some headers from Scintilla to build and expects a directory named
"scintilla" containing a copy of Scintilla 5+ to be a peer of the Lexilla top level
directory conventionally called "lexilla".

To use GCC, run lexilla/src/makefile:
	make

To use Clang, run lexilla/test/makefile:
	make CLANG=1
On macOS, CLANG is set automatically so this can just be
	make

To use MSVC, run lexilla/test/lexilla.mak:
	nmake -f lexilla.mak

To build a debugging version of the library, add DEBUG=1 to the command:
	make DEBUG=1
	
The built libraries are copied into lexilla/bin.

Lexilla relies on a list of lexers from the lexilla/lexers directory. If any changes are
made to the set of lexers then source and build files can be regenerated with the
lexilla/scripts/LexillaGen.py script which requires Python 3 and is tested with 3.7+.
Unix:
	python3 LexillaGen.py
Windows:
	pyw LexillaGen.py

lexilla's People

Contributors

Stargazers

Watchers

Forkers

raikohoff simdsoft rolandhughes riqq lifenjoiner getzze sjohannes brucerennie feitoi mpheath ivan-u7n toxicspot elitemastereric ancebfer zaidy036 red-m arkadiuszmichalski gideros cdbdev oswald3141 techee elcuco keyarts eranif xmduke moshekaplan germanaizek karla80 guanquanchen zufuliu chunmingwang wuhudage fincs sysfce2 jpe wxwidgets pvanhoof skyn9ne jhnc-oss dail8859 tholp bigfriisk webstorage119 tsuyo justinlardinois parhelia512 b4n tatolevicz pavelblend cadcorp rainrat molodiuc gmh5225 alberic89 ekopalypse charlisher

lexilla's Issues

Unify `_WIN32` testing conditionals

Although they do work, #if _WIN32 and #if !_WIN32 would be better to be updated to #ifdef _WIN32 and #ifndef _WIN32, which are the more common forms. And there are defined(_WIN32)s, keeping the same style would be better.

C++ preprocessor evaluation does not support negative literals

The following code correctly shows x=1; as inactive but x=2; as active which is incorrect.
This is because the expression evaluator does not understand negative literals or unary minus.

#define m 0-1
#if m > 0
x=1;
#endif

#define n -1
#if n > 0
x=2;
#endif

Support style value larger than 127

See https://sourceforge.net/p/scintilla/feature-requests/1431/

lexilla-int-style.zip

Patch contains following changes:

updated LexAccessor::StyleAt() and LexAccessor::BufferStyleAt() to returns int instead of char.
changed char chMask to unsigned char chMask in StyleContext's constructor.
changed style type from char to int and added casts for TestLexers.cxx.
changed style type from char to int for LexCaml.cxx, LexFortran.cxx, LexMagik.cxx, LexPowerPro.cxx, LexProgress.cxx and LexVHDL.cxx.

F#: Highlight `printf` specifiers in type-checked interpolated strings (.NET 5 feature)

Follows #21

Lexilla (correctly) ignores printf specifiers inside malformed interpolated strings:

But that's the incidental result of a blanket restriction:

lexilla/lexers/LexFSharp.cxx

Lines 499 to 500 in 68ff521

    
           } else if (sc.ch == '%' && !(fsStr.startChar == '`' || fsStr.startChar == '$') && 
        
           	   (setFormatSpecs.Contains(sc.chNext) || setFormatFlags.Contains(sc.chNext))) {

And it overlooks valid exceptions, like escaped percent signs:

> $"%% decrease: -{0.2 * 100.}%%" ;;
val it : string = "% decrease: -20%"

Moreover, since F# 5.0, every kind of printf specifier is allowed inside interpolated strings, provided:

they occur right before the braces around the interpolated value:

> $"%.2f{5./2.}" ;;
val it : string = "2.50"

> $"%.2f {5./2.}" ;;

  $"%.2f {5./2.}" ;;
  ^^^^^^^^^^^^^^^

stdin(2,1): error FS3376: Invalid interpolated string. Interpolated strings may not use '%' format specifiers unless each is given an expression, e.g. '%d{1+1}'.

the expression evaluates to the same type as the given format:

> $"%d{5./2.}" ;;

  $"%d{5./2.}" ;;
  --------^^

stdin(3,9): error FS0001: The type 'float' is not compatible with any of the types byte,int16,int32,int64,sbyte,uint16,uint32,uint64,nativeint,unativeint, arising from the use of a printf-style format string

These patches update the lexing rules for printf specifiers to include interpolated strings, when they contian:

i) escaped percent signs, at any position
ii) valid type annotations, as illustrated in context 1

LexFSharp-PrintfSpecsInNet5Strs.zip

Tcl: Quote inside command substitution inside quoted string treated as quoted string terminator

Example:

puts "one [two "three" four] five" six

highlights three as not being part of the quoted string "one [two "three" four] five". In other words, it highlights "one [two " as a quoted string, three as a bareword, and " four] five" as another string.
Command substitution in Tcl temporarily interrupts the quote, so that you can use additional quotes inside the [ ] without needing to escape them.

This is equivalent to the following code in Shell script (which Scintilla handles correctly):

echo "one $(two "three" four) five" six

Here, Scintilla colors all of "one $(two "three" four) five" as a quoted string, ignoring the quotes within the $( ), so it is recognizing and honoring the $( ). Something similar to this should be done for Tcl.

(Tested on a fresh Scintilla/Lexilla/SciTE build)

Scintillua issue with Lexilla::MakeLexer

With Scintillua installed in SciTE (Sc1, single file), I can't disable a Lua lexer resetting the property lexer.$(file.patterns.name)=name to the original lexilla lexer. It seems to me that a recent commit in lexilla (e4817a1) breaks the Scintillua behavior and a default Lua lexer is always created and set after been called by Lexilla::MakeLexer (see orbitalquark/scintillua#54).

As far as I understand, the attempt made here to match the lexer name without the namespace is due to the fact that the GetNameSpace function is optional. I would propose to modify the test if (lexLib.fnCL) { adding a check that the namespace is empty if (lexLib.fnCL && lexLib.nameSpace.empty()) {, so the attempt is made only if the implementation does not provide the GetNameSpace function.
Is it sensible ?

Incorrect PHP word end detection

The first line of the SCE_HPHP_WORD case should be

if (!IsPhpWordChar(ch)) {

instead of the current

if (!IsAWordChar(ch)) {

as otherwise things like __FILE__.':'.__LINE__ are parsed incorrectly — the first word isn't recognized as a keyword due to the dot being the part of it. Moreover something like __FILE__.__LINE__ is considered a single word with no operators within.

Language Support: Dart

We would like Dart language to be supported. Thank you.

Bash: "different styles between \r and \n" after comments and heredoc delims

LexBash may switch styles before complete EOL termination in files with CRLF line endings.

This affects SCE_SH_COMMENTLINE :

as well as SCE_SH_HERE_Q:

All cases can be traced to how the value of StyleContext::atLineEnd ¹ is determined, with CRLFs apparently giving a result like this: <false>CR<true>LF.

Since DOS EOLs would only corrupt a genuine shell script, the problem (if any) is strictly limited to prospective lexer tests for Bash (*), currently absent from the source tree.

Simply not testing LexBash would be the quickest solution; in which case the enclosed patch set can be ignored.

LexBash-Prefer-MatchLineEnd-to-atLineEnd.zip

(*) The proposed x.sh currently generates these (non-fatal) warnings:

/home/dev/lexilla/test/examples/bash/x.sh:1: different styles between \r and \n at 20: 2, 0
/home/dev/lexilla/test/examples/bash/x.sh:3: different styles between \r and \n at 109: 2, 0
/home/dev/lexilla/test/examples/bash/x.sh:4: different styles between \r and \n at 191: 2, 0
/home/dev/lexilla/test/examples/bash/x.sh:5: different styles between \r and \n at 276: 2, 0
/home/dev/lexilla/test/examples/bash/x.sh:6: different styles between \r and \n at 360: 2, 0
/home/dev/lexilla/test/examples/bash/x.sh:7: different styles between \r and \n at 441: 2, 0
/home/dev/lexilla/test/examples/bash/x.sh:19: different styles between \r and \n at 705: 0, 13
/home/dev/lexilla/test/examples/bash/x.sh:21: different styles between \r and \n at 769: 13, 0

/home/dev/lexilla/test/examples/bash/x.sh:1: has different styles with \n versus \r\n line ends

As this would add the first ever Bash lexing test, an effort was made to demonstrate a full suite of lexical styles, along with those affected by the topic issue; hence the nondescript file name.

https://github.com/ScintillaOrg/lexilla/blob/a35a59845e793d9d37d249cf097e71fecc5f4bcd/lexlib/StyleContext.h#L35-L40 ↩

Add some util functions into lexlib

AnyOf(ch, ...), source code at https://github.com/zufuliu/notepad2/blob/master/scintilla/lexlib/CharacterSet.h#L105
It has no runtime setup cost (compiled into bit test or lookup table).
Move implementation for GetRange() and GetRangeLowered() from StyleContext into LexAccessor, source code at https://github.com/zufuliu/notepad2/blob/master/scintilla/lexlib/LexAccessor.cxx#L30
Many lexers (HTML, Ruby, lexer using lineBuffer, etc.) can change to the new methods.
MatchIgnoreCase().

LexAccessor.zip

Inno: Multiline comments and close with correct comment sequence

The Inno lexer uses default styling on multiline comments.

The C-like comment of

// ...

are single line comments, like in C.

The Pascal comments behave like the C type of /* ... */ comment

// Single line comments:

// ...

{ ... }

(* ... *)

// Multiline comments:

{ ...
... }

(* ...
... *)

which can span multiple lines.

The current Inno lexer finds the EOL on the first comment line and so it may consider { ... or (* ...
as default style if not closing } or *) is found. The second comment line is going to be default style unless it starts with a known sequence to set a non default style.

In explaining this, I just thought of a bug as the inno lexer allows these comments

(* ... }

{ ... *)

as the code does not remember the opening sequence to know which closing sequence is required. Added bool isCommentCurly, bool isCommentRound and renamed bool isCStyleComment to bool isCommentSlash. Now the closing sequence required can be known.

Needed to use the line state functions to handle the multiline comments. Removed the enum section{...} as the related error messages were long and confusing so changed to more basic int const bitCode, bitMessages, bitCommentCurly, bitCommentRound names.

Minor tidy done with parameter spacing in the source which I hope is OK.

The behaviour is better so that comments and strings are styled during typing. Eliminated some of the go back a char with the styler.

Updated the test file x.iss with the x.iss.folded and x.iss.styled files. Snapshot included as x.png which can be discarded after viewing.

This is just the area with the comments:

Attached files have CRLF line ends:

inno.zip

Log:

Fixed Inno comment ends with incorrect closing sequence.
Added Inno styling of multiline comments.

Tcl: # symbol after { is treating the # to the end of the line as a comment

Hello, I feel it's important to disclose that I don't have a good competency with TCL, I just saw a certain bug repot notepad-plus-plus/notepad-plus-plus#10666 that seemed incorrect.

Having {x} should not treat x} as comment

Sample code that can be run in tclsh to test validity of doing such a thing.

set x "#hello"
if {$x eq {#hello}} {
  puts "this is valid, and the same"
} else {
  puts "these were not the same"
}
# this is unrelated to the above if statement, and should not collapse with it
# in notepad++
puts "Done executing"

Sc1

I suspect that line 279 of lexers/LexTCL.cxx needs to also check against SCE_TCL_OPERATOR

lexilla/lexers/LexTCL.cxx

Lines 277 to 281 in 4269001

    
           if (sc.ch == '#') { 
        
           	if (visibleChars) { 
        
           		if (sc.state != SCE_TCL_IN_QUOTE && expected) 
        
           			sc.SetState(SCE_TCL_COMMENT); 
        
           	} else {

But take that with a grain of salt. I have not tested it.

Batch variable highlighting is incorrect with escaped percent signs

Current situation

{2}echo{0} {6}%%0{0}
{2}echo{0} %%%0
{2}echo{0} %%%%~-abcd

Notepad++ 7.9.5 Screenshot:

Expected situation

{2}echo{0} {6}%%0{0}
{2}echo{0} %%{6}%0{0}
{2}echo{0} %%{6}%%~-abcd{0}

Add tests for props lexer

The props lexer responsible for key=value style configuration files like the .properties and .session files associated with SciTE should be tested.

The attached files check each of the 6 styles produced by props.
props.zip

Associated with issue #62.

Python: line continuation situations sometimes fold incorrectly

In python, brackets ()[]{} and a slash \ at the end of a line allow a statement to be continued onto the next non-empty line, meaning that something like this:

if param2 <= 2:
    print((-20 %


3) / 5)
    else:
        print(1)

Is legal python where the first print() gives 0.2. Python lexilla only looks at indentation for folding though, so this folds up to the second line and the 3) / 5) is treated like the beginning of a new block.

The above is true for and also occurs in the gdscript lexer.

Found in and continued from #41

Rust: "different styles between \r and \n" at end of line comments

While trying out a patch for #33, TestLexers found this additional problem:

Lexing rust\Issue33.rs
C:\Users\Rob\source\git\scintilla-contib\lexilla\test\examples\rust\Issue33.rs:1: different styles between \r and \n at 21: 4, 0

C:\Users\Rob\source\git\scintilla-contib\lexilla\test\examples\rust\Issue33.rs:1: has different styles with \n versus \r\n line ends

SciTE (5.1.4) visually confirmed premature style switching after line comments in a file saved with Windows EOLs:

This patch attempts to correct the EOL splitting: LexRust-Fix-EOL-splitting-at-comment-ends.zip

It would be ideal to have an omnibus lexer test (e.g. "AllStyles.rs"), but I leave that for someone who knows the language better.

JS Parsing

JS coding has evolved... update needed ! (ecmascript)

An example :
class Game { static myStaticVar = null; #myPrivateVar = null; /*....*/ }

gcc/g++ compiled 32 bits dll doesn't work

The reason is that the exported stdcall function names contains @n suffixes. GetProcAddress on the origin function name does not work.
--kill-at: https://sourceware.org/binutils/docs/ld/Options.html

Patch:
Strip-stdcall-suffixes-from-exported-symbols-generated-by-MinGW-w64.zip

BTW:
Scintilla also needs it.

[Make] Comments are not always recognized

For Make lexer comments are not recognized in all contexts. Code for test:

# I'm comment (OK)

test = 5 # I'm comment (BAD)

$(info $(test)) # I'm comment (BAD)

clean: # I'm comment (BAD)
	echo # I'm not comment (OK)

Looks like the lexer is very simple (there is a more complex version in Notepad2 project).

Some results:

SciTE:

Visual Studio Code:

Notepad2:

Refs:
https://www.gnu.org/software/make/manual/make.html#Makefile-Contents
https://www.gnu.org/software/make/manual/make.html#index-_0023-_0028comments_0029_002c-in-recipes

Allow setting properties inside test example files

Requiring a new subdirectory for each property change test is unwieldy. It would be better to allow test example files to set properties that differ from SciTE.properties. These could be placed in comments at the start of the test example file similar to:

# Test with f-strings disabled: [|lexer.python.strings.f=0|]

The [|...|] syntax was chosen as these sequences are unusual so should avoid unexpected matches.

An implementation is available from 62InlineProperties.patch.

Adding a subdirectory may still be worthwhile for properties that have complex effects or interactions.

Markdown deadloop at function IsCompleteStyleRegion to proces token "`" when type character continuous

https://github.com/ScintillaOrg/lexilla/blob/master/lexers/LexMarkdown.cxx#L126

Markdown header > color entire line

Hi,

In case of a header, only the '#' character has a specific color, but not the entire header line.

Below an example:

# Header 1  

Test

I've posted this question already in 2020, but at that time there was nobody maintaining the markdown lexer, until now i guess :)
Is it possible to have a look at this?

Thank you

Do not enable CETCOMPAT for ARM64 builds

In commit: 7728600
CETCOMPAT is enabled for all build configurations. However, if this is enabled for ARM64, the following link error is emitted:
1>LINK : fatal error LNK1246: '/CETCOMPAT' not compatible with 'ARM64' target machine; link without '/CETCOMPAT'

The solution is to enable CETCOMPAT only for x64 and Win32 build configurations.

Add DIR_O to makefile

Would be nice to have DIR_O in makefile, same as we see in lexilla.mak. Without it, I can't generate files by various tools without calling clean first. And this means that when I go back to the previous tool, I have to do the whole generation again.

@nyamatongwe Any objection to also change include deps.mak to native solution by using -MMD option for $(CXX)? Not everyone has Python installed to update a permanent deps.mak file. Alternatively this behaviour can be forced with some variable passed on the command line.

Need break loop if loaded while `Lexilla::Load`ing from paths

Add break; between:

lexilla/access/LexillaAccess.cxx

Lines 194 to 195 in e9bf225

    
           	libraries.push_back(lexLib); 
        
           }

support new PHP features (flexible heredoc/nowdoc syntaxes, an underscore inside numbers)

As written at https://sourceforge.net/p/scintilla/feature-requests/1378/ "This is a reasonable addition but it will require someone to provide an implementation." The someone in question turned out to be me.

The patch for the issue is below. Its more elaborate explanation, a downloadable file, and a test php file are available in the comment to the similar Notepad++ issue.

--- LexHTML~orig.cxx	2021-07-17 18:23:40.714103918 +0600
+++ LexHTML.cxx	2021-07-17 18:23:47.766164407 +0600
@@ -531,7 +531,7 @@
 Sci_Position FindPhpStringDelimiter(std::string &phpStringDelimiter, Sci_Position i, const Sci_Position lengthDoc, Accessor &styler, bool &isSimpleString) {
 	Sci_Position j;
 	const Sci_Position beginning = i - 1;
-	bool isValidSimpleString = false;
+	bool isQuoted = false;
 
 	while (i < lengthDoc && (styler[i] == ' ' || styler[i] == '\t'))
 		i++;
@@ -539,10 +539,11 @@
 	const char chNext = styler.SafeGetCharAt(i + 1);
 	phpStringDelimiter.clear();
 	if (!IsPhpWordStart(ch)) {
-		if (ch == '\'' && IsPhpWordStart(chNext)) {
+		if ((ch == '\'' || ch == '\"') && IsPhpWordStart(chNext)) {
+			isSimpleString = ch == '\'';
+			isQuoted = true;
 			i++;
 			ch = chNext;
-			isSimpleString = true;
 		} else {
 			return beginning;
 		}
@@ -550,9 +551,9 @@
 	phpStringDelimiter.push_back(ch);
 	i++;
 	for (j = i; j < lengthDoc && !isLineEnd(styler[j]); j++) {
-		if (!IsPhpWordChar(styler[j])) {
-			if (isSimpleString && (styler[j] == '\'') && isLineEnd(styler.SafeGetCharAt(j + 1))) {
-				isValidSimpleString = true;
+		if (!IsPhpWordChar(styler[j]) && isQuoted) {
+			if (((isSimpleString && styler[j] == '\'') || (!isSimpleString && styler[j] == '\"')) && isLineEnd(styler.SafeGetCharAt(j + 1))) {
+				isQuoted = false;
 				j++;
 				break;
 			} else {
@@ -562,7 +563,7 @@
 		}
 		phpStringDelimiter.push_back(styler[j]);
 	}
-	if (isSimpleString && !isValidSimpleString) {
+	if (isQuoted) {
 		phpStringDelimiter.clear();
 		return beginning;
 	}
@@ -2310,7 +2311,7 @@
 		case SCE_HPHP_NUMBER:
 			// recognize bases 8,10 or 16 integers OR floating-point numbers
 			if (!IsADigit(ch)
-				&& strchr(".xXabcdefABCDEF", ch) == NULL
+				&& strchr(".xXabcdefABCDEF_", ch) == NULL
 				&& ((ch != '-' && ch != '+') || (chPrev != 'e' && chPrev != 'E'))) {
 				styler.ColourTo(i - 1, SCE_HPHP_NUMBER);
 				if (IsOperator(ch))
@@ -2352,13 +2353,10 @@
 				if (phpStringDelimiter == "\"") {
 					styler.ColourTo(i, StateToPrint);
 					state = SCE_HPHP_DEFAULT;
-				} else if (isLineEnd(chPrev)) {
+				} else if (lineStartVisibleChars == 1) {
 					const int psdLength = static_cast<int>(phpStringDelimiter.length());
-					const char chAfterPsd = styler.SafeGetCharAt(i + psdLength);
-					const char chAfterPsd2 = styler.SafeGetCharAt(i + psdLength + 1);
-					if (isLineEnd(chAfterPsd) ||
-						(chAfterPsd == ';' && isLineEnd(chAfterPsd2))) {
-							i += (((i + psdLength) < lengthDoc) ? psdLength : lengthDoc) - 1;
+					if (!IsPhpWordChar(styler.SafeGetCharAt(i + psdLength))) {
+						i += (((i + psdLength) < lengthDoc) ? psdLength : lengthDoc) - 1;
 						styler.ColourTo(i, StateToPrint);
 						state = SCE_HPHP_DEFAULT;
 						if (foldHeredoc) levelCurrent--;
@@ -2375,12 +2373,9 @@
 					styler.ColourTo(i, StateToPrint);
 					state = SCE_HPHP_DEFAULT;
 				}
-			} else if (isLineEnd(chPrev) && styler.Match(i, phpStringDelimiter.c_str())) {
+			} else if (lineStartVisibleChars == 1 && styler.Match(i, phpStringDelimiter.c_str())) {
 				const int psdLength = static_cast<int>(phpStringDelimiter.length());
-				const char chAfterPsd = styler.SafeGetCharAt(i + psdLength);
-				const char chAfterPsd2 = styler.SafeGetCharAt(i + psdLength + 1);
-				if (isLineEnd(chAfterPsd) ||
-				(chAfterPsd == ';' && isLineEnd(chAfterPsd2))) {
+				if (!IsPhpWordChar(styler.SafeGetCharAt(i + psdLength))) {
 					i += (((i + psdLength) < lengthDoc) ? psdLength : lengthDoc) - 1;
 					styler.ColourTo(i, StateToPrint);
 					state = SCE_HPHP_DEFAULT;

JS Parsing 2 & Block

<script>
	function toJSON(obj){
		if( obj && obj.storePositions ) obj.storePositions()
		var s = JSON.stringify( obj, ['id','from','to','x','y','label','group','OnEnter','OnStay','OnExit','OnTransition'], 2 )
		s = str_replace(
			s,
			[ /,?\s*"([^"]+)":\s*""/gm, /,\s*"/gm, /{\s*/gm, /\s*"([^"]+)":\s*(.+?)\s*([\,}])/gm, /\s*}/gm, /"(\d+)"/gm ],
			[ '', ', "', '{ ', ' $1:$2$3', ' }', '$1' ]
			)
		return s
		}
</script>

... no comment !

F#: Update format specifiers for .NET versions 5 and 6

Since F# 5.0, interpolated strings accept .NET-style specifiers à la C#, under the same conditions:

they follow the expression, inside the braces [1]
general form: [,<alignment>]:<format>[<precision>]

e.g.

    > $"{5./2.,-12:F4}" ;;
    val it : string = "2.5000      "

More recently, F# 6.0 added the %B binary notation printf specifier for integers

Here's a patch and updated tests for each of the above: LexFSharp-MoreFmtSpecs.zip

[1] While mixing specifier types in the same string is wrong, the lexer will highlight all recognized forms regardless; this keeps the logic simple and may help source authors identify mistakes, as the compiler's diagnostics tend to obfuscate the underlying problem:

> $"%F{5./2.,-12:F4}" ;;     

  $"%F{5./2.,-12:F4}" ;;
  ^^^^^^^^^^^^^^^^^^^

stdin(1,2): error FS3376: Invalid interpolated string. Interpolated strings may not use '%' format specifiers unless each is given an expression, e.g. '%d{1+1}'.

Add some utils into LexAccessor and StyleContext

lexlib-0114.zip

code extract from https://github.com/zufuliu/notepad2/blob/main/scintilla/lexlib/LexAccessor.h and https://github.com/zufuliu/notepad2/blob/main/scintilla/lexlib/StyleContext.h
LexAccessor added GetCharacterAndWidth() and StyleAtEx() (to get cache style without call Flush()).
StyleContext added SeekTo(), Rewind(), BackTo() and Advance(). GetDocNextChar() and GetLineNextChar() (currently only skip ASCII whitespaces) may also useful for writing lexers.

PHP Numeric literals

On #19 @ivan-u7n commented:

I agree that the "HTML" lexer is rather complicated and needs overhaul. However, do you have in mind compile-time or run-time combining?

Nevertheless, I've prepared the update to PHP's numeric literals: php-numbers.patch.txt. It's greedier than PHP's own lexer — it doesn't stop on the first invalid character, but goes on. It was made this way on purpose: to visually show, by applying the default style, the invalid “numeric words” which will result in parser errors.

The test for this patch (“+” denotes a valid syntax that should be styled as a number, “-” — an invalid one that should have the default style):

123456; // +
123_456; // +
1234z6; // -
123456_; // -
123__456; // -

0x89Ab; // +
0x89_aB; // +
0x89zB; // -
0x89AB_; // -
0x_89AB; // -
0_x89AB; // -
0x89__AB; // -

1234.; // +
1234.e-0; // +
1234e+0; // +
1234e0; // +
1234.e-; // -
1234e+; // -
1234.-e; // -
1234+e; // -
1234e; // -

.1234; // +
.12e0; // +
.12.0e0; // -
.12e0.0; // -
.12e0e0; // -

1.234e-10; // +
1.2_34e-1_0; // +
1.234e-_10; // -
1.234e_-10; // -
1.234_e-10; // -
1._234e-10; // -
1_.234e-10; // -


01234567; // +
0_1234567; // +
012345678; // -

0...0; // +

Originally posted by @ivan-u7n in #19 (comment)

Syntax highlighting for markdown YAML front matter

Is it possible to get markdown YAML front matter syntax highlighting added? It currently is not working as it should be. None of the keys get highlighted, everything is default

C# lexer?

Does Lexilla have c# lexer?

Wrong syntax highlighting for Matlab strings

As reported to Notepad++ (issue #10065), where they redirected me to lexilla.

Description of the Issue

In Matlab (at least in version '9.10.0.1649659 (R2021a) Update 1', but I don't think it has ever been different), a double-quoted string does not have escape sequences. All characters represent themselves, except for double quotes - you need to double them.
E.g.

"\" -> \
"\\" -> \\
"a""a" -> a"a

C-like escape sequences are only converted when the string is passed to sprintf or similar.
Instead, Notepad++'s syntax coloring interprets those sequences as actual escape sequences.

Steps to Reproduce the Issue

Paste this text in a notepad++ tab, with Matlab as the language:

a="""";
b=1;
c='\';
d=2;
e="\";
f=3;
%" this should be a comment (colored as such), instead it closes the string
g="
h=123;
%" this is a syntax error in Matlab (about 'g'),
% followed by a valid assignment (of 'h')
% Instead, 'h' is colored as part of the string

Expected Behavior

Only the strings should be colored as strings.
Also, Matlab strings cannot span multiple lines, so the coloring should stop at the end of the line, even if no closing quote is found.

Actual Behavior

The 'e' and 'g' strings are not detected properly, so their coloring spills to the next semicolon, and also to the next line.

Debug Information

Notepad++ v7.9.5 (64-bit)
Build time : Mar 21 2021 - 02:13:17
Path : C:\Program Files\Notepad++\notepad++.exe
Admin mode : OFF
Local Conf mode : OFF
OS Name : Windows 10 Pro (64-bit)
OS Version : 2004
OS Build : 19041.1052
Current ANSI codepage : 1252
Plugins : HexEditor.dll mimeTools.dll NppConverter.dll NppExport.dll

Screenshots

Notepad++ coloring:

Matlab coloring:

F#: Allow strings to end with escaped '\'s

LexFSharp is confused by strings with one or more escaped backslashes in terminal position. Sequences like "\\" and "\\\\" are lexed as if the closing quote were escaped:

By contrast, a single escaped backslash character keeps to its proper bounds:

This is a trivial fix: LexFSharp-Allow-Backslashes-at-Str-Ends.zip

Use Heterogeneous Lookup

You may want to use Heterogeneous Lookup in here:
https://github.com/ScintillaOrg/lexilla/blob/master/lexlib/PropSetSimple.cxx#L52
More details:
https://abseil.io/tips/144

Change
typedef std::map<std::string, std::string> mapss;
to
typedef std::map<std::string, std::string, std::less<>> mapss;

and then change:
mapss::const_iterator keyPos = props->find(std::string(key));
to
mapss::const_iterator keyPos = props->find(key);

On another note, I'm surprised why you don't make your life easier using the auto keyword among many other language improvements.

TOPAS ini files

I wanted to suggest to create a new lexer for text files used in the TOPAS simulation software, to facilitate syntax highlighting.

There are syntax highlighting rule files available for Atom, VSCode or Sublime, see https://github.com/davidchall/topas-syntax, that could be used as reference.

`ifdef windir` doesn't work with MSYS2 make

Testing makefile:

# Work with MinGW make, but doesn't work with MSYS2 make.
ifdef windir
    Value1 = 1
else
    Value1 = 0
endif

# Work with both.
ifneq ("$windir", "")
    Value2 = 1
else
    Value2 = 0
endif

all:
	echo $(Value1)
	echo $(Value2)

Patch:
Improve-windir-variable-testing-for-makefile.zip

BTW:
Have you considered adding GitHub Action workflows of CI file and then reviewing/merging PRs? GitHub action can automatically build and test when push and/or PR. That will make the whole process easier.

Batch syntax highlighting doesn't work properly with expansion modifiers

Example: echo %~dp09
Only %~dp0 should be highlighted as a batch identifier instead of %~dp09.

Notepad++ Screenshot (the screenshot was made with an older version, but it's still reproducable with Notepad++ 7.9.5)

This was already reported at notepad-plus-plus/notepad-plus-plus#63.

Rust: 128-bit integer literals are not parsed correctly

The code which parses integer suffixes recognizes the suffixes: u8, i8, u16, i16, u32, i32, u64, i64, usize and isize, but not u128 and i128, so integer literals like 10u128 or 0xffffi128 are marked as errors.

lexilla/lexers/LexRust.cxx

Lines 272 to 293 in 782725a

    
           	/* Scan initial digits. The literal is malformed if there are none. */ 
        
           	error |= !ScanDigits(styler, pos, base); 
        
           	/* See if there's an integer suffix. We mimic the Rust's lexer 
        
           	 * and munch it even if there was an error above. */ 
        
           	c = styler.SafeGetCharAt(pos, '\0'); 
        
           	if (c == 'u' || c == 'i') { 
        
           		pos++; 
        
           		c = styler.SafeGetCharAt(pos, '\0'); 
        
           		n = styler.SafeGetCharAt(pos + 1, '\0'); 
        
           		if (c == '8') { 
        
           			pos++; 
        
           		} else if (c == '1' && n == '6') { 
        
           			pos += 2; 
        
           		} else if (c == '3' && n == '2') { 
        
           			pos += 2; 
        
           		} else if (c == '6' && n == '4') { 
        
           			pos += 2; 
        
           		} else if (styler.Match(pos, "size")) { 
        
           			pos += 4; 
        
           		} else { 
        
           			error = true; 
        
           		}

F#: Improve line-based folding

The LineContains() auxiliary function scans too far backward for same-styled lines. This can disrupt folding under these concurrent conditions:

a single line in a lexical style subject to folding, e.g. SCE_FSHARP_COMMENTLINE
an empty next line, followed by a group of same-styled lines

For example, a line comment group does not fold if the previous non-empty line is also a line comment ¹:

 0 400 400   // not folded
 1 400 400
 0 400 400   // first line in comment fold
 0 400 400   // second . . .
 0 400 400   // third . . .
 1 400 400

An extra blank line is currently needed to correct this:

 0 400 400   // not folded
 1 400 400
 1 400 400
 2 400 401 + // first line in comment fold
 0 401 401 | // second . . .
 0 401 400 | // third . . .
 1 400 400

Similarly, an isolated open statement will be counted as the head of a nearby import list, giving the fold level an extra increment:

 0 400 400   open System
 1 400 400
 2 400 401 + module FoldingTest =
 2 401 402 +     open FSharp.Quotations
 0 402 401 |     open FSharp.Reflection
 1 401 400 |
 0 400 400       () |> ignore
 1 400 400

Here is a mostly purgative refactoring of LineContains() that fixes the scanning range to immediately adjacent lines:

LexFSharp-refactor-LineContains.diff.zip

Test files will follow once this ticket has a number.

The failing condition is !LineContains(..., "//", lineStartPrev, ..., SCE_FSHARP_COMMENTLINE), i.e., lineStartPrev falls within the isolated line comment:
https://github.com/ScintillaOrg/lexilla/blob/a35a59845e793d9d37d249cf097e71fecc5f4bcd/lexers/LexFSharp.cxx#L736-L737 ↩

HTML Parsing : inside script element - block

Blocks not working correctly in SCRIPT html element.

Steps to Reproduce the Issue

1- type <SCRIPT ></SCRIPT >
2- type <SCRIPT ></SCRIPT >
3- type JS Code in the second element

Expected Behavior
Blocks works properly (expanding, reducing js code)

Actual Behavior
Blocks don't works properly

CSS lexer’s support for conditional group rules

Of CSS conditional group rules (at-rules that can include nested statements), the CSS lexer supports only the @media rule. The rules @supports, @document, @-moz-document and possible others should also be supported.

Raku lexer warning from Xcode Analyze

With Xcode 13.2.1, performing Product | Analyze produces this diagnostic:

lexilla/lexers/LexRaku.cxx:865:11: The left operand of '==' is a garbage value
lexilla/lexers/LexRaku.cxx:1044:2: 'typeDetect' declared without an initial value
lexilla/lexers/LexRaku.cxx:1050:6: Assuming 'initStyle' is equal to SCE_RAKU_DEFAULT
lexilla/lexers/LexRaku.cxx:1065:7: Assuming 'line' is <= 0
lexilla/lexers/LexRaku.cxx:1086:9: Calling 'StyleContext::More'
lexilla/lexlib/StyleContext.h:66:2: Entered call from 'LexerRaku::Lex'
lexilla/lexlib/StyleContext.h:67:10: Assuming field 'currentPos' is < field 'endPos'
lexilla/lexlib/StyleContext.h:67:3: Returning the value 1, which participates in a condition later
lexilla/lexers/LexRaku.cxx:1086:9: Returning from 'StyleContext::More'
lexilla/lexers/LexRaku.cxx:1086:9: Entering loop body
lexilla/lexers/LexRaku.cxx:1177:12: Entering loop body
lexilla/lexers/LexRaku.cxx:1190:52: Passing value via 3rd parameter 'type'
lexilla/lexers/LexRaku.cxx:1190:11: Calling 'LexerRaku::ProcessRegexTwinCapture'
lexilla/lexers/LexRaku.cxx:862:1: Entered call from 'LexerRaku::Lex'
lexilla/lexers/LexRaku.cxx:865:11: The left operand of '==' is a garbage value

Language Support: Haxe

Haxe is an open source high-level strictly-typed programming language with a fast optimizing cross-compiler.

https://haxe.org/

the collapse problem in php syntax

it's hard to describe without a demonstration. I
made a file that demonstrates and explains everything.
copy this text to notepad++ or scylla and everything will become clear
`<?PHP

$b=12;

if (2>$b)
{ // start of block 1

echo'Note';
?>

block 1 enclosed in curly braces can be collapsed

<?
print('everything works fine');

PHP_test_1212.txt

} // end of block 1

else

{
echo'if ?> block <? has multiple lines';

?>	
 <div>You cannot collapse the block else{...} in the same way.</div>
<?

print('I think the problem is already clear.');
}

function PASS($pass) // Press the alt+2 keyboard shortcuts
{

yes a single line interrupt of PHP code does not interrupt the code collapse logic

}

function PASS2($pass)// everything looks beautiful, easy navigation
{

even if several times

}

function PASS3($pass)// but if you want to make 1 line break
{

even if several times

or more

}

function PASS4($pass)// and you will no longer understand what is happening here
{

	?>
		<div>even if several times</div><?	// comment
	$b=100;	
	?>
		<div>or more</div><?

	?>
		<div>
			or very 
			very 
			very 
			much
		</div>
	<?

}
// For the Pascal language, there is a LexPascal.cxx file
// There is no such file for the PHP language.
// If by analogy with the file LexPascal.cxx
// then need to do something like this
...
else if (sc.Match('?', '>')) { sc.SetState(SCE_PAS_perversionPHP); sc.Forward(); }
...
case SCE_PAS_perversionPHP:
if (sc.Match('<', '?')) { sc.Forward(); sc.ForwardSetState(SCE_PAS_DEFAULT); }
break;
...

//maybe I'm wrong
//I've never worked for c

Rust: Document length exceeded when file ends with unterminated block comment

LexRust can pass an invalid position to LexAccessor::ColourTo when a file ends with /*, causing this debug assertion to fail:

         assert((startPosStyling + validLen) < Length());

I traced the fault to an excessive increment of pos inside ResumeBlockComment. A simple length comparison before incrementing seems to clear it up [1].

I don't suspect a regression as #34 only touched ResumeLineComment, and a trailing line comment doesn't raise any exception:

[1] The best way to reproduce would be unzipping the test files first, then compiling TestLexers with MSVC in debug mode: CXX_FLAGS=-Zi -TP -MP -W4 -Od -MTd -DDEBUG -EHsc -std:c++latest

Expect something like:

Lexing rust\Issue35.rs
Assertion failed: (startPosStyling + validLen) < Length(), file C:\Users\Rob\source\git\scintilla-contrib\lexilla-dev\lexilla\lexlib\LexAccessor.h, line 178

F#: Allow triple-quoted strings to interpolate string literals (.NET 5 feature)

Since the release version 5.0, F# handles interpolated strings with as much facility as C# has done for years.

Lexilla's F# lexer doesn't yet recognize all the potential use cases of this new language feature.

In particular, a triple-quoted string may be prefixed with the $ interpolation sigil; this is the only way to interpolate a value that is itself a string literal ℹ️ :

> $"""Date: {System.DateTime.Now.ToString("yyyy-MM-dd")}""" ;;
val it : string = "Date: 2021-07-28"

The internal string should not interrupt the surrounding style region; but that's currently not the case:

This patch set implements and tests interpolated string literals in the above context:

LexFSharp-EmbedInterpolatedStrLits.zip

ℹ️

> $"Date: {System.DateTime.Now.ToString("yyyy-MM-dd")}" ;;

  $"Date: {System.DateTime.Now.ToString("yyyy-MM-dd")}" ;;
  --------------------------------------^

stdin(2,39): error FS3373: Invalid interpolated string. Single quote or verbatim string literals may not be used in interpolated expressions in single quote or verbatim strings. Consider using an explicit 'let' binding for the interpolation expression or use a triple quote string as the outer string literal.

Add New Syntax

Dear developers. Please add LOLCODE, Brainfuck, YoptaScript, Pawn, Beef, Crystal and Eiffel syntax. It will be amazing - make brainfuck (or smth else) code in Notepad++

Line continuation in JavaScript

Original report: notepad-plus-plus/notepad-plus-plus#9220.

Line continuation in JavaScript is allowed only for string, not any place. The current behavior seems to come directly from C/C++. Below example is wrong:

if( test ) // toto \
{
	alert();
}
else
{
}

Result:

The first if should be collapsible and the next line with the opening bracket shouldn't be green (considered a comment). Same code inside <script> in HTML files works fine.

Is development moved here or CONTRIBUTING still applies?

If CONTRIBUTING still applies, the Issues and Pull requests tabs should be disabled.

	} else if (sc.ch == '%' && !(fsStr.startChar == '`' \|\| fsStr.startChar == '$') &&
	(setFormatSpecs.Contains(sc.chNext) \|\| setFormatFlags.Contains(sc.chNext))) {

	if (sc.ch == '#') {
	if (visibleChars) {
	if (sc.state != SCE_TCL_IN_QUOTE && expected)
	sc.SetState(SCE_TCL_COMMENT);
	} else {

	/* Scan initial digits. The literal is malformed if there are none. */
	error \|= !ScanDigits(styler, pos, base);
	/* See if there's an integer suffix. We mimic the Rust's lexer
	* and munch it even if there was an error above. */
	c = styler.SafeGetCharAt(pos, '\0');
	if (c == 'u' \|\| c == 'i') {
	pos++;
	c = styler.SafeGetCharAt(pos, '\0');
	n = styler.SafeGetCharAt(pos + 1, '\0');
	if (c == '8') {
	pos++;
	} else if (c == '1' && n == '6') {
	pos += 2;
	} else if (c == '3' && n == '2') {
	pos += 2;
	} else if (c == '6' && n == '4') {
	pos += 2;
	} else if (styler.Match(pos, "size")) {
	pos += 4;
	} else {
	error = true;
	}

scintillaorg / lexilla Goto Github PK

lexilla's Introduction

lexilla's People

Contributors

Stargazers

Watchers

Forkers

lexilla's Issues

Footnotes

Description of the Issue

Steps to Reproduce the Issue

Expected Behavior

Actual Behavior

Debug Information

Screenshots

Footnotes

Recommend Projects

Recommend Topics

Recommend Org