startautomating / irregular Goto Github PK
View Code? Open in Web Editor NEWRegular Expressions made Strangely Simple
Home Page: https://irregular.start-automating.com/
License: MIT License
Regular Expressions made Strangely Simple
Home Page: https://irregular.start-automating.com/
License: MIT License
Match a Python class
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
On Mac, Powershell v7.1
Steps to reproduce the behavior:
cd ./Irregular
$log = git log -n 10
$log | ?<Git_Log>
Expected behavior
PoSH object of the git log
Actual behavior
error as above for each log entry ( 20 )
Additional context
module is installed
get-module irregular
ModuleType Version PreRelease Name Exporte
dComman
ds
Script 0.5.6 Irregular {Expor…
The module irregular seems to work, confirmed by just typing ?<Git_Log> returns the 33 lines of the actual regex as expected.
$PSVersionTable
Name Value
PSVersion 7.1.3
PSEdition Core
GitCommitId 7.1.3
OS Darwin 20.4.0 Darwin Kernel Version 20.4.0: …
Platform Unix
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
WSManStackVersion 3.0
Hello,
When I pass files as parameters I would like the match to show also the information of the file.
For example if I execute:
Get-ChildItem -Path '.\' -Filter *.ps1|ForEach-Object{ $_|?<PowerShell_Requires> }
I get something like:
StartIndex EndIndex Value
---------- -------- -----
153 192 #requires -Version 2.0 -Modules ShowUI
0 23 #requires -Version 2.0
0 23 #requires -Version 1.0
0 23 #Requires -Version 3.0
24 59 #Requires -Modules ActiveDirectory
Checking the type I see that it is System.Text.RegularExpressions.Match
.
To show which file the match corresponds to I used:
Get-ChildItem -Path '.\' -Filter *.ps1|ForEach-Object{ $_.FullName; $_|?<PowerShell_Requires> }
but honestly that's not what I would want to get.
Is there a "neater" way to get the match with the file info included?
For example some type like System.Text.RegularExpressions.FileMatch
that in addition to StartIndex
, EndIndex
and Value
contains the File System Object
.
Best regards,
Claudio Salvio
It would be really nice if Write-Regex -Atomic -Or automatically indented and put each or on it's own line
Describe the bug
?<FFMpeg_Progress> captures Bitrate twice, because the Regex does not use the correct name for "Speed"
When matching IPv4 groups of numbers, invalid IPs are matched as well as valid IPs
$ipRegex = Write-RegEx -Pattern ?
$ipRegex.Matches('192.168.86.153,256.199.381.31,10.0.0.1')
StartIndex EndIndex Value
0 14 192.168.86.153
15 29 256.199.381.31 <-- should not match
30 38 10.0.0.1
Describe the Desired Expression
Have a regular expression that matches time intervals, as described in ISO-8601
Provide some Sample Source Text
PT5M
P5M
P3Y6M4DT12H30M5S
Highlight What you'd like to match
All of the above
What parts of the samples should match? Which pieces of data are important to extract from the match?
** PREFACE: I love this. Thx.**
Write-Regex -CharacterClass Digit -Repeat # This writes the Regex (\d+)
However, this only "writes" 3 blanks lines.
To Reproduce
Write-Regex -CharacterClass Digit -Repeat # This writes the Regex (\d+)
Expected behavior
Expect the pattern '\d+' to appear as the comment indicates.
Actual behavior
3 blank lines.
What is actually being "written" is the Regex object itself, since apparently there is a default formatter for regexes that produce this output.
Additional context
The following will show all of the object properties (as will | Format-Table *)
Write-Regex -CharacterClass Digit -Repeat | Select-Object *
The pattern can also be seen by accessing the "Pattern" property or using the ToString() method of the regex object.
This is not a bug in Irregular's cmdlets (IMO) but rather a serious issue in the documentation and help.
Write-* cmdlets (practically) all produce (some) screen output in the default case -- due to the formatters.
Those who don't understand GetType(), Get-Member, or using Format explicitly or Select-Object will have a hard time understanding what Write-Regex is actually doing.
I can't decide if this is truly non-standard behavior for a Write-* cmdlets or simply surprising because the result is so different (and the text is hidden).
However, it did cost me a few minutes, first thinking that maybe my 7.2 RC PowerShell was buggy, then trying it in 5.1, and finally investigating deeper.
I suspect many people will give up and walk away from it, and maybe even from this excellent module.
Basically, the problem is that when one reads a script or a blog post that uses the module, they use a bunch of commands like ?<Whatever>
that don't exist (or at least, aren't discoverable when the module isn't imported).
I think the best thing would be to put the whole list of aliases in the module manifest.
Describe the Desired Expression
It would be great to match cron intervals
Provide some Sample Source Text
0 0-23 * * *
11 11 * * 1,2,3,4,5
Highlight What you'd like to match
Minute, Hour, Day, DayOfMonth, DayOfWeek
Describe the Desired Expression
Markdown YAML Headers
Provide some Sample Source Text
Highlight What you'd like to match
The entire header.
What parts of the samples should match? Which pieces of data are important to extract from the match?
The header and it's content.
? currently emits the regex default, instead of the script used to generate it.
If parameters were passed or piped in, this should still generate a final regex. However, with no parameters, it should output it's generator, not the Regex.
-Extract
is hard to read on the CommandlineThe default output is hard to read on the command line, unless you filter properties.
Output where named capture groups become object properties, excluding 0
and Match
Example using -match
$regex = @'
(?x)
# parse output from: "netstat -a -n -o
^\s+
(?<Protocol>\S+)
\s+
(?<LocalAddress>\S+)
\s+
(?<ForeignAddress>\S+)
\s+
(?<State>\S{0,})?
\s+
(?<Pid>\S+)$
'@
netstat -a -n -o | %{
if($_ -match $regex) {
$matches.remove(0)
[pscustomobject]$Matches
}
} | select -Last 2 -ov 'last'
| ft
$last | fl
netstat -a -n -o
| Use-RegEx -Extract -Pattern $regex -Match { $_ } -ea SilentlyContinue
| select -Last 2 -ov last
|Ft
$last | fl
Removing or hiding the the [Match]
and 0
properties by default when using -Extract
netstat -a -n -o
| Use-RegEx -Extract -Pattern $regex -Match { $_ } -ea SilentlyContinue
| select -ExcludeProperty 'match', '0'
| select -Last 2 -ov last
| ft
$last | fl
Describe the Desired Expression
It would be nice to have Regular Expressions to extract out OpenSCAD.
These should cover:
Provide some Sample Source Text
See:
This way, one could search for a property, instead of matching all properties.
Describe the bug
-Extract should coerce to [Timespan] before [DateTime]
To Reproduce
Use-RegEx -Match "00:00:01.01" -Pattern "(?[\d:.]+)" -Extract
Expected behavior
Timespan is returned as a [Timespan]
Actual behavior
Timespan is returned as a [DateTime]
Is your feature request related to a problem? Please describe.
Regex character class subtraction is annoying.
New-Regex should include something to abstract it.
Describe the solution you'd like
New-Regex should add -NotCharacterClass and -NotLiteralCharacter.
Without additional parameters, these should act like -CharacterClass and -LiteralCharacter, except that the selection set should be prefixed with ^ (to indicate that it is not those characters)
When provided with -CharacterClass or -LiteralCharacter, these should create a character class subtraction.
# Match anything but punctuation
New-RegEx -NotCharacterClass Punctuation
# Match any punctuation except open/close/quote/endquote, and comma.
New-RegEx -CharacterClass Punctuation -NotCharacterClass PunctuationOpen, PunctuationClose, PunctuationInitialQuote, PunctuationFinalQuote -NotLiteralCharacter ','
?<PowerShell_HelpField> is slightly incorrect.
It matches any whitespace after the field.
It should match any whitespace except a newline or carriage return.
Then it should match the carriage returns and newlines, and then match the Content of the field.
Additionally. ?<PowerShell_HelpField> does not locate the end correctly.
It looks for . followed by any word characters.
It should instead look for dot followed by any valid help field names.
Describe the Desired Expression
?<FFMpeg_Progress> isn't the only useful piece of information to extract from FFMpeg:
There's also:
In Write-RegEx.ps1, the fourth .Example provides code for capturing an e-mail address, but the sample code provided fails if the email address contains subdomains (e.g. [email protected] or [email protected]).
I'm also not sure what would happen if an email address or domain contains multiple hyphens ([email protected] or [email protected]).
FWIW, emailRegex.com has a pretty exhaustive .NET RegEx string for email addresses, but it might be too much for an example.
To Reproduce
emailRegex.Match("[email protected]")
Expected behavior
Returns the entire email address
Actual behavior
Returns [email protected]
Additional context
This came up during your talk to the NY PowerShell MeetUp.
Cool talk, btw!
This can be accomplished with PipeScript and HelpOut
There should be a Regex to find the location a space would exist if text were interpreted as CamelCase
An ASCII version of this Regex could be written as:
New-Regex -Not -Modifier IgnoreCase |
New-RegEx -After '[a-z]' |
New-RegEx -Before '[A-Z]'
# Which would produce:
'(?-i)(?<=[a-z])(?=[A-Z])'
Describe the Desired Expression
Handle Markdown Lists
Provide some Sample Source Text
Highlight What you'd like to match
Each List Item
What parts of the samples should match? Which pieces of data are important to extract from the match?
Describe the Desired Expression
Should Match a C/C++ #ifdef preprocessor statement, up until the #endif.
Provide some Sample Source Text
#ifdef Windows
// compile this code
#endif
#ifndef Windows
// don't compile, it's not Windows
#endif
But not:
//#ifdef Windows
// this really doesn't matter
//#endif
What parts of the samples should match? Which pieces of data are important to extract from the match?
Each if statement should match.
The type of if statement and the remainder of the line will be import.
Describe the bug
Write-RegEx -If -Then could be useful for creating balancing groups, if it produced valid RegEx.
To Reproduce
Write-RegEx -If Foo -Then ?! # produces an invalid RegEx: (?(Foo)(?:?!))
Expected behavior
(?(Foo)(?!))
Is your feature request related to a problem? Please describe.
When I delete a module containing a regex, it is still present in the list returned by the Get-Regex cmdlet.
Describe the solution you'd like
In case the imported file contains only one regex, removing the module should remove the regex from the list returned by Get-RegEx.
Additional context
Repro :
Write-RegEx -Name TestExtractDataBetweenMarkers -After ${StartAnchor} -CharacterClass Any -Greedy -Lazy| write-regex -Before $EndAnchor |Set-Regex -Path c:\temp\regex
Import-Regex C:\temp\regex\TestExtractDataBetweenMarkers.regex.txt
Get-RegEx -Name TestExtractDataBetweenMarkers
# Name Description
# ---- -----------
# TestExtractDataBetweenMarkers
Get-Module
# ModuleType Version Name ExportedCommands
# ---------- ------- ---- ----------------
# Script 0.0 ?<TestExtractDataBetweenMarkers> ?<TestExtractDataBetweenMarkers>
# Script 0.6 Irregular {Export-RegEx, Get-RegEx, Import-RegEx, Set-Regex, Show-Re...
# ...
Remove-Module ?<TestExtractDataBetweenMarkers>
Get-RegEx -Name TestExtractDataBetweenMarkers
# Name Description
# ---- -----------
# TestExtractDataBetweenMarkers
?<TestExtractDataBetweenMarkers>
# ?<TestExtractDataBetweenMarkers> : Le terme «?<TestExtractDataBetweenMarkers>» n'est pas reconnu comme nom d'applet de
# commande, fonction, fichier de script ou programme exécutable. Vérifiez l'orthographe du nom, ou si un chemin d'accès
# existe, vérifiez que le chemin d'accès est correct et réessayez.
# Au caractère Ligne:1 : 1
# + ?<TestExtractDataBetweenMarkers>
# + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# + CategoryInfo : ObjectNotFound: (?<TestExtractDataBetweenMarkers>:String) [], CommandNotFoundException
# + FullyQualifiedErrorId : CommandNotFoundException
Write-RegEx -Atomic -Or a, b
Expected:
(?>a|b)
Actual:
(?>(a|b))
Hello,
Is there any recommended way to handle your own regular expressions?
Any suggested best practices for building a library of them?
@StartAutomating Thanks for making this innovative and useful module available to the community!
Best regards,
Claudio Salvio
Match a Struct in C/C++
Describe the Desired Expression
Match an enum in C/C++
Describe the bug
The last example of PowerShell code in the README.md does not return values.
"number: 1
string: 'hello'" | ? -Split |
Foreach-Object {
$key = $_ | ? -Until -Trim -IncludeMatch
$value = $key | ? -Until -Trim
@{$key.Trim(':')=$value}
}
To Reproduce
Steps to reproduce the behavior:
Expected behavior
I expect keys and values to be displayed.
Actual behavior
Name Value
number
string
Additional context
This behavior was seen in both Windows PowerShell 5.1 and PowerShell 7.2.5
Describe the Desired Expression
An expression to match a MAC address.
Provide some Sample Source Text
3C-9C-0F-8C-34-21
Highlight What you'd like to match
What parts of the samples should match? Which pieces of data are important to extract from the match?
It would be nice to extract out the OUI and NIC portions of the MAC.
Write-Regex -Atomic -Or ("a", "b") -Min 0 -Max 3
Expected:
(?>a|b){0,3}
Actual:
(?>(a|b){0,3})
It would be great to be able to use Irregular as a GitHub Action.
Describe the bug
Set-Regex throw WriteErrorException when the parameter -Modifier is present with Write-RegEx
To Reproduce
$StartAnchor='\[1'
$EndAnchor ='P]]'
Write-RegEx -Name ExtractDataBetweenMarkers -Modifier 'SingleLine' -After ${StartAnchor} -CharacterClass Any -Greedy -Lazy |
Write-Regex -Before $EndAnchor |Set-Regex
Expected behavior
No error (?) :
Write-RegEx -Name ExtractDataBetweenMarkers -Modifier 'SingleLine' -After ${StartAnchor} -CharacterClass Any -Greedy -Lazy |
Write-Regex -Before $EndAnchor |Set-Regex
Actual behavior
Set-Regex : Must provide a -Name, or start the pattern with a named capture.
Au caractère Ligne:2 : 34
+ Write-Regex -Before $EndAnchor |Set-Regex
+ ~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Write-Error], WriteErrorException
+ FullyQualifiedErrorId : Irregular.Missing.Name,Set-Regex
My regex seems to have a capture name :
(?s) (?<ExtractDataBetweenMarkers>(?<=\ [1). *?) (?= P]])
When I remove the Modifier parameter I have no error :
Write-RegEx -Name ExtractDataBetweenMarkers -Modifier 'SingleLine' -After ${StartAnchor} -CharacterClass Any -Greedy -Lazy | Write-Regex -Before $EndAnchor |Set-Regex
Additional context
PS v5.1 on Windows Server 2012.
The Irregular module was not installed by Install-Module (no internet access on the server) :
IPMO C:\Users\MyAccount\Downloads\irregular\Irregular.psd1
Is your feature request related to a problem? Please describe.
Use-RegEx -Extract is great for creating a property bag. It could be even more helpful if it auto-converted primitive types, such as [Timespan], [Datetime], [float], [int], [bool] . This functionality is possible with -Coerce, but requires knowledge of each capture.
Describe the solution you'd like
Either:
The documentation :
-Path <String>
The path to the file. If this is not provided, it will save regular expressions to the user's Irregular
module path.
1 - When I save a regex I can't tell where the cmdlet is saving the file.
2 - When the module is not installed by Install-module, in this case the behavior concerning the choice of the path is not documented.
Describe the solution you'd like
For 1 : add Verbose parameter.
For 2 : documente the behavior in this case.
Additional context
The Irregular module was not installed by Install-Module (no internet access on the server):
IPMO C: \ Users \ MyAccount \ Downloads \ irregular \ Irregular.psd1
$env:psmodulepath
C:\Users\MyAccount\Documents\WindowsPowerShell\Modules; C:\Program Files\WindowsPowerShell\Modules; C:\Windows\system32\WindowsPowerShell\v1.0\Modules ...
In my case the file is saved in the first path present in $env:psmodulepath :
C:\Users\MyAccount\Documents\WindowsPowerShell\Modules\Irregular\Regex
Found with trace-Command :
Trace-Command PathResolution -expression {
$StartAnchor='\[1'
$EndAnchor ='P]]'
Write-RegEx -Name ExtractDataBetweenMarkers -After ${StartAnchor} -CharacterClass Any -Greedy -Lazy|
write-regex -Before $EndAnchor |Set-Regex } -pshost
#DÉBOGUER : PathResolution Information: 0 : RESOLVED PATH:
#C:\Users\MyAccount\Documents\WindowsPowerShell\Modules\Irregular\RegEx\ExtractDataBetweenMarkers.regex.txt
The cmdlet create the path '\Irregular\RegEx'
Describe the Desired Expression
A regular expression that can extract out GCode instructions.
Provide some Sample Source Text
; generated by Slic3r 1.3.0 on 2021-10-05 at 19:28:57
; external perimeters extrusion width = 0.44mm (3.38mm^3/s)
; perimeters extrusion width = 0.48mm (7.54mm^3/s)
; infill extrusion width = 0.48mm (10.05mm^3/s)
; solid infill extrusion width = 0.48mm (2.51mm^3/s)
; top infill extrusion width = 0.48mm (1.88mm^3/s)
M127
M118 X38.97 Y23.08 Z3.00 T0
M140 S50 T0
M104 S230 T0
M104 S0 T1
M107
G90
G28
M132 X Y Z A B
G1 Z50.000 F420
G161 X Y F3300
M7 T0
M6 T0
M651
M907 X100 Y100 Z40 A100 B20
M108 T0
M106
; Filament gcode
G21 ; set units to millimeters
G90 ; use absolute coordinates
M73 P0
G1 Z0.400 F7800.000
G1 E-2.00000 F2400.00000
G1 X16.663 Y-10.414 F7800.000
G1 E0.00000 F2400.00000
G1 F1800
G1 X18.593 Y-9.147 E0.27422
G1 X19.941 Y-7.377 E0.53847
G1 X29.941 Y11.623 E3.08846
G1 X30.714 Y14.750 E3.47102
G1 X30.714 Y19.750 E4.06485
G1 X30.317 Y22.025 E4.33907
G1 X29.173 Y24.030 E4.61330
G1 X27.417 Y25.530 E4.88752
G1 X25.257 Y26.345 E5.16175
G1 X24.000 Y26.464 E5.31171
G1 X-6.000 Y26.464 E8.87467
Highlight What you'd like to match
Each line should be matched. If the line starts with a ;, it should be considered a comment. Otherwise, the first word is the instruction and all subsequent words are arguments. Anything after ; on a given line should be considered a comment.
If the Irregular action finds bits of Irregular inside of the github workspace, it should use them instead.
Describe the bug
This should have matched ?<FFMpeg_Progress>:
To Reproduce
"frame=10674 fps=1333 q=28.0 size= 20736kB time=00:07:04.80 bitrate= 399.9kbits/s speed=53.1x" | ?<FFmpeg_Progress>
Expected behavior
It matches
Actual behavior
It does not match, because the Regex expects at least one whitespace after frame=
Describe the Problem
When matching a series of files or functions, Use-Regex does make it easy to pair the match (or extracted match) with the input.
This makes an object pipeline using both pieces of information considerably less elegant.
Describe the solution you'd like
Use-Regex should have a parameter -IncludeInputObject (aliased to -OutputInputObject). This should include the input object in the output when using -Extract.
Describe alternatives you've considered
Use-Regex: -IncludeMatch could signal to include the input object.
Describe the bug
Export-RegEx -As EmbeddedEngine does nothing
To Reproduce
Export-RegEx -As EmbeddedEngine -Name * -Path .\Test.ps1
Does not create test.ps1, or error out.
Additional context
It appears that this has been internally renamed to 'Embedded', but not updated in the [ValidateSet()] of Get-RegEx or Export-RegEx.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.