Giter Site home page Giter Site logo

Regex API about psl HOT 8 CLOSED

azjezz avatar azjezz commented on July 23, 2024
Regex API

from psl.

Comments (8)

ddziaduch avatar ddziaduch commented on July 23, 2024 2

Hi @azjezz

I've started working on this. Got first draft, can you verify it?

Index: src/Psl/Internal/Loader.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- src/Psl/Internal/Loader.php	(revision 2336e5624e3c6a8f89ea899a35b15a66385e6bac)
+++ src/Psl/Internal/Loader.php	(date 1602170341000)
@@ -190,6 +190,7 @@
         'Psl\SecureRandom\string',
         'Psl\PseudoRandom\float',
         'Psl\PseudoRandom\int',
+        'Psl\RegExp\filter_array',
         'Psl\Str\Byte\capitalize',
         'Psl\Str\Byte\capitalize_words',
         'Psl\Str\Byte\chr',
Index: src/Psl/RegExp/filter_array.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- src/Psl/RegExp/filter_array.php	(date 1602170593000)
+++ src/Psl/RegExp/filter_array.php	(date 1602170593000)
@@ -0,0 +1,18 @@
+<?php
+
+declare(strict_types=1);
+
+namespace Psl\RegExp;
+
+/**
+ * Perform a regular expression search and replace
+ */
+function filter_array(
+    array $pattern,
+    array $replacement,
+    array $subject,
+    int $limit = -1,
+    ?int &$count = null
+): array {
+    return preg_filter($pattern, $replacement, $subject, $limit, $count);
+}
Index: tests/Psl/FilterArrayTest.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- tests/Psl/FilterArrayTest.php	(date 1602170385000)
+++ tests/Psl/FilterArrayTest.php	(date 1602170385000)
@@ -0,0 +1,31 @@
+<?php
+
+declare(strict_types=1);
+
+namespace Psl\Tests;
+
+use PHPUnit\Framework\TestCase;
+
+use function Psl\RegExp\filter_array;
+
+class FilterArrayTest extends TestCase
+{
+    public function testFilterArray()
+    {
+        $subject = ['1', 'a', '2', 'b', '3', 'A', 'B', '4'];
+        $pattern = ['/\d/', '/[a-z]/', '/[1a]/'];
+        $replace = ['A:$0', 'B:$0', 'C:$0'];
+
+        self::assertSame(
+            [
+                0 => 'A:C:1',
+                1 => 'B:C:a',
+                2 => 'A:2',
+                3 => 'B:b',
+                4 => 'A:3',
+                7 => 'A:4',
+            ],
+            filter_array($pattern, $replace, $subject)
+        );
+    }
+}

from psl.

azjezz avatar azjezz commented on July 23, 2024 1

I think a psalm-plugin is a good idea, we can have it a separate package and it the suggestions ( we don't want to install a pile of packages that might not be used if someone is not even using Psl\Regex ).

the plugin can take care of other things beside regex, such as enforcing Type\string()->coerce($var) istead of (string) $var, and forbidding the use of php builtin functions that PSL replaces. 🤔

as you said, people would have to either add type hint themselves for the pattern or the result.

so Pattern doesn't really add any value :/

from psl.

azjezz avatar azjezz commented on July 23, 2024

Hey @ddziaduch, thank you for picking this up! I suggest you open a pull request so I can do a line-by-line review, one thing to note about the Regex API, is that we want it to be type-safe, while it's "impossible" to do so 100%, we should try, and as per PSL rules, references ( int &$coutn ), must not be used.

You can take a look at HSL implementation ( https://docs.hhvm.com/search?term=HH%5CLib%5CRegex ), but we can't actually reimplement the same API as HSL, as Hack has a special generic type for regex patterns ( Pattern<T> ), but we will be using strings.

as a starter, i would suggest you start with replace, split, and matches, as these are the most commently used regex functions and won't give us any trouble with types.

from psl.

ddziaduch avatar ddziaduch commented on July 23, 2024

Thanks for the feedback @azjezz :)

from psl.

veewee avatar veewee commented on July 23, 2024

Besides enhancing the types, I find it also important that the regex functions deal with pcre errors (invalid regexes).
There is some good stuff in here: https://github.com/spatie/regex - even though I don't always like the API of the package.

More info about dealing with PCRE errors:
https://github.com/spatie/regex/blob/master/src/RegexResult.php#L9-L16

@azjezz : I like the Pattern<T>. Can't wa provide a class for that with a simple wrapper function for converting strings in patterns?

from psl.

azjezz avatar azjezz commented on July 23, 2024

mm no, Pattern<T> in hack is not a class, its a subtype of string that can constructed by using re"", e.g: $pattern = re"#^(HTTP/)?(?P<version>[1-9]\d*(?:\.\d)?)$#"; is of type Pattern<shape('version' => string, ... )>, at runtime, it is just a string.

from psl.

veewee avatar veewee commented on July 23, 2024

Would it make sense to make a class for it?
For static analysis, it is an issue to validate what is inside a pattern in order to tell what e.g. the matches array will look like:

If you add an additional pattern class, you can make psalm type-safe:

Downside : you still need to type the pattern in very a manual way in your applications:

/** @var Pattern<array{ 0: string, 1: string, word: string, 2: string}> $pattern */
$pattern = new Pattern('/([H])(?P<word>ello)/');

Meaning you could drop the pattern class and might as well type the resulting array-shape instead.

So maybe it is better to provide a psalm plugin that parses regexes in order to automatically detect the shape of the matching items instead of adding a Pattern class in here ... dunno :)

BTW : match is reserved in PHP 8 🤦

from psl.

psalm-github-bot avatar psalm-github-bot commented on July 23, 2024

I found these snippets:

https://psalm.dev/r/ab3913bfdf
<?php


$matches = [];
$a = preg_match('/([H])(?P<word>ello)/', "Hello", $matches);

var_dump($matches[0], $matches['word']);
Psalm output (using commit 7195275):

INFO: PossiblyUndefinedIntArrayOffset - 7:10 - Possibly undefined array offset 'int(0)' is risky given expected type 'array-key'. Consider using isset beforehand.

INFO: PossiblyUndefinedStringArrayOffset - 7:23 - Possibly undefined array offset 'string(word)' is risky given expected type 'array-key'. Consider using isset beforehand.

ERROR: ForbiddenCode - 7:1 - Unsafe var_dump

INFO: UnusedVariable - 5:1 - Variable $a is never referenced
https://psalm.dev/r/b91bfe654b
<?php

/**
 * @template MatchingShape
 */
class Pattern
{
    private string $pattern;

    public function __construct(string $pattern)
    {
        $this->pattern = $pattern;
    }

    public function toString(): string
    {
        return $this->pattern;
    }
}


/**
 * @template Shape of array
 *
 * @param string $lookup
 * @param Pattern<Shape> $pattern
 *
 * @return Shape
 */
function regex_match(Pattern $pattern, string $subject): array
{
    $matches = [];
    if (false === preg_match($pattern->toString(), $subject, $matches)) {
        throw new \RuntimeException('invalid pattern ... include pcre error info here');
    }

    /** @var Shape $matches */
    return $matches;
}



/** @var Pattern<array{ 0: string, 1: string, word: string, 2: string}> $pattern */
$pattern = new Pattern('/([H])(?P<word>ello)/');
$result = regex_match($pattern, 'Hello');

var_dump(
    $result[1],
    $result['word']
);
Psalm output (using commit 7195275):

ERROR: ForbiddenCode - 47:1 - Unsafe var_dump

from psl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.