Giter Site home page Giter Site logo

add fn:match-groups() function about qtspecs HOT 7 OPEN

liamquin avatar liamquin commented on August 17, 2024
add fn:match-groups() function

from qtspecs.

Comments (7)

michaelhkay avatar michaelhkay commented on August 17, 2024

When you compare a string against a regex there are in general multiple matches, each one of which has multiple captured groups. These days we could represent the full complexity of the result using maps and arrays, but back in the day we chose to represent it using XML (see fn:analyze-string). It's not clear to me that the result using maps and arrays would be more usable than the result represented as XML, and in fact delivering the result in the form of the original string augmented with markup has some real benefits.

from qtspecs.

liamquin avatar liamquin commented on August 17, 2024

Maybe fn:analyze-string() is enough. It’s true that the XML markup could be extended in the ways i mentioned, if needed, and also that being able to use *:match makes using analyze-string a little easier. However, in practice i’ll continue to use a wrapper for getting at just the matched groups.

from qtspecs.

ChristianGruen avatar ChristianGruen commented on August 17, 2024

It would be nice indeed to have a more lightweight alternative to fn:analyze-string (which people struggle with). The function provides much more functionality than what most users need for everyday tasks. We could keep it simple, do what (I believe) most other languages do and return a flat sequence, for example ("ab", "b") for match-groups('xabx', '(a(b))')?

If #37 is not dropped, we could then do:

let $date := "2010-10-10"
let $pattern := "^(\d{4})-(\d{2})-(\d{2})$"
let ($year, $month, $day) := match-groups($date, $pattern)

Currently, it would e.g. be:

let $groups := data(analyze-string($date, $pattern)/fn:match/fn:group)
let $year := $groups[1]
let $month := $groups[2]
let $day := $groups[3]

from qtspecs.

liamquin avatar liamquin commented on August 17, 2024

@ChristianGruen very much so. Although maybe we ended up with a "with prefix" expression (i lost track, i know i raised it as an issue!) in which case, in environments without fn predeclared,

let ($year, $month, $day) := (with prefix "fn" := "http://www.w3.org/2005/xpath-functions" return analyze-string($input, $regex)/fn:match/text())

but this is not as nice.

from qtspecs.

benibela avatar benibela commented on August 17, 2024

I have an extract function in Xidel that only returns the matched text and the third parameter lets one choose the returned capture groups
(I always thought it to be faster to only return the data one needs)

extract(
"It was in the January of 1836 that she set out.",
"(January|February|March|April...).*(\d\d\d\d)"
)

returns January of 1836

extract(
"It was in the January of 1836 that she set out.",
"(January|February|March|April...).*(\d\d\d\d)"
, 1 to 2)

returns ("January", "1836")

extract(
"It was in the January of 1836 that she set out.",
"(January|February|March|April...).*(\d\d\d\d)"
, (0,1,2,2,2,2))

returns ("January of 1836", "January", "1836", "1836", "1836", "1836")

extract(
"It was in the January of 1836 that she set out.",
"(\w+)"
, 1)

returns "It"

And my function can also return all matches together in a sequence:

extract(
"It was in the January of 1836 that she set out.",
"(\w+)"
, 1, "*")

return ("It", "was", "in", "the", "January", "of", "1836", "that", "she", "set", "out")

extract(
"It was in the January of 1836 that she set out.",
"(\w)(\w+)"
, (1,2), "*")

return ("I", "t", "w", "as", "i", "n", "t", "he", "J", "anuary", "o", "f", "1", "836", "t", "hat", "s", "he", "s", "et", "o", "ut",)

from qtspecs.

liamquin avatar liamquin commented on August 17, 2024

@benibela great to see you here. I think your function would be fine, although maybe 0 or -1 would be better than "", so that the argument could be specified as xs:integer

from qtspecs.

benibela avatar benibela commented on August 17, 2024

think your function would be fine, although maybe 0 or -1 would be better than "", so that the argument could be specified as xs:integer

there is more

All regex functions have a flags parameter, e.g. "i" for case insensitive.

That is where I put the * option. Like "i*" and it returns all matches case insensitively.

from qtspecs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.