Giter Site home page Giter Site logo

Comments (6)

jpcampb2 avatar jpcampb2 commented on June 30, 2024

@abegong: Looking at this with some mixed feelings.

I just created a PR to do it as it stands now, but my concern is that those regexs might get too closely identified with GE and/or be hard to version--certainly many of them could be better than they are now, and we'll want to be able to improve that without breaking peoples' expectations later on.

Adding a GE version to an expectation config may address that.

from great_expectations.

abegong avatar abegong commented on June 30, 2024

Instead of including named regexes as first-class citizens, we're going to make it simple to include named regexes (and other variables) in an expectation_config "meta" object.

Proposed API:

Suppose meta looks like this:

{
  "__version__" : "0.2.3",
  "regexes" : {
    "namelike" : "^[A-Z][a-z]+$",
    "leading_or_trailing_whitespace" : "^\W.*?\W$",
    "double_vowels" : [
      "aa", "ee", "ii", "oo", "uu", "yy"
    ]
  }
}

...then these are all valid function calls:

expect_column_values_to_match_regex("my_column", {"$meta_var" : "regexes.namelike"})
expect_column_values_to_not_match_regex("my_column", {"$meta_var" : "regexes.leading_or_trailing_whitespace"})
expect_column_values_to_match_regex_list("my_column", {"$meta_var" : "regexes.double_vowels"}, "any")

#Match against 'aa'
expect_column_values_to_match_regex("my_column", {"$meta_var" : "regexes.double_vowels.0"}) 

#Match against 'aa' or 'yy'
expect_column_values_to_match_regex_list("my_column", [{"$meta_var" : "regexes.double_vowels.0"}, {"$meta_var" : "regexes.double_vowels.5"}], "any")

At runtime, each $meta_var expression is evaluated by traversing the meta objects tree. This syntax is roughly equivalent to mongodb syntax for querying nested objects. (Mongo also supports matching on multiple criteria, which we don't need here.)

In the config, these will create entries like:

{
  "expectation_type" : "expect_column_values_to_match_regex",
  "kwargs" : {
    "column" : "my_column", 
    "regex" : {"$meta_value" : "regexes.namelike"}
  }
}

{
  "expectation_type" : "expect_column_values_to_not_match_regex",
  "kwargs" : {
    "column" : "my_column", 
    "regex" : {"$meta_value" : "regexes.leading_or_trailing_whitespace"}
  }
}

{
  "expectation_type" : "expect_column_values_to_match_regex_list",
  "kwargs" : {
    "column" : "my_column", 
    "regex_list" : {"$meta_value" : "regexes.double_vowels"},
    "match_on" : "any"
  }
}

{
  "expectation_type" : "expect_column_values_to_match_regex",
  "kwargs" : {
    "column" : "my_column", 
    "regex" : {"$meta_value" : "regexes.double_vowels.0"},
  }
}


{
  "expectation_type" : "expect_column_values_to_match_regex_list",
  "kwargs" : {
    "column" : "my_column", 
    "regex_list" : [
      {"$meta_value" : "regexes.double_vowels.0"},
      {"$meta_value" : "regexes.double_vowels.5"}
    ],
    "match_on" : "any"
  }
}

Note: all of these examples are given in terms of regexes, but this pattern can be applied to virtually any expectation argument.

Therefore, I think the most straightforward way to implement this proposal is by adding logic to @expectation which will iterate over all args and kwargs to find instances of {"$meta_var": ...} and perform the appropriate substitution.

from great_expectations.

jpcampb2 avatar jpcampb2 commented on June 30, 2024

Thanks for capturing this discussion! One small tweak: rather than __version__ what about ge_version: that avoid confusion about name mangling (since it's a dict key not a class variable anyway) and also confusion about whether it's a version of the config (it's not) or a version of the ge library (it is).

from great_expectations.

abegong avatar abegong commented on June 30, 2024

How about we remove all ambiguity and go with great_expectations.__version__?

That's explicit enough that a developer encountering a config for the first time (with no background in great_expectations at all) could still Google and figure it out.

from great_expectations.

jpcampb2 avatar jpcampb2 commented on June 30, 2024

Sounds good to me.

from great_expectations.

jpcampb2 avatar jpcampb2 commented on June 30, 2024

Confirming a discussion we had here:

  • we want to build a mechanism for making explicit when the meta object access will be explicit and when it will be implicit.

from great_expectations.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.