Comments (3)
I agree with both points (avoiding super()
, and there maybe should be an intermediate class)
As @ThomasHepworth has just pointed out, there are potential additions to InputColumn
such as to_date, regex extract, lower, that make it even more complex (and hence harder to use for end user).
from splink.
I think this makes sense - not sure I was too convinced at any point that we needed col_name
+ input_column
in general, especially as it only really fits for one-column levels.
As you say I think the fact is that users need to know (at least a little) about how InputColumn
works anyway to form the SQL string, so why not just get them to use this.
There is also maybe something conceptually nice about not having to keep any data in the base class - each subclass ca just define what it needs, and not have to worry about passing things to super()__init__()
.
from splink.
However maybe there is an argument that we should have people use an intermediate class instead? InputColumn
has a lot of stuff that goes with it (which is needed by the Linker
), but which people writing comparison levels almost certainly shouldn't be directly messing around with - I worry that people might get a bit lost in all the methods when they basically only need name_l
+ name_r
.
It would also mean we are more free to mess around with InputColumn
without worrying so much about user-impact - in the same way that we are aiming to do with Comparison
and ComparisonLevel
from splink.
Related Issues (20)
- Ethics article for Splink blog
- [FEAT] Parallelise u and EM estimating in duckdb HOT 1
- Incorrect URLs in README.md/linker.py HOT 6
- Datediff level + comparison naming consistency HOT 1
- EM silently ignores salting HOT 12
- truth_space_table_from_labels_table throws error with Pyspark HOT 8
- Cluster studio fails with debug mode on
- [FEAT] Different clustering algorithms besides Connexted Components HOT 1
- Consistent date difference level across backends HOT 1
- Spark regex extract not working (Splink 4) HOT 1
- [FEAT] Add cluster ID to output from linker.prediction_errors_from_labels_table() HOT 2
- DuckDB-less workflow broken
- [Splink 4] DateComparison - 1st January matching not working HOT 6
- [Splink 4] DateComparison - pass type?
- AttributeError: 'NoneType' object has no attribute 'sparkContext' HOT 1
- SplinkDataFrame __init__ sets `_target_schema` but never used anywhere in codebase
- Java constructor does not exist HOT 2
- bug: "can't take logarithm of zero" HOT 8
- Graph metrics testing framework
- Splink 4: Blocking rule shouldn't need `sql_dialect` when serialised to dict - this should be retrieved from settings
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from splink.