Comments (3)
The condition used to determine whether or not parameters are estimated for a comparison is whether it not any data columns are used in any of the comparison levels.
In your case, the sname
comparison makes reference to the columns sex
and mar
, which also appear in your training blocking rules, and so this comparison cannot be estimated. To train the parameters for the sname
comparison you will need to use a blocking rule that does not use any of the columns sname
, sex
, or mar
, as these are the columns that the sname
comparison depends on.
The match weight chart (and the m u parameters chart) will show the default m-values for any comparison that has no trained values associated to it, so those will probably be what you are seeing there.
The parameter estimates chart should not show default values, and should only be displaying values that are estimated from training sessions (expectation maximisation or estimate u from random sampling) - if you do have m-values appearing there for sname,
would you be able to upload an image of it?
from splink.
Thanks both for the replies this solves it. @ADBond apologies, there was indeed no values shown for sname in parameter_estimate_comparisons_chart()
from splink.
The condition used to determine whether or not parameters are estimated for a comparison is whether it not any data columns are used in any of the comparison levels.
In your case, the
sname
comparison makes reference to the columnssex
andmar
, which also appear in your training blocking rules, and so this comparison cannot be estimated. To train the parameters for thesname
comparison you will need to use a blocking rule that does not use any of the columnssname
,sex
, ormar
, as these are the columns that thesname
comparison depends on.The match weight chart (and the m u parameters chart) will show the default m-values for any comparison that has no trained values associated to it, so those will probably be what you are seeing there.
The parameter estimates chart should not show default values, and should only be displaying values that are estimated from training sessions (expectation maximisation or estimate u from random sampling) - if you do have m-values appearing there for
sname,
would you be able to upload an image of it?
I think possibly the distinction here is whether you're displaying from linker.match_weights_chart()
(which iirc does display default values) or the charts returned by the training session:
training_session = linker.estimate_parameters_using_expectation_maximisation(block_on(["first_name"]))
training_session.match_weights_interactive_history_chart()
(which shouldn't)
I admit, it's a bit confusing that linker.match_weights_chart()
shows default values, we should probably improve that somehow!
from splink.
Related Issues (20)
- Need to document `ColumnExpression`
- Replace settings dict guide with SettingsCreator reference HOT 1
- `count_num_comparisons_from_blocking_rule` missing from new Linker API HOT 1
- Some backends don't get completeness correct for array columns HOT 2
- Test chart data
- Better error for undialected ColumnExpression
- Completeness chart fails when data has `source_dataset` column
- [FEAT] Additional argument to filter comparisons shown in comparison viewer dashboard
- Splink 4.0.0: AttributeError: 'Linker' object has no attribute 'query_sql'
- Databricks custom SQL functions aren't registered
- Can't create SQLiteAPI with `register_udfs=False`
- Zero trained m-values can lead to `math domain error`
- `NaN` trained values can break `predict()` HOT 1
- Add option for Input Table with Athena Linker connection
- Splink install failing due to `splink_datasets` `PermissionError` HOT 1
- Docs build failing HOT 1
- High memory usage of `linker.evaluation.prediction_errors_from_labels_table` HOT 2
- Possible bug in `estimate_u_using_random_sampling` for Spark backend HOT 1
- Issues in a readonly filesystem
- Document `DatabaseAPI`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from splink.