What happens? Hello, I am using splink to link two datasets, using

Thanks both for the replies this solves it. <a class="user-mention notranslate" data-h

M values aren't trained for a column about splink HOT 3 CLOSED

lamaeldo commented on September 26, 2024

M values aren't trained for a column

from splink.

Comments (3)

ADBond commented on September 26, 2024 1

The condition used to determine whether or not parameters are estimated for a comparison is whether it not any data columns are used in any of the comparison levels.

In your case, the sname comparison makes reference to the columns sex and mar, which also appear in your training blocking rules, and so this comparison cannot be estimated. To train the parameters for the sname comparison you will need to use a blocking rule that does not use any of the columns sname, sex, or mar, as these are the columns that the sname comparison depends on.

The match weight chart (and the m u parameters chart) will show the default m-values for any comparison that has no trained values associated to it, so those will probably be what you are seeing there.

The parameter estimates chart should not show default values, and should only be displaying values that are estimated from training sessions (expectation maximisation or estimate u from random sampling) - if you do have m-values appearing there for sname, would you be able to upload an image of it?

from splink.

lamaeldo commented on September 26, 2024 1

Thanks both for the replies this solves it. @ADBond apologies, there was indeed no values shown for sname in parameter_estimate_comparisons_chart()

from splink.

RobinL commented on September 26, 2024

The condition used to determine whether or not parameters are estimated for a comparison is whether it not any data columns are used in any of the comparison levels.

In your case, the sname comparison makes reference to the columns sex and mar, which also appear in your training blocking rules, and so this comparison cannot be estimated. To train the parameters for the sname comparison you will need to use a blocking rule that does not use any of the columns sname, sex, or mar, as these are the columns that the sname comparison depends on.

The match weight chart (and the m u parameters chart) will show the default m-values for any comparison that has no trained values associated to it, so those will probably be what you are seeing there.

The parameter estimates chart should not show default values, and should only be displaying values that are estimated from training sessions (expectation maximisation or estimate u from random sampling) - if you do have m-values appearing there for sname, would you be able to upload an image of it?

I think possibly the distinction here is whether you're displaying from linker.match_weights_chart() (which iirc does display default values) or the charts returned by the training session:

training_session = linker.estimate_parameters_using_expectation_maximisation(block_on(["first_name"]))
training_session.match_weights_interactive_history_chart()

(which shouldn't)

I admit, it's a bit confusing that linker.match_weights_chart() shows default values, we should probably improve that somehow!

from splink.

M values aren't trained for a column about splink HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent