Scorers¶
A scorer decide how the similarity score for each pair is calculated.
- exception datamatch.scorers.RefuseToScoreException¶
Raise to delegate scoring to a parent object
- class datamatch.scorers.BaseScorer¶
Base class for all scorer classes.
Sub-class should implement the
score()
method.- abstract score(a, b)¶
Returns similarity score (0 <= sim <= 1) for a pair of records.
- Parameters
a (
pandas.Series
) – The left record.b (
pandas.Series
) – The right record.
- Returns
Similarity score.
- Return type
- class datamatch.scorers.SimSumScorer(fields)¶
Returns the sum of similarity values of all fields.
- Parameters
fields (
dict
of similarity classes) – The mapping between field name and similarity class to use.
- class datamatch.scorers.AbsoluteScorer(column_name, score, ignore_key_error=False)¶
Returns an arbitrary score if both records have the same value for a column.
If the values are not equal or one of them is null then this scorer will raise
RefuseToScoreException
. Therefore, this class should never be used on its own but always wrapped in eitherMaxScorer
orMinScorer
.
- class datamatch.scorers.MaxScorer(scorers)¶
Returns the max value from the scores of all child scorers.
- Parameters
scorers (list of
BaseScorer
subclasses.) – The children classes.
- class datamatch.scorers.MinScorer(scorers)¶
Returns the min value from the scores of all child scorers.
- Parameters
scorers (list of
BaseScorer
subclasses.) – The children classes.
- class datamatch.scorers.AlterScorer(scorer, values, alter)¶
Modifies the score for pairs with the same given values
- Parameters
scorer (
BaseScorer
subclass.) – The wrapped scorer.values (
pandas.Series
) – for each pair, if both rows have index in this series and their values are the same then call alter to modify the final score.alter (Callable[[int], int]) – callback to modify the final score.
- class datamatch.scorers.FuncScorer(cb)¶
Scores pairs by calling the given callback
- Parameters
cb (Callable[[
pandas.Series
,pandas.Series
],float
]) – Callback to calculate score