Scorers¶
A scorer decide how the similarity score for each pair is calculated.
- exception datamatch.scorers.RefuseToScoreException¶
Raise to delegate scoring to a parent object
- class datamatch.scorers.BaseScorer¶
Base class for all scorer classes.
Sub-class should implement the
score()
method.- abstract score(a, b)¶
Returns similarity score (0 <= sim <= 1) for a pair of records.
- Parameters
a (
pandas.Series
) – The left record.b (
pandas.Series
) – The right record.
- Returns
Similarity score.
- Return type
- class datamatch.scorers.SimSumScorer(fields)¶
Returns the sum of similarity values of all fields.
- Parameters
fields (
dict
of similarity classes) – The mapping between field name and similarity class to use.
- class datamatch.scorers.AbsoluteScorer(column_name, score, ignore_key_error=False)¶
Returns an arbitrary score if both records have the same value for a column.
If the values are not equal or one of them is null then this scorer will raise
RefuseToScoreException
. Therefore, this class should never be used on its own but always wrapped in eitherMaxScorer
orMinScorer
.
- class datamatch.scorers.MaxScorer(scorers)¶
Returns the max value from the scores of all child scorers.
- Parameters
scorers (list of
BaseScorer
subclasses.) – The children classes.
- class datamatch.scorers.MinScorer(scorers)¶
Returns the min value from the scores of all child scorers.
- Parameters
scorers (list of
BaseScorer
subclasses.) – The children classes.