Scorers¶
A scorer decide how the similarity score for each pair is calculated.
- exception datamatch.scorers.RefuseToScoreException¶
Raise to delegate scoring to a parent object
- class datamatch.scorers.BaseScorer¶
Base class for all scorer classes.
Sub-class should implement the
score()method.- abstract score(a, b)¶
Returns similarity score (0 <= sim <= 1) for a pair of records.
- Parameters
a (
pandas.Series) – The left record.b (
pandas.Series) – The right record.
- Returns
Similarity score.
- Return type
- class datamatch.scorers.SimSumScorer(fields)¶
Returns the sum of similarity values of all fields.
- Parameters
fields (
dictof similarity classes) – The mapping between field name and similarity class to use.
- class datamatch.scorers.AbsoluteScorer(column_name, score, ignore_key_error=False)¶
Returns an arbitrary score if both records have the same value for a column.
If the values are not equal or one of them is null then this scorer will raise
RefuseToScoreException. Therefore, this class should never be used on its own but always wrapped in eitherMaxScorerorMinScorer.
- class datamatch.scorers.MaxScorer(scorers)¶
Returns the max value from the scores of all child scorers.
- Parameters
scorers (list of
BaseScorersubclasses.) – The children classes.
- class datamatch.scorers.MinScorer(scorers)¶
Returns the min value from the scores of all child scorers.
- Parameters
scorers (list of
BaseScorersubclasses.) – The children classes.
- class datamatch.scorers.AlterScorer(scorer, values, alter)¶
Modifies the score for pairs with the same given values
- Parameters
scorer (
BaseScorersubclass.) – The wrapped scorer.values (
pandas.Series) – for each pair, if both rows have index in this series and their values are the same then call alter to modify the final score.alter (Callable[[int], int]) – callback to modify the final score.
- class datamatch.scorers.FuncScorer(cb)¶
Scores pairs by calling the given callback
- Parameters
cb (Callable[[
pandas.Series,pandas.Series],float]) – Callback to calculate score