Scorers

A scorer decide how the similarity score for each pair is calculated.

exception datamatch.scorers.RefuseToScoreException

Raise to delegate scoring to a parent object

class datamatch.scorers.BaseScorer

Base class for all scorer classes.

Sub-class should implement the score() method.

abstract score(a, b)

Returns similarity score (0 <= sim <= 1) for a pair of records.

Parameters
Returns

Similarity score.

Return type

float

class datamatch.scorers.SimSumScorer(fields)

Returns the sum of similarity values of all fields.

Parameters

fields (dict of similarity classes) – The mapping between field name and similarity class to use.

class datamatch.scorers.AbsoluteScorer(column_name, score, ignore_key_error=False)

Returns an arbitrary score if both records have the same value for a column.

If the values are not equal or one of them is null then this scorer will raise RefuseToScoreException. Therefore, this class should never be used on its own but always wrapped in either MaxScorer or MinScorer.

Parameters
  • column_name (str) – The column to compare.

  • score (float) – The score to return.

  • ignore_key_error (bool) – When set to True, if the column does not exist in either record, raise RefuseToScoreException (delegate to a parent scorer) instead of KeyError.

class datamatch.scorers.MaxScorer(scorers)

Returns the max value from the scores of all child scorers.

Parameters

scorers (list of BaseScorer subclasses.) – The children classes.

class datamatch.scorers.MinScorer(scorers)

Returns the min value from the scores of all child scorers.

Parameters

scorers (list of BaseScorer subclasses.) – The children classes.

class datamatch.scorers.AlterScorer(scorer, values, alter)

Modifies the score for pairs with the same given values

Parameters
  • scorer (BaseScorer subclass.) – The wrapped scorer.

  • values (pandas.Series) – for each pair, if both rows have index in this series and their values are the same then call alter to modify the final score.

  • alter (Callable[[int], int]) – callback to modify the final score.

class datamatch.scorers.FuncScorer(cb)

Scores pairs by calling the given callback

Parameters

cb (Callable[[pandas.Series, pandas.Series], float]) – Callback to calculate score