Variators

A variator creates variations of a single record. ThresholdMatcher can produce different similarity scores for different variations of the same record, discarding all but the highest score in the final result. This is very useful in situations where values are not put into the correct columns (e.g. when a person’s first name and last name are swapped).

class datamatch.variators.Variator

Base class of all variator classes.

Sub-class should override method variations().

This class also serves as a no-op variator. It simply returns the record as is.

variations(sr)

Returns variations of the input record.

Parameters

sr (pandas.Series) – The input record.

Return type

Iterator of pandas.Series

class datamatch.variators.Swap(column_a, column_b)

Produces variations by swapping values between two columns

Parameters
  • column_a (str) – The left column.

  • column_b (str) – The right column.