Variators

A variator creates variations of a single record. ThresholdMatcher can produce different similarity scores for different variations of the same record, discarding all but the highest score in the final result. This is very useful in situations where values are not put into the correct columns (e.g. when a person’s first name and last name are swapped).

class datamatch.variators.Variator

Base class of all variator classes

Sub-class should override method variations().

This class also serves as a no-op variator. It simply returns the record as is.

variations(sr)

Returns variations of the input record

Parameters

sr (pandas.Series) – The input record

Return type

Iterator of pandas.Series

class datamatch.variators.Swap(column_a, column_b)

Produces variations by swapping values between 2 columns

Parameters
  • column_a (str) – The left column

  • column_b (str) – The right column