Filters

A filter discards pairs from the matching process. An index, which dictates which pair can be compared, does the opposite. They are both employed to increase matching performance.

class datamatch.filters.BaseFilter

Base class of all filter classes.

Sub-class should implement the valid() method.

abstract valid(a, b)

Returns true if a pair of records is valid (can be matched).

Parameters
Returns

Whether these two records can be matched.

Return type

bool

class datamatch.filters.DissimilarFilter(col)

Eliminates pairs with the same value for a specific field.

Parameters

col (str) – the column to check.

class datamatch.filters.NonOverlappingFilter(start, end)

Eliminates pairs with overlapping ranges.

This is usually used over time ranges, which ensures time exclusivity of a record.

Both start and end columns must be of the same type and must be comparable.

e.g. df[end] < df[start] should produce a boolean series.

Parameters
  • start (str) – the range start column.

  • end (str) – the range end column.