Filters¶
A filter discards pairs from the matching process. An index, which dictates which pair can be compared, does the opposite. They are both employed to increase matching performance.
- class datamatch.filters.BaseFilter¶
Base class of all filter classes.
Sub-class should implement the
valid()
method.- abstract valid(a, b)¶
Returns true if a pair of records is valid (can be matched).
- Parameters
a (
pandas.Series
) – the left record.b (
pandas.Series
) – the right record.
- Returns
Whether these two records can be matched.
- Return type
- class datamatch.filters.DissimilarFilter(col, ignore_key_error=False)¶
Eliminates pairs with the same value for a specific field.
- class datamatch.filters.NonOverlappingFilter(start, end)¶
Eliminates pairs with overlapping ranges.
This is usually used over time ranges, which ensures time exclusivity of a record.
Both start and end columns must be of the same type and must be comparable.
e.g. df[end] < df[start] should produce a boolean series.