A filter discards pairs from the matching process. An index, which dictates which pair can be compared, does the opposite. They are both employed to increase matching performance.
- class datamatch.filters.BaseFilter¶
Base class of all filter classes.
Sub-class should implement the
- class datamatch.filters.DissimilarFilter(col, ignore_key_error=False)¶
Eliminates pairs with the same value for a specific field.
- class datamatch.filters.NonOverlappingFilter(start, end)¶
Eliminates pairs with overlapping ranges.
This is usually used over time ranges, which ensures time exclusivity of a record.
Both start and end columns must be of the same type and must be comparable.
e.g. df[end] < df[start] should produce a boolean series.