Abstract
•Three-way decision is proposed to solve record linkage problem.•Granule describing distribution uncertain data points is used in three-way decision.•Coverage of uncertainty is defined by entropy and memberships.•The specificity of granule construction is controlled by a new parameter.
Record linkage is a typical two-class recognition problem in data mining. To improve its classification performance of the problem, this paper proposes to apply three-way classification to identify uncertain points (regions) for further clerical investigation in decision-making. The detailed three-way decision process is realized by a two-phase approach. During the first phase, an information granule is constructed to describe the uncertain region in the data space. In the second phase, the constructed granule is utilized to discriminate between certain points (those with a high likelihood of belonging to one of the classes) and uncertain points (viz. those requiring clerical attention). For uncertain points, manual investigation is realized; for certain points, the generic binary classifier is applied for classification. Synthetic data and publicly available data are used to demonstrate the performance of the proposed approach. Finally, the proposed approach is shown effective in applications involving real-world record linkage data.