Abstract
Real-world datasets commonly present high dimensional data, which means an increased amount of information. However, this does not always imply an improvement in learning technique performance. Furthermore, some features may be correlated or add unexpected noise, thereby reducing data clustering performance. This has motivated the development of feature selection methods to find the most relevant subset of features to describe data. In this work, we focus on the problem of unsupervised feature selection. The main goal is to define a method to identify the number of features to select after sorting them based on some criterion. This task is done by means of the False Nearest Neighbor technique, which is rooted in chaos theory. Results have shown that this technique gives a good approximate number of features to select. When compared to other techniques, in most of the analyzed cases, it maintains the quality of the generated partitions while selecting fewer features.