Abstract
House price fluctuations in the real estate market occur due to the effects of other variables that are correlated to housing prices. Some prices cannot be controlled or it is not known when they will increase or decrease. These fluctuations accord with the hedonic pricing theory, which suggests that house pricing depends on the composite features. Our study explores this econometric concept using machine learning algorithms in the effort to accurately forecast house prices. To build the forecasted models, we use the Ames housing dataset, which include 82 explanatory features and 2930 entries on housing sales in Ames, Iowa, USA. We use particle swarm optimization feature selection to address the dataset dimensionality issues for both classification-based models and regression-based models. With dimension reduction, the experimental results demonstrate that the house price prediction accuracy increased from 81.4% to 84.4% for classification-based models, and the price prediction error decreased from 13.3% to 6.9% for regression-based models. The experimental results demonstrate that there is a substantial difference between using and not using feature selection in terms of the performance in predicting housing prices. The main benefit of applying feature selection is that it facilitates the selection of the more relevant and correlated features to prevent overfitting of the model with too many irrelevant features.