Abstract
To support effective data exploration, there has been a growing interest in developing solutions that can automatically recommend data visualizations that reveal important data-driven insights. In such solutions, a large number of possible data visualization views are generated and ranked according to some metric of importance, then the top-k most important views are recommended. However, one drawback of that approach is that it often recommends similar views, leaving the data analyst with a limited amount of gained insights. To address that limitation, in this work we posit that employing diversification techniques in the process of view recommendation allows eliminating that redundancy and provides a concise coverage of the possible insights to be discovered. To that end, we propose a hybrid objective utility function, which captures both the importance, as well as the diversity of the insights revealed by the recommended views. While in principle, traditional diversification methods provide plausible solutions under our proposed utility function, they suffer from a significantly high query processing cost. In particular, directly applying such methods leads to a "process-first-diversify-next" approach, in which all possible data visualization are generated first via executing a large number of aggregate queries. To address that challenge, we propose the DiVE scheme, which efficiently selects the top-k recommended view based on our hybrid utility function. DiVE leverages the properties of both the importance and diversity metrics to prune a large number of query executions without compromising the quality of recommendations. Our experimental evaluation on real datasets shows the performance gains provided by DiVE.