Abstract
Third party data analysis raises data privacy preservation concerns, therefore raising questions as to whether such outsourcing is viable. Cryptography allows a level of data confidentiality. Although some cryptography algorithms, such as Homomorphic Encryption (HE), allow a limited amount of data manipulation, the disadvantage is that encryption precludes any form of sophisticated analysis. For this to be achieved the encrypted data needs to coupled with additional information to facilitate third party analysis. This paper proposes a mechanism for secure k-means clustering that uses HE and the concept of an Updatable Distance Matrix (UDM). The mechanism is fully described and analysed. The reported evaluation shows that the proposed mechanism produces identical clustering results as when "standard" k-means is applied, but in a secure manner. The proposed mechanism thus allows the application of clustering algorithms to encrypted data while preserving both correctness and data privacy.