Abstract
Knowledge Discovery in Databases (KDDs) is the process of identifying valid, novel, useful, and understandable patterns from large data sets. Data Mining (DM) is the core of the KDD process, involving algorithms that explore the data, develop models, and discover significant patterns. One way to enable effective data mining while preserving privacy is to anonymize the data set that includes private information about subjects before being released for data mining. Two common manipulation techniques used to achieve k-anonymity of a data set are generalization and suppression. Generalization refers to replacing a value with a less specific but semantically consistent value, while suppression refers to not releasing a value at all. Generalization is more commonly applied in this domain since suppression may dramatically reduce the quality of the data mining results if not properly used.In this project, we propose a new method for achieving k-anonymity named K-anonymity of Classification Trees Using Suppression (kACTUS). In kACTUS, efficient multidimensional suppression is performed.Thus, in kACTUS, we identify attributes that have less influence on the classification of the data records and suppress them if needed in order to comply with k-anonymity. Encouraging results suggest that kACTUS pedictive performance is better than that of existing k-anonymity algorithms. Attackers often have background knowledge, and we show that k-anonymity does not guarantee privacy against attackers using background knowledge. So we propose the novel and powerful privacy definition called L-diversity. L-Diversity provides privacy even when the data publisher does not know what kind of knowledge is possessed by the adversary. The main idea behind L-diversity is the requirement that the values of the sensitive attributes are well-represented in each group.