An Improved Variable –Sized Microaggregation Algorithm for Privacy Preservation (IV-MDAV)
Keywords:
Privacy Preservation, MDAV, CLUSTERINGAbstract
Micro aggregation is a technique used to protect privacy in databases and location-based services. We propose a new hybrid technique for multivariate micro aggregation. Our technique combines a heuristic yielding fixed-size groups and a genetic algorithm yielding variable-sized groups. Fixed-size heuristics are fast and able to deal with large data sets, but they sometimes are far from optimal in terms of the information loss inflicted. On the other hand, the genetic algorithm obtains very good results (i.e. optimal or near optimal), but it can only cope with very small datasets. Our technique leverages the advantages of both types of heuristics and avoids their shortcomings. First, it partitions the data set into a number of groups by using a fixed-size heuristic. Then, it optimizes the partitions by means of the genetic algorithm. As an outcome of this mixture of heuristics, we obtain a technique that improves the results of the fixed-size heuristic in large data sets.
References
Agrawal R., Srikant R., “Privacy-Preserving Data Mining”. ACM SIGMOD Conference, 2000.
Charu C. Aggarwal and Philip S. Yu, “Privacy Preserving Data Mining: Models and Algorithms”.
Sweeney L.: Replacing Personally Identifiable Information in Medical Records, the Scrub System. Journal of the American Medical Informatics Association, 1996.
Sweeney L.: Guaranteeing Anonymity while Sharing Data, the Datafly System. Journal of the American Medical Informatics Association, 1997.
J.M. Mateo-Sanz and J. Domingo-Ferrer, “A Method for Data Oriented Multivariate Microaggregation,” Proc. Statistical Data Protection ’98, pp. 89-99, 1999.
A. Hundepool, A. V. de Wetering, R. Ramaswamy, L. Franconi, A. Capobianchi, P.-P. DeWolf, J. Domingo-Ferrer, V. Torra, R. Brand & S. Giessing, (2003) “µ-ARGUS version 3.2 Software and User’s Manual”, Voorburg NL: Statistics Netherlands, http://neon.vb.cbs.nl/casc.
M. Laszlo & S. Mukherjee, (2005) “Minimum spanning tree partitioning algorithm for microaggregation”, IEEE Transactions on Knowledge and Data Engineering, 17(7), pp. 902-911.
Domingo-Ferrer J., Mateo-Sanz J., Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering, 2002; 14(1): 189–201.
Malin B., Sweeney L.: Determining the identifiability of DNA database entries. Journal of the American Medical Informatics Association, pp. 537-541, November 2000.
Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng., 17(7), 902-911 (2005).
J. Domingo-Ferrer and V. Torra, “Ordinal, continuous and heterogeneous k-anonymity through microaggregation,” Data Mining and Knowledge Discovery, vol. 11, no. 2, pp. 195–212, 2005.
A. Solanas & A. Martínez-Ballesté, (2006) “V-MDAV: A multivariate microaggregation with variable group size”, Seventh COMPSTAT Symposium of the IASC, Rome.
J. Domingo-Ferrer, A. Solanas & A. Martínez-Ballesté, 2006 “Privacy in statistical databases: k-anonymity through microaggregation”, in IEEE Granular Computing '06, Atlanta, USA, pp. 774-777.
Newton E., Sweeney L., Malin B.: Preserving Privacy by De-identifying Facial Images. IEEE Transactions on Knowledge and Data Engineering, IEEE TKDE, February 2005.
A. Hundepool, A. V. de Wetering, R. Ramaswamy, L. Franconi, A. Capobianchi, P.-P. DeWolf, J. Domingo-Ferrer, V. Torra, R. Brand, and S. Giessing, “µ-ARGUS version 4.0 Software and User’s Manual”. Voorburg NL: Statistics Netherlands, May 2005, http://neon.vb.cbs.nl/casc.
Sweeney L.: Privacy-Preserving Bio-terrorism Surveillance. AAAI Spring Symposium, AI Technologies for Homeland Security, 2005.
Sweeney L.: Privacy Technologies for Homeland Security. Testimony before the Privacy and Integrity Advisory Committee of the Department of Homeland Security, Boston, MA, June 15, 2005.
A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam, “l-diversity: Privacy beyond k-anonymity”, In Proceedings of the 22nd IEEE International Conference on Data Engineering (ICDE 2006), 2006.
Solanas A., Martínez-Ballesté A., “V-MDAV: A multivariate microaggregation with variable group size.” Seventh COMPSTAT Symposium of the IASC, Rome, 2006.
Sweeney L.: AI Technologies to Defeat Identity Theft Vulnerabilities. AAAI Spring Symposium, AI Technologies for Homeland Security, 2005.
Benjamin C. M. Fung, Concordia University, Montreal; Rui Chen, Simon Fraser University, Burnaby; and Philip S. Yu, University of Illinois at Chicago, “Privacy-Preserving Data Publishing: A Survey of Recent Developments” ACM Computing Surveys, Vol. 42, No. 4, Article 14, Publication date: June 2010.
Privacy-Preserving Data Mining, Models and Algorithms Edited by Charu C. Aggarwal, IBM T.J. Watson Research Center, USA and Philip S. Yu, University of Illinois at Chicago, USA, Springer, 2008.
Ebaa Fayyoumi and B. John Oommen, “A survey on statistical disclosure control and micro-aggregation techniques for secure statistical databases.” Softw. Pract. Exper., 31 May 2010.
P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, CMU, SRI, 1998.
Josep Domingo-Ferrer, Agusti Solanas. “Privacy in Statistical Databases: k-Anonymity Through Microaggregation,” IEEE, 2006.
Domingo-Ferrer, J., Sebé, F., & Solanas, A. (2008). A polynomial-time approximation to optimal multivariate microaggregation. Computer and Mathematics with Applications, 55(4), 714–732.
Chang, C.-C., Li, Y.-C., & Huang, W.-H. (2007). TFRP: An efficient microaggregation algorithm for statistical disclosure control. Journal of Systems and Software, 80(11), 1866–1878.
Ebaa Fayyoumi and B. John Oommen, “A survey on statistical disclosure control and micro-aggregation techniques for secure statistical databases.” Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/spe.992.
Sweeney L., Gross R.: Mining Images in Publicly-Available Cameras for Homeland Security. AAAI Spring Symposium, AI Technologies for Homeland Security, 2005.
Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J. M., & Sebé, F. Efficient multivariate data-oriented microaggregation. The VLDB Journal, 15(4), 355–369, (2006).
Downloads
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.