Protecting Privacy Using k-Anonymity

Gespeichert in:

Bibliographische Detailangaben
Zeitschriftentitel:	Journal of the American Medical Informatics Association
Personen und Körperschaften:	El Emam, Khaled, Dankar, Fida Kamal
In:	Journal of the American Medical Informatics Association, 15, 2008, 5, S. 627-637
Format:	E-Article
Sprache:	Englisch
veröffentlicht:	Oxford University Press (OUP)
Schlagwörter:	Health Informatics

author_facet	El Emam, Khaled Dankar, Fida Kamal El Emam, Khaled Dankar, Fida Kamal
author	El Emam, Khaled Dankar, Fida Kamal
spellingShingle	El Emam, Khaled Dankar, Fida Kamal Journal of the American Medical Informatics Association Protecting Privacy Using k-Anonymity Health Informatics
author_sort	el emam, khaled
spelling	El Emam, Khaled Dankar, Fida Kamal 1527-974X 1067-5027 Oxford University Press (OUP) Health Informatics http://dx.doi.org/10.1197/jamia.m2716 <jats:title>Abstract</jats:title><jats:p>Objective: There is increasing pressure to share health information and even make it publicly available. However, such disclosures of personal health information raise serious privacy concerns. To alleviate such concerns, it is possible to anonymize the data before disclosure. One popular anonymization approach is k-anonymity. There have been no evaluations of the actual re-identification probability of k-anonymized data sets.</jats:p><jats:p>Design: Through a simulation, we evaluated the re-identification risk of k-anonymization and three different improvements on three large data sets.</jats:p><jats:p>Measurement: Re-identification probability is measured under two different re-identification scenarios. Information loss is measured by the commonly used discernability metric.</jats:p><jats:p>Results: For one of the re-identification scenarios, k-Anonymity consistently over-anonymizes data sets, with this over-anonymization being most pronounced with small sampling fractions. Over-anonymization results in excessive distortions to the data (i.e., high information loss), making the data less useful for subsequent analysis. We found that a hypothesis testing approach provided the best control over re-identification risk and reduces the extent of information loss compared to baseline k-anonymity.</jats:p><jats:p>Conclusion: Guidelines are provided on when to use the hypothesis testing approach instead of baseline k-anonymity.</jats:p> Protecting Privacy Using k-Anonymity Journal of the American Medical Informatics Association
doi_str_mv	10.1197/jamia.m2716
facet_avail	Online Free
finc_class_facet	Medizin Informatik
format	ElectronicArticle
fullrecord	blob:ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTE5Ny9qYW1pYS5tMjcxNg
id	ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTE5Ny9qYW1pYS5tMjcxNg
institution	DE-Zwi2 DE-D161 DE-Gla1 DE-Zi4 DE-15 DE-Pl11 DE-Rs1 DE-105 DE-14 DE-Ch1 DE-L229 DE-D275 DE-Bn3 DE-Brt1
imprint	Oxford University Press (OUP), 2008
imprint_str_mv	Oxford University Press (OUP), 2008
issn	1527-974X 1067-5027
issn_str_mv	1527-974X 1067-5027
language	English
mega_collection	Oxford University Press (OUP) (CrossRef)
match_str	elemam2008protectingprivacyusingkanonymity
publishDateSort	2008
publisher	Oxford University Press (OUP)
recordtype	ai
record_format	ai
series	Journal of the American Medical Informatics Association
source_id	49
title	Protecting Privacy Using k-Anonymity
title_unstemmed	Protecting Privacy Using k-Anonymity
title_full	Protecting Privacy Using k-Anonymity
title_fullStr	Protecting Privacy Using k-Anonymity
title_full_unstemmed	Protecting Privacy Using k-Anonymity
title_short	Protecting Privacy Using k-Anonymity
title_sort	protecting privacy using k-anonymity
topic	Health Informatics
url	http://dx.doi.org/10.1197/jamia.m2716
publishDate	2008
physical	627-637
description	<jats:title>Abstract</jats:title><jats:p>Objective: There is increasing pressure to share health information and even make it publicly available. However, such disclosures of personal health information raise serious privacy concerns. To alleviate such concerns, it is possible to anonymize the data before disclosure. One popular anonymization approach is k-anonymity. There have been no evaluations of the actual re-identification probability of k-anonymized data sets.</jats:p><jats:p>Design: Through a simulation, we evaluated the re-identification risk of k-anonymization and three different improvements on three large data sets.</jats:p><jats:p>Measurement: Re-identification probability is measured under two different re-identification scenarios. Information loss is measured by the commonly used discernability metric.</jats:p><jats:p>Results: For one of the re-identification scenarios, k-Anonymity consistently over-anonymizes data sets, with this over-anonymization being most pronounced with small sampling fractions. Over-anonymization results in excessive distortions to the data (i.e., high information loss), making the data less useful for subsequent analysis. We found that a hypothesis testing approach provided the best control over re-identification risk and reduces the extent of information loss compared to baseline k-anonymity.</jats:p><jats:p>Conclusion: Guidelines are provided on when to use the hypothesis testing approach instead of baseline k-anonymity.</jats:p>
container_issue	5
container_start_page	627
container_title	Journal of the American Medical Informatics Association
container_volume	15
format_de105	Article, E-Article
format_de14	Article, E-Article
format_de15	Article, E-Article
format_de520	Article, E-Article
format_de540	Article, E-Article
format_dech1	Article, E-Article
format_ded117	Article, E-Article
format_degla1	E-Article
format_del152	Buch
format_del189	Article, E-Article
format_dezi4	Article
format_dezwi2	Article, E-Article
format_finc	Article, E-Article
format_nrw	Article, E-Article
_version_	1792345308028469251
geogr_code	not assigned
last_indexed	2024-03-01T17:21:10.237Z
geogr_code_person	not assigned
openURL	url_ver=Z39.88-2004&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fvufind.svn.sourceforge.net%3Agenerator&rft.title=Protecting+Privacy+Using+k-Anonymity&rft.date=2008-09-01&genre=article&issn=1067-5027&volume=15&issue=5&spage=627&epage=637&pages=627-637&jtitle=Journal+of+the+American+Medical+Informatics+Association&atitle=Protecting+Privacy+Using+k-Anonymity&aulast=Dankar&aufirst=Fida+Kamal&rft_id=info%3Adoi%2F10.1197%2Fjamia.m2716&rft.language%5B0%5D=eng
SOLR
_version_	1792345308028469251
author	El Emam, Khaled, Dankar, Fida Kamal
author_facet	El Emam, Khaled, Dankar, Fida Kamal, El Emam, Khaled, Dankar, Fida Kamal
author_sort	el emam, khaled
container_issue	5
container_start_page	627
container_title	Journal of the American Medical Informatics Association
container_volume	15
description	<jats:title>Abstract</jats:title><jats:p>Objective: There is increasing pressure to share health information and even make it publicly available. However, such disclosures of personal health information raise serious privacy concerns. To alleviate such concerns, it is possible to anonymize the data before disclosure. One popular anonymization approach is k-anonymity. There have been no evaluations of the actual re-identification probability of k-anonymized data sets.</jats:p><jats:p>Design: Through a simulation, we evaluated the re-identification risk of k-anonymization and three different improvements on three large data sets.</jats:p><jats:p>Measurement: Re-identification probability is measured under two different re-identification scenarios. Information loss is measured by the commonly used discernability metric.</jats:p><jats:p>Results: For one of the re-identification scenarios, k-Anonymity consistently over-anonymizes data sets, with this over-anonymization being most pronounced with small sampling fractions. Over-anonymization results in excessive distortions to the data (i.e., high information loss), making the data less useful for subsequent analysis. We found that a hypothesis testing approach provided the best control over re-identification risk and reduces the extent of information loss compared to baseline k-anonymity.</jats:p><jats:p>Conclusion: Guidelines are provided on when to use the hypothesis testing approach instead of baseline k-anonymity.</jats:p>
doi_str_mv	10.1197/jamia.m2716
facet_avail	Online, Free
finc_class_facet	Medizin, Informatik
format	ElectronicArticle
format_de105	Article, E-Article
format_de14	Article, E-Article
format_de15	Article, E-Article
format_de520	Article, E-Article
format_de540	Article, E-Article
format_dech1	Article, E-Article
format_ded117	Article, E-Article
format_degla1	E-Article
format_del152	Buch
format_del189	Article, E-Article
format_dezi4	Article
format_dezwi2	Article, E-Article
format_finc	Article, E-Article
format_nrw	Article, E-Article
geogr_code	not assigned
geogr_code_person	not assigned
id	ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTE5Ny9qYW1pYS5tMjcxNg
imprint	Oxford University Press (OUP), 2008
imprint_str_mv	Oxford University Press (OUP), 2008
institution	DE-Zwi2, DE-D161, DE-Gla1, DE-Zi4, DE-15, DE-Pl11, DE-Rs1, DE-105, DE-14, DE-Ch1, DE-L229, DE-D275, DE-Bn3, DE-Brt1
issn	1527-974X, 1067-5027
issn_str_mv	1527-974X, 1067-5027
language	English
last_indexed	2024-03-01T17:21:10.237Z
match_str	elemam2008protectingprivacyusingkanonymity
mega_collection	Oxford University Press (OUP) (CrossRef)
physical	627-637
publishDate	2008
publishDateSort	2008
publisher	Oxford University Press (OUP)
record_format	ai
recordtype	ai
series	Journal of the American Medical Informatics Association
source_id	49
spelling	El Emam, Khaled Dankar, Fida Kamal 1527-974X 1067-5027 Oxford University Press (OUP) Health Informatics http://dx.doi.org/10.1197/jamia.m2716 <jats:title>Abstract</jats:title><jats:p>Objective: There is increasing pressure to share health information and even make it publicly available. However, such disclosures of personal health information raise serious privacy concerns. To alleviate such concerns, it is possible to anonymize the data before disclosure. One popular anonymization approach is k-anonymity. There have been no evaluations of the actual re-identification probability of k-anonymized data sets.</jats:p><jats:p>Design: Through a simulation, we evaluated the re-identification risk of k-anonymization and three different improvements on three large data sets.</jats:p><jats:p>Measurement: Re-identification probability is measured under two different re-identification scenarios. Information loss is measured by the commonly used discernability metric.</jats:p><jats:p>Results: For one of the re-identification scenarios, k-Anonymity consistently over-anonymizes data sets, with this over-anonymization being most pronounced with small sampling fractions. Over-anonymization results in excessive distortions to the data (i.e., high information loss), making the data less useful for subsequent analysis. We found that a hypothesis testing approach provided the best control over re-identification risk and reduces the extent of information loss compared to baseline k-anonymity.</jats:p><jats:p>Conclusion: Guidelines are provided on when to use the hypothesis testing approach instead of baseline k-anonymity.</jats:p> Protecting Privacy Using k-Anonymity Journal of the American Medical Informatics Association
spellingShingle	El Emam, Khaled, Dankar, Fida Kamal, Journal of the American Medical Informatics Association, Protecting Privacy Using k-Anonymity, Health Informatics
title	Protecting Privacy Using k-Anonymity
title_full	Protecting Privacy Using k-Anonymity
title_fullStr	Protecting Privacy Using k-Anonymity
title_full_unstemmed	Protecting Privacy Using k-Anonymity
title_short	Protecting Privacy Using k-Anonymity
title_sort	protecting privacy using k-anonymity
title_unstemmed	Protecting Privacy Using k-Anonymity
topic	Health Informatics
url	http://dx.doi.org/10.1197/jamia.m2716