author_facet El Emam, Khaled
Dankar, Fida Kamal
El Emam, Khaled
Dankar, Fida Kamal
author El Emam, Khaled
Dankar, Fida Kamal
spellingShingle El Emam, Khaled
Dankar, Fida Kamal
Journal of the American Medical Informatics Association
Protecting Privacy Using k-Anonymity
Health Informatics
author_sort el emam, khaled
spelling El Emam, Khaled Dankar, Fida Kamal 1527-974X 1067-5027 Oxford University Press (OUP) Health Informatics http://dx.doi.org/10.1197/jamia.m2716 <jats:title>Abstract</jats:title><jats:p>Objective: There is increasing pressure to share health information and even make it publicly available. However, such disclosures of personal health information raise serious privacy concerns. To alleviate such concerns, it is possible to anonymize the data before disclosure. One popular anonymization approach is k-anonymity. There have been no evaluations of the actual re-identification probability of k-anonymized data sets.</jats:p><jats:p>Design: Through a simulation, we evaluated the re-identification risk of k-anonymization and three different improvements on three large data sets.</jats:p><jats:p>Measurement: Re-identification probability is measured under two different re-identification scenarios. Information loss is measured by the commonly used discernability metric.</jats:p><jats:p>Results: For one of the re-identification scenarios, k-Anonymity consistently over-anonymizes data sets, with this over-anonymization being most pronounced with small sampling fractions. Over-anonymization results in excessive distortions to the data (i.e., high information loss), making the data less useful for subsequent analysis. We found that a hypothesis testing approach provided the best control over re-identification risk and reduces the extent of information loss compared to baseline k-anonymity.</jats:p><jats:p>Conclusion: Guidelines are provided on when to use the hypothesis testing approach instead of baseline k-anonymity.</jats:p> Protecting Privacy Using k-Anonymity Journal of the American Medical Informatics Association
doi_str_mv 10.1197/jamia.m2716
facet_avail Online
Free
finc_class_facet Medizin
Informatik
format ElectronicArticle
fullrecord blob:ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTE5Ny9qYW1pYS5tMjcxNg
id ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTE5Ny9qYW1pYS5tMjcxNg
institution DE-Zwi2
DE-D161
DE-Gla1
DE-Zi4
DE-15
DE-Pl11
DE-Rs1
DE-105
DE-14
DE-Ch1
DE-L229
DE-D275
DE-Bn3
DE-Brt1
imprint Oxford University Press (OUP), 2008
imprint_str_mv Oxford University Press (OUP), 2008
issn 1527-974X
1067-5027
issn_str_mv 1527-974X
1067-5027
language English
mega_collection Oxford University Press (OUP) (CrossRef)
match_str elemam2008protectingprivacyusingkanonymity
publishDateSort 2008
publisher Oxford University Press (OUP)
recordtype ai
record_format ai
series Journal of the American Medical Informatics Association
source_id 49
title Protecting Privacy Using k-Anonymity
title_unstemmed Protecting Privacy Using k-Anonymity
title_full Protecting Privacy Using k-Anonymity
title_fullStr Protecting Privacy Using k-Anonymity
title_full_unstemmed Protecting Privacy Using k-Anonymity
title_short Protecting Privacy Using k-Anonymity
title_sort protecting privacy using k-anonymity
topic Health Informatics
url http://dx.doi.org/10.1197/jamia.m2716
publishDate 2008
physical 627-637
description <jats:title>Abstract</jats:title><jats:p>Objective: There is increasing pressure to share health information and even make it publicly available. However, such disclosures of personal health information raise serious privacy concerns. To alleviate such concerns, it is possible to anonymize the data before disclosure. One popular anonymization approach is k-anonymity. There have been no evaluations of the actual re-identification probability of k-anonymized data sets.</jats:p><jats:p>Design: Through a simulation, we evaluated the re-identification risk of k-anonymization and three different improvements on three large data sets.</jats:p><jats:p>Measurement: Re-identification probability is measured under two different re-identification scenarios. Information loss is measured by the commonly used discernability metric.</jats:p><jats:p>Results: For one of the re-identification scenarios, k-Anonymity consistently over-anonymizes data sets, with this over-anonymization being most pronounced with small sampling fractions. Over-anonymization results in excessive distortions to the data (i.e., high information loss), making the data less useful for subsequent analysis. We found that a hypothesis testing approach provided the best control over re-identification risk and reduces the extent of information loss compared to baseline k-anonymity.</jats:p><jats:p>Conclusion: Guidelines are provided on when to use the hypothesis testing approach instead of baseline k-anonymity.</jats:p>
container_issue 5
container_start_page 627
container_title Journal of the American Medical Informatics Association
container_volume 15
format_de105 Article, E-Article
format_de14 Article, E-Article
format_de15 Article, E-Article
format_de520 Article, E-Article
format_de540 Article, E-Article
format_dech1 Article, E-Article
format_ded117 Article, E-Article
format_degla1 E-Article
format_del152 Buch
format_del189 Article, E-Article
format_dezi4 Article
format_dezwi2 Article, E-Article
format_finc Article, E-Article
format_nrw Article, E-Article
_version_ 1792345308028469251
geogr_code not assigned
last_indexed 2024-03-01T17:21:10.237Z
geogr_code_person not assigned
openURL url_ver=Z39.88-2004&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fvufind.svn.sourceforge.net%3Agenerator&rft.title=Protecting+Privacy+Using+k-Anonymity&rft.date=2008-09-01&genre=article&issn=1067-5027&volume=15&issue=5&spage=627&epage=637&pages=627-637&jtitle=Journal+of+the+American+Medical+Informatics+Association&atitle=Protecting+Privacy+Using+k-Anonymity&aulast=Dankar&aufirst=Fida+Kamal&rft_id=info%3Adoi%2F10.1197%2Fjamia.m2716&rft.language%5B0%5D=eng
SOLR
_version_ 1792345308028469251
author El Emam, Khaled, Dankar, Fida Kamal
author_facet El Emam, Khaled, Dankar, Fida Kamal, El Emam, Khaled, Dankar, Fida Kamal
author_sort el emam, khaled
container_issue 5
container_start_page 627
container_title Journal of the American Medical Informatics Association
container_volume 15
description <jats:title>Abstract</jats:title><jats:p>Objective: There is increasing pressure to share health information and even make it publicly available. However, such disclosures of personal health information raise serious privacy concerns. To alleviate such concerns, it is possible to anonymize the data before disclosure. One popular anonymization approach is k-anonymity. There have been no evaluations of the actual re-identification probability of k-anonymized data sets.</jats:p><jats:p>Design: Through a simulation, we evaluated the re-identification risk of k-anonymization and three different improvements on three large data sets.</jats:p><jats:p>Measurement: Re-identification probability is measured under two different re-identification scenarios. Information loss is measured by the commonly used discernability metric.</jats:p><jats:p>Results: For one of the re-identification scenarios, k-Anonymity consistently over-anonymizes data sets, with this over-anonymization being most pronounced with small sampling fractions. Over-anonymization results in excessive distortions to the data (i.e., high information loss), making the data less useful for subsequent analysis. We found that a hypothesis testing approach provided the best control over re-identification risk and reduces the extent of information loss compared to baseline k-anonymity.</jats:p><jats:p>Conclusion: Guidelines are provided on when to use the hypothesis testing approach instead of baseline k-anonymity.</jats:p>
doi_str_mv 10.1197/jamia.m2716
facet_avail Online, Free
finc_class_facet Medizin, Informatik
format ElectronicArticle
format_de105 Article, E-Article
format_de14 Article, E-Article
format_de15 Article, E-Article
format_de520 Article, E-Article
format_de540 Article, E-Article
format_dech1 Article, E-Article
format_ded117 Article, E-Article
format_degla1 E-Article
format_del152 Buch
format_del189 Article, E-Article
format_dezi4 Article
format_dezwi2 Article, E-Article
format_finc Article, E-Article
format_nrw Article, E-Article
geogr_code not assigned
geogr_code_person not assigned
id ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTE5Ny9qYW1pYS5tMjcxNg
imprint Oxford University Press (OUP), 2008
imprint_str_mv Oxford University Press (OUP), 2008
institution DE-Zwi2, DE-D161, DE-Gla1, DE-Zi4, DE-15, DE-Pl11, DE-Rs1, DE-105, DE-14, DE-Ch1, DE-L229, DE-D275, DE-Bn3, DE-Brt1
issn 1527-974X, 1067-5027
issn_str_mv 1527-974X, 1067-5027
language English
last_indexed 2024-03-01T17:21:10.237Z
match_str elemam2008protectingprivacyusingkanonymity
mega_collection Oxford University Press (OUP) (CrossRef)
physical 627-637
publishDate 2008
publishDateSort 2008
publisher Oxford University Press (OUP)
record_format ai
recordtype ai
series Journal of the American Medical Informatics Association
source_id 49
spelling El Emam, Khaled Dankar, Fida Kamal 1527-974X 1067-5027 Oxford University Press (OUP) Health Informatics http://dx.doi.org/10.1197/jamia.m2716 <jats:title>Abstract</jats:title><jats:p>Objective: There is increasing pressure to share health information and even make it publicly available. However, such disclosures of personal health information raise serious privacy concerns. To alleviate such concerns, it is possible to anonymize the data before disclosure. One popular anonymization approach is k-anonymity. There have been no evaluations of the actual re-identification probability of k-anonymized data sets.</jats:p><jats:p>Design: Through a simulation, we evaluated the re-identification risk of k-anonymization and three different improvements on three large data sets.</jats:p><jats:p>Measurement: Re-identification probability is measured under two different re-identification scenarios. Information loss is measured by the commonly used discernability metric.</jats:p><jats:p>Results: For one of the re-identification scenarios, k-Anonymity consistently over-anonymizes data sets, with this over-anonymization being most pronounced with small sampling fractions. Over-anonymization results in excessive distortions to the data (i.e., high information loss), making the data less useful for subsequent analysis. We found that a hypothesis testing approach provided the best control over re-identification risk and reduces the extent of information loss compared to baseline k-anonymity.</jats:p><jats:p>Conclusion: Guidelines are provided on when to use the hypothesis testing approach instead of baseline k-anonymity.</jats:p> Protecting Privacy Using k-Anonymity Journal of the American Medical Informatics Association
spellingShingle El Emam, Khaled, Dankar, Fida Kamal, Journal of the American Medical Informatics Association, Protecting Privacy Using k-Anonymity, Health Informatics
title Protecting Privacy Using k-Anonymity
title_full Protecting Privacy Using k-Anonymity
title_fullStr Protecting Privacy Using k-Anonymity
title_full_unstemmed Protecting Privacy Using k-Anonymity
title_short Protecting Privacy Using k-Anonymity
title_sort protecting privacy using k-anonymity
title_unstemmed Protecting Privacy Using k-Anonymity
topic Health Informatics
url http://dx.doi.org/10.1197/jamia.m2716