dc.description.abstract | the importance of database anonymization has become increasingly
critical for organizations that publish their database to the public. current
security measures for anonymization poses different manner of drawbacks. k-anonymity
is prone to many varieties of attack; !-diversity does not work well
with categorical or numerical attributes; t-closeness erases too much information
in the database. moreover, some measures of information loss are designed for
anonymization measure, such as k-anonymity, where sensitive attributes do not
play a part in measuring database's security. not measuring the re-distribution of
sensitive attributes will result in an underestimate for information loss such as 1-
diversity or t-closeness which intentionally tries removing the association between
non-sensitive attributes and sensitive attributes for better protecting individuals
from being indentified.
this thesis provides a more generalized version of !-diversity that will
better protect categorical attributes and numerical attributes and analyzes the
effectiveness and complexity of our new security scheme. another focus of this
thesis is to design a better approach of measuring information loss and lay down
a new standard for evaluating information loss on security measures such as 1-
diversity and t-closeness and quantify actual information loss from deliberately
hiding relations between non-sensitive attributes and sensitive attributes. this
new standard of information loss measure should provide a better estimation of
the data mining potential remained in a generalized database.
this thesis also proves that unlike k-anonymity which can be solved in
polynomial time when k=2. 1-diversity in fact remains np-hard in the special
case where 1=2, and even when there are only 2 possible sensitive attributes in
the alphabet. | |