Ericka Menchen-Trevino

Privacy & Open Science: Universal Numerical Fingerprint

Privacy & Open Science: Universal Numerical Fingerprint

There is a tension between open and transparent science and privacy concerns. I have and will continue to work with real-world web history data, which, even when participants contribute it with fully informed consent, potentially has quite a bit of private information contained in it. Because of the level of detail it contains it would be difficult to anonymize in a way that didn’t strip away its utility.

What’s an open science advocate to do? Enter the universal numerical fingerprint (UNF):

The universal numerical fingerprint begins with “UNF”. Four features make the UNF especially useful: The UNF algorithm’s cryptographic technology ensures that the alphanumeric identifier will change when any portion of the data set changes. Not only does this assure future researchers that they can use the same data set referenced in a years-old journal article, it enables the data set’s owner to track each iteration of the owner’s research. When an original data set is updated or incorporated into a new, related data set, the algorithm generates a unique UNF each time. The UNF is determined by the content of the data, not the format in which it is stored. For example, you create a data set in SPSS, Stata or R, and five years later, you need to look at your data set again, but the data was converted to the next big thing (NBT). You can use NBT, recompute the UNF, and verify for certain that the data set you’re downloading is the same one you created originally. That is, the UNF will not change. Knowing only the UNF, journal editors can be confident that they are referencing a specific data set that never can be changed, even if they do not have permission to see the data. In a sense, the UNF is the ultimate summary statistic. The UNF’s noninvertible, cryptographic properties guarantee that acquiring the UNF of a data set conveys no information about the content of the data. Authors can take advantage of this property to distribute the full citation of a data set–including the UNF–even if the data is proprietary or highly confidential, all without the risk of disclosure.

Leave a Reply

Your email address will not be published. Required fields are marked *