Privacy & Open Science: Universal Numerical Fingerprint

There is a tension between open and transparent science and privacy concerns. I have and will continue to work with real-world web history data, which, even when participants contribute it with fully informed consent, potentially has quite a bit of private information contained in it. Because of the level of detail it contains it would be difficult to anonymize in a way that didn’t strip away its utility.

What’s an open science advocate to do? Enter the universal numerical fingerprint (UNF):

The universal numerical fingerprint begins with “UNF”. Four features make the UNF especially useful: The UNF algorithm’s cryptographic technology ensures that the alphanumeric identifier will change when any portion of the data set changes. Not only does this assure future researchers that they can use the same data set referenced in a years-old journal article, it enables the data set’s owner to track each iteration of the owner’s research. When an original data set is updated or incorporated into a new, related data set, the algorithm generates a unique UNF each time. The UNF is determined by the content of the data, not the format in which it is stored. For example, you create a data set in SPSS, Stata or R, and five years later, you need to look at your data set again, but the data was converted to the next big thing (NBT). You can use NBT, recompute the UNF, and verify for certain that the data set you’re downloading is the same one you created originally. That is, the UNF will not change. Knowing only the UNF, journal editors can be confident that they are referencing a specific data set that never can be changed, even if they do not have permission to see the data. In a sense, the UNF is the ultimate summary statistic. The UNF’s noninvertible, cryptographic properties guarantee that acquiring the UNF of a data set conveys no information about the content of the data. Authors can take advantage of this property to distribute the full citation of a data set–including the UNF–even if the data is proprietary or highly confidential, all without the risk of disclosure. http://best-practices.dataverse.org/data-citation/#data-citation-standard

Continue Reading

Data fraud is not particular to graduate students

In the wake of the Michael LaCour (a political science graduate student at UCLA) data fabrication scandal that erupted last week (evidence, article, hashtag) I’ve heard several professor friends worry that their own students could have faked data, since they didn’t have procedures in place to catch fraud. Advisor-student relationships are often family-like, such that your advisor’s advisor would often half-jokingly be referred to as your grand-advisor. Advisors, like parents, range widely in the trust they place in their ‘children.’ However, data fraud is not a particular penchant of graduate students.

Take the case of Diederik Staples, a social psychologist in the Netherlands who faked studies for many years, including the data for studies on which his students based their dissertations. The more powerful supervisor is much more likely to harm the graduate student than the other way around. While I am absolutely in favor of common-sense transparent procedures to protect data integrity, like what Thomas Leper describes, I hope this incident doesn’t inspire paranoia on the part of graduate advisors, or anyone else. I suspect it is quite rare that people are willing to risk their career and reputation forever by fabricating data.

This makes such cases quite interesting, and my web browsing history visualization from last Friday shows.

laCourDay

Continue Reading

Web page refresh

With a blog post coming out on The Policy and Internet Blog tomorrow it felt like time to refresh my personal website and start from a clean WordPress installation. I’ll be doing a lot of “reinstallation” over the next few months as I’ll be moving back to the US after a wonderful nearly 3 years here in the Netherlands at Erasmus University Rotterdam to a position at American University in Washington D.C. in the School of Communication. I also just created another new website for a Chrome browser extension I’m working on called Web Historian, so I saw how easy WordPress is to work with these days. This website is currently pretty simple and I hope to keep it that way :)

Continue Reading