CRADLE Seminar - "Data Triage and Data Analytics for Personal Digital Collections" by Kam Woods
Talk: Data Triage and Data Analytics for Personal Digital Collections
Speaker: Kam Woods, postdoc here at SILS
Abstract: Understanding what should be considered private data in a personal digital collection is an increasingly complex task. While most users have a reasonably good idea of the contents of their hard drives, USB sticks, mobile digital devices, and other storage media, relatively few understand the wealth of private data that can be extracted from these media with modern forensic tools. Such data may include sensitive personal data (Social Security Numbers, medical records, credit card data, contacts, e-mail and social media); fragments of files believed to have been erased; and backup, versioning, and hibernation data that the user may be completely unaware of.
Problematically, there are no simple tools in existence that allow users to build easily understandable reports of where this data is, how it is encoded, how it can be protected or removed, and what dependencies may exist should the user wish to retrieve it in the future.
In this talk Kam will describe and provide real-world demonstrations of open source digital forensics tools that can be used to extract, index, and report on these types of data. He will discuss ongoing work as part of the BitCurator project to provide simple, "one pass" versions of these tools that can be readily deployed both by individual users and by librarians and digital archivists to analyze digital collections.
[This is a preliminary version of a talk to be given at Personal Digital Archiving 2012, San Franscisco, on February 23rd]
Bio: Kam Woods is a Postdoctoral Research Associate in the School of Information and Library Science at the University of North Carolina at Chapel Hill. He is technical lead on the BitCurator project, a Mellon-funded initiative to bring digital forensics software and expertise to collecting institutions.
