Jared Lyle

Jared Lyle

Last updated on 1 December 2017

Jared Lyle is Archivist and Director for ICPSR Curation Services and the DDI Alliance in Michigin, USA

The Inter-university Consortium for Political and Social Research (ICPSR), a social and behavioral science data archive based within the Institute for Social Research at the University of Michigan, is happy to celebrate the first ever International Digital Preservation Day.  ICPSR curates, preserves and shares 10,000+ data collections.  We also provide educational opportunities, including our Summer Program in Quantitative Methods and Social Research, and conduct data stewardship research projects

ICPSR has been preserving data collections for over 50 years.  Over the decades, we’ve archived data from punched cards and floppy disks and other media, as well as data in a wide range of formats, including now-decommissioned OSIRIS dictionaries written in EBCDIC.  We still do the occasional legacy conversion, such as rescuing data from a pioneering 1950s study of retirement, although the majority of data we acquire today arrive in more modern formats, such as SAS, SPSS, Stata, and R.  Regardless of the age, type, or shape of the data, preservation opportunities and challenges abound.

One such preservation opportunity is describing and documenting research data that are managed and analyzed in leading statistical software packages (SPSS®, SAS®, Stata®, R) used by most social scientists.  Statistics packages offer limited ways of describing data, and they provide little help in documenting data transformations and provenance. 

Lyle 1

At best the operations performed by the statistical package are described in a script, which more often than not is unavailable to future data users. Even if metadata exists, updating it to reflect changes made by a statistics package is a manual process.  Data cannot be truly reusable and interoperable without accurate metadata.

Lyle 2

ICPSR, in collaboration with several partners, is leading the C²Metadata project[1] to create tools that read scripts for the four main statistics packages and insert data transformation metadata into Data Documentation Initiative (DDI) metadata files.  Most social science data archives use DDI metadata files to describe and preserve data collections.  We are creating a Standard Data Transformation Language (SDTL), which represents data transformations in a JSON format that is independent of the original statistics package.  Our software will also render SDTL metadata into human readable forms for inclusion in codebooks and other forms of documentation.  Automated capture of metadata annotated with data transformation and provenance will help enable reuse -- both today and into the future.

Lyle 3

Interested in learning more about ICPSR and its digital preservation activities?  Visit our web site, or follow us on social media (Twitter, Facebook, or YouTube).  Happy International Digital Preservation Day!

[1] The C2Metadata project is supported by the Data Infrastructure Building Blocks (DIBBs) program of the United States National Science Foundation through grant NSF ACI-1640575.  ICPSR is proud to have as partners in this project the American National Election Studies, Colectica, MTNA, NORC, and the Norwegian Centre for Research Data.

Scroll to top