Glynn Edwards

Glynn Edwards

Last updated on 28 November 2018

Glynn Edwards, Josh Schneider and Peter Chan are part of the team at Stanford University Libraries

Email offers singular insight into and evidence of a person's self-expression, as well as records of collaboration, networks, and transactions. Email communications of prominent individuals, including politicians, writers, scholars, and the like, reveal not only their professional and personal actions, decisions, and creative output, but also relationships within society and communities. Thus, the appeal of email collections extends beyond historians to all manner of researchers, journalists, and the general public seeking to obtain insight into individuals and their lives.

ePADD is free and open-source software developed by Stanford University Libraries and partners that supports the appraisal, processing, preservation, discovery, and delivery of email archives of potential historical or cultural value. Over the past five years, ePADD has pioneered the application of machine learning and natural language processing to confront challenges that collection donors, archivists, and researchers routinely face in donating, administering, preserving, or accessing email collections. This includes screening email for confidential, restricted, or legally-protected information, preparing email for preservation, and making the resulting files (which incorporates preservation actions taken by the repository) discoverable and accessible to researchers.  

ePADD incorporates several automated functionalities which help simplify screening and optimization of access to the email archive's intellectual content. This work, which takes place during the initial import of email into ePADD, supports all subsequent activities by donors, cultural memory institutions, and researchers. Here are a few examples: ePADD resolves names and email addresses associated with a single correspondent. Resolved correspondent names can be browsed and graphed alphabetically or by volume of messages exchanged with the email account holder. ePADD also employs a custom fine-grained named entity recognizer that extracts categories of entities bootstrapped from DBpedia. These include persons, organizations, locations, government entities, political parties, companies, universities, diseases, and awards. Extracted entities can be browsed alphabetically or by volume of messages. Named entity recognition effectively convert unstructured data (the email message body) into structured data which can be queried and browsed.

Scroll to top