Sara Day Thomson

Sara Day Thomson

Last updated on 27 January 2017

UK_Data_Service_logoThis year, DPC's Research and Practice team has been working on two studies commissioned by the UK Data Service as part of their Big Data Network Support. Both Preserving Social Media and Preserving Transactional Data will address the issues facing long-term access to this big, fast-moving data and will be published as Technology Watch reports. As part of Preserving Social Media, this series of posts examines some of the points of tension in the efforts of research and collecting institutions to preserve this valuable record of life in the 21st century. 


I'm Sara Day Thomson, Project Officer for the DPC, specialising in the pursuit of new ideas in digital preservation. 

If you want to get involved, follow me on Twitter @sdaythomson and the DPC account @DPC_chatter to get the scoop on upcoming DPC events and activities!

Social Media for Good Episode 2: Updates in the World of Social Media Preservation and Research

Back in September, I regaled you with an introduction to the issues covered by DPC’s Technology Watch report Preserving Social Media. Since then, some good things have happened.

Twitter and Politwoops settled their differences and Politwoops will soon be back online, tracking the dastardly and otherwise deleted tweets of the official accounts of politicians. Now perhaps I must moan slightly less vociferously about the shifting attitudes of platforms away from transparency and away from using social media data for social good. Although, there is still much work to be done in encouraging platforms to cooperate with non-commercial research.

Justin Littman and team at George Washington University have been spiffing up their Social Feed Manager tool to help capture and preserve social media in tandem with traditional web archives. More information about the new specs for their open-source tool will be available soon—watch the @DPC__chat Twitter account for updates!


The Documenting the Now project team at Washington University in St. Louis, University of California at Riverside, and the Maryland Institute for Technology have also started work on this new project to help researchers capture and curate their own Twitter archives. In addition to building an open source Web application that supports appraisal, the project team, including Bergis Jules and Ed Summers, aim to cultivate the converstaion between scholars, archivists, journalisists, and human rights activists. Aligning with the Social Feed Manager as well as with Rhizome's WebRecorder, DocNow answers the need to comply with the needs of researchers and archivists as well as the ethical standards that consider the role of content owners.

Interest and collaboration in furthering social media research has gained momentum in the information science arena. On 12 February, I had the pleasure of joining Cambridge University Library in discussing the possibilities and on-going troubles with using social media data in research. Katrin Weller, who some may know from the DPC Briefing Day on Preserving Social Media, presented new developments in support strategies for social media research.

Katrin also shared some encouraging news from GESIS regarding the deposit of valuable social media research data—implementing strategies developed in their pilot study on Twitter data from the German Bundestag Elections in 2013. The event in Cambridge also featured social media researchers from a number of disciplines, from psychiatry to social science. Becky Inkster shared developments in the use of social media in mental health and neuroscience research.

Anne Alexander discussed the sensitivity of using social media data around political uprisings, namely the #arabspring hashtag on Twitter and on-going Middle Eastern political activism on Facebook.

A general theme has surfaced—at the Cambridge event and other related events in recent months—that emphasizes the importance of context and provenance for the use of social media data in research. Most researchers agree that capturing social media data isn’t enough. We need to be able to discover how well social media data does or does not represent the communities under observation. Who are social media users? How does the platform itself influence users’ interactions? New work also reveals the heightened importance of archived social media datasets that make it possible for researchers to re-use data. In order for this data to be useful, it must be curated and preserved with sufficient metadata to explain the conditions of its original capture and any subsequent actions taken to refine the data. For instance, a researcher may remove a particular hashtag or account as a study progresses, changing the resulting dataset. Archivists face a new mandate to develop tools and practices that support these conditions for re-use and reproducibility.


At the International Digital Curation Conference in Amsterdam at the end of February, the research data management community reinforced these edicts. In her keynote on the final day of the main conference, Susan Halford cast a social scientist’s critical eye over growing hype around ‘big data’ and the potential of social media data. She posed the question of whether data science will be this era’s ‘enlightenment’ that promises to finally reveal ‘the truth’.

She argues that the promise of data science lies in the ability to contextualise social media and curate these data to ensure reproducibility. Without these conditions, social media data (and other forms of big data) will never produce the discoveries and innovations both the academic and commercial sectors envision. Halford’s keynote, as well as the opening day’s keynote by Barend Mons—who called for 500,000 new trained data scientists in the next 10 years—stirred a lively discussion. To see more of the conversation, have a look at the #IDCC16 Twitter conversation in this Google spreadsheet, expertly captured by Alastair Dunning at TU Delft using this awesome tool!

But social media data isn’t the whole conversation. Data science and the continuing surge of big data analytics draws from all sources of data—generated through interactions on the web or through other routine capture. From energy meters to online retail, data captured and curated to be compatible with other sources, can support new discoveries about society and human behaviour. Preserving Transactional Data, the next installment in the DPC Technology Watch series in collaboration with the UKDS Big Data Network Support, presents the issues of preserving other forms of big data. Watch this space for more and don't forget to sign up for DPC and UKDS's Briefing Day on Preserving Transactional Data on the 17 March in London!

Scroll to top