Hervé L'Hours

Hervé L'Hours

Last updated on 11 June 2018

So… many of us are offering services around repositories and archives and the data they contain. But the different uses of the term ‘service’ can muddy the water. The UK Data Archive name implies a traditional OAIS-style preservation repository but we’ve always provide additional ‘services’ and through the UK Data Service we belong to a partnership whose name puts a ‘service’ at the heart of the brand. For us these include the archival services of deposit, curation and data publishing, but also extensive support and training.

‘Service’ is as much of a go-to word as ‘solutions’ but is it just as nebulous? Are services more of an operational concept? Or are ‘services’ a clearly definable constructs we can use when we’re designing and delivering our work to data depositors and users? 

My colleague Darren Bell is our Repository Architect at the UK Data Service as well as leading on our new Big Data Platform (http://dsaap.info/).

Darren is an entities and state machines kind of a guy who addresses each study as one of the possible boundaries around a set of data points, which need to be defined and handled through a series of distinct technical operations.  These are designed and executed as low-level auditable transactions, in that they can be fully replayed or rolled back if needed.

I’m the Repository and Preservation Manager. I’m an objects, agents, actors and processes kind of a guy. Considering how different classes of research digital object imply a set of curation workflows that must be consistently applied towards an agreed outcome, whether through the direct manipulation of skilled humans or mediated through some technical product or software package.

We’re both keenly aware of our distinct perspectives and that in reality the bits of the job that aren’t literally made of bits, are actually made of people.

We’re working on our research study-driven repository workflows so that we can identify the likely changes and challenges we will face in moving towards ‘new and novel’ research opportunities with linked ‘big’ data. This also forms part of our contribution work on legal, ethical and quality issues around these ‘new’ data in the SERISS project. These range from the quality and access issues around social media data, to enriching surveys through administrative population linkage, to new challenges in handling deposits of continuously streaming data at scale- how should we think about versions of a dataset when new additions are arriving every 50 milliseconds?

One big theme is that new data sources and technologies are moving (statistical) data from being a product to being a service, but has anyone clearly articulated what a "service" is in this context? Do we create an infrastructure of people, processes and technology in direct response to data supply and demand metrics? Or are we simply implying that we go beyond mediating data access to offer support in leveraging the research opportunities these data provide?

We need to communicate our ideas to stakeholders with a range of perspectives. As ever, funders want to see data made openly available to increase re-use and impact while researchers, particularly those who collect data around vulnerable groups, want to protect their data subjects. But we must also take into account the wider stakeholders: from a public with rapidly evolving views on the use of their data, to ethics committees, legislators and the providers of complex technologies and platforms that we increasingly rely upon.

 In searching for a simple, clear hook that would highlight the commonalities rather than the differences between data stewards we came up with:

“Which digital objects are you pushing through which parts of the data lifecycle to deliver what services?”

We know there are numerous perspectives on both the objects and lifecycles. We can take that into account, though, and agree on how to define and use these concepts to communicate effectively about business process and technology design. But when it came to services, we really struggled to find a common understanding that we could operationalize (I know!) into describing, designing and delivering, er… services.

I like the S word. I like the idea of delivering a service, meeting demand, managing supply, identifying resources, assessing quality and all that. Describing something as a service implicitly includes the idea of working to deliver satisfaction. But is it just too widely used with too many interpretations to be useful in an operational sense?  Do services always have to invoke SLAs, suggesting that the “level” of service which we commit to deliver is quantifiable?

So, do we go back to the classic OAIS functions (Ingest, Access etc.) and expand them to specify our work from a business process viewpoint, or does anyone know of any RDM/Repository efforts to clearly define services with a view to designing and delivering them? I’d be happy to see comments below or hear from anyone directly at This email address is being protected from spambots. You need JavaScript enabled to view it.. I will of course collate the outcome and report back via the DPC.

Scroll to top