Unpublished Research Data

   Practically Extinct small

Data sets produced in the course of research but never shared or made available outside of the initial research team.

Digital Species: Research Outputs

Trend in 2022:

reduced riskMaterial Improvement

Consensus Decision

Added to List: 2019

Trend in 2023:

reduced riskMaterial Improvement

Previously: Practically Extinct

Imminence of Action

Action is recommended within twelve months, detailed assessment is a priority.

Significance of Loss

The loss of tools, data or services within this group would impact on people and sectors around the world.

Effort to Preserve | Inevitability

Loss seems likely: by the time tools or techniques have been developed the material will likely have been lost.

Examples

Unpublished research data can include different kinds of unpublished research data outputs, such as unstructured or structured datasets, databases, or other organized collections of computerized information or data such as periodical articles, books, graphics and multimedia

‘Practically Extinct’ in the Presence of Aggravating Conditions

Originating researcher no longer active or changed research focus; staff on temporary contracts; dependence on single student or staff member; weak or fluid institutional commitment to subject matter; weak institutional commitment to data sharing; complicated or contested intellectual property; encryption; limited or dysfunctional data management planning.

‘Endangered’ in the Presence of Good Practice

Replication and documentation; data management plan; preservation pathway agreed

2023 Review

This entry was added in 2019 as a subset of the ‘Unpublished Research Outputs’ reported in 2018, which was split into entries to draw attention to the different preservation requirements and concerns that arise. This entry relates specifically to research data which has not been shared or published by any means and is thus in contravention of the ‘FAIR’ principles which require data to be Findable, Accessible, Interoperable and Reusable. Without proper planning, research data can have a high barrier to re-use, especially where documentation is lacking. The Jury takes the view that documentation and re-use go hand in hand, and researchers should be under no illusions that data not documented or shared faces material and immediate risks of extinction. Over the years, there have been numerous attempts to address the risk of data loss, and it was the 2019 Jury’s hope that this is now a small group. The 2021 Jury agreed with the description and Practically Extinct classification but added a trend towards reduced risk in light of more robust collaborative initiatives to jointly address the risk of data loss in and across research communities.

The 2022 Taskforce agreed on a trend towards reduced risk based on material improvement over the last year that have not only offered examples of good research data management and preservation practices but also suggest a significant shift towards culture of change and collaboration across different research communities and stakeholders. These include (but are not limited to) improvements and initiatives by the European Open Science Cloud (EOSC), Science Europe, Research Data Alliance (RDA), Digital Curation Centre (DCC) and related projects on the preservation of research data and outputs.

The 2023 Council changed the classification from Practically Extinct to Critically Endangered. This change was due to the fact that there is a positive trend of increased research data management activity and engagement by libraries, which should help to ensure that more research datasets are properly deposited in data repositories. There is a general trend across many if not most HEI libraries that are producing research to do more in terms of research data management and much larger part of what libraries do, with activities in this area growing and scaling up. However, the scale of unpublished datasets is hard to assess, as they are by definition unknown. Due to this, it was recommended that the classification change to Critically Endangered.

Additional Comments

If we do not know it exists, does it exist? It may also be that in certain circumstances this includes data that is unfavourable and has intentionally not been published. If perceived as high-value, someone in the research team will likely take steps to ensure it is protected. We can be proactive and offer advice, but ultimately it is down to them. We cannot keep everything!

This is a wide field, so the scale and impact are hard to describe, but the risk is higher than papers due to potential file format complexity.

Success is dependent on how successful an institution's research data management communications are. Advocacy and research are needed to show the scale of the problem, as well as education regarding open science and preservation.

Simply having a data management plan prepared is not sufficient, it needs to be properly implemented and kept up to date and relevant for both the researcher and the repository which will take a copy of the data. DMP should be used to appraise what data is worth long term preservation (e.g. NERC Data Value Check List), and what data is of lower quality, non-reusable, and even a reputational risk should it be shared further.


Scroll to top