Endangered large

Data sets produced in the course of research and shared informally between researchers such as by posting to a website or portal but without preservation capability or commitment. Typically the data remains in the hands of the researchers who have the job of maintaining it.

Group: Research Outputs

Trend in 2021:

Consensus Decision

Added to List: 2019

reduced riskTrend towards reduced risk

Previous classification: Endangered


Trend in 2022:


reduced riskMaterial improvement


Imminence of Action

Action is recommended within three years, detailed assessment within one year.

Significance of Loss

The loss of tools, data or services within this group would impact on people and sectors around the world.

Effort to Preserve

It would require a major effort to prevent losses in this group, such as the development of new preservation tools or techniques.


Departmental webservers; project wikis; GitHub repositories

‘Critically Endangered’ in the Presence of Aggravating Conditions

Originating researcher no longer active or changed research focus; staff on temporary contracts; dependence on single student or staff member; weak or fluid institutional commitment to subject matter; weak institutional commitment to data sharing; complicated or contested intellectual property; encryption; limited or dysfunctional data management planning; web capture challenges that means unlikely to be picked up by automatic crawlers

Vulnerable in the Presence of Good Practice

Data in preparation for transfer to specialist repository; robust data management planning; documented and managed professionally;

2021 Jury Review

This 2019 entry was previously introduced in 2017 under ‘Research Data’, though without explicit reference to semi-published research data. In 2019, the Jury split the ‘Research Data’ entry into a range of contexts for research outputs including this addition. The entry draws attention to represent ‘self-help’ data sharing which is to be encouraged as a means to facilitate open science but shouldn’t be confused with long-term preservation. The 2021 Jury agreed with the endangered classification, noting problems with volume of data being produced but not being kept in meaningful way. Research data is complex and has specific requirements for documentation which may only be known to subject matter experts. However, data creators (e.g. researchers) are not necessarily well placed to sustain data in the long term.

There have also been a few significant changes to the entry in the 2021 Bitlist.

  1. Removal of ‘informally’ from the previous entry description (‘shared informally between researchers’) due to possible misperception or misunderstanding; ‘informal’ may imply researchers would perceive the data as low value and not want it captured. This may be the case, so it is important to consider, and to provide advice and input to researchers who think there is value in their data. 

  2. Two previous entries (Geomagnetic Data and Software and Maritime Archaeological Archives) have been removed as separate entries and incorporated into this broader entry on semi-published research data to highlight the range of content and forms semi-research data can take, and to highlight the need for specialised knowledge and specialist repositories for preparing and/or managing the data throughout the lifecycle

  3. 2021 trend towards reduced risk based the improvements and initiatives towards preservation of semi-published research data since its addition in 2019.

2022 Trend

The 2022 Taskforce agreed on a trend towards reduced risk based on material improvement over the last year that have not only offered examples of good research data management and preservation practices but also suggest a significant shift towards culture of change and collaboration across different research communities and stakeholders. These include (but are not limited to) improvements and initiatives by the European Open Science Cloud (EOSC), Science Europe, Research Data Alliance (RDA), Digital Curation Centre (DCC) and related projects on the preservation of research data and outputs.

Additional Comments


Scroll to top