Semi-Published Research Data

   Endangered large

Data sets produced in the course of research and shared between researchers, such as by posting to a website or portal but without preservation capability or commitment. Typically the data remains in the hands of the researchers who have the job of maintaining it.

Digital Species: Research Outputs

Trend in 2022:

reduced riskMaterial improvement

Consensus Decision

Added to List: 2019

Trend in 2023:

No change No Change

Previously: Endangered

Imminence of Action

Action is recommended within three years, detailed assessment within one year.

Significance of Loss

The loss of tools, data or services within this group would impact on people and sectors around the world.

Effort to Preserve

It would require a major effort to prevent or reduce losses in this group, possibly requiring the development of new preservation tools or techniques.


Departmental web servers; project wikis; GitHub repositories.

‘Critically Endangered’ in the Presence of Aggravating Conditions

Originating researcher no longer active or changed research focus; staff on temporary contracts; dependence on single student or staff member; weak or fluid institutional commitment to subject matter; weak institutional commitment to data sharing; complicated or contested intellectual property; encryption; limited or dysfunctional data management planning; web capture challenges that means unlikely to be picked up by automatic crawlers.

Vulnerable in the Presence of Good Practice

Data in preparation for transfer to specialist repository; robust data management planning; documented and managed professionally using data stewards.

This 2019 entry was previously introduced in 2017 under ‘Research Data,’ though without explicit reference to semi-published research data. In 2019, the Jury split the ‘Research Data’ entry into a range of contexts for research outputs, including this addition. The entry draws attention to represent ‘self-help’ data sharing which is to be encouraged as a means to facilitate open science but should not be confused with long-term preservation. The 2021 Jury agreed with the Endangered classification, noting problems with the volume of data being produced but not being kept in a meaningful way. Research data is complex and has specific requirements for documentation that may only be known to subject matter experts. However, data creators (e.g., researchers) are not necessarily well laced to sustain data in the long term.

There were also a few significant changes to the entry in the 2021 Bit List.

  1. Removal of ‘informally’ from the previous entry description (‘shared informally between researchers’) due to possible misperception or misunderstanding; ‘informal’ may imply researchers would perceive the data as low value and not want it captured. This may be the case, so it is important to consider and provide advice to researchers who think there is value in their data.

  2. Two previous entries (Geomagnetic Data and Software and Maritime Archaeological Archives) have been removed as separate entries and incorporated into this broader entry on semi-published research data to highlight the range of content and forms semi-research data can take and highlight the need for specialized knowledge and specialist repositories for preparing and managing the data throughout the lifecycle

  3. The 2021 trend towards reduced risk, which was based on improvements and initiatives towards the preservation of semi-published research data since the entry’s addition in 2019.

The 2022 Taskforce agreed on a trend towards reduced risk based on material improvement over the last year that have not only offered examples of good research data management and preservation practices but also suggest a significant shift towards culture of change and collaboration across different research communities and stakeholders. These include (but are not limited to) improvements and initiatives by the European Open Science Cloud (EOSC), Science Europe, Research Data Alliance (RDA), Digital Curation Centre (DCC) and related projects on the preservation of research data and outputs.

The 2023 Council agreed with the Endangered classification.

Additional Comments

There is a positive trend of increased research data management activity and engagement by libraries and data centres, which should help to ensure that more research datasets are properly deposited in data repositories, rather than left in a 'semi-published' state.

Offering and minting researchers Digital Object Identifiers for datasets deposited at specialist repositories will encourage data citation and increase research impact of individual researchers, which traditionally relied more on publishing papers than datasets.


See also:

  • Boccali, T., Sølsnes, A., Thorley, M., Winkler-Nees, S. and Timmermann, M. (2021) ‘Practical Guide to Sustainable Research Data: Maturity Matrices for Research Funding Organisations, Research Performing Organisations, and Research Data Infrastructures’, Science Europe. Available at:

  • European Open Science Cloud (EOSC) (n.d.) ‘Development and outputs of the European Open Science Cloud (EOSC) Long-Term Data Preservation Task Force’. Available at: [accessed 24 October 2023]

Scroll to top