Semi-Published Research Data

Semi-Published Research Data

   Endangered large

Data sets produced in the course of research and shared between researchers, such as by posting to a website or portal but without preservation capability or commitment. Typically the data remains in the hands of the researchers who have the job of maintaining it.

Digital Species: Research Outputs

Trend in 2022:

reduced riskMaterial improvement

Consensus Decision

Added to List: 2019

Trend in 2023:

No change No Change

Previously: Endangered

Imminence of Action

Action is recommended within three years, detailed assessment within one year.

Significance of Loss

The loss of tools, data or services within this group would impact on people and sectors around the world.

Effort to Preserve

It would require a major effort to prevent or reduce losses in this group, possibly requiring the development of new preservation tools or techniques.

Examples

Departmental web servers; project wikis; GitHub repositories.

‘Critically Endangered’ in the Presence of Aggravating Conditions

Originating researcher no longer active or changed research focus; staff on temporary contracts; dependence on single student or staff member; weak or fluid institutional commitment to subject matter; weak institutional commitment to data sharing; complicated or contested intellectual property; encryption; limited or dysfunctional data management planning; web capture challenges that means unlikely to be picked up by automatic crawlers.

Vulnerable in the Presence of Good Practice

Data in preparation for transfer to specialist repository; robust data management planning; documented and managed professionally using data stewards.

This 2019 entry was previously introduced in 2017 under ‘Research Data,’ though without explicit reference to semi-published research data. In 2019, the Jury split the ‘Research Data’ entry into a range of contexts for research outputs, including this addition. The entry draws attention to represent ‘self-help’ data sharing which is to be encouraged as a means to facilitate open science but should not be confused with long-term preservation. The 2021 Jury agreed with the Endangered classification, noting problems with the volume of data being produced but not being kept in a meaningful way. Research data is complex and has specific requirements for documentation that may only be known to subject matter experts. However, data creators (e.g., researchers) are not necessarily well laced to sustain data in the long term.

There were also a few significant changes to the entry in the 2021 Bit List.

  1. Removal of ‘informally’ from the previous entry description (‘shared informally between researchers’) due to possible misperception or misunderstanding; ‘informal’ may imply researchers would perceive the data as low value and not want it captured. This may be the case, so it is important to consider and provide advice to researchers who think there is value in their data.

  2. Two previous entries (Geomagnetic Data and Software and Maritime Archaeological Archives) have been removed as separate entries and incorporated into this broader entry on semi-published research data to highlight the range of content and forms semi-research data can take and highlight the need for specialized knowledge and specialist repositories for preparing and managing the data throughout the lifecycle

  3. The 2021 trend towards reduced risk, which was based on improvements and initiatives towards the preservation of semi-published research data since the entry’s addition in 2019.

The 2022 Taskforce agreed on a trend towards reduced risk based on material improvement over the last year that have not only offered examples of good research data management and preservation practices but also suggest a significant shift towards culture of change and collaboration across different research communities and stakeholders. These include (but are not limited to) improvements and initiatives by the European Open Science Cloud (EOSC), Science Europe, Research Data Alliance (RDA), Digital Curation Centre (DCC) and related projects on the preservation of research data and outputs.

The 2023 Council agreed with the Endangered classification.

Additional Comments

There is a positive trend of increased research data management activity and engagement by libraries and data centres, which should help to ensure that more research datasets are properly deposited in data repositories, rather than left in a 'semi-published' state.

Offering and minting researchers Digital Object Identifiers for datasets deposited at specialist repositories will encourage data citation and increase research impact of individual researchers, which traditionally relied more on publishing papers than datasets.

 

See also:

  • Boccali, T., Sølsnes, A., Thorley, M., Winkler-Nees, S. and Timmermann, M. (2021) ‘Practical Guide to Sustainable Research Data: Maturity Matrices for Research Funding Organisations, Research Performing Organisations, and Research Data Infrastructures’, Science Europe. Available at: http://doi.org/10.5281/zenodo.4769703

  • European Open Science Cloud (EOSC) (n.d.) ‘Development and outputs of the European Open Science Cloud (EOSC) Long-Term Data Preservation Task Force’. Available at: https://www.eosc.eu/advisory-groups/long-term-data-preservation [accessed 24 October 2023]

Read More

Email

Email

   Endangered large

Documents, correspondence and other records created in the course of contractual dealings between individuals and agencies, especially where the subjects are of long duration and may be subject to legal scrutiny at undefined points in the distant future.

Digital Species: Formats

Trend in 2022:

reduced riskMaterial improvement

Consensus Decision

Added to List: 2017

Trend in 2023:

No change No Change

Previously: Endangered

Imminence of Action

Action is recommended within five years, detailed assessment within three years.

Significance of Loss

The loss of tools or services within this group would have a global impact.

Effort to Preserve | Inevitability

It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques.

Examples

Gmail, Hotmail, Yahoo Mail, Outlook, and email in all its forms including individual messages, threads of conversation, mailboxes, email servers and file attachments.

‘Critically Endangered’ in the Presence of Aggravating Conditions

Conflicting and unmanaged IPR; use of personal accounts for professional work and vice versa; proliferation and duplication of attachments; email not recognized as a record; absent, unworkable or inconsistent records management; dependence on free cloud-based services; lack of migration path; lack of preservation planning; perverse incentives to delete; encryption.

‘Vulnerable’ in the Presence of Good Practice

Application of appraisal and selection tools; timely transfer to preservation facility or archive; commitment to transparency; preservation policy; working preservation plan; clear migration path; widespread recognition of email as a record.

2023 Review

This entry was added in 2017, but the Jury did not have the capacity to assess it in detail. It was reviewed and assessed in 2019, including highlights to significant developments, including the recommendations of the Email Preservation Taskforce and the development of the ePADD software. Email presents many preservation challenges, from scale through core technologies, attachments, privacy and intellectual property rights. Because this entry intersects with many others, the aggravating conditions associated with email should be considered in conjunction where relevant.

The 2021 Jury discussed the continued developments in email preservation tools and techniques as well as the growing number of archives preserving email content. At the same time, issues with providing access to preserved email content have arisen. Ongoing records management policies towards corporate or business email need to be better embedded to stop the loss of important email content, and more awareness is needed around the potential of personal email.

While record-keeping legislation and mandates direct retention periods, email document decisions taken by government officials at local, regional and national levels are not always well maintained, if at all; a loss could impact people’s lives along with their ability to assert rights.

For these reasons, there was a 2021 trend towards reduced risk, but the Endangered classification remained.

The 2022 Taskforce agreed on a trend towards reduced risk based on material improvement over the last year with applied examples of good practice, including (but not limited to) approaches to creating a PDF format for the preservation of email, and improvements to existing software, tools and workflows supporting complex email preservation.

The 2023 Council agreed with the Endangered classification and noted a decrease in the imminence of action and effort to preserve.

Additional Comments

The 2023 Council also recommended noting areas of overlap with the ‘Cloud-based Services and Communications Platforms’ entry as it pertains to the saving and preservation of email in cloud-based services such as Microsoft Sharepoint).

Email is hugely important as it has been so pervasive as a communication mechanism for society. Some methods used and responsibility adopted for collecting at the business and public body level (again will differ globally), but this will be a fraction of the communities that use it, and few will be set up for the long-term care of this data.

Case Studies or Examples:

  • Resources and outputs from the EA-PDF project to identify the essential characteristics and optimal functional requirements of email messages and necessary related information in a PDF technology-based archive. PDF Association (2021), ‘EA-PDF’. Available at: https://www.pdfa.org/resource/ea-pdf/ [accessed 24 October 2023].

  • Resources and outputs from the Integrating Preservation Functionality into ePADD (ePADD+) project to integrate long-term email preservation functionality into the program and provide a tool supporting the email archiving lifecycle more robustly. ePADD (n.d.) ‘History’. Available at: https://www.epaddproject.org/about/history [accessed 24 October 2023].

  • Resources and outputs from the RATOM project to develop software to assist archives and other collecting organizations with email analysis, selection, and appraisal tasks. RATOM (n.d.) ‘About’. Available at: https://ratom.web.unc.edu/about/ [accessed 24 October 2023].

See also:

  • Prom, C (2019) ‘Preserving Email (2nd Edition)’, DPC Technology Watch Report 19-01. Available at: http://doi.org/10.7207/twr19-01

  • Artefactual Systems and DPC (2021) ‘Preserving Email’ DPC Technology Watch Guidance Notes. Available at: http://doi.org/10.7207/twgn21-08

  • Murray K and Prom, C (2018) ‘The Future of Email Archives: A Report from the Task Force on Technical Approaches for Email Archives’, CLIR. Available at: https://clir.wordpress.clir.org/wp-content/uploads/sites/6/2018/08/CLIR-pub175.pdf [accessed 24 October 2023].

  • “Novice to Know-How: Email Preservation”, was conceived of and funded by The National Archives (UK) and delivered by the DPC. This course is aimed at learners who already have a solid foundational knowledge of digital preservation (e.g. they have completed the original N2KH learning pathway) and wish to gain practical skills in relation to the preservation of email. DPC and The National Archives (UK) (2023), ‘Novice to Know-How: Email Preservation’. Available at: https://www.dpconline.org/digipres/prof-development/n2kh-online-training [accessed 24 October 2023].

  • Ville, M. (2023) ‘2013 – 2023: A Review of Ten Years of Email Archiving in France’, iPRES 2023 Conference, Urbana-Champaign, Illinois, USA, 19–22 September.

Read More

Scroll to top