DPC

Current Hard Disk Technologies

Current Hard Disk Technologies

   Vulnerable small

Materials saved to storage devices with a variety of underlying magnetic or solid-state (flash) technologies that are hardwired into a computer still under warranty or supported: typically hard disks that are less than five years old.

Digital Species: Integrated Storage

Trend in 2022:

No changeNo Change

Consensus Decision

Added to List: 2019

Trend in 2023:

No changeNo Change

Previously: Vulnerable

Imminence of Action

Action is recommended within five years, detailed assessment within three years.

Significance of Loss

The loss of tools, data or services within this group would impact on many people and sectors.

Effort to Preserve | Inevitability 

Loss of material in this group could be entirely avoidable if provided the means to deploy proven tools and techniques.

Examples

Direct Attached Storage (DAS) such as magnetic or solid-state drives integrated into individual laptops or workstations and into smaller scale storage facilities.

‘Endangered’ in the Presence of Aggravating Conditions

Encryption; poor handling; poor storage; lack of consistent replication; failure of external (dependencies, e.g., suppliers, security); political or commercial interference; failure of internal dependencies (e.g., power supply, disk controller); overly aggressive compression; poor information security; lack of integrity-checking; lack of strategic investment; lack of warranty; unenforceable warranty.

‘Lower Risk’ in the Presence of Good Practice

Backup to different technology; backup to diverse locations; documentation of assets; integrity checking; preservation planning; refreshment planning; export functionality; resilient to hacking; selection and appraisal criteria; version control; resilient funding; technology watch; enforceable warranty; disaster planning.

2023 Review

This entry was added in 2019 to ensure that the range of media storage is properly assessed and presented. It was reviewed in 2021 with a noted trend towards greater risk in light of the continued shift towards reliance on cloud storage with computers increasingly reducing hard disk for solid-state storage and commercial motivations for less support, and reviewed in 2022 with no noted increase in trend towards even greater or reduced risk.

The 2023 Council agreed with the current Vulnerable classification, with overall risks remaining on the same basis as before (no change to the trend), while also noting a slight decrease in the effort needed to preserve and the imminence of action required when compared to the 2021 Jury review.

Additional Comments

As people increasingly select other storage methods, such as cloud, they are less likely to maintain existing content on portable hard disks, which means the portable hard disks are more likely to be overlooked or ignored (e.g., left in drawers) rather than checked and refreshed. There are also indications of increasing prevalence of soldered-in flash storage which cannot easily be accessed in the case of device failure.

Case Studies or Examples:

  • Some new technologies like shingling, HAMR/MAMR and multiple actuators have given HDD technology–and, more importantly for preservation, interfaces such as SATA and SAS–a new lease on life. Nevertheless, the writing is on the wall as flash and related technologies move to NVME and CXL interfaces. See Mellor, C. (2023) ‘Pure: No more hard drives will be sold after 2028’, Blocks & Files. Available at https://blocksandfiles.com/2023/05/09/pure-no-more-hard-drives-2028/ [accessed 24 October 2023]

  • For example, SSDs can be remarkably sensitive to storage conditions when unpowered. See Cox, A. (2013) ‘JEDEC SSD Specifications Explained’, JC-64.8. Available at: https://www.jedec.org/sites/default/files/Alvin_Cox%20%5bCompatibility%20Mode%5d_0.pdf [accessed 24 October 2023]

See also:

Read More

Recently Commissioned or Completed Media Art

Recently Commissioned or Completed Media Art

  Vulnerable small

Media art currently displayed in a gallery or in the process of being displayed.

Digital Species: Media Art

Trend in 2022:

No changeNo Change

Consensus Decision

Added to List: 2019

Trend in 2023:

No changeNo Change

Previously: Vulnerable

Imminence of Action

Action is recommended within twelve months, detailed assessment is a priority.

Significance of Loss

The loss of tools, data or services within this group would impact on many people and sectors.

Effort to Preserve | Inevitability

It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques.

Examples

Media art recently acquired by galleries that utilize specific hardware and software in order to be accessed or exhibited.

‘Endangered’ in the Presence of Aggravating Conditions

Lack of documentation to enable maintenance; lack of clarity with respect to intellectual property; complex interdependencies on specific hardware, software or operating systems; lack of capacity in the gallery or workshop; lack of strategic investment; complex external dependencies; lack of documentation about artist intent.

‘Lower Risk’ in the Presence of Good Practice

Strong documentation; clarity of preservation path and ensuing responsibilities; proven preservation plan; capacity of workshop to support artwork at de-installation; capacity of gallery to conserve after de-installation; capacity of gallery to re-install work.

2023 Review

This entry was added in 2019 as a separate entry, but it was previously introduced in 2017 under ‘Media Art’ with particular reference to historical media art. It was added for greater specificity for its recommendations, to represent works commissioned in the last five years where there is a reasonable expectation that documentation has been produced or could still be obtained.

While the 2020 Jury found no change in trend, the 2021 Jury discussed how prospects for longterm preservation depend entirely on whether the artwork is collected post-commission and by an organization with the resources to care for it. They agreed that the classification remains Vulnerable but with a trend towards greater risk because the imminence of action is timesensitive, requiring working with the artist to get the documentation from them about their work and what is needed before it is too late. Furthermore, there remains a vulnerability for the smaller museums or others that do not take the preservation of media art as seriously.

The 2023 Council agreed with the Vulnerable classification with overall risks remaining on the same basis as before (no change to the trend), although noted a change in the imminence of action from 3 years to 12 months.

Additional Comments

By the time digital art, time-based media, etc., has entered into the permanent care of a stewarding institution, many of its technologies are already end-of-life, unsupported, or the hardware components have deteriorated. Often the expertise to maintain these many interacting components sits outside the host organization, with a technical supplier to the gallery, and this is in itself vulnerable to business change. Although there are a few exceptions, there is a need for greater capacity within the museum and gallery sector to address the challenges. There have been new initiatives for guidance and examples of institutions taking wider sectoral responsibility for standards, which have helped with the effort to preserve, such as Matters in Media Art information resource and guidance.

Media artworks are often made with a network of knowledge that can be precarious. Documentation around production processes can be minimal, and hence acting quickly with known processes can gather information before the knowledge and people networks start to disperse. This can mean preservation of production environments and associated workflows can be preserved alongside the media.

Some art works specifically leverage the limitations and characteristics of the systems that they incorporate, often in unusual ways. This can be hard to migrate or emulate accurately.

Case Studies or Examples:

  • Resources and outputs from the Preserving and Sharing Born Digital and Hybrid Objects From and Across The National Collection project. See V&A Research Projects (n.d.) ‘Preserving and Sharing Born Digital and Hybrid Objects’. Available at: https://www.vam.ac.uk/research/projects/preserving-and-sharing-born-digital-and-hybrid-objects [accessed 24 October 2023].

  • This includes decision model work around acquisition of complex collections such as born digital and hybrid art. See Ensom, T, and McConnachie, S. (2022) ‘Preserving and sharing born-digital and hybrid objects from and across the National Collection’, Decision Model Report: March 2022. Available at: http://doi.org/10.5281/zenodo.7097489

  • Matters in Media Art (n.d.) ‘Guidelines for the care of media artworks’. Available at: http://mattersinmediaart.org/ [accessed 24 October 2023]

See also:

  • NEW MEDIA MUSEUMS: Creating Framework for Preserving and Collecting Media Arts in V4, initiated by the Olomouc Museum of Art as a joint international platform for sharing experience with building and maintaining collections of new media artworks across different types of institutions. The aim of the project is to find workable methods for heritage institutions to build and maintain collections of media arts, which are necessary for safeguarding this area for the benefit of society. See Central European Art Database (2021) ‘NEW MEDIA MUSEUMS: Creating Framework for Preserving and Collecting Media Arts in V4’. Available at: http://cead.space/Detail/projects/3797 [accessed 24 October 2023]

  • The Collaborative Infrastructure for sustainable access to digital art LIMA project, to prevent the loss of digital artworks and to commonly develop the knowledge to preserve these works in a sustainable way. The project ‘Infrastructure sustainable accessibility digital art’ invests in research, training, knowledge sharing and conservation to prevent the loss of both digital artworks and the knowledge to preserve them. See LIMA (n.d.) ‘Collaborative infrastructure for sustainable access to digital art’. Available at: https://www.li-ma.nl/lima/article/collaborative-infrastructure-sustainable-access-digital-art [accessed 24 October 2023]

Read More

PDF

 PDF

   Vulnerable small

Documents presented in PDF (Portable Document Format) format (ISO 32000:1 and ISO 32000:2) and other data wrapped inside them, including all variants and versions, including PDF/A.

Digital Species: Formats

Trend in 2022:

No changeNo Change

Consensus Decision

Added to List: 2017

Trend in 2023:

No changeNo Change

Previously: Vulnerable/Endangered

Imminence of Action

Action is recommended as required, with periodic review every five years.

Significance of Loss

The loss of tools, data or services within this group would impact on people and sectors around the world.

Effort to Preserve | Inevitability 

It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques.

Examples

Documents stored offline, or online in repositories or EDRMS, including reports, agenda, minutes, correspondence, contracts, essays, articles, or research papers, PDF 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 and 2.0. PDF/A, PDF/X and PDF/E.

‘Endangered’ in the Presence of Aggravating Conditions

Loss of context; loss of authenticity or integrity; external dependencies; poor storage; lack of understanding; significant diversity of data; poorly developed digitization specifications; lack of integrity checking; poorly developed migration or normalizations specifications; lack of virus control; poor storage or replication; lack of validation at the point of creation; encryption.

‘Lower Risk’ in the Presence of Good Practice

Well-managed data infrastructure; preservation planning; authenticity managed; use of persistent identifiers; reduction of dependencies; application of records management standards; recognition of preservation requirements beyond formats; strategic investment in digital preservation; preservation roadmap; participation in digital preservation community; format validation; version control.

2023 Review

The PDF entry was added in 2017 and was split into two entries, ‘PDF/A’ and ‘PDF other than PDF/A’, in 2019 to emphasize the different threats faced by different types of PDF.

The 2021 Jury agreed with this decision and noted that trends for the PDF other than PDF/A entry and the PDF/A entry were both towards a reduced risk.

The 2023 Council recommended merging the two previously split entries (‘PDF/A’ and ‘PDF other than PDF/A’). After reviewing the two entries separately, they found more similarities than differences between the two and indeed across all types of PDF (not just PDF/A). Due to the level of commercial, open-source tools that are available to assist preservation, the risk of loss is less persistent than previously suggested. Therefore a Vulnerable classification is appropriate for all PDF formats as whole.

Additional Comments

There is a lot of material produced and kept in PDF. Some of it is authoritative, in other words, the only available copy, while some of it is not. However, if it is the only copy and it is lost, it can have an impact on a lot of people

The challenge in evaluating the significance and impact of the loss of PDFs is that they’re quite often a surrogate of something else, whether a digitized record or a Word document, etc. Whether or not that record is retained may be a factor. We should also be considering PDF Portfolios, which are an extension of PDF 1.7. Portfolios contain embedded files and can include text documents, spreadsheets, PowerPoints, emails, Computer Aided Design (CAD) drawings.

Vulnerability also depends on if the PDF file conforms to the specific PDF/A standard or not. This is caused by a combination of 1) not conforming to the standard and 2) collection managers assuming that the file is resilient simply because it purports to be a PDF/A. This risk is less with the format and more with the understanding and experience in data management. Moreover, materials embedded in or attached to PDF/A-2 and PDF/A-3 may be at risk.

See also:

Read More

Published Research Papers

 Published Research Papers

   Endangered large

Completed research papers published in serials, monographs or theses which fall under specific collecting policies of research libraries or archives and are managed through dedicated repository infrastructures.

Digital Species: Research Outputs

Trend in 2022:

No change No Change

Consensus Decision

Added to List: 2017

Trend in 2023:

No change No Change

Previously: Vulnerable

Imminence of Action

Action is recommended within three years, detailed assessment within one year.

Significance of Loss

The loss of tools, data or services within this group would impact on people and sectors around the world.

Effort to Preserve | Inevitability

It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques.

Examples

Published research papers in scholarly E-Books and Electronic Journals; Electronic manuscripts; Electronic theses (E-theses).

‘Endangered’ in the Presence of Aggravating Conditions

Lack of documentation; lack of clarity with respect to intellectual property; embedded complex objects; unstable funding for repository; lack of strategic investment; complex external dependencies; lack of persistent identifiers; bespoke formats; lack of legal deposit mandate.

‘Lower Risk’ in the Presence of Good Practice

Strong documentation including intellectual property rights; clarity of preservation path and ensuing responsibilities; credible preservation plan; proven capacity of repository; legal deposit preservation copying; post-cancellation access service; persistent identifiers used consistently; non-proprietary formats used and validated; minimal or well managed external dependencies.

2023 Review

This entry was added in 2017 under 'Published research outputs,' though without reference to the capacity of the repository infrastructure. The 2019 Jury amended it to presume the existence of repository infrastructure and noted that the aggravating conditions (which introduce risks) and good practice enhancements (which reduce it) are most relevant to repository operations.

While the 2020 Jury found no change in trend, the 2021 Jury agreed it should remain Vulnerable but discussed improvements and initiatives towards the preservation of research data and outputs, pointing to a trend towards reduced risk.

The 2023 Council agreed with the Vulnerable classification, noting a slight decrease in imminence of action with no significant trends towards greater or reduced risk.

Additionally, the Council recommended that a nomination received for a new ‘E-theses’ entry would provide a valuable example to this entry rather than as a new, standalone entry. However, as noted in the additional comments below, a recommended rescoping of the entry planned the next Bit List will revisit this nomination again as part of a restructuring.

Additional Comments

The 2023 Council additionally recognize that further scoping and input are needed for this entry and recommend that the next major review revisit and restructure the entry, in particular looking at restructuring based on differences between:

  • Types of published material. There are different levels of risk relating to the published version of record of the research paper (typically hosted on a publisher or aggregator platform), research papers hosted on institutional open access repositories (typically the author accepted manuscript rather than the version of record), and E- theses (typically hosted on an institutional repository or similar platform, sometimes with a copy harvested by an aggregation service, such as Ethos). However, there is a chance of becoming too granular with entries if separating them by types.

  • The version of the record hosted on the publisher platform and the version hosted in open access repository. In other words, it might be a better question to ask where it is being published rather than what is being published. Preservation risks and considerations for these are quite different and would benefit from being assessed separately.

A 2023 nomination for E-theses highlights distinct risks tied to these digital published materials. E-theses tend to be sole documents which when published by universities may get harvested into other aggregators or resources but in many cases the only copy (with no physical/analogue copy) sits on an Institution's repository. In addition, many are deposited in PDF format (of many varieties and many don't even attempt to use PDF/A etc.) risking long term accessibility and re-use. However, the breadth of risks goes beyond just the PDF variety, as e-theses often include databases, audiovisual materials, websites, and more.

The loss of tools, data or services within this group would impact on people and sectors around the world. Particularly those involved with reproducibility and those wishing to use the datasets for further research.

Although there have been improvements in current practice, policies and workflows, there is still a significant corpus of information that was deposited before these improvements came into force. It is unlikely that there will be the time, will or resources to bring this information up to current standards.

See also:

  • Konstantelos, L., (2021) ‘Breaking down barriers in e-only thesis submission: how digital preservation contributes to the conversation at the University of Glasgow’, Digital Preservation Coalition Blog. Available at: https://www.dpconline.org/blog/wdpd/wdpd2021-konstantelos [accessed 24 October 2023]

  • Klungthanaboon, W., (2021) ‘From “research output'' to “research data'' - a willingness to move forward?’, Digital Preservation Coalition Blog. Available at: https://www.dpconline.org/blog/wdpd/research-output-to-research-data [accessed 24 October 2023]

  • Beagrie, N (2013) ‘Preservation, Trust and Continuing Access for E-Journals’, DPC Technology Watch Report 13-04. Available at: http://doi.org/10.7207/twr13-04

  • Morrissey, S, and Kirchhoff, A (2014) ‘Preserving E-Books’, DPC Technology Watch Report 14-01. Available at: http://doi.org/10.7207/twr14-01

  • Resources and recent outputs from Public Knowledge Project (PKP) Preservation Network, which developed to digitally preserve Open Journal Systems (OJS) journals. See Public Knowledge Project (n.d.) ‘PKP Preservation Network’. Available at: https://pkp.sfu.ca/pkp-pn/ [accessed 24 October 2023]

 

Read More

Local Network Storage

Local Network Storage

   Vulnerable small

Materials routinely copied or backed up to locally managed data storage facilities and able to be restored under institutional service arrangements.

Digital Species: Integrated Storage

Trend in 2022:

No change No Change

Consensus Decision

Added to List: 2019

Trend in 2023:

No change No Change

Previously: Vulnerable

Imminence of Action

Action is recommended as required, with periodic review every five years.

Significance of Loss

The loss of tools, data or services within this group would impact on many people and sectors.

Effort to Preserve | Inevitability 

Loss of material in this group could be entirely avoidable if provided the means to deploy proven tools and techniques.

Examples

Institutional or departmental network storage and institutional data centers based on technologies such as (NAS) Network Attached Storage, (RAID) Redundant Array of Independent Disks, (SAN) Storage Area Networks, JBOD (Just a bunch of disks), SPAN and related.

‘Endangered’ in the Presence of Aggravating Conditions

Encryption; lack of routine maintenance; lack of storage replication; over-dependence on a single supplier, technology or technician; insufficient documentation; single point of failure; political or commercial interference; failure of dependencies (e.g., power supply, controller software); overly aggressive compression; poor information security; lack of integrity-checking; lack of strategic investment; lack of warranty; unenforceable warranty, encryption.

‘Lower Risk’ in the Presence of Good Practice

Backup to different technology; backup to diverse locations; documentation of assets; integrity checking; preservation planning; refreshment planning; export functionality; resilient to hacking; selection and appraisal criteria; version control; resilient funding; technology watch; enforceable warranty; disaster planning and documentation.

2023 Review

This entry was added in 2019 to ensure that the range of media storage is properly assessed and presented.

The 2023 Council agreed with the current Vulnerable classification with overall risks remaining on the same basis as before (no change to the trend), while also noting a slight decrease in the effort needed to preserve and the imminence of action required when compared to the 2021 Jury review

Additional Comments

There has been a renewed interest in tape as offline storage is the only sure protection against advanced ransomware.

See also:

Read More

Pension, Mortgage and Insurance Records

Pension, Mortgage and Insurance Records

   Vulnerable small

Records of transactions for long-lived financial products and services contracted between individuals and corporations. These records typically contain or depend on significant amounts of personal information and outlast the infrastructure on which they were created.

Group: Sensitive Data

Trend in 2022:

No change No Change

Consensus Decision

Added to List: 2017

Trend in 2023:

No change No Change

Previously: Vulnerable

Imminence of Action

Action is recommended within three years, detailed assessment within one year.

Significance of Loss

The loss of tools, data or services within this group would impact on many people and sectors.

Effort to Preserve | Inevitability

It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques.

Examples

Applications, correspondence and ancillary records relating to pensions, mortgages and insurances and other contracts of long duration. This includes corporate databases, email, web archives and EDRMS, and may require some coordination of paper, microfiche, born-digital and digitized records. These records often include the scope and duration of the contract as well as any agreed changes during the lifetime of the product. It may also include evidence of mis-selling or other sharp practice, which only becomes apparent after the fact. This entry pertains to corporate records rather than personal records. 

‘Endangered’ in the Presence of Aggravating Conditions

Lack of corporate preservation planning; lack of preservation within the procurement of corporate systems; companies conflating backup with preservation; loss of integrity and authenticity; loss of context and connections to provide meaning; lack of preservation capability within agencies; lack of preservation voice at executive level; poor planning and roadmap for corporate infrastructure; proliferation of legacy systems; slapdash procurement or migration of new systems; mergers and acquisitions leading to confusion of corporate systems; lack of compliance, audit or accountability at operational levels; encryption.

‘Lower Risk’ in the Presence of Good Practice

Backup and documentation; use of open formats and open source software; considered data management planning; licencing that enables preservation; preservation capability in designated repository; resilient to hacking; selection and appraisal in place; authenticity and integrity of records managed; resilient funding and recognition at executive level; technology watch; regular preservation audits; accreditation and participation in the professional preservation community.

2023 Review

This entry was added in 2017 but was outside the competence of the judges to assess at that time. It was assessed in 2019 with additional expertise invited to the panel to support this assessment and reviewed again in 2020. The 2021 Jury agreed with that 2019 assessment and subsequent 2020 review, which classified these digital materials as Vulnerable with no trend towards greater or reduced risk. The 2023 Council agreed with the Vulnerable classification with the overall risks remaining on the same basis as before (no change to the trend).

Additional Comments

The importance of retaining documentation in any kind of legal agreement offers this kind of material more protection than most but legal organizations may conflate backup with preservation and not always have consistent records management systems.

See also:

Read More

Completed Investigations Based on Open Source Intelligence Sources

Completed Investigations Based on Open Source Intelligence Sources

   Endangered large

Open source social media and web content that has been used to support the conclusions of crowd-sourced investigation and fact-checking in political or military conflict.

Digital Species: Legal Data

Trend in 2022:

increased riskTrend towards greater risk

Consensus Decision

Added to List: 2019

Trend in 2023:

No change No Change

Previously: Endangered

Imminence of Action

Action is recommended within twelve months, detailed assessment is a priority.

Significance of Loss

The loss of tools, data or services within this group would impact on people and sectors around the world.

Effort to Preserve | Inevitability

It would require a major effort to prevent or reduce losses in this group, possibly requiring the development of new preservation tools or techniques.

Examples

Social media sources relating to recent conflicts.

Critically Endangered in the presence of Aggravating Conditions

Encryption; loss of authenticity; lack of preservation agency; limited or no digital preservation capability.

‘Vulnerable’ in the Presence of Good Practice

Offline backup captured by a journalist or investigating authority; materials presented and documented in court; court able to deliver preservation; authenticity protected.

2023 Review

This entry was added in 2019 and subsequently split into three elements by the Jury, relating to current, recent and historic sources. This entry relates to materials used in evidence in completed investigations, as well as those presented to courts or other investigatory agencies.

Social media companies have a policy to take down or suppress content that they consider propaganda for terrorist groups. This has had the unintended consequence of deleting or suppressing content used in open source investigation or fact-checking for journalistic or judicial purposes, which may impede refutation or prosecution. However, a new generation of cloud-based services now allows investigators to copy and stabilize content to private accounts in the process of investigating it, so the ethical requirements of social media companies and the integrity of the investigation are both served. The 2020 Jury noted that such content remains at risk. The presentation of data to a court or prosecuting authority, or the publication through news media, implies the introduction of a long-term preservation function. The 2021 Jury agreed with this assessment and Endangered classification but changed the 2021 trend towards greater risk in light of recent developments in crowd-sourced investigations and fact-checking. The 2022 Taskforce agreed on a trend towards even greater risk based on the increased significance of crowd-sourced investigations and fact-checking in light of ongoing global conflicts that include (but are not limited to) those in Ukraine.

The 2023 Council agreed with the Endangered classification with the overall risks remaining on the same basis as before (no change to the trend), noting that some of the description and language used in the entry may be confusing. For example, one might think if investigations are complete that surely the parts used as evidence or published in articles are captured elsewhere and presumably preserved there? While this may be the case, presuming long term preservation may lead to future loss in instances where this is not true. Here, risks can overlaps with those found in 'Evidence in Court' and 'Proceedings in Court' entries, which themselves indicate that standard records management processes within designated agencies should be able to take care of the preservation of materials like this but, given that it is likely to involve complex types of data, such agencies may not be equipped to deliver preservation effectively.

Additional Comments

The 2023 Council additionally recommend further scoping of the entry with input from those working with these materials directly, to help explain the unique challenges as well as examples where content has been lost due to deletion by social media companies and/or legal retention periods where certain content may not fall under the scope of records for long-term or permanent retention.

Case Studies or Examples:

  • The 2020 Medical Facilities Under Fire report by the Syrian Archive., which provides information on how The Syrian Archive and its partners (Syrians for Truth and Justice, Justice for Life) analyzed and verified pattern of attacks by cross referencing a combination of open-source visual content, flight observation data, and witness statements. Through collecting, verifying and reporting investigative findings from these incidents, the authors hope to preserve critical information that may be used for advocacy purposes or as evidence in future proceedings seeking legal accountability. See Syrian Archive (2017), ‘Medical Facilities Under Fire’. Available at: https://syrianarchive.org/en/investigations/Medical-Facilities-Under-Fire [accessed 24 October 2023]

See also:

  • Higgins, E. (2019) ‘Bellingcat and beyond. The future for Bellingcat and online open source investigation’, iPres Conference 2019, Amsterdam. Available at: https://www.youtube.com/watch?v=kZAb7CVGmXM [accessed 24 October 2023]

  • European Human Rights Advocacy Centre (EHRAC) blog post providing a Q&A on ‘Using Open Source Investigations in Human Rights Litigation,’ noting their approach to gather and present evidence of Russian military presence in and around Ilovaisk in August 2014. See European Human Rights Advocacy Centre (EHRAC) (2022), ‘Q&A: USING OPEN SOURCE INVESTIGATIONS IN HUMAN RIGHTS LITIGATION’. Available at: https://ehrac.org.uk/en_gb/blog/qa-using-open-source-investigations-in-human-rights-litigation/ [accessed 24 October 2023]

Read More

Email

Email

   Endangered large

Documents, correspondence and other records created in the course of contractual dealings between individuals and agencies, especially where the subjects are of long duration and may be subject to legal scrutiny at undefined points in the distant future.

Digital Species: Formats

Trend in 2022:

reduced riskMaterial improvement

Consensus Decision

Added to List: 2017

Trend in 2023:

No change No Change

Previously: Endangered

Imminence of Action

Action is recommended within five years, detailed assessment within three years.

Significance of Loss

The loss of tools or services within this group would have a global impact.

Effort to Preserve | Inevitability

It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques.

Examples

Gmail, Hotmail, Yahoo Mail, Outlook, and email in all its forms including individual messages, threads of conversation, mailboxes, email servers and file attachments.

‘Critically Endangered’ in the Presence of Aggravating Conditions

Conflicting and unmanaged IPR; use of personal accounts for professional work and vice versa; proliferation and duplication of attachments; email not recognized as a record; absent, unworkable or inconsistent records management; dependence on free cloud-based services; lack of migration path; lack of preservation planning; perverse incentives to delete; encryption.

‘Vulnerable’ in the Presence of Good Practice

Application of appraisal and selection tools; timely transfer to preservation facility or archive; commitment to transparency; preservation policy; working preservation plan; clear migration path; widespread recognition of email as a record.

2023 Review

This entry was added in 2017, but the Jury did not have the capacity to assess it in detail. It was reviewed and assessed in 2019, including highlights to significant developments, including the recommendations of the Email Preservation Taskforce and the development of the ePADD software. Email presents many preservation challenges, from scale through core technologies, attachments, privacy and intellectual property rights. Because this entry intersects with many others, the aggravating conditions associated with email should be considered in conjunction where relevant.

The 2021 Jury discussed the continued developments in email preservation tools and techniques as well as the growing number of archives preserving email content. At the same time, issues with providing access to preserved email content have arisen. Ongoing records management policies towards corporate or business email need to be better embedded to stop the loss of important email content, and more awareness is needed around the potential of personal email.

While record-keeping legislation and mandates direct retention periods, email document decisions taken by government officials at local, regional and national levels are not always well maintained, if at all; a loss could impact people’s lives along with their ability to assert rights.

For these reasons, there was a 2021 trend towards reduced risk, but the Endangered classification remained.

The 2022 Taskforce agreed on a trend towards reduced risk based on material improvement over the last year with applied examples of good practice, including (but not limited to) approaches to creating a PDF format for the preservation of email, and improvements to existing software, tools and workflows supporting complex email preservation.

The 2023 Council agreed with the Endangered classification and noted a decrease in the imminence of action and effort to preserve.

Additional Comments

The 2023 Council also recommended noting areas of overlap with the ‘Cloud-based Services and Communications Platforms’ entry as it pertains to the saving and preservation of email in cloud-based services such as Microsoft Sharepoint).

Email is hugely important as it has been so pervasive as a communication mechanism for society. Some methods used and responsibility adopted for collecting at the business and public body level (again will differ globally), but this will be a fraction of the communities that use it, and few will be set up for the long-term care of this data.

Case Studies or Examples:

  • Resources and outputs from the EA-PDF project to identify the essential characteristics and optimal functional requirements of email messages and necessary related information in a PDF technology-based archive. PDF Association (2021), ‘EA-PDF’. Available at: https://www.pdfa.org/resource/ea-pdf/ [accessed 24 October 2023].

  • Resources and outputs from the Integrating Preservation Functionality into ePADD (ePADD+) project to integrate long-term email preservation functionality into the program and provide a tool supporting the email archiving lifecycle more robustly. ePADD (n.d.) ‘History’. Available at: https://www.epaddproject.org/about/history [accessed 24 October 2023].

  • Resources and outputs from the RATOM project to develop software to assist archives and other collecting organizations with email analysis, selection, and appraisal tasks. RATOM (n.d.) ‘About’. Available at: https://ratom.web.unc.edu/about/ [accessed 24 October 2023].

See also:

  • Prom, C (2019) ‘Preserving Email (2nd Edition)’, DPC Technology Watch Report 19-01. Available at: http://doi.org/10.7207/twr19-01

  • Artefactual Systems and DPC (2021) ‘Preserving Email’ DPC Technology Watch Guidance Notes. Available at: http://doi.org/10.7207/twgn21-08

  • Murray K and Prom, C (2018) ‘The Future of Email Archives: A Report from the Task Force on Technical Approaches for Email Archives’, CLIR. Available at: https://clir.wordpress.clir.org/wp-content/uploads/sites/6/2018/08/CLIR-pub175.pdf [accessed 24 October 2023].

  • “Novice to Know-How: Email Preservation”, was conceived of and funded by The National Archives (UK) and delivered by the DPC. This course is aimed at learners who already have a solid foundational knowledge of digital preservation (e.g. they have completed the original N2KH learning pathway) and wish to gain practical skills in relation to the preservation of email. DPC and The National Archives (UK) (2023), ‘Novice to Know-How: Email Preservation’. Available at: https://www.dpconline.org/digipres/prof-development/n2kh-online-training [accessed 24 October 2023].

  • Ville, M. (2023) ‘2013 – 2023: A Review of Ten Years of Email Archiving in France’, iPRES 2023 Conference, Urbana-Champaign, Illinois, USA, 19–22 September.

Read More

Semi-Published Research Data

Semi-Published Research Data

   Endangered large

Data sets produced in the course of research and shared between researchers, such as by posting to a website or portal but without preservation capability or commitment. Typically the data remains in the hands of the researchers who have the job of maintaining it.

Digital Species: Research Outputs

Trend in 2022:

reduced riskMaterial improvement

Consensus Decision

Added to List: 2019

Trend in 2023:

No change No Change

Previously: Endangered

Imminence of Action

Action is recommended within three years, detailed assessment within one year.

Significance of Loss

The loss of tools, data or services within this group would impact on people and sectors around the world.

Effort to Preserve

It would require a major effort to prevent or reduce losses in this group, possibly requiring the development of new preservation tools or techniques.

Examples

Departmental web servers; project wikis; GitHub repositories.

‘Critically Endangered’ in the Presence of Aggravating Conditions

Originating researcher no longer active or changed research focus; staff on temporary contracts; dependence on single student or staff member; weak or fluid institutional commitment to subject matter; weak institutional commitment to data sharing; complicated or contested intellectual property; encryption; limited or dysfunctional data management planning; web capture challenges that means unlikely to be picked up by automatic crawlers.

Vulnerable in the Presence of Good Practice

Data in preparation for transfer to specialist repository; robust data management planning; documented and managed professionally using data stewards.

This 2019 entry was previously introduced in 2017 under ‘Research Data,’ though without explicit reference to semi-published research data. In 2019, the Jury split the ‘Research Data’ entry into a range of contexts for research outputs, including this addition. The entry draws attention to represent ‘self-help’ data sharing which is to be encouraged as a means to facilitate open science but should not be confused with long-term preservation. The 2021 Jury agreed with the Endangered classification, noting problems with the volume of data being produced but not being kept in a meaningful way. Research data is complex and has specific requirements for documentation that may only be known to subject matter experts. However, data creators (e.g., researchers) are not necessarily well laced to sustain data in the long term.

There were also a few significant changes to the entry in the 2021 Bit List.

  1. Removal of ‘informally’ from the previous entry description (‘shared informally between researchers’) due to possible misperception or misunderstanding; ‘informal’ may imply researchers would perceive the data as low value and not want it captured. This may be the case, so it is important to consider and provide advice to researchers who think there is value in their data.

  2. Two previous entries (Geomagnetic Data and Software and Maritime Archaeological Archives) have been removed as separate entries and incorporated into this broader entry on semi-published research data to highlight the range of content and forms semi-research data can take and highlight the need for specialized knowledge and specialist repositories for preparing and managing the data throughout the lifecycle

  3. The 2021 trend towards reduced risk, which was based on improvements and initiatives towards the preservation of semi-published research data since the entry’s addition in 2019.

The 2022 Taskforce agreed on a trend towards reduced risk based on material improvement over the last year that have not only offered examples of good research data management and preservation practices but also suggest a significant shift towards culture of change and collaboration across different research communities and stakeholders. These include (but are not limited to) improvements and initiatives by the European Open Science Cloud (EOSC), Science Europe, Research Data Alliance (RDA), Digital Curation Centre (DCC) and related projects on the preservation of research data and outputs.

The 2023 Council agreed with the Endangered classification.

Additional Comments

There is a positive trend of increased research data management activity and engagement by libraries and data centres, which should help to ensure that more research datasets are properly deposited in data repositories, rather than left in a 'semi-published' state.

Offering and minting researchers Digital Object Identifiers for datasets deposited at specialist repositories will encourage data citation and increase research impact of individual researchers, which traditionally relied more on publishing papers than datasets.

 

See also:

  • Boccali, T., Sølsnes, A., Thorley, M., Winkler-Nees, S. and Timmermann, M. (2021) ‘Practical Guide to Sustainable Research Data: Maturity Matrices for Research Funding Organisations, Research Performing Organisations, and Research Data Infrastructures’, Science Europe. Available at: http://doi.org/10.5281/zenodo.4769703

  • European Open Science Cloud (EOSC) (n.d.) ‘Development and outputs of the European Open Science Cloud (EOSC) Long-Term Data Preservation Task Force’. Available at: https://www.eosc.eu/advisory-groups/long-term-data-preservation [accessed 24 October 2023]

Read More

3D Digital Engineering Drawings

3D Digital Engineering Drawings

   Endangered large

3D digital engineering models produced as part of building or engineering design processes. The production of such drawings has progressed from a digital analogue of paper to complex digital environments such as BIM (Building Information Modelling) which combine original drawings, libraries of compound objects, and links to external data sets such as about the performance of materials and maintenance of parts.

Digital Species: Engineering Data

Trend in 2022:

No changeNo Change

Consensus Decision

Added to List: 2017

Trend in 2023:

No changeNo Change

Previously: Endangered

Imminence of Action

Action is recommended within three years, detailed assessment within one year.

Significance of Loss

The loss of tools, data or services within this group would impact on many people and sectors.

Effort to Preserve | Inevitability 

It would require a major effort to prevent or reduce losses in this group, possibly requiring the development of new preservation tools or techniques.

Examples

Building Information Management (BIM), Computer Aided Design (CAD), Product Data Management in engineering and architecture.

‘Critically Endangered’ in the Presence of Aggravating Conditions

Lack of preservation mandate or collecting institution; lack of preservation capability in data owner; irregularities in supply chains; complex or long data supply chains; dependencies on proprietary software or formats; lack of persistent identifiers; poorly managed IPR; temporary joint-venture companies; poor records management; poor regulatory compliance; encryption.

‘Vulnerable’ in the Presence of Good Practice

Well managed data infrastructure; preservation from the point creation; carefully managed IPR; persistent identifiers; well managed records management processes; recognition of preservation requirements at highest levels; strategic investment in digital preservation; host clearly identified; participation in the digital preservation community.

2023 Review

This entry was first submitted in 2017 when the Jury lacked the capacity to consider it in detail. In 2019 it was assessed with additional expertise co-opted, with the decision to remain a very broad category, including major one-off construction and engineering projects, a long tail of more minor building programmes, and large volume but homogeneous production processes in engineering. The 2021 Jury agreed with its Endangered status. The key consideration is that the lifecycle of the products and the data that describes them vastly exceeds the short life cycles of the infrastructures on which they are designed. This challenge is compounded by supply chains that may involve many different stages of production, as well as the delivery of large projects through transitory joint ventures companies that have no residual mechanism or capacity to preserve the data thereafter. Although there have been advancements in the development of new preservation tools and techniques for these materials, there are recent examples of the loss of 3D architectural drawings; these have had a huge impact, especially at the local level, as well as significant impacts on infrastructure, travel, and how people interact with built environments throughout the world. The 2021 trend moved towards greater risk to reflect this.

The 2023 Council agreed with the Endangered classification and seconded the trend reported last time; risks continue on the same basis as before with no significant trends towards even greater or reduced risk. Most of the complexities of the format and issues remain the same, such as reliance on proprietary software and complex or unknown copyright with the datasets. Moving forwards, it was highlighted by the Council that there needs to be a greater focus and understanding on the long term preservation of these outputs within the sector.

Additional Comments

Data in this category enables the safety and security of critical infrastructure, but the responsibility to maintain data is unclear, nor are retention periods clear. Although examples of good practice exist, the extent to which there are working solutions at large seems doubtful, and it is surprising that there are not more diverse success stories to report.

Case Studies or Examples:

  • The Grenfell Tower Inquiry offers a case to consider how the loss of 3D Digital Engineering Drawings can have a huge impact, especially at the local level. For example, if Grenfell had been done using 3D technologies, do we have confidence that those materials would have been adequately preserved? What would have been the local impact? What would have been the impact on the inquiry? See Grenfell Tower Inquiry (n.d.) ‘Grenfell Tower Inquiry’. Available at: https://www.grenfelltowerinquiry.org.uk/ [accessed 24 October 2023]

  • In 2006, it was reported that the Airbus A380 was 2 years behind schedule due to different offices using different versions of the CATIA CAD/CAM software. See Shelly, T. (2006) ‘What can go wrong when you give IT the large’, Manufacturing Management. Available at: https://www.manufacturingmanagement.co.uk/content/features/what-can-go-wrong-when-you-give-it-the-large/ [accessed 24 October 2023]

See also:

  • The DPC Design and Construction Records technology watch report, which aims to support archival professionals as well as active designers and facilities managers, considering acquisition, preservation, and access approaches that account for both the technical and cultural components of the broad range of born-digital design and construction records created throughout the course of designing, building, and maintaining a built space. As well as bringing together a helpful summary of relevant work in this area and discussing a range of case studies it also covers the concept of visual digital literacy which is the first step towards understanding and managing this content. See Leventhal, A, and Thompson, J. (2021) ‘Preserving Born-Digital Design and Construction Records’, DPC Technology Watch Report 21-01. Available at: http://doi.org/10.7207/twr21-01

  • The Library of Congress had a symposium on 3D design and assets in 2017. See Leventhal, A. (2018) ‘Designing the Future Landscape: Digital Architecture, Design & Engineering Assets’, Library of Congress. Available at: https://www.loc.gov/preservation/digital/meetings/DesigningTheFutureLandscapeReport.pdf [accessed 24 October 2023]

Read More

Scroll to top