Preserving records from an EDRMS: a case study

Hugh Campbell, Public Records Office of Northern Ireland (PRONI)

 

The Northern Ireland Civil Service (NICS) selected TRIM as the software platform for its corporate Electronic Document and Records Management (EDRM) system following a procurement exercise in the early 2000s. TRIM has subsequently evolved through a number of manifestations and is now (Micro Focus) Content Manager. The NICS currently uses CM 9.4.

 

A number of Public Record Office of Northern Ireland (PRONI) staff were involved in the initial procurement project and PRONI was one of three lead implementers of the system. This proved to be very beneficial as we had a member of staff who was interested in records management and was an obvious selection for the role of system administrator for the PRONI implementation. This afforded us a great opportunity to learn about the product particularly as we had someone with higher privileges than regular users.

 

We were very aware that, although Retention and Disposal hadn’t been implemented at that point, PRONI would receive records from the corporate EDRM system at some point in the future. The two obvious areas for research and investigation therefore were:

 

  • Metadata; and
  • Export

 

We spent some time researching metadata standards before a very simple realisation dawned on us – the only metadata we could get was what was in the system. It didn’t matter what was being recommended, if it wasn’t in the source system then we weren’t going to get it. This led to a more focussed look at the actual metadata within the EDRM system. We did this by:

 

  • Going through all the screens and recording the metadata; and
  • Using the out of the box export and examining the output.

 

The next stage involved lots of meetings and discussion as we examined each piece of metadata and tried to make an objective decision as to the value of keeping it (in a digital repository for ever). In making our decision, we took into account what we understood the metadata to mean and considered how useful (or confusing) it may be for future generations. For example, one item within the EDRM system which generated considerable discussion was ‘Creator’. On the surface, ‘Creator’ sounds like an important piece of metadata to retain. Investigation, however, revealed that ‘Creator’ did not necessarily guarantee a meaningful association with a digital object. It simply recorded who saved the record into the EDRM. In the case of senior civil servants, who may be creating a substantial percentage of the content in which future generations may be interested, records were often being saved by secretaries or personal assistants (who had no other association with the record). In this case, we decided that it could be a very confusing piece of metadata and so we decided not to take it. The various date fields stored in the EDRM system also generated considerable discussion. It should be noted, however, that not every piece of metadata warranted the same level of consideration, particularly those items that were obviously required.

 

One of the benefits of the EDRM system was expected to be the reduction in duplication which would arise from the use of ‘links’. Undoubtedly this has been the case, particularly when a ‘link’ rather than an attachment is emailed to multiple recipients. These ‘links’, however, were also the subject of considerable discussion in an attempt to reach a decision on what we would do with them. The final decision here was to generate a text ‘stub’ based on the content of the link.

 

After lengthy research and discussion, we eventually settled on the metadata fields we would take from the EDRM system – this is shown below.

 

The EDRM system was supported by a Managed Service Provider when we were developing the means to export records and metadata. We worked with the Managed Service Provider to specify and develop an export that:

 

  • Copied each container selected for transfer out into a Windows folder on the file system, and;
  • Created a metadata csv file within each folder, with one row of metadata for every object within the folder.

 

We also used this metadata layout as a standard template for the metadata associated with all digital records transferring to PRONI. As part of our processing, we will supplement this with more metadata, for example the metadata generated by DROID, and we will populate the ‘PRONI use’ fields.

 

Like most great plans, however, it has not all been plain sailing. We have sought to tweak the metadata slightly over the last few years and we know that there will be occasions when we will have to develop some scripts to manipulate metadata before it is presented to our digital preservation system for processing. To date, two Public Inquiries have transferred over 51,000 records from the EDRM system to PRONI - proof that the process works.

To find out more about PRONI, please visit our website and follow us on Facebook  and Twitter.

 

PRONI - EDRM system metadata template

FIELD NAME

DESCRIPTION

SysInfo

Name of originating System

SysVersion

Originating System version

LocalSysName

Local System Name

DataExportDate

Date exported from EDRM System

ClassificationTitle

The titles, separated by space | (pipe) space, of the classification levels excluding container holding records

ContainerTitle

The title of the container or folder containing records

ContainedRecords

The number of original digital objects in a container

ContainerRecordType

The container record type description

ContainerId

The ID of the EDRM container level classification

ContainerLongId

The full ID of the EDRM container level classification

ContainerLevel

The level of the container within the classification

ContainerNotes

From the Notes tab of the container

OriginalFolderPath

Path of interim location of data files on the export server prior to transfer to PRONI

RelativeFolderPath

This is the relative data path following structure defined by PRONI (Accession Number\"data"\transfer identifier\ContainerID\)

DateClosed

Date that the container was closed

DPID

FOR PRONI USE (Digital Preservation Unique Identifier)

RecordType

Name of the Record Type

Description

The original textual description of the record

Filename

The filename and extension of the digital object

RecordNumber

The unique identifier within an EDRM System

RecordLongID

The unique identifier within an EDRM System

Notes

From the object's 'Notes' tab record metadata

Language

Language of the intellectual content of the resource

DateCreated

Date of creation of the digital object

DateModified

The date on which the digital object was last modified

Author

Person who composed the digital object

FileSize

Exact size of the object in bytes

RelatedRecord

Details of related objects

RelationshipDetails

Description of relationship eg attachment to email or document embedded within another document

AccessDecision

Determines whether or not the Access decision permits the digital object to be viewed by the public

RecordAccessExemptions

If record is Closed for FOI/DPA/or other reasons

ClosureReason

Free text field describing reasons for decisions to close

NextAction

Next Action for record

NextActionDate

The date on which the next action on the record will occur

OriginalFilename

If the filename is more than 200 characters, the filename should be recorded here prior to being truncated - see Filename

BusinessArea

Business area to which the record relates

InformationAssetOwner

Information Asset Owner as determined by the business

Reviewer

Name of person who reviewed file

DateReviewed

Date file was reviewed

DepartmentalInformationManager

Name of Departmental Information Manager approving decision

DateApproved

Date approved by Departmental Information Manager

RightsStatus

This will be either Crown Copyright (Government Records) or other details agreed at submission with depositor

RightsCustodian

The person identified as having management powers over the digital object with regards to access

RightsNotes

Free text field containing additional information on the copyright/licensing of the digital object

AccessCopyRequired

Is an access copy required for access systems

Comments

Free text field containing any comments relating to entries on the file format registry

PCPRef

FOR PRONI USE

MD5Checksum

MD5 checksum if EDRMS stores checksum

UserDefined2

 

UserDefined3

 

UserDefined4

 

UserDefined5

 

EOSM

End of standard metadata

AdditionalMetadata

 

 

Read More

Determining risk: a case study

Nicola Steele, Grosvenor Estates

 

Background

The Grosvenor Estate is an international, diverse, privately owned company. The Grosvenor Estate encompasses all the activities of the Grosvenor family and each of its parts has a distinct focus but share the same values and a common purpose of delivering lasting commercial and social benefit.

The Family Office portion of the estate manages the Grosvenor family’s rural estates in the United Kingdom and Spain, their philanthropic activities through the Westminster Foundation, Realty Insurances, and other specialist functions largely focused on heritage and conservation.

The collection, dating as far back as the 12th century, documents the history of the Grosvenor Family as well as that of the Eaton Estate and other rural estates and businesses owned by trusts of the Grosvenor Estate on behalf of the Grosvenor Family. The archives collection is held primarily for the benefit of the Grosvenor Family, internal departments and the Trustees of the Grosvenor Estate.

The EDRMS Preservation Task Force

Back in early 2020, I volunteered to join a new taskforce initiative from the DPC on Electronic Document and Records Management System (EDRMS) preservation. Although I was not part of the implementation of the current EDRMS (SharePoint) in use in our organization, I was keen to learn, and be part of the learning process, about how safe data and records are in an EDRMS environment. As the Assistant Archivist in our organisation, working largely on digital preservation, I was especially interested as to the possibility of records remaining in an EDRMS long term. Huge amounts of data are held within EDRM systems, and some will be identified as having long term value and therefore be flagged for preservation. This raised many questions about safe transfer of records from an EDRMS to a digital archive, how and to what extent processes can be automated and what metadata can and should be captured. More specifically for this case study, a subgroup addressed the issue of how safe it is to leave data within an EDRMS long term and what features and functionality (or policies) should be in place to provide assurance that the records are safe in the EDRMS for a period of time. It was determined that, rather than re-inventing the wheel, use could be made of existing risk assessment models to aid us in gauging how safe EDRMS environments are.

Testing the tools

I offered to trial the National Archives UK's DiAGRAM tool against the digital collections held in SharePoint, our organization’s EDRMS. To do this, certain questions from the National Digital Stewardship Alliance (NDSA) Levels of Preservation and the Digital Preservation Coalition Rapid Assessment Model (DPC RAM) tools needed to be answered and used to populate some answers in the DiAGRAM tool. I very quickly decided to complete both former tools in their entirety, as opposed to just a few questions from each, because I felt they could offer more of a rounded understanding and assessment of the EDRMS environment. That decision was helped by the fact that both tools are not terribly time consuming to complete and are easy to use.

Once I had completed the NDSA and DPC RAM tools, I then started with the DiAGRAM tool. A set of questions need to be answered before the tool could be used in earnest, so I created a word document to provide the questions and answers required. I did attempt to be as accurate in my answers as possible, but some were a little out of my knowledge range and so I had to accept that the results may be slightly inaccurate, but by no means far from correct!

Firstly, I created a model as a baseline for the assessment. This in itself proved very useful. The input into the tool was straightforward and the end result, specifically the visualisation of the results, is excellent. I have often commented that this sort of visualisation is what senior managers in our organisation will find the most informative and useful, even as a way of showing our current capabilities, without even thinking of what we could do to improve.

The next step was to create a scenario in the model, where answers to some questions are changed, to try and improve the results and to illustrate how actions taken can have a big impact. This is a great way to ask permission from senior management for activities to take place, or to justify why you have decided to take certain actions on your digital collections, and the results they have. Since I wanted this to be as close to a real-world activity as possible, I chose to alter my answers in the Information Management section, to show what position we would like to be at and what I hope we would be able to achieve with some work and collaboration. These altered answers achieved an astonishing increase in our results (from 3% to 30% for renderability). This would allow me to create an action plan and roadmap to present to the appropriate management to show current downfalls, what we would ultimately like to achieve (and why) and how we could achieve that. In terms of risk assessment, it seemed clear that actions taken around preservation metadata would improve our confidence in the intellectual control we have over, and safety of, our digital assets as a first step. Future risks and steps could be articulated, but for this exercise, I chose to concentrate on one area I believe we can tackle and make progress in effectively.

It is perhaps worth noting that I found the NDSA and DPC RAM tools more applicable to digital preservation environments, whereas the DiAGRAM tool can be easily applied to any environment holding digital assets. In this instance, the environment was our EDRMS, SharePoint. Therefore, when using the former tools mentioned, it is worth remembering that the EDRMS is not necessarily functioning at this point with preservation activities in mind. It is functioning to fulfil a current business requirement and so some of the questions should be approached with this in mind. 

Conclusion

In conclusion, I found all three models useful for gaining an understanding of the capabilities and functions of our EDRMS environment, but the DiAGRAM model stood out the most for me. When I presented my findings to our subgroup, I was asked if I would choose to keep records long term in an EDRMS, or transfer them over to a digital preservation environment, having completed this exercise and had time to digest the results of all of the models used. My answer depends on current circumstances. If we did not have a digital preservation system in place (which we do), then I would push fairly quickly for some changes around metadata, specifically preservation metadata, to be made within the EDRMS environment. But, since we do have a digital preservation system in place, my answer for our situation was that I would have records moved to this environment once they had served their business and/or legal functions. Some organisations will not have the luxury of having use of any type of digital preservation environment, so we cannot dismiss the idea that their EDRMS covers all of their digital assets and potentially their preservation actions.

Having completed this exercise, I believe using a tool such as DiAGRAM, or a compilation of tools as I did, is a very useful (and probably could be considered essential) project for any organisation dealing with digital material deemed worthy of long-term preservation. Whether as part of a business case to enhance the current EDRMS setup, to procure or develop a digital preservation system, or to form part of a risk or disaster register for example. The potential uses are numerous and could be hugely beneficial to any organisation.

Read More

National Archives of Australia EDRMS Sentencing and Transfer Project: a case study

James Doig, National Archives of Australia

 

Corporate EDRMS

The National Archives of Australia (NAA) has had a corporate EDRMS since 2000.  The product initially purchased was TOWER Software’s TRIM Captura, which NAA upgraded to TRIM Context in 2006.  NAA has regularly upgraded the EDRMS and we have now deployed Micro Focus Content Manager 9.4 in all State and Territory offices.  The EDRMS technology used by NAA has remained the same - TOWER software was acquired by Hewlett-Packard in 2008, which sold its software division to Micro Focus of the UK in 2016.

The EDRMS is integrated with Outlook and the common desktop record-creating applications, so that emails and documents can be checked into TRIM from the applications themselves, or by drag-and-drop.  Other government agencies have customised more complex integrations, for example Sharepoint-EDRMS.

 

Project background and outcomes

In 2012, NAA commenced a project to sentence all records in TRIM Context created between 1 January 1998 and 31 December 2008, export the “Retain as National Archives (RNA)” component from the EDRMS, and ingest the RNA component into the digital preservation system.  Although the project was completed almost ten years ago, and noting that processes and system functionality have improved in the meantime, there are still some useful lessons learned, particularly regarding preservation issues.

The project team comprised 4 people including three sentencing officers, and indeed the focus was on sentencing, a uniquely Australian term for the process of applying disposal decisions to records using the legal authorisation – a Records Authority (RA) for functions specific to an agency, or a General Records Authority (GRA) for general administrative functions.  The project took about 8 months to complete, and sentencing, which was effectively a manual process, took about 5 months.  About 34,000 TRIM files or containers were sentenced comprising about a million records.  Records were sentenced at TRIM file/container level, unless there was a good risk-based reason to go into the file and look at actual records.  The proportion of records sentenced “Retain as National Archives (RNA)” was about 10%, quite a high proportion compared with physical records (generally 3-5% for permanent retention), and about 3,000 files, comprising close to 100,000 records, were ingested into the digital preservation system.

More detailed project statistics are as follows:

  • 31,693 TRIM files/containers were sentenced (comprising over a million records)

  • 2% were approved for destruction in 2012

  • 80% were identified for destruction in future years

  • 2% were placed on hold

  • 10% were transferred as Retain as National Archives (RNA)

  • 6% were identified for destruction using the Normal Administrative Practice (NAP) Policy (empty, redundant or practice files or files with no documents attached)

 

Sentencing

EDRMSs are defined by their compliance with international recordkeeping standards such as ISO 15489 and ISO 16175.  Therefore, EDRMS products must have appraisal and disposal functionality built into them.  In this case, following sentencing, the appropriate disposal class (a unique number that links to a disposal action in a RA or a GRA) and the disposal action (RNA, Destroy, NAP) were entered into the EDRMS and a User Stamp applied, which automatically applied the name of the sentencer and a timestamp. 

The time-consuming, manual approach to sentencing was identified as a significant pain point and subsequent work has focused on the feasibility of using AI and machine learning technology to automate disposal decisions; that is, to develop an accurate and scalable way to decide the value of government digital information and data in order to determine whether it should be retained or destroyed.

 

Destruction concurrence

The process of obtaining business owner approval to destroy records is known as destruction concurrence (or just concurrence).   Concurrence was automated as a digital workflow in the EDRMS, which created efficiencies, though at times it was difficult to identify the business owner due to organisational change over time.  More importantly, staff were still using records whose destruction due date had passed, so in many cases records were retained in the system and not destroyed.

 

Review of records to confirm RNA status and quality check metadata

Key record metadata (record number, title, disposal class, security level, date created, date closed) were exported into a spreadsheet and reviewed to confirm RNA status and to quality check and correct errors in record titles (including expanding acronyms).  In addition, a unique item number was applied to each record, a requirement of NAA’s archival management system.  When the review was complete, the revised metadata file was imported back into the EDRMS using the TRIM import/export application called TRIMDataPort. 

 

Export RNA records and metadata

Using TrimDataPort, records identified as RNA were exported out of the EDRMS into a directory location.  The record export process does not retain the physical aggregations of records represented in the EDRMS (e.g. Containers: in the NAA example “Files” and “File Boxes”), for example via a directory/folder structure.  Rather, these aggregations are represented in recordkeeping metadata via the record number, and so could be reconstructed in the archival management system through item relationships such as Item/Sub Item or Aggregate Item/Constituent Item. 

Also, at the digital file level, files were given TRIM database identifiers (e.g. rec_1387634.DOCX), rather than the record title given by the record creator.  Since this exercise, the functionality now exists to choose the record title, record URI, database ID, or a combination of the three.

TRIMDataPort exported recordkeeping metadata to delimited form (e.g. CSV).  While TRIMDataPort can export metadata in XML, NAA’s archival management system requires metadata to be imported in a delimited format.  In addition, the archival management system can import and manage only a subset of the full suite of recordkeeping metadata, and decisions needed to be made about what, if any, additional metadata needed to be retained (for example, is it necessary to retain Movement History and TRIM audit metadata?) and how to manage the additional metadata (for example, this metadata could be managed as a Control Series in the archival control system, or managed in the digital preservation system).

Dates are always critical for archival control.  The NAA’s archival control system requires Date Created, Date Last Updated (particularly important for us as it determines when a record becomes publicly available), and Date Registered (what the Australian Series System calls Accumulation Date).

 

Ingest into the digital preservation system

A Submission Information Package was created using an in-house developed SIP creator, which included the generation of checksums for each digital object.  The SIP was successfully ingested into the NAA’s bespoke digital preservation system.  Note that NAA has recently procured Preservica as the replacement digital preservation system, and different tools and processes are in development.

 

Lessons Learned

There were many useful lessons learned from this project, and these have fed into improved processes and better use of Content Manager functionality.  Those listed here relate directly to digital preservation and ongoing access issues:

  1. Emails with stub attachments linking to a record in the EDRMS are problematic when exported from TRIM.  These stub attachments are usually made when sending a record reference from the EDRMS, but they can also be made via Outlook.  These links fail when records are exported from TRIM, and automatically generated metadata about the linked record (usually the record number and record title) might not have been retained in the body of the email, therefore the record is incomplete.  Even if the record number was retained, this doesn’t mean that this was the version of the record actually emailed as it could well have been edited after the email was sent.

  2. A possible solution to this problem would be to capture version number, not record number in the body of the email.  However, NAA transferred finalised records, not TRIM versions.  Unless versions were captured as separate records in the EDRMS, versions would not be captured.

  3. A key lesson learned was the need to do a detailed analysis of formats prior to ingest into the digital preservation system.  An EDRMS is not fussy about what formats you can check into it, and we’ve found there are lots of complex formats in TRIM that we could have identified up front as needing, for example, better documentation, such as dozens of legacy Access databases and AutoCAD files.  Some complex formats, for example aggregate email formats such as PST and MBOX could usefully be de-aggregated and described prior to ingest.  The preferred approach to format analysis would be to use a format identification tool such as DROID following record export from the EDRMS.  There may also be EDRMS functionality to run a report, for example by file format extension, though this isn’t a failsafe method of identifying format.

  4. A good example of problems resulting from not undertaking a thorough analysis of formats is issues encountered dealing with a couple of email formats.  A feature of earlier versions of TRIM is that MS Outlook emails, when checked into TRIM, were saved as TRIM Outlook Saved Message Format with the extension VMBX.  A similar format is MS Windows Outlook Express email, which has an extension MBX.  These formats are plain text files; any attachments are base64 encoded in the body of the file.  While VMBX and MBX files can be rendered perfectly in the TRIM viewer, when exported from TRIM the base64 encoding will need to be decoded for access.  We have about 40,000 of these files in the digital preservation system and we’ve made a PRONOM submission for them so that they can be identified by PRONOM-based format identification tools.  Later versions of TRIM, or what is now called Micro Focus Content Manager, has a built in Mail Conversion Format tool that can migrate these formats to EML.

  5. As described above in the section on destruction concurrence, TRIM can also automate authorisation and approval workflows, for example authorising expenditure.  These digital workflows are retained as metadata, which will need to be retained if the authorisations/approvals are part of the RNA record.

  6. The MS Windows character limit on file names (260 characters) caused problems, but once the problem was identified it was possible to script a solution.

  7. Finally, archival control of records from EDRMSs is not a trivial exercise.  Good archival management depends on a number of factors that can be hard to control.  First is the quality of recordkeeping metadata.  Archives reuse recordkeeping metadata for archival control, so the quality of metadata is critical, and this can vary dramatically within and between government agencies, particularly record titles.  Second is the difference between EDRMS metadata capability and the metadata capability of the archival control system.  Decisions need to be made about what recordkeeping metadata is retained and where it is stored.  Third is the sophistication of the archival control system to properly manage record relationships and representations.  In other words, what do you do if the data model of your archival control system can’t deal with the complex web of record relationships that we see in EDRMSs?  This issue is not just about effectively documenting relationships within and between records, but also relationships within and between other entities, and relationships/integrations with other software applications.  The solutions – replacing the archival control system, or introducing a new data model - are significant, long-term projects.  This issue resulted in a large project to develop a new data model, the Archival Control Model, for government records, and a revised metadata schema.

 

Conclusion

The key learning of the project was the need to fully analyse and understand the EDRMS prior to transfer and ingest.  EDRMSs provide a range of options for configuration. Options may include differing system interfaces (web, simplified and full featured versions), methods of integration with other software applications, presentation of search results, and export options/functionality.  Some system settings may affect the operation of other, seemingly unrelated, aspects of the system.  Reasons for choosing certain options, views and settings should be documented and understood.  Similarly, the EDRMS does not operate in isolation.  Policies and guidelines governing use of the EDRMS should be understood as part of the system analysis process and also captured in the transfer process, for example there may be rules governing titling, capturing record versions, email attachment record references and so on.

Read More

Transfer of records from an EDRMS into a Digital Preservation system: a case study

Elvis Valdes Ramirez, UN International Residual Mechanism for Criminal Tribunals (IRMCT)

 

Background

The International Residual Mechanism for Criminal Tribunals (“the Mechanism”) is the successor of the International Criminal Tribunal for the former Yugoslavia (“ICTY”) and the International Criminal Tribunal for Rwanda (“ICTR”) which has, over the last two decades, accumulated large quantities of digital records. The Mechanism is mandated, under Article 27 of its statute, with managing, including preserving and providing access to, the archives of the ICTR, the ICTY, and the Mechanism itself. The digital component of the archives is estimated at approximately 3 petabytes and is composed of all types of born digital and digitized material in a variety of formats coming from network shared drives, business systems, Electronic Documents and Records Management Systems (“EDRMS”), email systems, websites and a selection of bespoke systems which were developed in-house. The Mechanism implemented an EDRMS, currently HP Records Manager, which is in use for the management and access of records, and a Digital Preservation System (DPS) (Preservica) for the preservation of digital material. The Mechanism’s implementations of the EDRMS and the DPS adhere to policies and guidelines established by the United Nations Archives and Records Management Section (ARMS) and follows International good practice and standards.

 

The challenge

The main challenge was to find a solution that would facilitate appropriate packaging and structuring of metadata records and their related objects (files) after they are exported out of the EDRMS and before they are ingested into the DPS. This was all to be done in a manner that conforms to United Nations policies and international standards for good practice.

The following list highlights some of the agreed prerequisites, constraints and assumptions for the solution

  • Records exported out of the EDRMS consist of metadata and objects (files).

  • There are no similar technical implementations by other organizations to transfer records from an EDRMS to a DPS which could be used.

  • A proper assessment of the export functionalities and capabilities provided by the Mechanism’s EDRMS system is done.

  • Metadata of each record exported out of the EDRMS must be packaged and formatted in XML, using a metadata standard approved by the Mechanism. A proprietary metadata schema must be built to accommodate information that cannot be mapped using existing metadata standards.

  • Ingested records must have a pre-defined minimum set of metadata for digital preservation.

  • No resources (both technical and human) are available to develop a programming interface using the APIs and SDKs that are available in both systems.

  • The DPS technical capabilities for creation of Submission Information Packages (SIP) and ingest of records are properly assessed and well understood.

  • A solid mechanism of integrity checks and access control must be implemented during the process.

  • Solution must be implemented with existing resources and endorsed and approved by management.

 

Case study

In order to address the challenge the Digital Archivists of the organization clearly articulated the business requirements. Initial assessments were made to either wrap the selected metadata within METS files or create BAGIT files and then ingest those as SIPs into the DPS. After testing this was discarded in preference for a tool which came with the DPS for creation of SIPs that includes descriptive metadata. Other metadata (structural, administrative, preservation and technical) are added during creation of Archival Information Packages (AIP). A decision was subsequently taken to develop an application to automate the packaging and structuring of metadata and associated objects for ingest. The application must input records exported out of the EDRMS and save descriptive metadata files and related objects (files) in the predefined structure which is required for creation of SIPs by the SIP Creator tool which came with the DPS.

Main steps in the application’s workflow

1. The application is launched.

2.  A user enters the following parameters:

  • Location of exported file from the EDRMS

  • A character delimiter, if a delimited separated value file is uploaded

  • The style sheet to use (the style sheets are based on a metadata schema e.g. MODS)

  • Additional information such as: prefix for output file names, output file’s extension, etc.

  • Output folder where the files and their metadata are going to be saved

3. A user starts the process.

4. The application packages records (metadata) exported out of the EDRMS in the selected metadata standard format and related objects (files), and creates a predefined structure in the selected location, where the output is saved to be used as input by the SIP creator tool of the DPS.

The following were the key requirements/specifications of the application

  • Metadata exported out of the EDRMS and used as input by the application must be in a delimited separated values file (comma, tab, etc.), or XML format.

  • Columns on the exported files out of the EDRMS must contain metadata information, and optionally some configuration information used by the application.

  • Metadata created for each record must be based on an XML schema (international metadata standard or bespoke schema).

  • Separate style sheets (xslt) must be created for each metadata standard used in the application, mapping columns on the delimited separated values file exported from the EDRMS against respective schema elements.

  • Style sheets used to create metadata files must be (routinely/regularly) validated against corresponding schemas by the application.

  • Mapping of metadata columns on the delimited separated values file exported from the EDRMS against schema elements must be validated by the application.

  • Checksum calculation must be conducted on objects (files) when they are moved or copied to the output location during the process using one of existing algorithms (MD5, SHA1, etc.)

  • The application must save the objects (files) and their related descriptive metadata files in a predefined hierarchical structure.

Read More

Which metadata to preserve

The table below lists some of the metadata fields that you may wish to capture from an EDRMS or other record keeping system when migrating records of long term value to a preservation system.

Recognising that different organisations may require different fields depending on their context and the anticipated future users and use cases of the records, a set of metadata fields are listed with some description and notes and a list of reasons why it might be important to capture in particular contexts. 

It should be noted that not all record keeping systems will capture and store all of the metadata fields described below. Many of the fields may be commonly found in EDRMS, but perhaps not other less controlled systems in which records are stored. 

Decisions on which metadata to capture will need to factor in the following considerations: 

  • Does the record keeping system store this information?

  • Can this information be extracted from the record keeping system?

  • Can this information be stored within the digital archive?

Record level metadata

You may wish to capture the following metadata at record level: 

Metadata field

Definition

Notes

Why you might need this

File name

The file name of the record as stored in the record keeping system

Note that the system may allow duplicate file names or may allow file names to include special characters that may cause problems once the files are exported into a file system (e.g. \/:?*”<>|). If this is the case, files may be renamed on export and it is important to ensure that the metadata includes details of the original file name of the object as stored in the system.

You should consider capturing this information in the following circumstances:

  • If this information is required to relate the exported digital records to their metadata.

  • If the file name contains valuable information which will help current and future users understand the file.

  • If current and future users will use the file name to refer to records or for search and retrieval.

  • If file names may be changed on export (for example to remove special characters or resolve duplicates).

File format

The file format of the record as defined in the system

An EDRMS or other system may record the file format of each record. This may not be as thorough or accurate as the file format identification that you would wish to carry out within a digital archive (for example it may state the file is a PDF but not which version). It seems likely that file format identification would be carried out outside of the system, either as a pre-ingest step or as a part of the ingest process as records move into the digital archive.

You should consider capturing this information in the following circumstances:

  • If it is considered necessary to understand how the digital objects were identified within the original system.

  • If you have confidence in the file format metadata from the system.

  • If this information will provide a useful overview prior to more thorough file identification outside of the system.

Previous file format or file extension

The previous file format or extension of a record

In certain circumstances, a record keeping system may change the format of a file on capture or upload. An example that has been noted is the conversion of emails to a format specific to the EDRMS in which they are stored. If a conversion such as this has occurred, there may be evidence of this within the metadata.

You should consider capturing this information in the following circumstances:

  • If this behaviour (file format migration) has been noted within the record keeping system.

  • If being able to demonstrate provenance and history of the record is a priority.

MIME type

The MIME type of the record as defined in the system

The system may store information about the MIME type of each record, but also is typically captured as part of pre-ingest or ingest routines within a digital archive.

You should consider capturing this information in the following circumstances:

  • If it is considered necessary to understand how the digital objects were defined within the system.

  • If you have confidence in the MIME type metadata from the system.

  • If this information will provide a useful overview prior to more thorough file identification outside of the system.

File size

The size of the record (in KB/MB as appropriate)

The record keeping system may store information about the file size of each record, but this information is also typically captured as part of pre-ingest or ingest routines within a digital archive.

You should consider capturing this information in the following circumstances:

  • If this information will be useful to you for planning or validation purposes as you ingest the content into the digital archive.

Number of files

The number of files that make up a single record within the system. For example this may apply to the contents of ZIP file, emails with attachments or number of messages within a PST file.

This metric will only apply to certain records. Note that metadata about number of files in total within a transfer or export is discussed under ‘Transfer level metadata’.

You should consider capturing this information in the following circumstances:

  • If the system contains complex records consisting of multiple files.

  • If this information will be useful to you for planning or validation purposes as you ingest the content into the digital archive.

Digital object specific dimensions 

This metadata would be specific to particular types of digital object and could include:

  • Duration of audio-visual records

  • Word and page count of text

  • Dimensions of images

The system may contain metadata relating to the dimensions of digital objects and this will be specific to the types of records contained within it.

You should consider capturing this information in the following circumstances:

  • If the metadata contained within the system is of a higher standard of completion or accuracy than that which will be generated in the digital archive.

  • If this information will be useful to you for validation purposes.

Language

Language of the digital object.

 

You should consider capturing this information in the following circumstances:

  • If the records are in multiple languages.

  • If it will help future users of the records use and interpret them.

  • If it will help curators of the records describe and manage them.

Character encoding

Character encoding of the digital object. For example ASCII, Unicode, UTF-8.

 

You should consider capturing this information in the following circumstances:

  • If it will help future users of the records use and interpret them.

  • If it will help curators of the records describe and manage them.

Unique identifier (system generated)

The unique reference of a record within originating system (typically assigned automatically)

Note that there may be more than one version of this identifier that can be captured. The identifier may reflect the function, context or structure of the record and how it was used.

You should consider capturing this information in the following circumstances:

  • If this information will be useful to you for validation purposes.

  • If users or curators of the digital object need to cross reference it with the original record within the system.

  • If current and future users will use this identifier to refer to records or for search and retrieval.

  • If the assigned identifier gives a useful indication of the function and/or context of the record and how it was used.

Agency assigned identifier

Catalogue or local identifier of the record within the system (typically assigned by a human operator)

May reflect the function/context of the object and how it was used.

You should consider capturing this information in the following circumstances:

  • If this information will be useful to you for validation purposes.

  • If users or curators of the digital object need to cross reference it with the original record within the system.

  • If current and future users will use this identifier to refer to records or for search and retrieval.

  • If the assigned identifier gives a useful indication of the function and/or context of the record and how it was used.

Previous identifier

A previous identifier allocated to a record

A previous identifier metadata field may be of value where records have previously been migrated from another system. The previous identifier field may be particularly important If relationships between documents are defined using these identifiers.

You should consider capturing this information in the following circumstances:

  • If this identifier is used to define relationships between records.

  • If current and future users will use this identifier for search and retrieval.

  • If the identifier gives a useful indication of the function and/or context of the record and how it was used.

  • If being able to demonstrate provenance and history of the record is a priority.

Title

Title or short description of the record

Sometimes records may not have meaningful titles assigned, or a set of records will share a very generic title. In some cases a short description field may be present instead of a title.

You should consider capturing this information in the following circumstances:

  • If assigned titles are useful for search and retrieval of the records within the archive.

  • If assigned titles will help users understand the context or content of the records.

Description

More detailed description of the digital object

 

You should consider capturing this information in the following circumstances:

  • If assigned descriptions is useful for search and retrieval of the records within the archive.

  • If assigned descriptions will help users understand the context or content of the records.

Export date

Date record was exported from the system

This date does not exist within the system but may be included as part of an export or transfer process. Can help demonstrate provenance. May also help with disaster recovery.

You should consider capturing this information in the following circumstances:

  • If being able to demonstrate provenance and history of the record is a priority.

  • If this information will be valuable to you for disaster recovery purposes.

  • If further versions of the record may be transferred at a later date.

Creation date

Date record was originally created

Note that this date may still be attached to the files as system info once the record is extracted, but system dates are vulnerable to change so extracting this date as metadata is a sensible precaution. Note that this date may reflect the date a record was originally uploaded to the system rather than the original creation date.

You should consider capturing this information in the following circumstances:

  • If being able to demonstrate provenance and history of the record is a priority

  • If being able to demonstrate authenticity of the record is a priority

Last modified date

Date record was last modified

Note that this date may still be attached to the files as system info once the record is extracted, but system dates are vulnerable to change so extracting this date as metadata is a sensible precaution. The system may be configured to capture a full audit trail, including dates of all edits to a record. Consider what level of detail is required for the digital archive.

You should consider capturing this information in the following circumstances:

  • If being able to demonstrate provenance and history of the record is a priority.

  • If being able to demonstrate authenticity of the record is a priority.

Date folder was closed

The date an folder was closed may act as a trigger date for export to digital archive.

Depending on local practices, this action may be manually applied or automatically generated.

You should consider capturing this information in the following circumstances:

  • If this date is considered to have significance to the history of the record.

Review date

If a record is closed to the public this is the date it needs to be reviewed to see If it can be opened (unless a date open is already recorded - see below).

Note that this may be more broadly categorised as date of next action (where other proposed actions relating to a record are recorded)

You should consider capturing this information in the following circumstances:

  • If this information isn’t overridden by new processes and procedures applied in the archive.

Date open to public

The date a record can be (or was) opened for public access.

 

You should consider capturing this information in the following circumstances:

  • If this information isn’t overridden by new processes and procedures applied in the archive.

Date that the file became a record

The date that a file is marked as a record.

This may be a feature of some EDRMS and will depend on local practices. Depending how the field is used in practice, it may not be particularly meaningful. For example sometimes a file may be marked as a record years after the record was created and/or last edited.

You should consider capturing this information in the following circumstances:

  • If being able to demonstrate provenance and history of the record is a priority.

  • If being able to demonstrate authenticity of the record is a priority.

Disposal date

Date that record can be disposed of.

This field will not be applicable to all organisations and implementations, but in some cases records transferred to archive will need to be disposed of at a later date. 

You should consider capturing this information in the following circumstances:

  • If this information isn’t overridden by new processes and procedures applied in the archive.

Creator

Individual or group primarily responsible for creating the record

There may be more than one - depending on context you may want to record more granular roles. Note that there may be issues relating how to this is configured within the system (for example just as an identifier, which would need additional information to interpret). Important to ensure you get the details you need. Note also that there may be inaccuracies within the metadata. Systems and local practices will vary, but ensure you understand how it was generated. Is it added manually, extracted from the embedded metadata of a document or does the system generate it based on who uploaded the record (which may be different to who created the document)?

You should consider capturing this information in the following circumstances:

  • If being able to demonstrate provenance and history of the record is a priority.

  • If being able to demonstrate authenticity of the record is a priority.

  • If this information will help with search and retrieval of records within the archive.

  • If this information will help users understand and evaluate the record.

Creating organization

Details of organization responsible for creating record

As above there may be issues relating how to this is configured in the system (for example just as an identifier, which would need additional information to interpret). It is important to ensure you get the details you need. Note also that there may be inaccuracies within the metadata. Systems and local practices will vary, but ensure you understand how it was generated. Is it added manually or does the system generate it based on who uploaded the record (which may be different to who created the document)?

You should consider capturing this information in the following circumstances:

  • If being able to demonstrate provenance and history of the record is a priority.

  • If being able to demonstrate authenticity of the record is a priority.

  • If this information will help with search and retrieval of records within the archive.

  • If this information will help users understand and evaluate the record.

Edited by

Information about who has edited the record since creation

Record keeping systems may capture a full audit trail for a record, including details of any edits made. You may want to capture this information alongside edit dates (described above under ‘last modified date’)

You should consider capturing this information in the following circumstances:

  • If being able to demonstrate provenance and history of the record is a priority.

  • If being able to demonstrate authenticity of the record is a priority.

  • If this information will help with search and retrieval of records within the archive.

  • If this information will help users understand and evaluate the record.

Classification code

Classification code

Also relevant is the record identifier (discussed earlier)

You should consider capturing this information in the following circumstances:

  • If this information will help with search and retrieval of records within the archive.

  • If this information will help users understand and evaluate the record.

Classification

Human readable description of above code 

The classification code may consist of a series of acronyms which are hard for a user to interpret. The system may also store a more human readable description of this code

You should consider capturing this information in the following circumstances:

  • If this information will help with search and retrieval of records within the archive.

  • If this information will help users understand and evaluate the record.

Permissions

Who has rights to read/copy/edit a record within the system

May be applicable in some circumstances - depends on the context. Can cover a variety of things.

You should consider capturing this information in the following circumstances:

  • If this information isn’t overridden by new processes and procedures applied in the archive.

  • If this information is useful to future users of the record.

IPR and holder

Including copyrights

 

You should consider capturing this information in the following circumstances:

  • If this information isn’t overridden by new processes and procedures applied in the archive.

Checksum

Checksum for the record

Alongside the checksum itself it may also be helpful to extract details of the date the checksum was generated and the algorithm used. Note that if a batch of records have been imported into an EDRMS or other system they may have come with a checksum. It would also be useful to capture this information about previous checksum If it is present.

You should consider capturing this information in the following circumstances:

  • If being able to demonstrate authenticity of the record is a priority.

  • If this information is needed to validate that transfer has been successful.

Versioning

The version of the record

Multiple versions of any one record may exist within the system

You should consider capturing this information in the following circumstances:

  • If you intend to capture more than one version of a record (either now or in the future).

  • If this information is of value to you and future users of the record.

Location within folder structure or hierarchy

Records within an EDRMS or other record keeping system may be placed in a particular structure/hierarchy or ‘tagged’.

Where a record sits within a structure can give valuable context to a record. It may not make sense once it is moved out of this structure. The location of the record within the structure should be captured in some way, this may or may not be through the metadata export.

You should consider capturing this information in the following circumstances:

  • If this information isn’t captured within the export in another way (for example in the exported folder structure).

  • If the location of the record gives useful context that helps users locate, understand and interpret the record.

Relationships with other records

Relationships with other records (not apparent through the folder structure or hierarchy)

Relationships with other records within the system may be present in other ways outside of the relationships described through a record hierarchy or folder structure within the system. For example an email record may contain an attachment or multiple files may form a single logical record (for example a GIS layer or a website)

You should consider capturing this information in the following circumstances:

  • If this information isn’t captured within the export in another way (for example by any associated records being exported in the same folder or zip file as the record).

Other descriptive metadata

Other descriptive metadata that exists within the system

Local practices will dictate what additional descriptive metadata is contained within any record keeping system and this will typically be used to help current users with locating and interpreting the records.

You should consider capturing this information in the following circumstances:

  • If this information is helpful to current and future users of the digital archive for locating or interpreting records.

 

Transfer level metadata

You may wish to capture the following metadata at for the batch of records as a whole (rather than at record level):

Metadata field

Definition

Notes

Why you might need this

Total number  and total size of files/records

The number of and size of files and/or records extracted from the system

Totals for records and files may be different (for example one record may consist of multiple files) so two different figures may need to be captured here.

You should consider capturing this information in the following circumstances:

  • If this information will be useful to you for planning or validation purposes as you ingest the content into the digital archive.

System details

Details of the system that the records are being transferred from (for example name and version)

This information may need to be captured manually and incorporated into the metadata for each record. Additional documentation may also be required (see below)

You should consider capturing this information in the following circumstances:

  • If being able to demonstrate provenance and history of the record is a priority

 

Additional documentation

Some organizations will also wish to capture a full set of system documentation relating to the record keeping system and how it was configured and used. This may include a data dictionary, records management policy and procedure, users manual and documentation relating to the configuration or set up of the system. This level of documentation will provide an additional level of detail about the system and provide context for the records that are being preserved.

Read More

How this resource was created

In early 2020 the DPC established the EDRMS Preservation Task Force. The task force was set up in response to a request to investigate this topic emerging from a digital preservation project with the Nuclear Decommissioning Authority (NDA).

The DPC invited Members to express an interest in joining the task force for a set period of 6 months with the aim of bringing together multiple stakeholders on the issue of EDRMS preservation to identify and elicit good practice. It was intended that not just the NDA but the whole of the DPC Membership would be able to benefit from this knowledge exchange.

The task force intended to:

  • Articulate the challenge/s of preserving records from an EDRMS

  • Share experiences of tackling these issues and learning from each other

  • Highlight other useful case studies or examples of good practice 

  • Gather together existing sources of guidance

  • Highlight gaps in current guidance

  • Make recommendations for concrete DPC outputs or events to help address the challenge (for example: briefing day, technology watch report, guidance notes, case studies, webinars, blog posts)

As this initial 6 month period came to an end, task force members agreed to continue to meet in order to carry out some agreed actions - the creation of some online guidance on EDRMS preservation (this toolkit!) and a briefing day on the topic. At this point a call for new task force members went out to DPC Members to gather further volunteers to engage with this programme of work.

The text for this resource was created by a series of subgroups of the task force and through an online booksprint event which was held in January 2021.

Some of the booksprint team

 

Our briefing day event ‘Unbroken records: A briefing day on Digital Preservation and EDRMS‘ was held on 20th May 2021 and involved a great line up of presentations, from both members of the task force and other invited speakers. Many of these talks are linked from relevant points from within this online resource.

A big thank you to members of the EDRMS Preservation Task Force for sharing their challenges, knowledge and experience on this topic and their hard work and good humour throughout.

  • Kyle Browness - Library and Archives Canada

  • Hugh Campbell - PRONI

  • Kevin De Vorsey - NARA

  • James Doig - National Archives of Australia

  • Tim Gollins - National Records of Scotland

  • James Lappin - University of Loughborough

  • Rachel MacGregor - Warwick University

  • Jenny Mitcham - Digital Preservation Coalition

  • Bob Radford - Nuclear Decommissioning Authority

  • Kristen Schuster - King's College London

  • Caylin Smith - University of Cambridge

  • Sara Somerville - University of Glasgow

  • Nicola Steele - Grosvenor Estate

  • Zsuzsanna Tozser Milam - European Central Bank

  • Elvis Valdes Ramirez - United Nations International Residual Mechanism for Criminal Tribunals

  • Lorna Williams - Bank of England

  • Emma Yan - University of Glasgow

  • Paul Young - The National Archives UK

NDA final logo Black

The EDRMS Preservation Task Force was established by the DPC as a result of a digital preservation project with the Nuclear Decommissioning Authority and our thanks go to them for supporting this work.

 

Read More

Further resources & case studies

Listed below are a number of resources that relate to the topic of EDRMS preservation. This list was collated during the course of the work of the EDRMS Preservation Task Force and it was noted by Task Force members that there wasn’t a huge quantity of existing guidance available on this subject.  Please contact us if you know of other useful resources that should be referenced here.

General guidance

Relevant guidance from task force members

Case studies demonstrating how organizations have tackled the challenge of records preservation. 

We would like to provide further case studies demonstrating how different organizations have tackled this challenge. Please contact us if you have a relevant case study you would like to share.

Read More

The future of records preservation

This online resource captures current good practice in the area of digital records preservation but it is clear that as we work in this area the technology and institutional practices continue to move on at a pace. This section brings together some of the trends and initiatives we may see in the future.

1. ‘Hands free’ transfer

Though the EDRMS preservation task force has seen a few examples of direct integration between an EDRMS and a digital preservation system, this is still an area for future development. The extent to which transfer between an EDRMS and the digital archive can be fully automated is something to explore in the future and it would be helpful for the digital preservation community to see more case studies of how this might work in practice. Could we see a future in which a document is created and correctly described and classified by the user and that this is ultimately transferred into the archive without any manual intervention from the archivist or records manager? Is this something the community would like to see, or is trust in the technology never going to reach these levels

2. EDRMS as digital archive? (or vice versa)

Instead of aiming for effective integration between a record keeping system and a digital archive, could we have one system that does it all? Some EDRMS providers have been making moves into the digital preservation space, adding in features for ‘archiving’ of records in order to meet the need of customers who do not have digital preservation systems in place. Similarly, providers of digital preservation systems may be providing bespoke features for customers that are more traditionally seen in the records management space. It is conceivable that a supplier will manage to create a combined records management and digital preservation system in the future. One could imagine dragging and dropping records from a records management area to an archive area and seamlessly bypassing some of the challenges and complexities of integration and transfer that are detailed in this online resource. Again, trust in the technology is required in order for the community to embrace these developments. It is advised that practitioners working in this space continue to articulate their particular requirements and ask questions of solution providers about how certain functionality is implemented before making decisions on whether a system meets their needs. 

3. Beyond the EDRMS

Though the EDRMS Preservation Task Force came together specifically to talk about preservation of records from EDRMS, it was clear once we started to scope our work, that records are not exclusively stored in formal EDRM systems. Within many organizations, other systems are being used to manage records and/or documents. It was noted that many organizations are moving away from a structured and controlled EDRMS to systems that are less formally managed. Sharepoint is one such system that falls outside of the tight EDRMS definition but is widely used to store records. Taking this further, Office 365, Microsoft Teams and Google Drive are now becoming widely adopted and in some organizations are replacing the EDRMS. Though it is certainly a challenge to preserve records that sit within a structured and controlled EDRMS, it is perhaps more of a challenge to work with records that reside in a less rigorously controlled system.

Though some of the advice and guidance contained within this resource will be applicable to record keeping systems that are not considered to be EDRMS, it is recognised that the community will face emerging challenges as the preservation of records within these (more loosely controlled) systems become a priority.

Several presentations that have touched on this issue are listed below:

  • Patricia Sleeman, UNHCR touches on the challenges of the preservation of Office365, describing them as “tanks without drivers, rolling away with all our information” - Winners Webinars, 10th December 2020 (DPC members login to view the recording)

Read More

Processing the lessons learned

Whichever approach has been taken to preservation, it is important that lessons learned are captured and shared. It may be that any problems and pitfalls you have encountered in this process can be headed off in advance next time you need to carry out a similar task. Also consider whether the path to digital preservation can be smoothed for digital content residing in other systems. You should now have a line of communication with colleagues in various departments who should be receptive to talking to you about how things have gone and may be able to work with you to enact changes to improve the process.

Here are a few points to consider:

  • How will you report on your preservation work and share lessons learned with colleagues?

  • Can any updates to preservation and/or records management policy and procedures be suggested to improve things for next time? Can you use them to influence or improve on the record creation or management process?

  • Can any lessons learned related to the record keeping system itself be absorbed - what would you look for in a future record keeping system to make this process easier next time? How can you influence this process? Note that many of the questions included within this resource may also be considerations when looking for a new record keeping system. The DPC Procurement Toolkit may also be a helpful resource to consult.

  • Are there parts of your work that can be shared with the wider digital preservation community? Sharing of case studies and lessons learned are always welcomed. Do contact the DPC  if you would like to provide a blog post or a case study to link from this resource

Read More

Taking action

Hopefully your business case or bid for resources to enable you to preserve the records has been successful and you are now in a position to start putting your plans into action. This is the biggest and most challenging step in the process, but all the groundwork you have laid up until this point will stand you in good stead.

As this is a substantial piece of work it is recommended that detailed planning is carried out and project management techniques are used to manage and keep track of timelines, goals and resources. Ensure that you tap into support and guidance that is available in your organization.

“Good and thorough planning are essential and it is never too early to be planning the transfer from one system to another - every system is a legacy system waiting to happen.” 

Rachel MacGregor, University of Warwick

 

Kyle Browness from Library and Archives Canada gave a talk at the DPC’s Connecting the Bits unconference event in June 2020 entitled ‘The challenges and lessons of processing records from an EDRMS‘. He describes Library and Archive Canada’s approach to the project and gives an example of phases and deliverables.

The steps you will take will vary depending on the preservation approach that has been selected. For example, if the selected preservation approach is to export or transfer records to a digital archive the following steps will need to be carried out. Specifics will vary depending on the organizational context and the technologies in use.

   
  • Identify the records to be transferred - note it may not be everything. In an ideal world, the records to be transferred will already be flagged up within the system. If this is not the case, establish how you will carry out selection and appraisal. Identify a mechanism to determine the number of records to be transferred.

  • Identify the metadata to be transferred. The metadata guide included within this resource may help you to consider which fields should be retained [link to metadata work]. Work may also be required to map these metadata fields to the schema in use within your digital archive. 

  • Establish whether the transfer of records from record keeping system to digital preservation system can be automated. Do APIs exist to facilitate this? Can a bespoke solution be developed?

  • Investigate export (and import) options for records and metadata. In what format and structure will records and metadata be exported from the record keeping system? What format can your digital preservation system accept a Submission Information Package? What metadata formats and/or schema can it work with?

  • Establish where records and metadata will be securely stored after export and prior to ingest into the digital preservation system.

  • Are any pre-ingest steps required to prepare the export from the record keeping system for ingest - for example enhancing or restructuring the metadata? Establish what these steps are and which tools will be used.

  • Liaise with system users and other stakeholders about timing of transfer. The transfer may have an impact upon the performance of a live system and availability of records and may have to be performed outside of normal hours.

  • Once the export and transfer has been carried out, check and validate the transfer. Are the right number of records in the system? Is the metadata complete? Can checksums be used to validate it has been successful?

  • Initiate agreed access mechanism to preserved records (if appropriate).

  • Decide what actions need to be enacted on original records in the record keeping system. Will they be removed from the system? Will a signpost to the new location in the digital archive be available? How will users be able to locate and access them?

   

As the steps listed above are very generic, it may be most helpful to you to find out in more detail how others have tackled this challenge. Some case studies are provided below but note you may find it most helpful to find someone who is using the same record keeping and/or preservation system as you and gather more details from them as to how they approached the exercise.

  • Elvis Valdes Ramirez from the International Residual Mechanism for Criminal Tribunals describes in this case study how they established an export and transfer workflow for records held within their EDRMS and an application that was developed to automate the packaging and structuring of metadata and associated objects for ingest.

  • James Doig from the National Archives of Australia provides an overview of work that has been carried out to transfer records from their EDRMS to the digital preservation system. This case study discusses some of the stages in the process of establishing a transfer methodology and some of the specific challenges that were encountered.

  • Kyle Browness from Library and Archives Canada gave a talk at the DPC’s Connecting the Bits unconference event in June 2020 entitled ‘The challenges and lessons of processing records from an EDRMS‘. He provides details of some of the decisions that were made in order to establish a transfer workflow, including those that relate to the metadata that will be exported alongside the records.

  • In a talk at the EDRMS Preservation Briefing Day, Zsuzsanna Tözsér Milam from the European Central Bank described their semi-automated workflows for the transfer of records from an EDRMS at ECB and the rationale for the approach they have taken. ‘Preserving records at the ECB’ a presentation at Unbroken records: A briefing day on Digital Preservation and EDRMS, 20th May 2021

As can be seen from the selection of case studies presented, there is no one way of tackling this challenge and no one export tool or standard in use. It should be noted that work is currently underway by the DILCIS Board to create a Content Information Type Specification for ERMS (Electronic Records Management System) and an export tool to go alongside this. A talk by Karin Bredenberg (Kommunalförbundet Sydarkivera) at the 2021 Briefing Day on EDRMS preservation can be viewed for further background on this initiative.

The winding road to a CITS ERMS  - Karin Bredenberg (Kommunalförbundet Sydarkivera)

Remember that whichever approach you have taken, you will also need to establish what to do with any documentation and information you have gathered to inform your work. Consider which information should be preserved alongside the records and metadata to provide further context for future users.

Read More

Scroll to top