James Doig, National Archives of Australia

 

Corporate EDRMS

The National Archives of Australia (NAA) has had a corporate EDRMS since 2000.  The product initially purchased was TOWER Software’s TRIM Captura, which NAA upgraded to TRIM Context in 2006.  NAA has regularly upgraded the EDRMS and we have now deployed Micro Focus Content Manager 9.4 in all State and Territory offices.  The EDRMS technology used by NAA has remained the same - TOWER software was acquired by Hewlett-Packard in 2008, which sold its software division to Micro Focus of the UK in 2016.

The EDRMS is integrated with Outlook and the common desktop record-creating applications, so that emails and documents can be checked into TRIM from the applications themselves, or by drag-and-drop.  Other government agencies have customised more complex integrations, for example Sharepoint-EDRMS.

 

Project background and outcomes

In 2012, NAA commenced a project to sentence all records in TRIM Context created between 1 January 1998 and 31 December 2008, export the “Retain as National Archives (RNA)” component from the EDRMS, and ingest the RNA component into the digital preservation system.  Although the project was completed almost ten years ago, and noting that processes and system functionality have improved in the meantime, there are still some useful lessons learned, particularly regarding preservation issues.

The project team comprised 4 people including three sentencing officers, and indeed the focus was on sentencing, a uniquely Australian term for the process of applying disposal decisions to records using the legal authorisation – a Records Authority (RA) for functions specific to an agency, or a General Records Authority (GRA) for general administrative functions.  The project took about 8 months to complete, and sentencing, which was effectively a manual process, took about 5 months.  About 34,000 TRIM files or containers were sentenced comprising about a million records.  Records were sentenced at TRIM file/container level, unless there was a good risk-based reason to go into the file and look at actual records.  The proportion of records sentenced “Retain as National Archives (RNA)” was about 10%, quite a high proportion compared with physical records (generally 3-5% for permanent retention), and about 3,000 files, comprising close to 100,000 records, were ingested into the digital preservation system.

More detailed project statistics are as follows:

  • 31,693 TRIM files/containers were sentenced (comprising over a million records)

  • 2% were approved for destruction in 2012

  • 80% were identified for destruction in future years

  • 2% were placed on hold

  • 10% were transferred as Retain as National Archives (RNA)

  • 6% were identified for destruction using the Normal Administrative Practice (NAP) Policy (empty, redundant or practice files or files with no documents attached)

 

Sentencing

EDRMSs are defined by their compliance with international recordkeeping standards such as ISO 15489 and ISO 16175.  Therefore, EDRMS products must have appraisal and disposal functionality built into them.  In this case, following sentencing, the appropriate disposal class (a unique number that links to a disposal action in a RA or a GRA) and the disposal action (RNA, Destroy, NAP) were entered into the EDRMS and a User Stamp applied, which automatically applied the name of the sentencer and a timestamp. 

The time-consuming, manual approach to sentencing was identified as a significant pain point and subsequent work has focused on the feasibility of using AI and machine learning technology to automate disposal decisions; that is, to develop an accurate and scalable way to decide the value of government digital information and data in order to determine whether it should be retained or destroyed.

 

Destruction concurrence

The process of obtaining business owner approval to destroy records is known as destruction concurrence (or just concurrence).   Concurrence was automated as a digital workflow in the EDRMS, which created efficiencies, though at times it was difficult to identify the business owner due to organisational change over time.  More importantly, staff were still using records whose destruction due date had passed, so in many cases records were retained in the system and not destroyed.

 

Review of records to confirm RNA status and quality check metadata

Key record metadata (record number, title, disposal class, security level, date created, date closed) were exported into a spreadsheet and reviewed to confirm RNA status and to quality check and correct errors in record titles (including expanding acronyms).  In addition, a unique item number was applied to each record, a requirement of NAA’s archival management system.  When the review was complete, the revised metadata file was imported back into the EDRMS using the TRIM import/export application called TRIMDataPort. 

 

Export RNA records and metadata

Using TrimDataPort, records identified as RNA were exported out of the EDRMS into a directory location.  The record export process does not retain the physical aggregations of records represented in the EDRMS (e.g. Containers: in the NAA example “Files” and “File Boxes”), for example via a directory/folder structure.  Rather, these aggregations are represented in recordkeeping metadata via the record number, and so could be reconstructed in the archival management system through item relationships such as Item/Sub Item or Aggregate Item/Constituent Item. 

Also, at the digital file level, files were given TRIM database identifiers (e.g. rec_1387634.DOCX), rather than the record title given by the record creator.  Since this exercise, the functionality now exists to choose the record title, record URI, database ID, or a combination of the three.

TRIMDataPort exported recordkeeping metadata to delimited form (e.g. CSV).  While TRIMDataPort can export metadata in XML, NAA’s archival management system requires metadata to be imported in a delimited format.  In addition, the archival management system can import and manage only a subset of the full suite of recordkeeping metadata, and decisions needed to be made about what, if any, additional metadata needed to be retained (for example, is it necessary to retain Movement History and TRIM audit metadata?) and how to manage the additional metadata (for example, this metadata could be managed as a Control Series in the archival control system, or managed in the digital preservation system).

Dates are always critical for archival control.  The NAA’s archival control system requires Date Created, Date Last Updated (particularly important for us as it determines when a record becomes publicly available), and Date Registered (what the Australian Series System calls Accumulation Date).

 

Ingest into the digital preservation system

A Submission Information Package was created using an in-house developed SIP creator, which included the generation of checksums for each digital object.  The SIP was successfully ingested into the NAA’s bespoke digital preservation system.  Note that NAA has recently procured Preservica as the replacement digital preservation system, and different tools and processes are in development.

 

Lessons Learned

There were many useful lessons learned from this project, and these have fed into improved processes and better use of Content Manager functionality.  Those listed here relate directly to digital preservation and ongoing access issues:

  1. Emails with stub attachments linking to a record in the EDRMS are problematic when exported from TRIM.  These stub attachments are usually made when sending a record reference from the EDRMS, but they can also be made via Outlook.  These links fail when records are exported from TRIM, and automatically generated metadata about the linked record (usually the record number and record title) might not have been retained in the body of the email, therefore the record is incomplete.  Even if the record number was retained, this doesn’t mean that this was the version of the record actually emailed as it could well have been edited after the email was sent.

  2. A possible solution to this problem would be to capture version number, not record number in the body of the email.  However, NAA transferred finalised records, not TRIM versions.  Unless versions were captured as separate records in the EDRMS, versions would not be captured.

  3. A key lesson learned was the need to do a detailed analysis of formats prior to ingest into the digital preservation system.  An EDRMS is not fussy about what formats you can check into it, and we’ve found there are lots of complex formats in TRIM that we could have identified up front as needing, for example, better documentation, such as dozens of legacy Access databases and AutoCAD files.  Some complex formats, for example aggregate email formats such as PST and MBOX could usefully be de-aggregated and described prior to ingest.  The preferred approach to format analysis would be to use a format identification tool such as DROID following record export from the EDRMS.  There may also be EDRMS functionality to run a report, for example by file format extension, though this isn’t a failsafe method of identifying format.

  4. A good example of problems resulting from not undertaking a thorough analysis of formats is issues encountered dealing with a couple of email formats.  A feature of earlier versions of TRIM is that MS Outlook emails, when checked into TRIM, were saved as TRIM Outlook Saved Message Format with the extension VMBX.  A similar format is MS Windows Outlook Express email, which has an extension MBX.  These formats are plain text files; any attachments are base64 encoded in the body of the file.  While VMBX and MBX files can be rendered perfectly in the TRIM viewer, when exported from TRIM the base64 encoding will need to be decoded for access.  We have about 40,000 of these files in the digital preservation system and we’ve made a PRONOM submission for them so that they can be identified by PRONOM-based format identification tools.  Later versions of TRIM, or what is now called Micro Focus Content Manager, has a built in Mail Conversion Format tool that can migrate these formats to EML.

  5. As described above in the section on destruction concurrence, TRIM can also automate authorisation and approval workflows, for example authorising expenditure.  These digital workflows are retained as metadata, which will need to be retained if the authorisations/approvals are part of the RNA record.

  6. The MS Windows character limit on file names (260 characters) caused problems, but once the problem was identified it was possible to script a solution.

  7. Finally, archival control of records from EDRMSs is not a trivial exercise.  Good archival management depends on a number of factors that can be hard to control.  First is the quality of recordkeeping metadata.  Archives reuse recordkeeping metadata for archival control, so the quality of metadata is critical, and this can vary dramatically within and between government agencies, particularly record titles.  Second is the difference between EDRMS metadata capability and the metadata capability of the archival control system.  Decisions need to be made about what recordkeeping metadata is retained and where it is stored.  Third is the sophistication of the archival control system to properly manage record relationships and representations.  In other words, what do you do if the data model of your archival control system can’t deal with the complex web of record relationships that we see in EDRMSs?  This issue is not just about effectively documenting relationships within and between records, but also relationships within and between other entities, and relationships/integrations with other software applications.  The solutions – replacing the archival control system, or introducing a new data model - are significant, long-term projects.  This issue resulted in a large project to develop a new data model, the Archival Control Model, for government records, and a revised metadata schema.

 

Conclusion

The key learning of the project was the need to fully analyse and understand the EDRMS prior to transfer and ingest.  EDRMSs provide a range of options for configuration. Options may include differing system interfaces (web, simplified and full featured versions), methods of integration with other software applications, presentation of search results, and export options/functionality.  Some system settings may affect the operation of other, seemingly unrelated, aspects of the system.  Reasons for choosing certain options, views and settings should be documented and understood.  Similarly, the EDRMS does not operate in isolation.  Policies and guidelines governing use of the EDRMS should be understood as part of the system analysis process and also captured in the transfer process, for example there may be rules governing titling, capturing record versions, email attachment record references and so on.


Scroll to top