Electronic files often depend on software that can become outdated and even obsolete. When a type of file becomes obsolete in an archive, it is converted to another file format (e.g. from spreadsheet to PDF). However, this conversion has risks and consequences. With a spreadsheet, the outcome of a formula could remain visible, but the underlying calculation could no longer be accessible. By researching significant properties of spreadsheets, it is possible to look at the types of properties that need to be safeguarded during conversion. Of course, this only applies to cases wherepreservation in the original format is not possible. The priority given to certain functionalities can best be examinedfrom the perspective of stakeholders, such as creators, users, and managers.

Previously, a framework was developed by the InSPECT project. This framework states that two analyses are required to define the significant properties of spreadsheets. The first analysis, the object analysis, has been completed priorby the Archives Interest Group of the Open Preservation Foundation (OPF AIG). This analysis investigated which properties could be found in spreadsheets. The main focus of this study was the second analysis, the Stakeholder Analysis. What do stakeholders themselves find important to retain?

Lotte Wisjman OPF Meeting

In order to answer this question, three parts were carried out which together form a toolbox and methodology that can be applied by archives for future research. The first part consisted mainly of exploratory questions. Examples of these questions were why these stakeholders use spreadsheets and how they qualify their own level of knowledge. In addition, the stakeholders were requested to come up with five properties that seemed important to them when it came to preserving spreadsheets. These five properties could then be contrasted in the second part of the stakeholder analysis. The Object Analysis of the OPF AIG yielded 334 properties. These were then further divided into 21 groups. These 21 groups were presented in the form of a catalogue. Participants in the study were asked to choose the five groups that they deemed to be most significant. On the basis of these two parts, a follow-up interview wasconducted with 1/3 of the sample. These interviews focused more on the background and preservation intent of the stakeholder.

Based on the gathered information, a qualitative and statistical analysis were carried out. Both analyses clearly showed that the dynamic functionalities of spreadsheets are of importance. Stakeholders indicated in the interviews that formulas, the use of external data, and the sorting and filtering options were the reason that they opted to use spreadsheets instead of a simple text document like Word. This finding is further supported by the most popular groups in the catalogue: formulas, external data and pivot tables were among the five most popular groups. However, a critical note needs to be added. When a stakeholder does not use these functionalities, they will not assign significance to them. ‘One size does not fit all' is the motto here. The statistical analysis also showed that the stakeholders who stated to have a higher degree of knowledge of spreadsheet functionality were more likely to view pivot tables as significant. The level of knowledge of the stakeholder therefore also has a major influence on what can be considered significant.

Lotte Wijsman ImageCatalogue

One must conclude that, at this point in time, consensus on one format cannot be found. The format to which spreadsheets should be converted should therefore be dependent on the functionalities used. A handy tool to see which functionalities are utilised by the stakeholder is the Spreadsheet Complexity Analyser. This tool extracts data from the spreadsheet, such as how many formulas, shapes, hyperlinks, and cells have been used. This could be extended for follow-up research in order to assess spreadsheets on their nominal value. In this way, spreadsheets could then be classified by type. By linking various formats to this, it is possible to see how the spreadsheet can be preserved with maximum retention of the functionalities.

This nominal value could be an addition to the preservation intent. One should first start with the question: why shouldit be preserved? After this, one can look at the significant properties and whether these correspond to the stated intention. Formulas, pivot tables, and sorting and filtering options are often used when stakeholders conduct analyses. In order to maintain the original intention of the spreadsheet, these functionalities will have to be preserved. The nominal value then serves as a check of the preservation intent. Are the cited significant properties indeed present intheir spreadsheets?

The tools and methodology in the study serve to identify patterns in stakeholder behaviour. What do they use, why do they use it, and how do they use it? Together with a preservation intent and results from the Spreadsheet Complexity Analyser, the significant properties of spreadsheets can thus be determined. This study provides a framework that caneasily be adopted by archives in order to conduct their own stakeholder analyses. Thereby, they would be able to identify which formats are most suitable for long-term preservation of their specific spreadsheets. Due to the increasing number of spreadsheets being received by archives, identifying these formats is of increasing importance.

The study was presented to and well received by the National Archives of the Netherlands, the OPF AIG, and the University of Amsterdam. The Danish National Archives and the National Archives of Estonia, members of the AIG, have expressed interest in adopting the methodology outlined in this study. The main results will also be incorporated in a working paper by the AIG on significant properties of spreadsheets.

Read more:

Report: https://doi.org/10.5281/zenodo.3971833

Catalogue: http://doi.org/10.5281/zenodo.3902080

Scroll to top