Beat Mattmann is Data Librarian FDM & DLZA and Iris Lindenmann is Scientific Assistant for Research Data Management at the University of Basel in Switzerland


Background

About ten years ago, a music archive took over the private archive of a composer who had already begun composing with digital techniques in the 1980s. The result is impressive: the composer transferred his work on not less than 700 data carriers to the archive, including 660 floppy discs, 26 SyQuest carriers and a few carriers from the families of Iomega Jaz, Iomega Zip, CD-ROM and Harddisk. The composer has used Apple systems and proprietary special software (music notation and sequencing software) in his work.

So far, the music archive has had a rather traditional analogue orientation. A few years ago, however, the archivists created images of the data carriers and stored it separately to back up the digital data - as long as the data carriers were still readable. The reason for a later, deeper examination of the estate in 2018 was the research stay of a scholar from abroad. Using a few digitally created works, she wanted to examine the composer's creative process and working method in this digital environment. The accessibility of the data was difficult: drives for the data carriers as well as the necessary software for reading the files were not available at their modern workstations. Old computers and drives from the personal collections of archive staff and from the private archive itself were therefore restored. Fortunately, the required proprietary software and the corresponding licenses were also available in the private collection of an archive employee. The researcher was thus able to examine part of the data on the old devices during her stay, but not all data carriers were readable, or she could not always recognize the file types and the assignment to the matching software.

The experiences from this research stay drove the archive to find a long-term secure archiving and usage possibility for future requests regarding digital archives. The present archival collection served as a welcome pilot project, not only for the music archive itself. Since no in-house know-how and insufficient capacities were available, a collaboration with the University Library of Basel was found, which at that time was also interested in various research-related use cases, which were suitable for gaining experience in the field of digital long-term archiving.

Hardware: data carriers, ports, infrastructure

The music archive already new about the problem of the partially no longer readable data carriers and we were prepared accordingly. However, the full extent of the problem only became apparent when we tried to detach the data or create an image. While the existing Iomega Zip, Iomega Jaz and SyQuest 44MB data carriers and the single external hard disk can be processed with more or less manageable effort (although not all of them are readable), the approximately 660 3.5'' floppy disks played out their individual characteristics to the full. In addition to the most recent HD / DS (High Density / Double Sided) types, there were also many Double Density discs and a few Single Sided discs. Quite a few of the carriers were specifically designed or formatted for use with synthesizer equipment and could not be read with classic floppy drives. Others showed the presence of disk errors. Without having completed the collection, we can already say, that a larger package of work is waiting for a professional data forensicist.

Since we built up our infrastructure for digital long-term archiving from scratch and there was hardly anything left of older equipment in the library, we had to procure many things newly. Within a year, several workstations (Windows, Ubuntu with BitCurator and Mac OS) accumulated. Specifically for the case described here, two older Macintosh PowerBooks - G3 Wallstreet Series II from 1998 and 1400c from 1996 - were acquired via Ebay in addition to a modern MacBook Air from 2014 from our personal collection for Mac problems of all kinds. When selecting the types we were guided by an excellent blog article by Porter Olsen on the topic of the so-called "Rosetta computers" (also worth reading: the explanations on siber-sonic.com on the same topic). These two devices allow a wide (but not comprehensive) readability of old floppy disks (regardless of type or formatting). Both machines are fully functional and are appropriately equipped with removable drives for floppy and Iomega Zip, allowing easy data transfer to a modern workstation after image creation. In addition, several external CD/DVD/BluRay drives were added, as well as an external drive for floppy disks and one for Iomega ZIP media up to 750MB. In order to be able to counter data carriers like SyQuest EzDrive and 44MB, we could thankfully fall back on permanent loans of the music archive (partly even from the archival collection described here). How we manage the connection and the workflows from the old (partly internal) drives with SCSI interface to the modern workstations and thus finally to our digital archive, will still keep us busy in the upcoming months.

Beat 1

Our equipment (at least a part of it)

Software: Operating systems, peculiarities, files

The software follows the hardware: The floppy disks in the collection were in use at the time of the late 1980s and 1990s - a time when the world of Mac OS was in transition and caused one or the other software incompatibility with a system break when switching from so-called Old World ROMs to New World ROMs. The three existing Apple devices currently available cover the operating systems 7.5.3, 8.1 (both Old World) and 10.14 Mojave (New World).

Beat 2

Source: https://en.wikipedia.org/wiki/Macintosh_operating_systems

One level lower on the data carriers, in addition to classical archival documents such as correspondence and text documents of all kinds (sketches, calendar entries, notes ...), there are also countless special files for programs for notation and music sequencing such as Digital Performer (or its predecessor Performer), Professional Composer or Finale. Here, too, the system discontinuity described above plays an important role. Thus, Performer was completely redesigned for OS X (and released as Digital Performer) and the further development of Mosaic, the successor of Professional Composer, was discontinued. Another stroke of luck helped us here: In the archival collection itself as well as in the music archive there were not only installation disks of the used software, but also some licenses.

A further technical level later, when the workflows for Pre-Ingest were working up to a certain level, there were further subtleties that had to be solved. One example: Interestingly, in the late 1980s and 1990s it was possible to use file names with special characters. You can find question marks or slashes, but also spaces and even line breaks at the end of a file name. A circumstance that today's systems cannot really handle. It is also difficult that the classic programs for file identification and validation such as JHOVE or DROID do not necessarily help with these files, as they simply cannot identify them.

Content-related aspects: Redundancy, versioning, appraisal

On the content level, the composer himself has backed up his work on various data carriers, partly also during the creation process of a work. In addition, one program he worked with backed up his work every ten minutes during the creative process. Although this systematic version history from the creation process of the works is interesting for research, but it causes significant data potentiation and confusion, especially since program also created automated backups when the program was open but not actively used. In these cases, a checksum comparison could identify identical file versions and support the evaluation. This could lead to a reduction of the amount of data to be archived and to more clarity, but it requires a content analysis, for which we have rather little capacity.

An evaluation of the content and selection of the data is also necessary in other cases. When creating backups, system files and software were also written to the disks. This is useful for interpreting files whose file type we could not recognize, or when special software that is no longer easily available today is needed for emulations. However, it would be sufficient to archive these system files once. Here a comparison would have to take place over all data carriers and a selection would have to be made which system files are archived meaningfully and which not. In addition, it must be questioned whether all backups created by the composer are archived. Some of the data on these backups are identical.

However, who should when carry out this evaluation and according to which criteria should it be proceeded? What happens to files and data carriers that we can no longer read, not even with data forensics? At best, an evaluation by experts from the music archive takes place after the detachment of the data from the data carrier. However, several persons should always carry out the evaluation.

Based on similar guidelines that we have already developed for the evaluation of audiovisual documents, we primarily see the following headings as criteria:

  • Information content and expressiveness
  • Conservation status and readability
  • Existence of duplicates
  • Interest of users and researchers today or in the future

Files that unfit for preservation or have become unusable are ideally deleted. In the additionally archived image, the files would still be saved for possible future needs. In addition, every evaluation process and every deletion should be documented for reasons of maintaining authenticity and integrity.

No archiving without later use

Due to the problems described above and many file formats that we cannot be recognized without a great deal of analysis effort, it is difficult to migrate with our standard solution in the Digital Archive: What are the target formats? With which tools do we migrate? Are our ideas feasible at all on a Windows or Linux-based ingest system?

Tests with the researcher made it clear: A migration without noticeable loss of information and functionality is often not possible, at least for files that go beyond simple text documents. An anecdote here: When the aforementioned researcher examined the transfer of composition and sequencer files to newer systems, there was no direct connection between the systems (hardware/interfaces and software). In a few cases, the transfer was achieved by systematic copying and migrating, which, however, involved considerable effort and was not possible without loss of information. The software's diverse functionalities could not be mapped 1:1 in newer versions and alternative programs. The researcher therefore attempted to film the dynamic or interactive control elements of the software from the screen when playing a composition. Unfortunately, the existing screens were all too small to open and display all controls simultaneously.

Therefore, from the beginning we had a reasonable but pragmatic re-use in mind. Ultimately, for this specific case, we repeatedly ended up with the EaaSI (Emulation-as-a-Service Infrastructure) project. A browser-based emulation, which makes the use possible without great effort - authentic in the former system with the original software - seems ideal to us. A test application, provided by the company OpenSLX respectively Klaus Rechert, we were already allowed to try out, further clarifications are planned.

Conclusion

The first conclusion that we can draw without having already completed the work is that the archive stock presented here is a very demanding use case with complex problems for setting up a digital long-term archiving system. The gain of knowledge and the learning curve in the short project period so far was enormous. Of particular interest were the aspects of evaluation and the significant features that we have to rethink in this inventory. In addition, the courage to pragmatism is imperative. It was very helpful that with this archive we had someone at our side who knew the stock and the former technology well and could also contribute equipment.

This use case is certainly not a standard case for our digital archive. The time used for analysis and pre-ingest in the case described here was therefore disproportionately high. While the broad mass can otherwise be ingested and reused relatively easily, here the exploration of alternative ways was necessary.

What this use case illustrates very well - but what we also found in other use cases that we have dealt with - is the fact that we always try to think beyond the mere storage of the data and to think about reusability right from the start. The decisive factor for reusability can also be the preservation of functionalities that are not linked to the data but to the software used. The solutions for this approach sometimes differ greatly from the pure preservation of the data.


Scroll to top