The OECD (Organisation for Economic Cooperation and Development) is an 'nternational government organisation that creates knowledge. Its reports and analysis focus on important social and environmental issues, and are underpinned by data collected from participating national statistics offices and other sources. With more than 5 billion data points and constant updates, the challenge was how to preserve the dynamic data, in a format facilitating use and ensuring long-term availability This was achieved by taking periodic snapshots of the data and creating CSV (Comma-Separated Values) Archive Files, for dissemination alongside other digital content on the research platform, OECD iLibrary (www.oecd-ilibrary.org).

The publishing unit is charged with maximising dissemination in a sustainable manner that covers all costs. Consequently, under a 'freemium' business model similar to that used by the mobile game industry, the basic READ service allows all material to be freely consulted onscreen, whilst access to premium formats, such as PDF and ePub, is reserved for users at subscribing institutions. 

All content on the platform conforms to the accepted standards of academic journal publishing industry, by having full metadata support information about and describing it (e.g. author, title, date of publication, abstract, etc.). This metadata not only simplifies discovery on OECD iLibrary itself, but is also shared with numerous other services and channels that large institutional libraries install to assist their patrons with research. OECD was one of the first publishers, starting in 2000, to organise its monographs so that they would fit into a system that had previously been reserved for journals, and now every book item, down to table and graph component, has full bibliographic details and an abstract. In addition, each one has a DOI (Digital Object Identifier), providing a permanent URL link for citation purposes.

At the heart of all OECD publications is the data it gathers from participating countries' National Statistics Offices and other sources. Following checking and verification, the data is housed in a huge virtual warehouse, containing more than 5 billion cells, that is freely accessible to all OECD iLibrary visitors. For subscribers, the data is divided into one of 1,200 datasets to facilitate navigation and extraction. Once the data has been downloaded and used in a report, however, the problem has been that there is no guarantee that exactly the same set of figures can be found at a later stage. this is due to the dynamic nature of the data in the warehouse, with updates, revisions and additions constantly being made.

The creation of digital archive editions of the datasets therefore resolves this issue, by freezing the data and saving it as CSV (Comma-Separated Values) files. These are treated in the same way that other OECD content is, with the layer of bibliographic metadata being added and DOI allocated, available on OECD iLibrary alongside the other content, ready for discovery and retrieval.

The CSV Archive Files mean that a complete dataset can now be downloaded with a single click, simultaneously, by multiple users upon publication, confident in the quality of the file that has been alidated by OECD staff. There are no more delays or problems related to producing, shipping and managing CD-Roms, the medium previously used for archival purposes. The simple structure of CSV also makes it suitable for integration into other software. The dataset 'snapshots' are scheduled according to the regularity of the updates and the size of the dataset concerned.

OECD iLibrary is undergoing a major overhaul with the release of a new version of the platform due for release towards the end of 2017. To avoid duplication of the development work involved, once live, one or more of the so-called 'deep archive' solutions provided by LOCKSS, CLOCKSS or Portico will be deployed, to cover so-called catastrophic failure and ensure the long-term availability of the content, including CSV files, beyond the OECD's existence.


Scroll to top