Diageo Archive AI Cataloguing Assistant Application
Diageo Archive – Alice McFarlane
![]() |
The Diageo Archive AI Cataloguing Assistant application was developed to address a fundamental digital preservation challenge: digital assets cannot be effectively preserved if they are not adequately described, structured, and managed.
Across many archives, valuable digital and digitised materials remain difficult to locate, interpret, or reuse because cataloguing is time-intensive, inconsistent, and difficult to scale. Without intervention, this creates long-term risks including loss of context, duplication, reduced accessibility, and eventual loss of value.
The project set out to test whether AI could be applied in a controlled and accountable way to reduce these risks by strengthening cataloguing workflows rather than replacing them. The AI application combines OCR, image analysis, structured metadata extraction, controlled vocabularies, and rule-based validation within an Azure-hosted workflow. The system processes digital files, proposes metadata, standardises values against agreed schemas, and flags uncertainty for human review.
A core principle of the project is that preservation quality must be designed into workflows from the outset. Rather than functioning as a black-box automation tool, the assistant operates within clearly defined archival rules. Controlled vocabularies are applied to key fields such as brand and category; dates are normalised using defined policies; and records are flagged where information is incomplete, ambiguous, or unsuitable for automated inference. This ensures that outputs remain transparent, auditable, and aligned with archival standards of authenticity, provenance, and accountability.
The workflow introduces early-stage preservation awareness into cataloguing. By improving metadata completeness and consistency at scale, the system directly strengthens the long-term accessibility, interpretability, and management of digital assets. In practice, assets that would otherwise remain effectively inaccessible become structured, searchable, and usable within archive systems. The workflow introduces early-stage preservation awareness into cataloguing by identifying incomplete, inconsistent, or uncertain metadata and surfacing these through QA outputs. This supports more informed management decisions and strengthens the long-term accessibility and usability of digital assets
The prototype has been tested against real archive materials, including digitised packaging, labels, advertising, photographs, and brand records. These materials have ongoing cultural, evidential, and commercial value, and require consistent description to remain usable over time. Initial testing demonstrated that AI-assisted cataloguing can significantly reduce repetitive manual effort while improving consistency across records. Outputs include structured metadata files, QA reports, and logs, supporting both operational workflows and preservation requirements.
Importantly, the project was delivered using an MVP approach. Rather than attempting full-scale implementation, it focused on validating feasibility, usability, and preservation impact at a practical level. This approach ensured that the system could be tested against real workflows, with archivists retaining full oversight of decisions and quality control. Early results indicate that the assistant can reduce cataloguing effort for digital assets while improving metadata quality, with further evaluation planned to quantify performance across larger datasets and more diverse asset types.
Innovation in this initiative lies not simply in the use of AI, but in how it is governed and applied. The assistant demonstrates a model in which automation supports, rather than replaces, archival expertise. It embeds preservation principles directly into cataloguing workflows, metadata quality, and auditability central to system design. This approach is particularly relevant in commercial archives, where digital assets must remain accessible and trustworthy over long timeframes.
For Diageo, the AI Cataloguing Assistant application provides a scalable pathway to improve archive processing capacity and reduce long-term preservation risk. For the wider digital preservation community, it offers a transferable model for integrating AI into cataloguing in a controlled, transparent, and preservation-aware way. By focusing on workflow transformation rather than isolated tooling, the project contributes to more sustainable and resilient approaches to managing digital heritage.
DPC Members, login to reveal the link to the voting form!
Votes must be cast online by 1200 (BST/UTC+1) on Monday 6th July.



















































































































































