Reuters Picture Library

Requirement

The Reuters Picture Library embarked on a digitisation strategy to make around 30% of the archive available for customer requests online. The archiving requires the images to be well documented; categorised by hierarchies, and include keywords to enable the searching and rapid retrieval of specific assets. Simultaneously, news feeds of around 1,000 images relating to news breaking stories are submitted to the news desk daily, and those that are appropriate for potential future sales are included in the archive.

With such a vast quantity of assets to catalogue a major obstacle has been the ability for Reuters’ editors to rapidly create hierarchies of terms and keywords. Whilst an internal workflow system existed picture editors were finding the system too inflexible for creating picture specific terms since it addressed more news feeds and text requirements. Duplication of terms, inconsistency and varying interpretation of terms and semantics across editorial content creation were also a major concern for Reuters. Furthermore as Reuters adhered to stringent IT security policies the solution had to function as a web application with secure access for the inclusion of external users. Once the content had been created and approved, it had to reside on a new picture library system to enable sales executives to retrieve assets from the archive to fulfil sales enquiries.

Delivery

Following on from consultation discussions with Reuters,we were able to define a clear requirements analysis outlining the following:

The creation of a customised polyhierachical thesaurus system with a workflow to facilitate the reviewing process of created terms and keywords
Creation of the thesaurus content by Phocuus Ltd’s editorial team
Ongoing collaboration with Reuters editorial team to refine the thesaurus

Modifications were made to the existing keyword and categories module of Phocuus to enable the creation of a polyhierachical thesaurus. Integration with the task manager module enabled editors to be assigned specific nodes and trees to develop. The workflow allows the project manager to define the approval order to enable the completed work to pass through the various channels for sign-off and progression to the next stage of the workflow.

Once signed off the node is released as read-only to enable other users to view, share, and copy that category resulting in a related polyhierachy. Since each node is associated with its originator the system allows updates, amends and additions to the hierarchies to be shared with other users. A comments facility enables an editor sharing users terms to make suggestions and flag if they feel the terms are inappropriate, which submits an email to the project manager to review and assess the information. Comprehensive searching facilities enabled editors to search to find existing terms for using in their tree construction and so reduce the need for duplication.

Once an approved node was returned to the editor, they could progress to the next phase of the content creation, the generation of keywords, gerunds, plurals and Americanised terms.Since the refining process is continual the workflow functions can be re-assigned to different stages of the workflow and different editors. The audit trail enables the senior editor or project manager to trace back the different terms and hierarchies and review the comments provided.

Due to Reuters’ strict IT policy there were concerns for the direct integration of a third party application. As a result the data structure facility, which could generate XML, CSV and text file output enabled the required data to be generated from the system. Since the data structures created within Phocuus mirror that of the new Reuters picture library system the migration to the new system of the content was a relatively smooth process.

Phocuus continues to be in use by Reuters and Phocuus Ltd continues to provide an ongoing support service for the upkeep of the application and the ongoing refinement of terms for inclusion in the new picture library system. We are also in discussions with Reuters for phase 2 developments, which will culminate in the data mining of captions created for an image, from which keywords and hierarchies could be automatically created for assigning to the images and re-importing to the system utilising IPTC.