Thursday, November 28, 2013 10:46 AM
The College of Registered Nurses of BC (CRNBC) celebrated its 100th anniversary in 2012. To commemorate this milestone, a project was undertaken to digitize and make available online many decades of CRNBC publications, such as newsletters and annual reports. This collection documents the history of the college and the many nurses who contributed to its first 100 years, and perhaps most importantly, easily enables tracking of important decisions over the decades.
Printed copies of the publications were digitized by a service bureau, with Andornot then developing the online search and presentation system.
The new site is available at https://archives.crnbc.ca
As shown in the flowchart below, the workflow from print to online involved several stages and processes.
- The service bureau scanned the documents to specifications developed by Andornot, producing thousands of high-resolution TIFF images – one image for each page of each publication – as well as associated XML in ALTO format containing the full text extracted from the scanned images through an OCR process.
- Andornot developed scripts to extract metadata from these many separate files, such as the name and date of the publication, and to generate images in different sizes as needed for the interface. We used PowerShell, ImageMagick and djvulibre for this.
- Andornot developed a search engine using the Andornot Discovery Interface (AnDI) to provide the best possible keyword searching.
The interface and features were tailored to the specific needs of this project:
- Brief search results show details of the publication and a snippet of text showing the user’s search words highlighted, as well as a thumbnail image of the page containing the text, and facets to limit by date and publication.
- Clicking through to the full record shows the page in greater detail, but still with the search words highlighted. As well, the surrounding pages of the publication are also available allowing quick navigation through the entire publication. This was achieved through the use of the New York Times Document Viewer and custom programming to highlight text in an overlay layer.
- A PDF of the full document is also available for download. Andornot created these by stitching together the separate images files for each page back together into a single file.
- Permalinks allow users to easily bookmark and share specific pages and documents.
Often in a digitization project, the result might be a single PDF per publication. With this project, by having each page available as a separate image, we were more easily able to direct the user to the page and text they are most interested in, though they can still access a PDF of the entire document – the best of both worlds.
All of this complexity comes together to provide an elegant and intuitive interface for users.
A CRNBC staffer using the archive says, “This archive is awesome! We were able to search several decades of a policy issue in a short time, so we could draft an historical timeline showing policy changes right up to 2013! Searching this database saved us so much time.”
Contact Andornot for help with your own digitization project.