Andornot Professional Development Grant for 2018 Awarded to Gayle Graham

by Jonathan Jacobsen Wednesday, March 28, 2018 10:35 PM

We are very pleased to announce a recipient for the Andornot Professional Development Grant for 2018: Gayle Graham of the Nova Scotia Health Authority.

Gayle is relatively new to health sciences librarianship, and has never attended a conference specifically about health libraries. She will use the grant to attend the Canadian Health Libraries Association (CHLA) Conference in St. John’s this June. Gayle notes that “as a new organization, we would really benefit from an update on what's happening in the wider community of health sciences libraries.”

Andornot strongly believes in the value of attending conferences to foster professional development. We attend events across Canada and the United States all year long to learn about new trends and technologies, meet with clients, and share our expertise with like-minded folks.

We inaugurated this grant last year, awarding it to Mark Goodwin of the BC Cancer Agency. We were delighted to be able to offer this grant again this year, and only wish we could send everyone who applied to the conference of their choice. 

We hope that everyone who applied, and all of you, will also be able to attend a conference this year. Check out the list of ones we’ll be at and drop by to say hi if you can.

Tags: events | funding

Don’t overlook the obvious. How to help researchers find your collections.

by Kathy Bryce Monday, March 26, 2018 4:28 PM

Reams of websites and consultants offer search engine optimization (SEO) advice and services, to help people find your content and information.  However we’ve noticed that many of our clients are missing an obvious, no cost source of referral links that would help researchers find their sites.  Have you Googled your organization or the major subjects or people that are included in your collections?  Odds on Wikipedia will often be the first source listed in Google search results for people or place names. It therefore makes sense to make sure that your content and collections are findable through Wikipedia.    Don’t neglect this opportunity to promote your material to researchers who may be unaware of your existence, and to contribute back to the Wikipedia community.

As outlined below, Wikipedia is strictly non-commercial so we cannot add content for you.

“Wikipedia is a multilingual, web-based, free-content encyclopedia project supported by the Wikimedia Foundation and based on a model of openly editable content. Wikipedia is written collaboratively by largely anonymous volunteers who write without pay. Anyone with Internet access can write and make changes to Wikipedia articles, except in limited cases where editing is restricted to prevent disruption or vandalism. Users can contribute anonymously, under a pseudonym, or, if they choose to, with their real identity.”

We recommend you read the Guide to Contributing first before you get started.

  • Determine if there are any links from Wikipedia to your website. Go to https://en.wikipedia.org/w/index.php?title=Special:LinkSearch&target= and enter the URL of your site.
  • Check to see if your parent organization has a page. Maybe a link to your site or more information on the scope of your collection on their page would be adequate, and they can be asked to add this link for you.
  • Consider adding a link on existing Wikipedia entries for significant people, organizations or places that are well represented in your collection, and are therefore a useful source of information for researchers. If your collections management system offers permalinks, you can add the URL to a fonds level descriptive record or finding aid under either the External links or References section. This requires only minimal knowledge of the formatting in the wiki markup language.
  • Add a new page if nothing exists on a person or topic already.  You will need to check first that it meets the Wikipedia tests for notability, i.e. how the editors decide whether a given topic warrants its own article, and follow the content protocols and editing guidelines

To add more detailed content, check out the Wikipedia tutorial or watch their YouTube videos. There is also a useful video from the Archives Association of Ontario created specifically as an overview of the ways in which archivists can use Wikipedia to link to their online resources.  The page List of Archives in Canada shows how many of these archives do not yet have a specific entry. Check out the page for the Canadian Lesbian and Gay Archives for a good example to look at for possible content ideas.

Please contact us if you would like help with more general tips to help users find your content.

Tags: Archives

Adjusting Solr relevancy ranking for good metadata in the Andornot Discovery Interface

by Peter Tyrrell Thursday, January 18, 2018 4:00 PM

I learned an interesting lesson about Solr relevancy tuning due to a request from a client to improve their search results. A search for chest tube was ranking a record titled "Heimlich Valve" over a record titled "Understanding Chest Tube Management," and a search for diabetes put "Novolin-Pen Quick Guide" above "My Diabetes Toolkit Booklet," for example.

Solr was using the usual default AnDI (Andornot Discovery Interface) boosts, so what was going wrong?

Andi default boosts (pf is phrase matching):
qf=title^10 name^7 place^7 topic^7 text
pf= title^10 name^7 place^7 topic^7 text

The high-scoring records without terms in their titles had topic = "chest tube" or topic = "diabetes", yes, but so did the second-place records with the terms in their titles! Looking at the boosts, you would think that the total relevancy score would be a sum of (title score) plus (topic score) plus the others.

Well, you'd be wrong.

In Solr DisMax queries, the total relevancy score is not the sum of contributing field scores. Instead, the highest individual contributing field score takes precedence. It’s a winner-takes-all situation. Oh.

In the samples above, the boost on the incidence of “chest tube” or “diabetes” in the topic field was enough to overcome the title field's contribution, in the context of Solr’s TF-IDF scoring algorithm. I.e. it’s not just a matter of “the term is there” versus “the term is not there”, instead the score is proportional to the number of query terms the field contains and inversely proportional to the number of times those query terms appear across the whole collection of documents. Field and document length matters. Also whether the term appears nearer the front of the text.

So I could just ratchet up the boost on the title field and be done with it, right? Well, maybe.

As someone else* has said: DisMax is great for finding a needle in a haystack. It’s just not that good at searching for hay in a haystack.

The client’s collection has a small number of records, and the records themselves are quite short, consisting of a handful of highly focused metadata. The title and topic fields are pithy and the titles are particularly good at summarizing the “aboutness” of the record, so I focused on those aspects when re-arranging relevancy boosts.

New Solr field type: *_notf, a text field for title and topic that does not retain term frequencies or term positions. This means a term hit will not be correlated to term frequency in the field. It is not necessary to take term frequency into account in a title because the title’s “aboutness” isn’t related to the number of times a term appears in it. The logic of term frequency makes sense in the long text of an article, say, but not in the brief phrase that is a title. Or topic.

New Solr fields: title_notf, topic_notf

Updated boosts (pf is phrase matching):
qf=title_notf^10 topic_notf^7 text
pf=title^10 topic^7

Note that phrase matching still uses the original version of the title and topic fields, because they index term positions. Thus they can score higher when the terms chest and tube appear together as the phrase “chest tube”.

Also, I added a tie=1.0 parameter to the DisMax scoring, so that the total relevancy score of any given record will be the sum of contributing field scores, like I expected in the first place.

total score = max(field scores) + tie * sum(other field scores)

So, lesson learned. Probably. And the lesson has particular importance to me because the vast majority of our clients are libraries, archives or museums who spend time honing their metadata rather than relying on keyword search across masses of undifferentiated text. Must. Respect. Cataloguer.

Further Reading

Getting Dissed by Dismax – Why your incorrect assumptions about dismax/edismax are hurting search relevancy

Title Search: when relevancy is only skin deep

* Doug Turnbull, author of both articles above.

Library and Archives Canada announces launch of 2018 funding cycle for Documentary Heritage Communities Program

by Jonathan Jacobsen Sunday, December 03, 2017 4:45 PM

Library and Archives Canada has announced the launch of the 2018 funding cycle for the Documentary Heritage Communities Program (DHCP). This is the fourth year of a planned 5 year program, with $1.5 million available this year, as in previous rounds.

The DHCP provides financial assistance to the Canadian documentary heritage community for activities that:

  • increase access to, and awareness of, Canada’s local documentary heritage institutions and their holdings; and
  • increase the capacity of local documentary heritage institutions to better sustain and preserve Canada’s documentary heritage.

The deadline for submitting completed application packages is February 7, 2018. 

This program is a great opportunity for archives, museums, historical societies and other cultural institutions to digitize their collections, develop search engines and virtual exhibits, and other activities that preserve and promote their valuable resources.

The program is aimed at non-governmental organizations specifically, including:

  • Archives; 
  • Privately funded libraries; 
  • Historical societies;              
  • Genealogical organizations/societies;  
  • Professional Associations; and 
  • Museums with an archival component.

Businesses, government and government institution (including municipal governments and Crown Corporations), museums without archives, and universities and colleges are not eligible.

Types of projects which would be considered for funding include:

  • Conversion and digitization for access purposes; 
  • Conservation and preservation treatment; 
  • The development (research, design and production) of virtual and physical exhibitions, including travelling exhibits; 
  • Conversion and digitization for preservation purposes; 
  • Increased digital preservation capacity (excluding digital infrastructure related to day-to-day activities); 
  • Training and workshops that improve competencies and build capacity; and 
  • Development of standards, performance and other measurement activities. 
  • Collection, cataloguing and access based management; and 
  • Commemorative projects.

Further program details, requirements  and application procedures are available at http://www.bac-lac.gc.ca/eng/services/documentary-heritage-communities-program/Pages/dhcp-portal.aspx

How can Andornot help?

Many Andornot clients have obtained DHCP grants in previous rounds, and Andornot has worked on many other projects which would qualify for this grant. Some examples are detailed in these blog posts:

We have extensive experience with digitizing documents, books and audio and video materials, and developing systems to manage those collections and make them searchable or presented in virtual exhibits.

Contact us to discuss collections you have and ideas for proposals. We'll do our best to help you obtain funding from the DHCP program!

Omeka S Out of Beta and Ready for Use

by Jonathan Jacobsen Saturday, November 25, 2017 12:05 PM

About this time last year we blogged about a new version of Omeka, Omeka S, entering beta release. Now we're happy to see that a final 1.0 release of Omeka S has just been released.

Omeka is a free, open-source content management system (CMS) for online digital collections. With Omeka, you can quickly build a searchable repository of archival, artifact or other records and assemble them into virtual exhibits to showcase your holdings.

Andornot uses Omeka with select clients and as the basis of our Digital History Hub platform.

Most content management systems are designed to manage a single website with a hierarchy of pages, in which are placed text and other media. In contrast, Omeka is based around items (e.g. historic documents, photographs, audio or video recordings, etc.) which can be arranged into item sets and pages of items. One Item can be used in multiple ways, as part of different exhibits, for example.

An easy-to-use web interface provides site administrators with access to all the important back-end features: configuring the site appearance and navigation, uploading items (individually or in batches, such as from a database export), changing themes, and creating content pages.

Omeka S offers users a brand-new interface and features such as:

  • Manage multiple separate sites from a single installation of Omeka.
  • Build and publish pages, exhibits, or digital stories by adding and mixing different content blocks.
  • Create relationships between your resources - items, item sets, and media.
  • Use importers to bring in content from a spreadsheet or an Omeka Classic site.
  • Geolocate your content and display maps on sites using Mapping.
  • Connect your installation with Fedora and DSpace repositories, with the ability to update content periodically.
  • Use mobile-ready themes to customize the look of each site.

Omeka is a great choice for museums, archives, historical societies and others with cultural collections who want to make their collections searchable online. It's as easy to use for volunteers with little experience as by professional curators, archivists and historians.

More Information:

Month List