Don’t overlook the obvious. How to help researchers find your collections.

by Kathy Bryce Monday, March 26, 2018 4:28 PM

Reams of websites and consultants offer search engine optimization (SEO) advice and services, to help people find your content and information.  However we’ve noticed that many of our clients are missing an obvious, no cost source of referral links that would help researchers find their sites.  Have you Googled your organization or the major subjects or people that are included in your collections?  Odds on Wikipedia will often be the first source listed in Google search results for people or place names. It therefore makes sense to make sure that your content and collections are findable through Wikipedia.    Don’t neglect this opportunity to promote your material to researchers who may be unaware of your existence, and to contribute back to the Wikipedia community.

As outlined below, Wikipedia is strictly non-commercial so we cannot add content for you.

“Wikipedia is a multilingual, web-based, free-content encyclopedia project supported by the Wikimedia Foundation and based on a model of openly editable content. Wikipedia is written collaboratively by largely anonymous volunteers who write without pay. Anyone with Internet access can write and make changes to Wikipedia articles, except in limited cases where editing is restricted to prevent disruption or vandalism. Users can contribute anonymously, under a pseudonym, or, if they choose to, with their real identity.”

We recommend you read the Guide to Contributing first before you get started.

  • Determine if there are any links from Wikipedia to your website. Go to https://en.wikipedia.org/w/index.php?title=Special:LinkSearch&target= and enter the URL of your site.
  • Check to see if your parent organization has a page. Maybe a link to your site or more information on the scope of your collection on their page would be adequate, and they can be asked to add this link for you.
  • Consider adding a link on existing Wikipedia entries for significant people, organizations or places that are well represented in your collection, and are therefore a useful source of information for researchers. If your collections management system offers permalinks, you can add the URL to a fonds level descriptive record or finding aid under either the External links or References section. This requires only minimal knowledge of the formatting in the wiki markup language.
  • Add a new page if nothing exists on a person or topic already.  You will need to check first that it meets the Wikipedia tests for notability, i.e. how the editors decide whether a given topic warrants its own article, and follow the content protocols and editing guidelines

To add more detailed content, check out the Wikipedia tutorial or watch their YouTube videos. There is also a useful video from the Archives Association of Ontario created specifically as an overview of the ways in which archivists can use Wikipedia to link to their online resources.  The page List of Archives in Canada shows how many of these archives do not yet have a specific entry. Check out the page for the Canadian Lesbian and Gay Archives for a good example to look at for possible content ideas.

Please contact us if you would like help with more general tips to help users find your content.

Tags: Archives

Adjusting Solr relevancy ranking for good metadata in the Andornot Discovery Interface

by Peter Tyrrell Thursday, January 18, 2018 4:00 PM

I learned an interesting lesson about Solr relevancy tuning due to a request from a client to improve their search results. A search for chest tube was ranking a record titled "Heimlich Valve" over a record titled "Understanding Chest Tube Management," and a search for diabetes put "Novolin-Pen Quick Guide" above "My Diabetes Toolkit Booklet," for example.

Solr was using the usual default AnDI (Andornot Discovery Interface) boosts, so what was going wrong?

Andi default boosts (pf is phrase matching):
qf=title^10 name^7 place^7 topic^7 text
pf= title^10 name^7 place^7 topic^7 text

The high-scoring records without terms in their titles had topic = "chest tube" or topic = "diabetes", yes, but so did the second-place records with the terms in their titles! Looking at the boosts, you would think that the total relevancy score would be a sum of (title score) plus (topic score) plus the others.

Well, you'd be wrong.

In Solr DisMax queries, the total relevancy score is not the sum of contributing field scores. Instead, the highest individual contributing field score takes precedence. It’s a winner-takes-all situation. Oh.

In the samples above, the boost on the incidence of “chest tube” or “diabetes” in the topic field was enough to overcome the title field's contribution, in the context of Solr’s TF-IDF scoring algorithm. I.e. it’s not just a matter of “the term is there” versus “the term is not there”, instead the score is proportional to the number of query terms the field contains and inversely proportional to the number of times those query terms appear across the whole collection of documents. Field and document length matters. Also whether the term appears nearer the front of the text.

So I could just ratchet up the boost on the title field and be done with it, right? Well, maybe.

As someone else* has said: DisMax is great for finding a needle in a haystack. It’s just not that good at searching for hay in a haystack.

The client’s collection has a small number of records, and the records themselves are quite short, consisting of a handful of highly focused metadata. The title and topic fields are pithy and the titles are particularly good at summarizing the “aboutness” of the record, so I focused on those aspects when re-arranging relevancy boosts.

New Solr field type: *_notf, a text field for title and topic that does not retain term frequencies or term positions. This means a term hit will not be correlated to term frequency in the field. It is not necessary to take term frequency into account in a title because the title’s “aboutness” isn’t related to the number of times a term appears in it. The logic of term frequency makes sense in the long text of an article, say, but not in the brief phrase that is a title. Or topic.

New Solr fields: title_notf, topic_notf

Updated boosts (pf is phrase matching):
qf=title_notf^10 topic_notf^7 text
pf=title^10 topic^7

Note that phrase matching still uses the original version of the title and topic fields, because they index term positions. Thus they can score higher when the terms chest and tube appear together as the phrase “chest tube”.

Also, I added a tie=1.0 parameter to the DisMax scoring, so that the total relevancy score of any given record will be the sum of contributing field scores, like I expected in the first place.

total score = max(field scores) + tie * sum(other field scores)

So, lesson learned. Probably. And the lesson has particular importance to me because the vast majority of our clients are libraries, archives or museums who spend time honing their metadata rather than relying on keyword search across masses of undifferentiated text. Must. Respect. Cataloguer.

Further Reading

Getting Dissed by Dismax – Why your incorrect assumptions about dismax/edismax are hurting search relevancy

Title Search: when relevancy is only skin deep

* Doug Turnbull, author of both articles above.

Library and Archives Canada announces launch of 2018 funding cycle for Documentary Heritage Communities Program

by Jonathan Jacobsen Sunday, December 03, 2017 4:45 PM

Library and Archives Canada has announced the launch of the 2018 funding cycle for the Documentary Heritage Communities Program (DHCP). This is the fourth year of a planned 5 year program, with $1.5 million available this year, as in previous rounds.

The DHCP provides financial assistance to the Canadian documentary heritage community for activities that:

  • increase access to, and awareness of, Canada’s local documentary heritage institutions and their holdings; and
  • increase the capacity of local documentary heritage institutions to better sustain and preserve Canada’s documentary heritage.

The deadline for submitting completed application packages is February 7, 2018. 

This program is a great opportunity for archives, museums, historical societies and other cultural institutions to digitize their collections, develop search engines and virtual exhibits, and other activities that preserve and promote their valuable resources.

The program is aimed at non-governmental organizations specifically, including:

  • Archives; 
  • Privately funded libraries; 
  • Historical societies;              
  • Genealogical organizations/societies;  
  • Professional Associations; and 
  • Museums with an archival component.

Businesses, government and government institution (including municipal governments and Crown Corporations), museums without archives, and universities and colleges are not eligible.

Types of projects which would be considered for funding include:

  • Conversion and digitization for access purposes; 
  • Conservation and preservation treatment; 
  • The development (research, design and production) of virtual and physical exhibitions, including travelling exhibits; 
  • Conversion and digitization for preservation purposes; 
  • Increased digital preservation capacity (excluding digital infrastructure related to day-to-day activities); 
  • Training and workshops that improve competencies and build capacity; and 
  • Development of standards, performance and other measurement activities. 
  • Collection, cataloguing and access based management; and 
  • Commemorative projects.

Further program details, requirements  and application procedures are available at http://www.bac-lac.gc.ca/eng/services/documentary-heritage-communities-program/Pages/dhcp-portal.aspx

How can Andornot help?

Many Andornot clients have obtained DHCP grants in previous rounds, and Andornot has worked on many other projects which would qualify for this grant. Some examples are detailed in these blog posts:

We have extensive experience with digitizing documents, books and audio and video materials, and developing systems to manage those collections and make them searchable or presented in virtual exhibits.

Contact us to discuss collections you have and ideas for proposals. We'll do our best to help you obtain funding from the DHCP program!

Omeka S Out of Beta and Ready for Use

by Jonathan Jacobsen Saturday, November 25, 2017 12:05 PM

About this time last year we blogged about a new version of Omeka, Omeka S, entering beta release. Now we're happy to see that a final 1.0 release of Omeka S has just been released.

Omeka is a free, open-source content management system (CMS) for online digital collections. With Omeka, you can quickly build a searchable repository of archival, artifact or other records and assemble them into virtual exhibits to showcase your holdings.

Andornot uses Omeka with select clients and as the basis of our Digital History Hub platform.

Most content management systems are designed to manage a single website with a hierarchy of pages, in which are placed text and other media. In contrast, Omeka is based around items (e.g. historic documents, photographs, audio or video recordings, etc.) which can be arranged into item sets and pages of items. One Item can be used in multiple ways, as part of different exhibits, for example.

An easy-to-use web interface provides site administrators with access to all the important back-end features: configuring the site appearance and navigation, uploading items (individually or in batches, such as from a database export), changing themes, and creating content pages.

Omeka S offers users a brand-new interface and features such as:

  • Manage multiple separate sites from a single installation of Omeka.
  • Build and publish pages, exhibits, or digital stories by adding and mixing different content blocks.
  • Create relationships between your resources - items, item sets, and media.
  • Use importers to bring in content from a spreadsheet or an Omeka Classic site.
  • Geolocate your content and display maps on sites using Mapping.
  • Connect your installation with Fedora and DSpace repositories, with the ability to update content periodically.
  • Use mobile-ready themes to customize the look of each site.

Omeka is a great choice for museums, archives, historical societies and others with cultural collections who want to make their collections searchable online. It's as easy to use for volunteers with little experience as by professional curators, archivists and historians.

More Information:

DB/TextWorks Still A Popular Choice for Teaching in Schools

by Jonathan Jacobsen Saturday, October 28, 2017 7:44 AM

Inmagic DB/TextWorks continues to be a popular software application taught in schools. For example, the Library Technician programs at Langara College and the University of the Fraser Valley in B.C, as well as at the Southern Alberta Institute of Technology, each include it in some of their cases.

Yesterday I had the pleasure of speaking to students in the Library Technologies and Information Management class at Langara College. These budding library techs will learn to create a database for a class project using DB/TextWorks, hopefully with a bit of inspiration from the ideas I was able to share with them.

The image above shows screens from the Andornot Starter Kit, a ready-to-use DB/TextWorks database suitable for a small library.

Not all software has such longevity as DB/TextWorks, but I think this popular app endures because it remains unique in the market. For clients of ours with a modest budget who need to manage diverse kinds of information and don't have programming skills, it remains an excellent choice, once we heavily recommend to many clients.

We see it used in law firms to create and manage databases of experts, memos, precedents, boilerplate documents, corporate archives, and of course a traditional library catalogue. In hospitals, it's used to manage patient education materials, and libraries with a strong circulation component. Elsewhere, we see it used to manage museum artifact collections, archival documents, databases of digitized historic documents and audio-visual recordings. In municipalities, it manages bylaws, real estate development applications, council documents… the list is endless. 

There are many highly-specific database applications available, tailored to the needs of particular organizations (e.g. Inmagic Genie for specialized libraries, Lucidea's Argus for museums, etc.), but few tools that are as easy to use as DB/TextWorks that can be applied to managing any kind of information. Anyone can learn to create a database and snazzy search and edit screens and have a functional, aesthetically pleasing database in a very short time, with little technical aptitude needed. Managing this information is easy with the many built-in, pre-programmed features, such as validation lists, batch modifications, the URL checker, and so on.

Two other long-standing database programs are of course MS Access, included with almost every copy of the MS Office suite, and Apple's FileMaker. The former is practically free and so ubiquitous that many people use it out of necessity, while the latter is quite visually appealing and with many useful features. However, in our experience, both require a higher level of technical skills to really make useful. DB/TextWorks simply has more of the programming already done.

It's reasons like this that cause it to still be an excellent choice in many cases, when budgets and user skills are modest, and thus is well-worthwhile learning to use in a library technician or similar programm. Paired with a search interface like our Andornot Discovery Interface, VuFind, Omeka, or Inmagic Presto, it becomes a perfect back-end to a highly functional front-end, a great combination for managing and searching information.

Contact us to learn more about any of the above, or if you're a school or student and would like a trial version of DB/TextWorks to use.

Month List