Andornot's January 2016 Newsletter Now Available

by Jonathan Jacobsen Monday, January 18, 2016 11:59 AM

Tags: newsletters

The Storebox – an Online Repository of Christian Social Media Usage

by Jonathan Jacobsen Wednesday, January 06, 2016 8:48 AM

The Storebox is a digital repository of interesting, illuminating, best practices of new and social media use by Christian communities. The Storebox highlights what Christian communities and leaders (lay and ordained) are doing with digital technologies to share the gospel (as they understand it), to connect communities, and to envision/incarnate "church" in the digital age.

The Storebox

The Storebox is a project of the New Media Project at the Christian Theological Seminary in Indianapolis. It contains case studies, collections, and exhibits curated by students at Fordham University, in New York City, under the direction of Professor Kathryn Reklis

Using the open-source Omeka content management and virtual exhibit system, Prof. Reklis and her students have built a diverse collection of examples of Christian usage of Facebook, Twitter, Instagram, podcasts, blogs, plain old websites and more.

"I'm drawn to Omeka for its cost-effective means of presenting and organizing content and allowing users to interact with the content in meaningful ways. Also, most of the content will be generated by undergraduate college students, and Omeka seems like an excellent choice in this regard as well." – Prof. Reklis

The site is available at http://omeka.cts.edu 

Andornot developed a custom Omeka theme for this project and tailored it for the specific needs of the project and users.

Contact us to discuss Omeka and other systems for curating and managing digital content.

Using Named Entity Recognition to Generate Searchable Metadata

by Jonathan Jacobsen Tuesday, January 05, 2016 10:53 AM

Ask any librarian and they'll tell you that good metadata makes for a positive and productive search experience for users. Trying to find resources about a historic person or place, produced in a particular time period, and especially about a specific topic, is always more easily achieved when resources have been analyzed and described by a trained professional, with metadata applied from a controlled vocabulary, a process long known as "cataloguing".

Sure, search engines do an ever better job of returning relevant search results based only on the full text of a resource, with little or no metadata, thanks to some pretty sophisticated algorithms. Google is a giant because Google works! And even the Apache Solr search engine in our Andornot Discovery Interface and VuFind is impressive in its ability to parse and return meaningful results from large amounts of non-catalogued, metadata-free text.

But good metadata, applied by a librarian, archivist, curator or other skilled person, is still an even better source of data for a search engine. However, producing it does take time and staff resources. So, many have asked, "what if a computer could help me figure out what this resource is about, who is mentioned in it it, and where and when it takes place? What if the computer could extract the full text as well as metadata from a resource?"

We're very interested in some work being done on this. While automated subject analysis is still challenging, work at Stanford University by a Natural Language Processing group has produced a Named Entity Recognition engine that shows great promise. In a nutshell, this engine does a fine job of reading a passage of text, as long as you like, and finding within it the names of people, organizations and locations. 

Here's an example of a passage of text processed by the engine, with entities identified.

The screenshot shows that the engine did a pretty good job of identifying the names of people, organizations and places. This metadata can be used for increased searching options in a search engine, or fed back into a database for review and editing (as the engine may not always be perfect, there's still a role for professional review).

We're researching the possible uses of this with some of our projects, such as those built from the Andornot Discovery Interface (AnDI). When importing the full text of documents, that text will be run through a Named Entity Recognition engine to generate name and place metadata. For unstructured data, this may provide to be a great means of populating the Names facet, for example.

Stay tuned to this blog for further results, or contact us to discuss your collections and how they could be made more accessible with AnDI and Named Entity Recognition.

New Search Options for the Railroad Museum of Pennsylvania Library and Archives

by Jonathan Jacobsen Friday, November 20, 2015 10:30 AM

The Railroad Museum of Pennsylvania maintains a collection of tens of thousands of resources related to railroading in the Commonwealth of Pennsylvania. The collection is diverse - historical, political, cultural, social, economic, and technological - and emphasizes its development from the 1830s through the present day. Every manner of printed materials is in the collection, from annual reports to timetables, as well as an extensive set of photographs and negatives. A reference library contains books, periodicals, railroad association and union publications, government documents, and trade catalogues.

Public search access has been available for many years through an interface developed by Andornot using our Andornot Starter Kit. However, as with all websites and applications, renewal and refurbishment is necessary every few years, to keep up with technology standards and user expectations. In particular, we noticed that the search logs indicated no records found for many user searches, so we knew that some new features were needed to help users connect to resources.

In 2015, the museum began a project with Andornot to develop a new, modern search engine using the Andornot Discovery Interface (AnDI). This is now available at http://rrmuseumpa.andornot.com 

"We had two primary objectives – to replace an earlier online catalog search system that was sagging under the growing weight of tens of thousands of new records and images, and to make the system more useful to users who have become accustomed to the more intelligent finding systems currently available in so many places on the web. Andornot delivered admirably on both needs." -- James Alexander, Jr., the museum's webmaster and lead on this project.

Large Collection Needs Advanced Search Features

The new search offers users access to over 270,000 records from both the library and archives databases, which were formerly separate. 80,000 of these records have digitized photographs available online. With such a large data set, advanced search features are needed to help researchers uncover resources of interest to them.

AnDI's Apache Solr search engine excels at indexing large data sets. The more records that are available to it, the better it can analyze words and perform frequency analysis on them, one of the many algorithms it uses to deliver relevant results first.

Key to the search process are the facets that allow researchers to narrow their initial search by many criteria, such as the names of railroads, individuals, corporations and other organizations, subjects, geographic places, and dates.

As with all AnDI sites, users can view brief and full records, view photographs in a gallery layout, save records to a list, share search results on social media, and of course, access the site as easily from a tablet or phone as a desktop web browser.

The small selection of videos included in search results are published through the museum's YouTube channel to expose the museum to the widest possible audience. A YouTube player is embedded in search results for playback within the new site.

AnDI Handles Spelling Variations

As is to be expected with such a large collection, entered over many years by a variety of people, spelling variations and typographic errors have crept in. AnDI helps users locate resources despite this, using two key features:

1. The Apache Solr search engine in AnDI is very, very good at parsing terms from records and suggesting correct terms based on what's in the records and what user's search for. These appear in search results as spelling corrections and "Did you mean?" suggestions, which a user may click to try a different search.

2. A synonym list created by museum staff relates correct terms to some of the many variations that appear. 

For example, the New York, Susquehanna & Western Railway appears in around 7,000 records, but with the name Susquehanna spelled at least 11 different ways. Given that searchers may not enter the correct spelling either, the search problem is not trivial! The combination of the synonym list and Solr's other suggestions and corrections helps ensure that no matter how either the data was originally entered, nor how a user searches for it, AnDI can return relevant and complete results.

A video introduction and written search help are both available to introduce users to the site. 

Inmagic DB/TextWorks for Back-End Data Management

Behind the scenes, the museum continues to use Inmagic DB/TextWorks to manage these records. This database management system is invaluable to them in managing metadata, selecting standard metadata from validation lists, and providing access to volunteers for every-day data entry.

The museum's search engine continues to be hosted by Andornot as part of our managed hosting service.

"While Andornot had available a well-built modern search system in AnDI, they spent a lot of time with us learning about our particular users' needs, helping us think through the most useful processes, and refining the search experience. They know the business of both managing records internally and helping users find what they need. 

In the process two things happened – we learned more about the strengths and weaknesses in our data entry processes, and the usefulness and public recognition of our holdings were enhanced through improved web access.  The search help video was a real plus, and they worked with us in making our search page both functional and attractive." – James Alexander, Jr.

We're very pleased to continue our work with the Railroad Museum of Pennsylvania. Contact us to discuss upgrades and search options for your museum collections.

Andornot Authentication Manager: Perfect for Managing Access to Online Content

by Jonathan Jacobsen Wednesday, November 18, 2015 7:25 AM

While there is an ongoing move towards open access content, open data and open source applications, some of our clients do manage subscription-based resources. In some cases a subset of their information resources are available to the public, while more detailed information requires a paid membership. In other cases, all access requires some sort of login.

For these clients, and for anyone wanting to manage access to web-based information resources, we offer the Andornot Authentication Manager.

This web application allows you to limit access to your website content and search applications based on:

  • usernames and passwords;
  • IP addresses (single or ranges); and
  • referring URLs (i.e. incoming links from an intranet or subscriber-only site).

The Authentication Manager controls what a user can see or access based on their role. For example, the general public may have search-only access to brief records, whereas a logged in user can view a full record, access full text or original digitized content, submit requests or orders, etc. This flexibility is perfect for subscription-based sites and services.

The Authentication Manager is designed to work specifically with the Andornot Discovery Interface and Andornot Starter Kit for Inmagic WebPublisher PRO, but can be adapted to other web applications. 

Other features include:

  • account and group profile management;
  • detailed reporting of site access, by account and time period; and
  • subscription and account expiration management.

The Authentication Manager is a .net web application, so designed for Windows Servers and web applications that run on them. The interface is accessible from both desktop and mobile browsers.

Contact us to learn more about the Andornot Authentication Manager.

Month List