King Institute at Stanford University Upgrades DB/TextWorks Archival Solution

by Jonathan Jacobsen Wednesday, January 20, 2016 8:44 AM

The Martin Luther King, Jr. Institute at Stanford Universityuses DB/TextWorks to manage an archival database of tens of thousands of speeches, sermons, letters, and other documents by and about Martin Luther King, Jr.

Known as OKRA (Online King Records Access), the database includes descriptive information as well as holdings details for these resources held at repositories all over the United States. Digitized audio and video recording are also available.

A web-based data entry system had been developed many years ago, using Inmagic WebPublisher PRO. However, advances in web technologies had resulted in problems with this interface. A decision was made to replace it with a purely DB/TextWorks-based solution.

Andornot upgraded the query screens, reports and data entry forms in the main database to ones based on our Andornot Starter Kit, for improved usability. Script-buttons were included to assist with searching and editing records, and scripts and other validation was added to the data entry forms to aid in looking up information in other databases, and to restrict some user groups' access the database. Secondary databases were converted to thesauri so that they could be used as validation lists, but with multiple selections possible in a field and record in the main database.

Online search access using Inmagic WebPublisher PRO remains available at http://okra.stanford.edu though this interface may also be upgraded in the near future.

Please contact Andornot for assistance with your Inmagic-based databases, and similar projects.

Andornot's January 2016 Newsletter Now Available

by Jonathan Jacobsen Monday, January 18, 2016 11:59 AM

Tags: newsletters

The Storebox – an Online Repository of Christian Social Media Usage

by Jonathan Jacobsen Wednesday, January 06, 2016 8:48 AM

The Storebox is a digital repository of interesting, illuminating, best practices of new and social media use by Christian communities. The Storebox highlights what Christian communities and leaders (lay and ordained) are doing with digital technologies to share the gospel (as they understand it), to connect communities, and to envision/incarnate "church" in the digital age.

The Storebox

The Storebox is a project of the New Media Project at the Christian Theological Seminary in Indianapolis. It contains case studies, collections, and exhibits curated by students at Fordham University, in New York City, under the direction of Professor Kathryn Reklis

Using the open-source Omeka content management and virtual exhibit system, Prof. Reklis and her students have built a diverse collection of examples of Christian usage of Facebook, Twitter, Instagram, podcasts, blogs, plain old websites and more.

"I'm drawn to Omeka for its cost-effective means of presenting and organizing content and allowing users to interact with the content in meaningful ways. Also, most of the content will be generated by undergraduate college students, and Omeka seems like an excellent choice in this regard as well." – Prof. Reklis

The site is available at http://omeka.cts.edu 

Andornot developed a custom Omeka theme for this project and tailored it for the specific needs of the project and users.

Contact us to discuss Omeka and other systems for curating and managing digital content.

Using Named Entity Recognition to Generate Searchable Metadata

by Jonathan Jacobsen Tuesday, January 05, 2016 10:53 AM

Ask any librarian and they'll tell you that good metadata makes for a positive and productive search experience for users. Trying to find resources about a historic person or place, produced in a particular time period, and especially about a specific topic, is always more easily achieved when resources have been analyzed and described by a trained professional, with metadata applied from a controlled vocabulary, a process long known as "cataloguing".

Sure, search engines do an ever better job of returning relevant search results based only on the full text of a resource, with little or no metadata, thanks to some pretty sophisticated algorithms. Google is a giant because Google works! And even the Apache Solr search engine in our Andornot Discovery Interface and VuFind is impressive in its ability to parse and return meaningful results from large amounts of non-catalogued, metadata-free text.

But good metadata, applied by a librarian, archivist, curator or other skilled person, is still an even better source of data for a search engine. However, producing it does take time and staff resources. So, many have asked, "what if a computer could help me figure out what this resource is about, who is mentioned in it it, and where and when it takes place? What if the computer could extract the full text as well as metadata from a resource?"

We're very interested in some work being done on this. While automated subject analysis is still challenging, work at Stanford University by a Natural Language Processing group has produced a Named Entity Recognition engine that shows great promise. In a nutshell, this engine does a fine job of reading a passage of text, as long as you like, and finding within it the names of people, organizations and locations. 

Here's an example of a passage of text processed by the engine, with entities identified.

The screenshot shows that the engine did a pretty good job of identifying the names of people, organizations and places. This metadata can be used for increased searching options in a search engine, or fed back into a database for review and editing (as the engine may not always be perfect, there's still a role for professional review).

We're researching the possible uses of this with some of our projects, such as those built from the Andornot Discovery Interface (AnDI). When importing the full text of documents, that text will be run through a Named Entity Recognition engine to generate name and place metadata. For unstructured data, this may provide to be a great means of populating the Names facet, for example.

Stay tuned to this blog for further results, or contact us to discuss your collections and how they could be made more accessible with AnDI and Named Entity Recognition.

Month List