Our Favourite Omeka Plugins

by Jonathan Jacobsen Tuesday, June 27, 2017 8:54 AM

At Andornot, we're big fans of the Omeka web publishing and content management platform, as a low cost, easy, simple way to get historic, cultural or other content online. Why, we've even launched a whole website dedicated to it: Digital History Hub !

One of Omeka's many strengths is the selection of plugins that add all sorts of extra features. By our count, there are over 90 of them. Most are listed here and here, but we've found a few others around the web too. Some of the plugins are older and not as actively supported as others, or serve only a very specific purpose, or are not of use to very many Omeka users.

We've reviewed and tried them almost all of them, though, and present here our most highly recommended ones. These are plugins that, in our view, should be added to almost every Omeka site as they are each so useful and so likely to appeal to a wide array of Omeka users. About half are helpful for Omeka site administrators, while the other half offer new features in the public side.

Learn more about each plugin by clicking its name here: http://omeka.org/add-ons/plugins/ and then the More Info link.

Plugin NameDescription and Andornot Comments
Admin Images Allows administrators to upload images not attached to items for use in carousels and simple pages. Very handy.
Bulk Metadata Editor Adds search and replace functionality, allowing administrators to update metadata fields over many records quickly and easily.
CSV Import Imports items, tags, and files from CSV files. Great when you have data in another database, such as Inmagic DB/TextWorks and don't want to re-key it into Omeka.
Derivative Images Recreate (or create) derivative images (e.g. thumbnails). Handy when the initial size set proves to be too large or too small for the selected theme. Saves re-uploading each image.
Exhibit Builder Build rich exhibits using Omeka. See jpl-presents.org for an Omeka site that uses exclusively exhibits to present content.
HTML5 Media Enables HTML5 for media files using MediaElement.js, to allow streaming playback. Great for sites with audio and video recordings.
Google Analytics A small plugin to include Google Analytics JavaScript code on pages. Everyone should want to know how much traffic their site gets!
Search By Metadata Allows administrators to configure metadata fields to link to items with same field value (e.g. click a Subject link to view all records with that same Subject).
Simple Contact Form Adds a simple contact form for users to contact the administrator. Be sure to configure the RECAPTCHA anti-spam feature too. Requires mail sending ability on the server, but a nice alternative to just listing an email address.
Simple Pages Allows administrators to create additional web pages for their public site. In our view, every site should have at least some sort of About page with more information about the site, who created it, etc.
Sitemap 2 This Omeka 2.0+ plugin provides a persistent url for a dynamically generated XML Sitemap, for SEO purposes. With this enabled, create a Google Webmaster account (and similar one in Bing) to feed your site into these search engines.
Social Bookmarking Uses AddThis to insert a customizable list of social bookmarking sites on each item page. Great for helping users share your items on Twitter, Pinterest, Facebook, Google+, etc.

All of the plugins above are installed and ready to use in every site built through our Digital History Hub.

The next list of plugins below are those which we think are quite useful, on a case-by-case basis. We make them available in every Digital History Hub Omeka site, for the site owner to install, configure and use if it suits their needs, their data and their audience.

Plugin NameDescription and Andornot Comments
Commenting Allows commenting on Items, Collections, Exhibits, and more. Most useful for gathering feedback from other site administrators, in our view. Consider Disqus instead for public comments (Note: there is an older Disqus plugin, but it may need updating).
Contribution Allows collecting items from visitors. Great for engaging the community and gathering additional contributions to a site. Requires the Guest User plugin.
Contributor Contact Supplies administrators with tools to contact contributors in bulk. Complements the above Contribution plugin.
CSS Editor Add public CSS styles through the admin interface. Useful when you don't have access to the theme's CSS files and want to make some minor adjustments.
Geolocation Adds location info and maps to Omeka. Who doesn't love browsing a map as a way of discovering resources!
Getty Suggest Enable an autosuggest feature for Omeka elements using the Getty Collection controlled vocabularies. Could be quite useful for art and architectural items, as well as place names.
Guest User Adds a guest user role. Can't access the backend administrative interface, but allows plugins such as Contribution to use an authenticated user.
Hide Elements Hide admin-specified metadata elements. Great when you really don't need even the 15 Dublin Core elements and have, perhaps, volunteers performing data entry – makes it even simpler for them.
PDF Embed Embeds PDF documents into item and file pages. Very useful if you have these in your Omeka collection.
Simple Vocab A simple way to create controlled vocabularies, such as keywords or subjects, for consistent data entry. Works best with small-ish vocabularies.
Simple Vocab Plus A fuller featured option for controlled vocabularies with auto suggest.

Visit our Digital History Hub site for more information on Omeka and low-cost hosting plans, or contact us for help getting an Omeka site up, or for adding these or other plugins to an existing one.

And watch this blog for more in-depth posts about select plugins. Next up is a step-by-step guide to exporting data from an Inmagic DB/TextWorks database, then batch importing it into Omeka.

Tags: Omeka

Tips for Scaling Full Text Indexing of PDFs with Apache Solr and Tika

by Peter Tyrrell Friday, June 23, 2017 1:21 PM

We often find ourselves indexing the content of PDFs with Solr, the open-source search engine beneath our Andornot Discovery Interface. Sometimes these PDFs are linked to database records also being indexed. Sometimes the PDFs are a standalone collection. Sometimes both. Either way, our clients often want to have this full-text content in their search engine. See the Arnrpior & McNab/Braeside Archives site, which has both standalone PDFs and PDFs linked from database records.

Solr, or rather its Tika plugin, does a good job of extracting the text layer in the PDF and most of my efforts are directed at making sure Tika knows where the PDF documents are. This can be mildly difficult when PDFs are associated with database records that point to the documents via relative file paths like where\is\this\document.pdf. Or, when the documents are pointed to with full paths like x:\path\to\document.pdf, but those paths have no meaning on the server where Solr resides. There are a variety of tricks which transform those file paths to something Solr can use, and I needn't get into them here. The problem I really want to talk about is the problem of scale.

When I say 'the problem of scale' I refer to the amount of time it takes to index a single PDF, and how that amount—small as it might be—can add up over many PDFs to an unwieldy total. The larger the PDFs are on average, the more time each unit of indexing consumes, and if you have to fetch the PDF over a network (remember I was talking about file paths?), the amount of time needed per unit increases again. If your source documents are numbered in the mere hundreds or thousands, scale isn't much of a problem, but tens or hundreds of thousands or more? That is a problem, and it's particularly tricksome in the case where the PDFs are associated with a database that is undergoing constant revision.

In a typical scenario, a client makes changes to a database which of course can include edits or deletions involving a linked PDF file. (Linked only in the sense that the database record stores the file path.) Our Andornot Discovery Interface is a step removed from the database, and can harvest changes on a regular basis, but the database software is not going to directly update Solr. (This is a deliberate strategy we take with the Discovery Interface.) Therefore, although we can quite easily apply database (and PDF) edits and additions incrementally to avoid the scale problem, deletions are a fly in the ointment.

Deletions from the database mean that we have to, at least once in a while (usually nightly), refresh the entire Solr index. (I'm being deliberately vague about the nature of 'database' here but assume the database does not use logical deletion, but actually purges a deleted record immediately.) A nightly refresh that takes more than a few hours to complete means the problem of scale is back with us. Gah. So here's the approach I took to resolve that problem, and for our purposes, the solution is quite satisfactory.

What I reckoned was: the only thing I actually want from the PDFs at index-time is their text content. (Assuming they have text content, but that's a future blog post.) If I can't significantly speed up the process of extraction, I can at least extract at a time of my choosing. I set up a script that creates a PDF to text file mirror.

The script queries the database for PDF file paths, checks file paths for validity, and extracts the text layer of each PDF to a text file of the same name. The text file mirror also reflects the folder hierarchy of the source PDFs. Whenever the script is run after the first time, it checks to see if a matching text file already exists for a PDF. If yes, the PDF is only processed if its modify date is newer than its text file doppelgänger. It may take days for the initial run to finish, but once it has, only additional or modified PDFs have to be processed on subsequent runs.

Solr is then configured to ingest the text files instead of the PDFs, and it does that very quickly relative to the time it would take to ingest the PDFs.

The script is for Windows, is written in PowerShell, and is available as a Github gist.

Tags: PowerShell | Solr | Tika

Andornot's June 2017 Newsletter Available: News, Tips and Tricks for Libraries, Archives and Museums

by Jonathan Jacobsen Thursday, June 22, 2017 8:54 AM

Andornot's June 2017 Newsletter has been emailed to subscribers and is available to read here, with news, tips and tricks for libraries, archives and museums.

 

In This Issue

Andornot News

Andornot's Latest Projects

Tips, Tricks and Ideas

Other News

Tags: newsletters

Richmond Archives Adds Name Origins Resource to Online Search

by Jonathan Jacobsen Tuesday, June 06, 2017 9:51 AM

I live in Richmond, part of the Metro Vancouver Regional District, and have an interest in local history, so I was particularly interested when Andornot was asked by the City of Richmond Archives to help with a project on the origins of Richmond place names. 

The City of Richmond Archives is a long time user of Inmagic DB/TextWorks for managing their collections, and were instrumental in developing the set of linked databases that became our Andornot Archives Starter Kit. Over the past couple years we’ve helped the Archives upgrade their Inmagic WebPublisher-based online search system, which is available at http://archives.richmond.ca/archives/descriptions/ 

The new Name Origins search, available at http://archives.richmond.ca/archives/places/ features almost 500 records (and growing) that document and describe the history of Richmond streets, roads, bridges, neighbourhoods, and other landmarks. It’s easy to search by keyword or by type of place, and whenever possible, a Google map of the named place is shown. This database is updated by the Friends of the Richmond Archives, volunteers with a passion for local history. Launching this new database online was made possible through the Richmond Canada 150 Community Celebration Grant Allocations. 

As I worked in the web search interface to the database, I couldn’t help but search for places in my neighbourhood and around Richmond, and become captivated by the history of them. Now community members can access this information 24-7 and learn the history behind the names of streets, areas, and landmarks in their community.

Contact Andornot for options for your Inmagic databases and for search engines and other software to make your collections accessible online.

Month List