How To Import Excel Data into VuFind

by Jonathan Jacobsen Tuesday, January 08, 2019 4:52 PM

Recently we had a new client come to us looking for help with several subscription-based VuFind sites they manage, and ultimately to have us host them as part of our managed hosting service. This client had a unique challenge for us: 3 million records, available as tab-separated text files of up to 70,000 records each.

Most of the data sets we work with are relatively small: libraries with a few thousand records, archives with a few tens of thousands, and every so often, databases of a few hundred thousand, like those in the Arctic Health bibliography.

While VuFind and the Apache Solr search engine that powers it (and also powers our Andornot Discovery Interface) have no trouble with that volume of records, transforming the data from hundreds of tab-separated text files into something Solr can use, in an efficient manner, was a pleasant challenge.

VuFind has excellent tools for importing traditional library MARC records, using the SolrMarc tool to post data to Solr. For other types data, such as records exported from DB/TextWorks databases, we’ve long used the PHP-based tools in VuFind that use XSLTs to transform XML into Solr's schema and post it to Solr. While this has worked well, XSLTs are especially difficult to debug, so we considered alternatives.

For this new project, we knew we needed to write some code to manipulate the 3 million records in tab-separated text files into XML, and we knew from our extensive experience with Solr that it's best to post small batches of records at a time, in separate files, rather than one large post of 3 million! So we wrote a python script to split up the source data into separate files of about 1,000 records each, and also remove invalid characters that had crept in to the data over time (this data set goes back decades and has likely been stored in many different character encodings on many different systems, so it's no surprise there were some gremlins).

Once the script was happily creating Solr-ready XML files, rather than use VuFind's PHP tools and an XSLT to index the data, it just seemed more straightforward to push the XML directly to Solr. For this, we wrote a bash shell script that uses the post tool that ships with Solr to iterate through the thousands of data files and push each to Solr, logging the results.

The combination of a python script to convert the tab-separated text files into Solr-ready XML and a bash script to push it to Solr worked extremely well for this project. Python is lightning fast at processing text and pushing data directly to Solr is definitely faster than invoking XSLT transformations.

This approach would work well for any data. Python is a very forgiving language to develop with, making it easy and quick to write scripts to process any data source. In fact, since this project, we've used Python to manipulate a FileMaker Pro database export for indexing in our Andornot Discovery Interface (also powered by Apache Solr) and to harvest data from the Internet Archive and Online Archive of California, for another Andornot Discovery Interface project (watch this blog for news of both when they launch).

We look forward to more challenges like this one! Contact us for help with your own VuFind, Solr and similar projects.

Andornot's November 2018 Newsletter Available: News, Tips and Tricks for Libraries, Archives and Museums

by Jonathan Jacobsen Monday, November 19, 2018 7:35 AM

Tags: newsletters

How To Create a DB/TextWorks Menu Screen

by Jonathan Jacobsen Thursday, November 15, 2018 10:36 AM

For some reason, DB/TextWorks menu screens are a little used feature. We often meet clients with many databases, but without a convenient way of seeing and accessing them all at a glance. Adding a menu screen to DB/TextWorks is quick and easy to do, but makes using your databases so much easier.

The screenshot above shows the menu screen from our Andornot Library Kit, with links to each of the many databases it includes. The one below shows one from one of our clients' systems.

What is a Menu Screen?

Like a Query Screen or Report Form in a DB/TextWorks database, a Menu Screen is a screen layout you create using the WYSIWYG designer in DB/TextWorks. You would usually add to it links to each of your databases, for searching or data entry. You might also add your organization's name or logo, contact or support info for anyone who might be using the system, a brief description of each database, etc.

Having links to all your databases on a single screen saves time and helps new users find their way around your collection of databases without having to hunt for them in folders on disk. It also allows you to specify, in each link to a database, which query screen and reports to load for that database. 

One way to create menu screens is to have different menu screens for different kinds of users. For example, in an archives or museum that relies on volunteers to help with data entry, you could have a menu screen for volunteers that only lists the Accessions database, and pre-loads a simpler query screen and data entry form designed specifically for volunteers. A more extensive menu could provide the archivist or curator with links to all databases, pre-loading the more sophisticated query and edit screens for their use.

Unlike a Query Screen or Report Form, the menu screen isn't stored in any one database, but rather as a separate file on disk (with a .tbm or .cbm extension). You would usually store it in the same folder as all your database files.

How do I create a Menu Screen?

  1. Open DB/TextWorks but don't open a database.
  2. Select Menu Screens > Design from the main menu.
  3. Choose "Create a New Menu Screen File."
  4. Browse to the folder where your databases are stored to save the menu screen in the same location, and give it a name.
  5. In the WYSIWYG Menu Screen Designer, you may now add links to textbases, your organization's name or logo, and other information. Use the examples above for ideas, or come up with your own design.
  6. To add links to textbases, choose Edit > Add > Textbase box.
  7. In the Textbase Properties Dialogue, select the textbase to link to, then on the Initial Elements tab, pre-select the query screen and forms to use by default. Note that these override the default screens and forms set in the textbase, and that in either case, users may still change to other screens and forms once they are in the database.
  8. On the Initial Action tab, be sure to select which window to open. For example, if your link is one such as "Search the Database", select a Query Window. If your link is "Add a New Record", select Edit New Record as the window to open.
  9. Save your new menu screen when your design is complete.
  10. If you ever create more than one menu screen, you can even add links from one to another on each of them.

How do I use a Menu Screen?

  1. On each PC that has DB/TextWorks, open DB/TextWorks but don't open a database.
  2. Select Menu Screens > Select from the main menu.
  3. Choose "Use the Menu Screen in a File", then browse to and select the Menu Screen file (ending with .tbm or .cbm) that you created earlier, usually stored in the same folder as your databases.
  4. Close and re-start DB/TextWorks and your menu screen will now automatically load, ready for use.

See this blog post from earlier this week about two other helpful but little used features of DB/TextWorks: Sets and Record Skeletons.

Andornot's Professional Development Grant Available for 2019

by Jonathan Jacobsen Tuesday, November 06, 2018 8:20 AM

Andornot strongly believes in the value of attending conferences to foster professional development.

Two years ago, we introduced the Andornot Professional Development Grant, a new, annual grant to help you attend a conference or event as part of your ongoing professional development activities. Of the many excellent applications we received each year, Mark Goodwin of the BC Cancer Agency was selected in 2016, and Gayle Graham of the Nova Scotia Health Authority received the grant in 2017. Both used the funds to help them attend the Canadian Health Libraries Association Conference.

We are very pleased to be able to offer the grant again this year, to help you attend an event in 2019.

One grant of up to $1,000 is available, with an application deadline of January 31st, 2019. The funds can be used for registration or travel related expenses. We hope that this grant will provide an opportunity for someone without access to funds from their organization to network and enrich their knowledge.

To apply, just complete this application form. The recipient will be selected in early February, allowing time to register for a conference at any point in the year.

Tags: funding

Library and Archives Canada announces launch of 2019 funding cycle for Documentary Heritage Communities Program

by Jonathan Jacobsen Monday, October 22, 2018 11:44 AM

Library and Archives Canada has announced the launch of the 2019 funding cycle for the Documentary Heritage Communities Program (DHCP). This is the fifth round of what was originally envisioned as a five year program, so could potentially be the final year.

The DHCP provides financial assistance to the Canadian documentary heritage community for activities that:

  • increase access to, and awareness of, Canada’s local documentary heritage institutions and their holdings; and
  • increase the capacity of local documentary heritage institutions to better sustain and preserve Canada's documentary heritage.

The deadline for submitting completed application packages is January 8, 2019.

This program is a great opportunity for archives, museums, historical societies and other cultural institutions to digitize their collections, develop search engines and virtual exhibits, and other activities that preserve and promote their valuable resources.

There are a number of significant changes this year:

  • The upper limit of funding for a small project has increased to $24,999. Many of the projects Andornot helps with would fall into this range.
  • Organizations which receive up to half their funding from government sources are now eligible.

Types of projects which would be considered for funding include:

  • Conversion and digitization for access purposes; 
  • Conservation and preservation treatment; 
  • The development (research, design and production) of virtual and physical exhibitions, including travelling exhibits; 
  • Conversion and digitization for preservation purposes; 
  • Increased digital preservation capacity (excluding digital infrastructure related to day-to-day activities); 
  • Training and workshops that improve competencies and build capacity; and 
  • Development of standards, performance and other measurement activities. 
  • Collection, cataloguing and access based management; and 
  • Commemorative projects.

Lists of the grants and recipients in the previous four rounds of funding are available here and may help you as you think about your own application.

Further program details, requirements and application procedures are available at http://www.bac-lac.gc.ca/eng/services/documentary-heritage-communities-program/Pages/dhcp-portal.aspx

How can Andornot help?

Many Andornot clients have obtained DHCP grants in previous rounds, and Andornot has worked on many other projects which would qualify for this grant. Some examples are detailed in these blog posts:

We have extensive experience with digitizing documents, books and audio and video materials, and developing systems to manage those collections and make them searchable or presented in virtual exhibits.

Contact us to discuss collections you have and ideas for proposals. We'll do our best to help you obtain funding from the DHCP program!

Also also check out a few other grants that are open this fall in this blog post: "Grants with Fall 2018 Application Deadlines"

Tags: funding

Month List