How to Import Data from Inmagic DB/TextWorks into Omeka

by Jonathan Jacobsen Monday, July 03, 2017 7:40 AM

Last week we published a blog post on our favourite Omeka plugins. This week we focus on one in particular, the CSV Import plugin. This plugin is included in every site hosted through Digital History Hub, our low-cost Omeka hosting platform.

One of Omeka's many strengths is the built-in data entry screens, based on Dublin Core fields. While there's a small learning curve to understanding DC, once mastered, it provides just the right set of metadata to describe anything you might want to put in an Omeka site, whether an artifact, photograph, document, map, etc.

But what if you already have a database of this sort of information and want to publish most or all of it in an Omeka site? Perhaps you're using the ever-popular Inmagic DB/TextWorks database management system, but don't yet have your records searchable online, or want to use Omeka's Exhibit Builder plug-in to mount an online virtual exhibit featuring a portion of your collection. Re-entering all that metadata into Omeka one record a time would be onerous. This is where the CSV Import plug-in comes in!

As the name implies, this plugin allows you to quickly import many records in a batch from a text file. You simply choose a suitable text file, map fields from your source into Omeka's Dublin Core schema, set a few other values and very quickly your records will be available in Omeka for review, further editing or simply ready for searching. The only main feature missing from this plugin is the ability to import PDFs, documents, photos and other media files that are saved locally on your computer or network. To bulk import these files, they need to be accessible on a web server with a URL to the file in your database. Note that this may not be as challenging to set up as you may think; there are always ways to work around issues like this, so don't hesitate to contact us for help.

Here's a step by step guide to using this plug-in with DB/TextWorks and Omeka. The procedure for exporting data from other databases will vary of course, but the principles remain the same. As always, do contact us for help !

Mapping Fields

Start by reviewing Omeka's Dublin Core fields on the Item entry screen and think about where data from your database should go. 

You may want to prepare a simple two column list mapping fields from your data source into the Dublin Core fields, like this:

DB/TextWorks Field Name Omeka Dublin Core Field Name
Title Title
Material Type Format
Author Creator
Corporate Author Creator
Publication Date Date
ISBN Identifier

etc.

You don't need to populate every Omeka DC field of course, just the ones that make sense for your data. And you can merge multiple fields from your database into one Dublin Core field in Omeka. To learn more about each DC field, read the brief note on the Omeka data entry screen, or visit http://dublincore.org/documents/dces/ for more detailed information.

Note that there is also a plugin called Dublin Core Extended Fields which adds even more fields. If you have a particularly complex database and feel the need to preserve and fully represent all or most fields, this might be for you. In our view, though, keeping things simple is better, and was precisely why DC was developed, to have a brief, common set of fields that could be used to describe almost anything.

Choosing Data to Export

When you get to the step of importing records into Omeka, you have the option of assigning one Item Type to all incoming records, and only one. The Item Type determines which additional metadata elements are available when editing the record. For example, the "Still Image" Item Type adds fields for Original Format and Physical Dimensions. If your source data contains information that is available in these extended fields and you wish to import it, or add it after by editing imported records in Omeka, you may wish to export records in groups by Item Type. E.g. all "still images", then all "Moving Images", etc. You can then import these in batches and specify the correct Item Type for each. The additional fields specific to that Item Type will then be available for import from your source data.

Exporting From DB/TextWorks

If your data contains special characters like accented letters or letters from outside the Latin alphabet, the file must be encoded as UTF-8 for Omeka to import it correctly. DB/TextWorks offers several text encoding options, so before exporting data, choose Tools > Options > Text Encoding and under "Output file encoding", choose the UTF-8 option (applies to v15.0 or later of DB/TextWorks).

To export a selection of records, search for them first, then select File > Export. 

Save the file somewhere handy, with a .txt or .csv extension. 

In the Export Options dialogue, make the following choices:

Export File Format: Delimited ASCII

Delimiter options:

Record Separator {CR}{LF}

Entry Separator |

Quote Character "

Field Separator , (only commas are supported for import)

Select the "Store Field Names in First Row" option

If any of your fields are of the type Rich Text, be sure to export those as HTML. That HTML can be preserved during the import to Omeka by selecting the HTML option for the field on Step 2 of the import (see below).

Records to Export: choose to export either the records you searched for with "Export Current Record Set" or the entire database with "Export Entire Textbase"

Fields to Export: select only those fields that you included in your field mapping

Optionally you can save these options as a profile for re-use again later.

Complete the export and note how many records were exported (so you can verify that the same number are imported into Omeka).

Importing Data into Omeka

With the export to a comma-separated text file complete, login to your Omeka site and select the CSV Import option in the menu. If that option isn't available, you'll need to install and activate this plugin first.

In Step 1 of the CSV Import, select your exported data file, then set the following options on this page:

If your database field names happen to be identical to those in Omeka and have “DublinCore” in their names (e.g. DublinCore:Title), you can select the Automap Column Names to Elements option. For all others (most of you!), deselect this option.

If importing different types of records in batches, select the Item Type appropriate to each batch.

Choose the following delimiters to match your export from DB/TextWorks:

Column Delimiter , (matches the Field Separator in the DB/TextWorks export)

Tag Delimiter | (matches the Entry Separator in the DB/TextWorks export)

File Delimiter | (matches the Entry Separator in the DB/TextWorks export)

Element Delimiter | (matches the Entry Separator in the DB/TextWorks export)

Optionally, choose to assign all items to a Collection or make all items Public. 

If you're importing a large number of records, you probably don't want to Feature all of them, as it's more common to select a small set of Items to feature on the home page of Omeka.

Continue to the next step.

In Step 2, you will select the Omeka DC fields into which your data source fields will be imported, using your field mapping as a guide. 

Click the Use HTML checkbox if this data includes HTML markup (e.g. if it's a Rich Text Format field in DB/TextWorks and during export, you included that field and chose to export it as HTML).

For source fields which contain tags, select the Tags option instead of selecting a field to import the data to.

For source fields which contain URLs to files, select the Files option instead of selecting a field to import the data to. This will cause the import to fetch those files and add them to Omeka. Fetching many large files will take quite a while, so if this is your very first import, you might be best to try importing just a small data set with or even without this files option, to work out kinks in your whole procedure.

Reviewing Imported Data

If you imported a small number of records, you can review each one. If you imported a large number, you may wish to spot check a random sample, to make sure all the data ended up where you expected it, that records are public or not, featured or not, in a collection or not, etc.

If there are problems, the Undo Import feature is your new best friend. Find it back in the CSV Import plugin and use it to remove the records just imported.

Need Help?

Need help with any of this? Contact Andornot and we'll be glad to work with you on this.

 

 

How to Add a Unique Record ID and Date to an Existing Database

by Jonathan Jacobsen Wednesday, May 28, 2014 6:09 PM

The principles of good database design suggest that every record in a database should have something that uniquely identifies it and distinguishes it from every other record. In a small library catalogue, it may be that every title is different from every other, but in any larger database, there's a good chance there will be more than one very, very similar record. 

When it comes time to do more with this information, such as import it into Inmagic Presto or the VuFind discovery interface, or clean-up some older records, it's imperative that every record be able to be uniquely identified.

It's also good practice to have fields that track the date a record was created and the date it was last modified, for similar reasons.

It's very easy to create these fields in your database structure, whether in Inmagic DB/TextWorks or another system, when you are first building the database. But what can you do if you already have a database with thousands of records, and no unique record identifier or dates created or modified? With DB/TextWorks, it's very easy to add this information even after the database was built and populated. 

There are four main steps to this:

  • Backup all your existing records, just in case of a problem.
  • Verify that these fields don't already exist.
  • Add the fields to the database structure.
  • Populate the fields with data.

Backup the Database

1. Open DB/TextWorks but don't open a database. Select Manage Textbases > Dump Textbase. 

2. Browse to the location of your database and select it. 

3. Next, pick a folder to store the export in and give it a meaningful name.

4. When the export is complete, you'll now have a backup you can reload, just in case anything happens.

Verify Fields

5. Open the database in DB/TextWorks and, if it has password protection, be sure to sign in with the Master password, so you have full access.

6. Select Maintain > Edit Textbase Structure.

7. In the Edit Textbase Structure window, scroll through the list of fields to check that you don't already have a unique ID and/or dates created or modified fields. You may be able to identify them by Name (e.g. RecordID, RecordCreated, RecordModified) or by Type (e.g. Automatic Number, Automatic ID, Automatic Date).

Add Fields

8. If none are found, select Edit Fields. You will now enter information to create the new fields listed below. Be sure to select Add rather than Change after entering information for each new field.

Field Name RecordID
Field Type Automatic Number
Other Settings By default, the automatic numbering will start at 1 and increment by 1; there's usually no reason to change this.
Field Name RecordModified
Field Type Automatic Date
Other Settings Update Entry: When Record is Created or Modified.
Date Components and Format: you may choose whether to include the time, or only the date, and which format. We usually select Date Only and Short Date Format, unless your data will be imported into Presto, in which case Date and Time is best.
Field Name RecordCreated
Field Type Automatic Date
Other Settings Update Entry: When Record is Created.
Date Components and Format: you may choose whether to include the time, or only the date, and which format. We usually select Date Only and Short Date Format, unless your data will be imported into Presto, in which case Date and Time is best.

9. With all of the information entered, click Finish and OK to save your changes to the database structure.

Add Data to Existing Records

10. Select Search > Find All Records to retrieve all records from the database. This may take some time, but you can click the Stop button in the bottom status bar of DB/TextWorks to stop the formatting on the screen.

11. When the records have finished loading, select Records > Batch Modify. The trick now is to make a perfectly innocuous change to every record in your database. As DB/TextWorks goes through every one, since the database now has Automatic Number and Automatic Date fields, they will automatically be populated. 

If you happen to have an unused field in your database, you could populate it with some value, and then remove that again in a second batch modification.

Or, you could add a new field and use batch modification to populate it with some value. You can then either leave it as-is, perform another batch modification to delete the data from the field, or delete the field altogether.

The batch modification feature in DB/TextWorks is quite sophisticated and an explanation of it would expand this blog post to unwieldy dimensions. However, the help built-in to DB/TextWorks explains all the options, including new features added to recent releases. 

12. When the batch modification is complete, every record will now have a unique identifier and the date the record was last modified. To verify, search for some records and view the ID and dates.

13. Now that you have a unique ID, you can configure DB/TextWorks to record it in the textbase log file every time a change is made to a record. This has proven invaluable to several clients this month when accidental changes were made to records. We were able to use the log to undo those changes. To configure your textbase log, select Maintain > Edit Textbase Structure again, then Log File. Select your RecordID field as the First Identifying Field, and something like title as the Second.

If you need any help with the above, just let us know.

Grey literature for the Third Sector

by Denise Bonin Wednesday, October 13, 2010 10:52 AM

Grey literature, fugitive publications, the hidden web; it sounds all very mysterious doesn’t it? threeSOURCE-logoWhere are these resources? How can they be found? Well for the folks in Alberta – and because it is on the Internet, for the rest of the world – this previously concealed material in the non-profit and social services “third” sector now has a home at threeSOURCE: http://www.threesource.ca. See the press release here.

Database

The database that forms the basis of the site contains a vast quantity of grey literature from groups such as the Alberta Federation of Labour, Alberta Status of Women Action Committee, Family Service Association of Edmonton, Calgary Status of Women Action Committee, and Families First Edmonton. It also contains the ESPC catalogue collection, which during the course of the project was converted from another system, L4U, using the MARC Transformer, into Inmagic DB/TextWorks.

Jennifer Hoyer from the Edmonton Social Planning Councilspearheaded this new website with funding assistance from Alberta Culture and Community Spirit and the Edmonton Community Foundation.

“There is currently no central location, either physically or virtually, for accessing publications created within or about this field of work.  People working within the third sector – in social services and nonprofits – are notoriously short on time when it comes to finding information and staying current within their field.  ThreeSOURCE hopes to make this process easier by presenting a one-stop-shop,” writes Jennifer.

Website

Andornot assisted with almost every aspect of this site, from the recommendation of Artisteer as the basis for the website graphic design, which Jennifer took to enthusiastically, to the deployment of the site on the ESPC server. Our team integrated the website design into the ASP.NET based Umbraco Content Management System. We set Jennifer up with the desktop interface of the Andornot Starter Kit so she could catalogue grey literature while we developed the web interface. This included the database component, which consisted of the quick and advanced search screens, brief displays, a full display, Google book covers, a RSS feed for the latest database additions, and the Email, Save, and Print components. The web catalogue uses Inmagic WebPublisher PRO as the underlying search engine.

Content Management System

Once the website was up and running on our development server with Umbraco, Jennifer could login through a web browser and start adding website content to the site. Andornot put the final touches to the site, such as a link to their newsletter sign in and an RSS feed from the database and then moved the whole site over to the ESPC server.

“One of the key features of this audience is that they generally access and share information in a very social way: they discuss the latest developments in their field over coffee with colleagues, and they share new publications with their email contacts,” writes Jennifer. “We wanted to replicate this social aspect in some manner, and the RSS feed of New Acquisitions is a starting point for engaging our audience beyond the library catalogue interface.”

Topic Searches

Using Umbraco, Jennifer is able to quickly add new canned or topic searches to the home page, which can be based on recent requests for information or hot topics. To illustrate, Jennifer writes:

“We were recently approached for information regarding affordable housing solutions for seniors, to support a proposal for a related project.  A quick search of subjects such as “Seniors” and “Housing – affordable housing solutions” brought up the Wellesley Institute’s recent report on Precarious Housing in Canada (2010) and the Canadian Centre on Disability Studies Analysis of housing for seniors living with disabilities using a livable and inclusive community lens (2009).  The former provides federal government funding allocations towards housing for low-income seniors.  The latter identifies affordability as a top major housing issue faced by seniors and seniors with disabilities, and pointed towards other publications confirming the urgent nature of this issue.”

She promptly added a link to all the items in the database on the topic Housing for Seniors after receiving that request for information. We are sure that the page will soon fill up with links to other relevant topics as they are determined, making the finding of relevant information in the third sector so much easier.

Congratulations to the Edmonton Social Planning Councilon the launch of this valuable resource.  Contact us for more information on project specifics. 

Law Society of Saskatchewan Launches Genie

by Jonathan Jacobsen Friday, January 22, 2010 10:05 AM

The Law Society of Saskatchewan is the latest Andornot client to launch Genie (part of the Inmagic DB/Text Library Suite).

The Law Society Library is a long-time Inmagic user, with several textbases used to manage books, journals, legal resources, and other materials. These databases are maintained in the Society’s two main offices in Regina and Saskatoon, with online access for society members through the society's website.

One of the challenges facing the library was managing all of these databases from separate locations. "We were looking for any way to streamline our core tasks of acquisitions and circulation by eliminating double and sometimes triple data entry and the never-ending battle of syncing databases," says Toby Willis-Camp, Director of Libraries at the Law Society. The library selected Genie as a solution, as it provides a single, central, web-based system for all the core library functions. The library also opted to have Andornot host both the Genie system, as well as the entire society website, with its members-only resources and other searchable databases.

Andornot and the staff in the Law Society Library planned and completed a conversion of several Inmagic databases into the Genie system, with special attention paid to de-duplicating records from similar source databases.

Law_Society_of_Saskatchewan_Libraries_Search_Results While the library staff use the Genie Catalogue, Orders, Serials and other "back-end" modules for their day-to-day work, a new search interface for society members and the public was created, based on the Andornot Starter Kit. This OPAC features:

  • quick and advanced search screens for all library materials;
  • a specific search screen for Continuing Professional Development (CPD) materials;
  • the full text of CPD materials, with a link from the record to a PDF of the material, for immediate online access;
  • tick boxes allow society members to mark items as they find them, then submit a request to the library for those materials, or email a copy of the list to themselves or a colleague, or save or print the list; and
  • icons in the search results to help users more easily identify the types of materials in their search results. Different icons appear for books, journals, websites, DVDs and CPD materials.

"Every time I use the OPAC I break out into a huge smile, because I am so pleased with how it's working and presents the information.  I am absolutely ecstatic about the 'public face' we now present," says Willis-Camp.

Members of the Law Society register in a different online system, but thanks to a behind-the-scenes transfer of information into Genie on a regular basis, the Genie Borrowers database is always up to date with society members.

"It was a year filled with immense change at the Library.  Working with Andornot made most of those changes go more smoothly.  We are looking forward to using Genie at its full potential, and continuing to improve and expand our web presence with Andornot's expert assistance." says Willis-Camp.

Please contact Andornot for more information on Genie, the Andornot Starter Kit, and our hosting and consulting services.

Using XML and XSL to transform and import records

by Jonathan Jacobsen Wednesday, July 29, 2009 6:53 PM

Why enter records into your database when you can have someone else do it for you? Or at least, why not borrow records from other sources and import them into your database? It’s quite easy to do, saving time and improving accuracy. One approach to this is:

  1. Use a service such as Bookwhere, Biblios.net or PubMed to search for records in numerous online databases, for materials such as books, journals, articles, videos, maps ­- anything that might have been catalogued by someone somewhere may be found (see our blog post on Biblios.net).
  2. Save records in MARC XML format (though any XML format can be used).
  3. In Genie (part of the Inmagic Library Suite), use the included Bookwhere XSLT to convert selected MARC tags to Genie fields and import records. (XSLT is short for Extensible Stylesheet Language Transformation, and is a language used to transform XML data into other formats).
  4. In Inmagic DB/Text, customize an XSLT to map MARC XML or any other XML data source to your data structure and import records.
  5. After importing records, you would of course further customize them to suit your database.

If you use Genie, it includes an XSLT (Bookwhere.xsl in the Genie ImporterFiles folder) that maps MARC XML fields into Genie fields. You can customize this XSLT further for your cataloging needs. For example, some MARC tag to Genie field mappings we have added include:

 

MARC Tags Genie Field
090 or 050 CatCallNumber
520 CatAbstract

856 subfield u

CatURL

856 subfield y

CarURLNotes

246, 247, 730, 740, 770, 772, 776, 780, 785, 787

CatAlternateTitle

Leader position 6 or 7

CatRecordType

Here's an example of the above Leader mapping added to the Genie Bookwhere.xsl transformation:

<xsl:template name="RecordType2" match="marc:leader">

<xsl:choose>

<xsl:when test="substring(marc:leader, 8, 1 )='s'">

'CatRecordType' Periodical

</xsl:when>

<xsl:when test="substring(marc:leader, 7, 1 )='a'">

'CatRecordType' Book

</xsl:when>

<xsl:when test="substring(marc:leader, 7, 1 )='g'">

'CatRecordType' Videorecording

</xsl:when>

</xsl:choose>

</xsl:template>

Virtually any XML file can be imported into a textbase using an XSL. The beauty of XSLT is that data cleanup can be done as part of the process. For example, ALL CAPS can be converted to Title case, fields can be separated or joined, dates can be transformed to other formats, and much more.

Month List