Converting DB/TextWorks Data to MARC: Not as Easy as it Sounds!

by Jonathan Jacobsen Wednesday, April 25, 2018 4:55 PM

We recently had a client ask us if we could help them convert their DB/TextWorks library catalogue database to MARC format, for submission to another search system.

Our initial reaction was "Sure, easy-peasy." After all, we're librarians, we know MARC well, we're experts in DB/TextWorks, and we've done this before. How hard could it be?

Ha! Famous last words, and all that.

To be fair, there are a few wrinkles:

  1. The conversion is not a one-time affair, but rather something the client would like automated and running on a regular basis.
  2. Not all records in the database are to be converted.
  3. MARC is not the simplest of formats. Whether MARC or MARCXML, there are a fairly rigid set of rules that must be followed to create 100% valid, standards-compliant MARC records.

Nonetheless, since MARC has been around rather a long time, there are a plethora of tools available for creating MARC records from other data sources. There's Inmagic's own MARC Transformer, which works directly with DB/TextWorks databases, as well as various free or open source tools from the U.S. Library of Congress and other agencies, MARCEdit, Balboa Software's Data Magician, and others.

We also have our Andornot Data Extraction Utility for automatically exporting data from a DB/TextWorks database and manipulating it into a variety of formats.

Ever optimistic, we figured some combination of these tools could be strung together in sequence without too much effort to create a solution for this client. We wanted to avoid developing a DB/TextWorks-to-MARC conversion program from scratch as it would be quite time consuming, mostly due to the MARC format requirements themselves.

Several tools, upon closer investigation, proved to be too ancient to run reliably on a modern Windows server, or couldn't work with the current version of DB/TextWorks. Others proved almost impossible to use in an automated fashion. They could be useful in a one-time manual conversion to MARC format, but not in the hands-off, automated workflow we needed.

The exploration of this issue was an interesting exercise in seeing how old data formats and old programs age and become harder to work with.

The recipe that baked the cake in the end was:

  1. Use the Andornot Data Extraction Utility and the Inmagic ODBC driver to extract data from the DB/TextWorks database to a pseudo-MARC plain text format. 
  2. Use a custom-developed PowerShell script to manipulate the records in this file to handle some of the quirks of the ODBC output and to more closely adhere to the MARC format.
  3. Use the command line interface to MARCEdit to convert the pseudo-MARC to MARC Communications Format files.
  4. Upload the MARC files over FTP to the destination server.
  5. Manage all the moving parts through a PowerShell script and log the steps and results to a file for easy troubleshooting in case of problems.
  6. Run the script nightly as a scheduled task on a Windows server.

When written like that, in hindsight, it sounds so simple. And in the end, it was, and works well. But the journey to arrive at this solution was one of the more challenging small projects we've undertaken, considering how simple the task sounds at first.

We hope this will help you if you have a similar project, but don't hesitate to ask us for help, now that we've worked through this.

Grey literature for the Third Sector

by Denise Bonin Wednesday, October 13, 2010 10:52 AM

Grey literature, fugitive publications, the hidden web; it sounds all very mysterious doesn’t it? threeSOURCE-logoWhere are these resources? How can they be found? Well for the folks in Alberta – and because it is on the Internet, for the rest of the world – this previously concealed material in the non-profit and social services “third” sector now has a home at threeSOURCE: http://www.threesource.ca. See the press release here.

Database

The database that forms the basis of the site contains a vast quantity of grey literature from groups such as the Alberta Federation of Labour, Alberta Status of Women Action Committee, Family Service Association of Edmonton, Calgary Status of Women Action Committee, and Families First Edmonton. It also contains the ESPC catalogue collection, which during the course of the project was converted from another system, L4U, using the MARC Transformer, into Inmagic DB/TextWorks.

Jennifer Hoyer from the Edmonton Social Planning Councilspearheaded this new website with funding assistance from Alberta Culture and Community Spirit and the Edmonton Community Foundation.

“There is currently no central location, either physically or virtually, for accessing publications created within or about this field of work.  People working within the third sector – in social services and nonprofits – are notoriously short on time when it comes to finding information and staying current within their field.  ThreeSOURCE hopes to make this process easier by presenting a one-stop-shop,” writes Jennifer.

Website

Andornot assisted with almost every aspect of this site, from the recommendation of Artisteer as the basis for the website graphic design, which Jennifer took to enthusiastically, to the deployment of the site on the ESPC server. Our team integrated the website design into the ASP.NET based Umbraco Content Management System. We set Jennifer up with the desktop interface of the Andornot Starter Kit so she could catalogue grey literature while we developed the web interface. This included the database component, which consisted of the quick and advanced search screens, brief displays, a full display, Google book covers, a RSS feed for the latest database additions, and the Email, Save, and Print components. The web catalogue uses Inmagic WebPublisher PRO as the underlying search engine.

Content Management System

Once the website was up and running on our development server with Umbraco, Jennifer could login through a web browser and start adding website content to the site. Andornot put the final touches to the site, such as a link to their newsletter sign in and an RSS feed from the database and then moved the whole site over to the ESPC server.

“One of the key features of this audience is that they generally access and share information in a very social way: they discuss the latest developments in their field over coffee with colleagues, and they share new publications with their email contacts,” writes Jennifer. “We wanted to replicate this social aspect in some manner, and the RSS feed of New Acquisitions is a starting point for engaging our audience beyond the library catalogue interface.”

Topic Searches

Using Umbraco, Jennifer is able to quickly add new canned or topic searches to the home page, which can be based on recent requests for information or hot topics. To illustrate, Jennifer writes:

“We were recently approached for information regarding affordable housing solutions for seniors, to support a proposal for a related project.  A quick search of subjects such as “Seniors” and “Housing – affordable housing solutions” brought up the Wellesley Institute’s recent report on Precarious Housing in Canada (2010) and the Canadian Centre on Disability Studies Analysis of housing for seniors living with disabilities using a livable and inclusive community lens (2009).  The former provides federal government funding allocations towards housing for low-income seniors.  The latter identifies affordability as a top major housing issue faced by seniors and seniors with disabilities, and pointed towards other publications confirming the urgent nature of this issue.”

She promptly added a link to all the items in the database on the topic Housing for Seniors after receiving that request for information. We are sure that the page will soon fill up with links to other relevant topics as they are determined, making the finding of relevant information in the third sector so much easier.

Congratulations to the Edmonton Social Planning Councilon the launch of this valuable resource.  Contact us for more information on project specifics. 

Using XML and XSL to transform and import records

by Jonathan Jacobsen Wednesday, July 29, 2009 6:53 PM

Why enter records into your database when you can have someone else do it for you? Or at least, why not borrow records from other sources and import them into your database? It’s quite easy to do, saving time and improving accuracy. One approach to this is:

  1. Use a service such as Bookwhere, Biblios.net or PubMed to search for records in numerous online databases, for materials such as books, journals, articles, videos, maps ­- anything that might have been catalogued by someone somewhere may be found (see our blog post on Biblios.net).
  2. Save records in MARC XML format (though any XML format can be used).
  3. In Genie (part of the Inmagic Library Suite), use the included Bookwhere XSLT to convert selected MARC tags to Genie fields and import records. (XSLT is short for Extensible Stylesheet Language Transformation, and is a language used to transform XML data into other formats).
  4. In Inmagic DB/Text, customize an XSLT to map MARC XML or any other XML data source to your data structure and import records.
  5. After importing records, you would of course further customize them to suit your database.

If you use Genie, it includes an XSLT (Bookwhere.xsl in the Genie ImporterFiles folder) that maps MARC XML fields into Genie fields. You can customize this XSLT further for your cataloging needs. For example, some MARC tag to Genie field mappings we have added include:

 

MARC Tags Genie Field
090 or 050 CatCallNumber
520 CatAbstract

856 subfield u

CatURL

856 subfield y

CarURLNotes

246, 247, 730, 740, 770, 772, 776, 780, 785, 787

CatAlternateTitle

Leader position 6 or 7

CatRecordType

Here's an example of the above Leader mapping added to the Genie Bookwhere.xsl transformation:

<xsl:template name="RecordType2" match="marc:leader">

<xsl:choose>

<xsl:when test="substring(marc:leader, 8, 1 )='s'">

'CatRecordType' Periodical

</xsl:when>

<xsl:when test="substring(marc:leader, 7, 1 )='a'">

'CatRecordType' Book

</xsl:when>

<xsl:when test="substring(marc:leader, 7, 1 )='g'">

'CatRecordType' Videorecording

</xsl:when>

</xsl:choose>

</xsl:template>

Virtually any XML file can be imported into a textbase using an XSL. The beauty of XSLT is that data cleanup can be done as part of the process. For example, ALL CAPS can be converted to Title case, fields can be separated or joined, dates can be transformed to other formats, and much more.

Importing biblios.net records into Inmagic Genie

by Kathy Bryce Sunday, February 01, 2009 10:03 PM

‡biblios.net from LibLime was launched at the ALA Midwinter meeting in Denver last week. It is a free browser-based cataloging service with a data store containing over 30 million records. Records are licensed under the Open Data Commons, making the service the world's largest repository of freely-licensed library records.  For additional information you can also listen to a podcast with the LibLime CEO, Josh Ferraro.

The site features a very clean, easy to use interface with options to select target libraries and refine search results by authors, publishers, subjects and dates. ‡biblios.net also offers the ability to export records in MARC XML, and in my tests so far, this data imports nicely into Genie in the Inmagic Library Suite. To see the level of detail in the default upload, check out our Genie demo site and search for "Torts" in the Anyword box.  

The built in ability in Genie to upload BookWhere generated MARC XML can be used to also upload ‡biblios.net records. BookWhere users will miss features such as the rating of the quality of the MARC records. However ‡biblios.net provides libraries that don't use BookWhere with another option for obtaining high quality catalog records.  As for BookWhere records, the XSL file included with Genie can be modified to include additional MARC fields if needed, for example no call numbers are added by default, but the XSL can be edited to add whichever MARC tag is appropriate for the classification scheme in use.

BookWhere, MARC Records and Inmagic Genie

by Administrator Wednesday, January 17, 2007 10:32 PM

A client of ours recently started using the new BookWhere XML MARC record import feature of Inmagic Genie. They noticed that although the new feature allowed them to import MARC records nicely into Genie, it did not import a call number from any of the records. There are several MARC call numbers that could be used, depending on the classification system used by the Genie user. See http://www.loc.gov/marc/bibliographic/ecbdclas.html for specifics on which MARC field you should use. We have used MARC field 050 - Library of Congress Call Number - in this example (leave out leading zero).

Month List