Video: Slay Monster PDFs with pdfbox

by Peter Tyrrell Friday, October 21, 2011 3:38 PM

A screencast of the lightning talk presentation I gave at the Access 2011 Conference on October 21, 2011 in Vancouver, BC, entitled "Quick and Dirty's Guide to Slaying Monster PDFs," in which I show how to use pdfbox to slice up large PDFs for indexing to make search more meaningful.

Tags: Solr | video

I deleted it! Now what?

by Peter Tyrrell Tuesday, June 07, 2011 12:25 PM

I have a home server running Amahi on Fedora Linux. I tell everyone it’s for backups and media, but it’s really so I can tinker with it and have to something to complain about. While prepping it this weekend for an upgrade, I got careless and decided to merge a partition that wasn’t being used with the boot partition.

Except it was being used. Imagine my surprise when it wouldn’t boot. And after some poking around, I was able to phrase the question that was to determine my course of action for the next 12 hours: “Er… where’s the operating system?”

To cut a long story short, I was able to recover the deleted partition with TestDisk, which comes with the handy handy HANDY Parted Magic CD. Both are free and indispensable.

TestDisk is open source data recovery software for recovering lost partitions and making non-booting disks bootable again.

PhotoRec is open source file recovery software that finds lost files from hard disks and digital cameras, even if the file system has been damaged or reformatted.

PartedMagic is an open source suite of programs for disk, partition and file system management. It includes TestDisk and PhotoRec (and many others). It runs from a CD, no install required.

  • Format internal and external hard drives.
  • Move, copy, create, delete, expand & shrink hard drive partitions.
  • Clone your hard drive, to create a full backup.
  • Test hard drives for impending failure.
  • Test memory for bad sectors.
  • Benchmark your computer for a performance rating.
  • Securely erase your entire hard drive, wiping it clean from all data.
  • Gives access to non-booting systems allowing you to rescue important data.

Tags: Linux | tips | tools | utilities

Solr and the Trend to Open Source Search

by Peter Tyrrell Thursday, June 02, 2011 11:06 AM

On Saturday I caught a cold. On Sunday, I caught a flight to San Francisco to attend Lucene Revolution - the biggest open source search conference on the planet – to catch up on the latest developments with Apache Lucene/Solr.

From the Solr project website:

“Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.”

I was joined there by developers and representatives from AT&T,, Corbis, Ebay, eHarmony, EMC,, HathiTrust, Healthwise, Intuit, Travelocity, Trulia, Twitter, Woot, and Yelp, to name a few. (Yes, I’m name dropping – just trying to appear cool here.)

I’ve been working with Solr for the better part of a year, and I thought it very impressive, but the conference blew my socks off in terms of what Solr can do. To think that the best-of-breed search performance in the world is open source! (Solr beats Google in overall search performance. No, I’m not exaggerating.)

Solr is a game-changer, there’s no doubt. No longer is open source just a freebie alternative, it is the go-to standard that is beating the pants off of proprietary search engines. I heard quite a few stories of prominent household-name enterprises switching to Solr and reducing costs while simultaneously vastly increasing their capabilities and performance. 

Happily, Solr not only scales up to the largest data collections ever created by our species, but also down to the relatively modest needs of the rest of us. It democratizes search. A kid making a website in his parent’s basement can utilize the same cutting-edge search features as a multinational corporation, and that’s not just convenient, it’s necessary, because search is vital now to every level of our interaction with information.

I can’t wait to apply what I learned at the conference back at El Rancho Andornot. Also, I need to keep ahead of that kid in the basement.

Tags: Solr

Amahi Home Server

by Peter Tyrrell Friday, February 11, 2011 11:56 AM


I recently converted. I didn’t intend to switch allegiance but Fate, full of twists and turns, is by nature never predictable. It all started with Windows Home Server Version 2, codenamed “Vail”. Or, as most are calling it now, “Vail Fail”.

Windows Home Server “Vail Fail”

Windows Home Server v1 had gained quite a devoted following, encompassing as it did backup, file, and media server features in an affordable and extensible package. Many felt that the key technology for home users was its Drive Extender, a software storage technology that allowed one to plug in a hard drive of any type and size and add it effortlessly to a combined storage pool, a homogenous and seamless whole made up of heterogenous parts. This sum-of-its-parts feature was not a technology that could withstand the rigors of mission-critical enterprise data-storage demands, but was perfectly matched to the home market which has neither the cash nor the expertise to maintain RAID arrays of identically matched disks.

There was a comfy, DIY hobbyist feeling to WHS v1 which attracted tinkering technophile weekend warriors, but delivered enough utility to satisfy wives and girlfriends: “See honey? Now we can stream Glee to the bedroom and back up your iTunes bellydance playlists, on the same device!” (Tip: significant others, always on the alert against clutter, tend to fall hard for the “fewer devices are better” argument.)

When Microsoft announced it was working on WHS v2, the community happily fluttered and twittered (literally), plucking up with gusto every bit of news that came out on Vail’s progress, following that breadcrumb trail deep into the wood after the tantalizing promise of delicious server cake, when one day something happened, and we all came to in a dark forest, lost and hungry. With no cake. A distinct absence of cake, in fact, both now and in the future, for that day, November 23 2010, Microsoft proclaimed that Windows Home Server Version 2 “Vail” would abandon Drive Extender.

The news went over like a lead balloon. My own first reaction was incredulity: “What’s the point of WHS without Drive Extender?” This sentiment was echoed a thousandfold across the land. If you were in space you might have seen the myriad question marks popping up over the heads of puzzled WHS enthusiasts everywhere like mushrooms after a rain. And indeed, the answer to that question, following stages of denial, rage, bargaining, and sorrow, is the answer which brings bittersweet acceptance at last: there is no point. Days later, HP announced it was dumping its WHS line of products. Nothing to do with Vail Fail. They said. Yeaaahhhh.

Disillusioned WHS refugees began clogging the highways and biways in an internet diaspora, knowing not whither they might go, only sure they didn’t want to stay. I myself was among them, despondent, plodding aimlessly with all my digital possessions piled high on a hand-cart, when there appeared suddenly a beacon in the dark, a giant candle blazing forth atop a cake of prodigious size and flavour. Impossibly, forks were free. That cake was called “Amahi”.

Amahi Home Server


Amahi is an open-source home server built on Fedora Linux. Storage pooling technology is handled by Greyhole. Media streaming, file sharing, VPN, PC backups, a variety of one-click apps - it does everything I did with WHS, and more.

I installed Fedora and then Amahi on an older but robust PC. It took me most of a weekend to migrate data from WHS, and a couple of weeks following to tweak shares, get backups going, set up the apps I wanted and test them, get familiar with Linux, and most importantly, start trusting Amahi not to blow up. I turned off WHS after three weeks. After a month, when Crashplan had caught up with offsite backup (yes, Amahi has a one-click app for Crashplan), I dismantled the WHS server, one of the most satisfying hours I’ve ever spent with a screwdriver.

I won’t pretend I didn’t get stuck a few times while getting Amahi to do my bidding, or that the command line terminal isn’t a stark and lonely plateau to traverse alone. Fedora has a nice GUI, but once a problem crops up, you are sent scurrying to the command line where anything of any consequence must be done. That’s Linux for you: the level of control it gives you over your system is awe-inspiring and a little bit frightening. The Amahi forums helped. I learned a lot about Linux because I had to.

It is still possible to run Windows-dependent software if you need to. Amahi is just a layer on Fedora, so you can take full advantage of the operating system. WINE is a Windows emulator that runs within Linux, or you can install a virtualization product like (the open-source) VirtualBox which can host a Windows OS.

All in all, I highly recommend Amahi as a home server if you like to tinker and like a learning experience. There are quite a few of us huddled masses yearning to be free, late of Windows Home Server, swelling the ranks over there.

Anatomy of a Genie Add-In

by Peter Tyrrell Wednesday, October 20, 2010 1:18 PM

I was recently asked to add a feature to Inmagic Genie that would detect overdues and calculate fines on the Loans Checkin page, allow a staffperson to override fine values, and save the fine totals to the Loans database in order to generate overdue reports by borrower.

Here it is in action over the Loans Checkin page:


Fig. 1 (above) - Overdues dialog appears when barcode input loses focus.



Fig. 2 (above) - Growl-type message shows feedback.



  1. Enter barcodes.
  2. Tab out.
  3. Overdues, if any, appear. Set and submit fines.
  4. Click Check In button.

One extra step isn't too bad, right?


  • Minimal impact on Genie: just one extra line in loans_checkin.aspx.
  • Easy to set up.
  • Supports IE7, IE8, IE9 beta, Firefox 3+, Chrome 5+, Opera 10+, Safari 5+
  • Supports 212 international currency formats.
  • AJAX-to-web-service-enabled
  • Blessed with good looks


  • It's not already part of Genie?


The key components are:

  • An ASP.NET user control: AndornotCheckinControl.ascx.
  • A jQuery plugin: jquery.genieCheckin-1.0.js.
  • A JSON-enabled .NET web service: checkin.asmx.
  • A .NET wrapper to the Webpublisher XML API: Andornot.Web.WebPublisherXml.dll
  • A supporting Genie AddIn assembly: Andornot.GenieAddIns.Web.dll


Fig. 3 (above) - Diagram showing key components and workflow of the add-in.


Contact Us

Call or email or parachute in for a quote on adding this functionality to your Inmagic Genie installation.

1-866-2626-2525 toll free
Where to land your parachute or glider

Month List