Make numbers behave when sorting alphanumerically in Solr

by Peter Tyrrell Monday, November 03, 2014 10:09 AM

Problem

Numbers mixed with alphabetic characters are sorted lexically in Solr. That means that 10 comes before 2, like this:

  • Title No. 1
  • Title No. 10
  • Title No. 100
  • Title No. 2

Solution

To force numbers to sort numerically, we need to left-pad any numbers with zeroes: 2 becomes 0002, 10 becomes 0010, 100 becomes 0100, et cetera. Then even a lexical sort will arrange values like this:

  • Title No. 1
  • Title No. 2
  • Title No. 10
  • Title No. 100

The Field Type

This alphanumeric sort field type converts any numbers found to 6 digits, padded with zeroes. (If you expect numbers larger than 6 digits in your field values, you will need to increase the number of zeroes when padding.)

The field type also removes English and French leading articles, lowercases, and purges any character that isn’t alphanumeric. It is English-centric, and assumes that diacritics have been folded into ASCII characters.

Sample output

Title No. 1 => titleno000001
Title No. 2 => titleno000002
Title No. 10 => titleno000010
Title No. 100 => titleno000100

Tags: Solr

blog comments powered by Disqus

Month List