Make numbers behave when sorting alphanumerically in Solr
Problem
Numbers mixed with alphabetic characters are sorted lexically in Solr. That means that 10 comes before 2, like this:
- Title No. 1
- Title No. 10
- Title No. 100
- Title No. 2
Solution
To force numbers to sort numerically, we need to left-pad any numbers with zeroes: 2 becomes 0002, 10 becomes 0010, 100 becomes 0100, et cetera. Then even a lexical sort will arrange values like this:
- Title No. 1
- Title No. 2
- Title No. 10
- Title No. 100
The Field Type
This alphanumeric sort field type converts any numbers found to 6 digits, padded with zeroes. (If you expect numbers larger than 6 digits in your field values, you will need to increase the number of zeroes when padding.)
The field type also removes English and French leading articles, lowercases, and purges any character that isn’t alphanumeric. It is English-centric, and assumes that diacritics have been folded into ASCII characters.
Sample output
Title No. 1 => titleno000001
Title No. 2 => titleno000002
Title No. 10 => titleno000010
Title No. 100 => titleno000100