Java 11 date parsing? Locale, locale, locale.

by Peter Tyrrell Monday, January 07, 2019 11:39 AM

Java is undergoing some considerable licensing changes, prompting us to plan an all-out move from Oracle Java 8 to OpenJDK Java 11 this Spring for every Solr instance we host. I have been running covertly about the hills setting traps for Java 11.0.1 to see what I might snare before unleashing it on our live servers. I caught something this week.

Dates! Of course it's about parsing dates! I noticed that the Solr Data Import Handler (DIH) transforms didn't handle making created dates during ingest. (In DIH, we use a script transformer and manipulate some Java classes with javascript. This includes the parsing of dates from text.) Up until now, our DIH has used an older method of parsing dates with a Java class called SimpleDateFormat. If you look for info on parsing dates in Java, you will find years and years of advice related to that class and its foibles, and then you will notice that in recent times experts advise using the java.time classes introduced in Java 8. Since SimpleDateFormat didn't work during DIH, I assumed that SimpleDateFormat was deprecated in Java 11 (it isn't actually), and moved to convert the relevant DIH code to use java.time.

Many hours passed here, during which the output of two lines of code* made no goddamn sense at all. The javadocs that describe the behaviour of java.time classes are completely inadequate, with their stupid little "hello, world" examples, when dates are tricky, slippery, malicious dagger-worms of pure hatred. Long story short, a date like '2004-09-15 12:00:00 AM' produced by Inmagic ODBC from a DB/Textworks database could not be parsed. The parser choked on the string at "AM," even though my match pattern was correct: 'uuuu-MM-dd hh:mm:ss a'. Desperate to find the tiniest crack to exploit, I changed every variable I could think of one at a time. That was how I found that, when I switched to Java 8, the same exact code worked. Switch back to Java 11. Not working. Back to Java 8. Working. WTF?

I thought, maybe the Nashorn scripting engine that allows javascript to be interpreted inside the Java JVM is to blame, because this scenario does involve Java inside javascript inside Java, which is weird. So I set up a Java project with Visual Studio Code and Maven and wrote some unit tests in pure Java. (That was pretty fun. It was about the same effort as ordering a pizza in Italian when you don’t speak Italian: everything about the ordering process was tantalizingly familiar but different enough to delay my pizza for quite some time.) The problem remained: parsing worked as-written in Java 8, but not Java 11.

I started writing a Stack Overflow question. In so doing, I realized I hadn't tried an overload method of java.time.format.DateTimeFormatter.ofPattern() which takes a locale. I had already dotted many i's and crossed a thousand t's, but I wanted to really impress anyone reading the question that I had done my homework, because I hate looking ignorant, so I wrote another unit test that passed in Locale.ENGLISH and, ohmigawd, that solved the problem entirely. If you have been following along, that means that "AM/PM" could not be understood by the parser, even with the right pattern matcher, without the context of a locale, and obviously the default locale used by the simpler version of DateTimeFormatter.ofPattern() was inadequate to the task. I tested and Locale.ENGLISH and Locale.US both worked with "AM/PM" but Locale.CANADA did not. Likely the latter is my default locale, because I do reside in Canada. Really? Really, Java? We have AM and PM here in the Great White North, I assure you.

I don’t know if this a bug in Java 11. I’m merely happy to have understood the problem at this point. Just another day in the developer life, eh? Something that should be a snap becomes a grueling carnival ride that deposits you at the exit, white-faced and shaking, with an underwhelming sense of minor accomplishment. How do you explain to people that you spent 8 hours teaching a computer to treat an ordinary date as a date? Write a blog post, I guess. Winking smile

* Two lines of code. 8 hours of frustration. Here it is, ready?

import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Locale;

public class App {

    public LocalDateTime Parse (String dateText, String pattern) {

        DateTimeFormatter parser = DateTimeFormatter.ofPattern(pattern, Locale.ENGLISH);
        LocalDateTime date = LocalDateTime.parse(dateText, parser);
        return date;

    }
}

Fixing short date strings in Textworks

by Peter Tyrrell Thursday, June 07, 2007 2:58 PM

The Problem

1000s of date strings in short date format like m/d/yy. Fine as long as system date settings assume month/day/year. Then system date settings change to day/month/year to conform with international standards. 1000s of date strings are misinterpreted.

E.g. 06/01/2007 Before = June 1, 2007 After = January 6, 2007

The Fix

  1. Export date field and unique ID field to delimited text file.
  2. Use regular expression to switch day and month:
    • find expression: (\d+)/(\d+)/(\d{4})
    • replace expression: \2/\1/\3
  3. Import modified file into Excel or Access, treating the date string field as DateTime so it's interpreted as a proper date, not a string.
  4. Change the format of the date field to a Long Date
    • Access query expression: Format([MyDate], "Long Date")
  5. Import the file with long date back into Textworks, matching on unique ID field; replace field values.
  6. Date strings are now in unambiguous Long Date format, e.g. MMM dd, yyyy.

UPDATE: This comic at xkcd.com is totally awesome. And relevant and stuff. comic

Month List