Wednesday, February 21, 2024

Epoch dates in Excel and Java

Dates and times are very complicated as we know and almost every sentence should be followed by a caveat, probably to do with time zones and/or leap seconds. But, let's just press on anyway!

Excel stores dates as a number of days since a reference ("Epoch") date - the fractional part represents the time, for example 34567.5 means midday on day 34567.

Java has a whole load of stuff in the java.time package (and others) but it is easy enough to get dates as a Unix time-stamp, which is a number of seconds since the epoch.

The Excel epoch is 1-Jan-1900 and the Unix epoch is 1-Jan-1970.

I had some code outputting from Java which I wanted to make into a text file that could be imported by Excel. Formatting dates as text so that Excel could re-convert seemed troublesome because it's not clear exactly what formats it supports and it (probably!) depends on the locale. Why not output the Excel date serial number, then all that's needed is to format it as a date in the sheet?

The Excel serial number should be:

(Unix time stamp) / (number of seconds in a day) + offset
  
where offset is to account for the difference between the two epochs, i.e. number of days between 1-Jan-1900 and 1-Jan-1970.

Java can work this out, here is some jshell

   1 : import java.time.*;
   2 : import java.time.temporal.*;
   3 : var unix_epoch  = LocalDateTime.of(1970,1,1,0,0,0,0);
   4 : var excel_epoch = LocalDateTime.of(1900,1,1,0,0,0,0);
   5 : ChronoUnit.DAYS.between(excel_epoch,unix_epoch)
   

which gives the answer 25567

Excel can work this out with

=date(1970,1,1)

which formatted as a number gives 25569.

Two days out. The Excel one gives the right answer if used as the offset in the formula but other systems also suggest the proper answer is 25567.

The discrepancy comes because

  • Excel and Java disagree as to whether 1900 was a leap year
  • Excel epoch date is "Day 1" rather than starting from zero (in other words, =date(1900,1,1) gives 1)

So - I just hard-coded 25569 as a magic constant to use as the offset.

Postscript - seems it's impossible to type in an Excel date before 1900, it doesn't autoformat it. If you try and do arithmetic to subtract 2 days from 1-jan-1900 you just get ##########. If you use the DATE function it wraps, eg. =DATE(1800,1,1) gives a date in the year 3700. This is weird because I can imagine uses for dates in the 19th century, like genealogy, but not for the 38th century (extreme forecasting??)