BackToBasics: Dating Responsibly

No, this is not going to be a synopsis of the TV series, "8 Simple Rules... for Dating My Teenage Daughter." This is a continuation of my #BackToBasics series, which I pledged after a challenge from Tim Ford. This month, after having yet another conversation about safe date formats, I thought I would put together a TL;DR version of a few of my favorite "Bad Habits" posts about date/time handling.

Please don't use yyyy-mm-dd

My most recent discussion about date formats in SQL Server revolved around yyyy-mm-dd - which I contend is not a safe date format to use as a string passed into any kind of date/time parameter, variable, or column. The reason is that it can fail when casting to datetime or smalldatetime, because SQL Server swaps the month and day under certain language settings. For example:

SET LANGUAGE Deutsch; -- German

DECLARE @d datetime = '2016-04-17';

Msg 242, Level 16, State 3, Line 2 
Bei der Konvertierung eines varchar-Datentyps in einen datetime-Datentyp liegt der Wert außerhalb des gültigen Bereichs.
Translation:
The conversion of a varchar data type to a datetime data type resulted in an out-of-range value.

Why? It was interpreted as yyyy-dd-mm. This is a little less obvious than other completely unsafe date formats like mm/dd/yyyy and dd/mm/yyyy.

Of course, getting an error is a good thing, but it's only because there is no 17th month. If a user running under German language settings tried to store 2015-04-06, SQL Server would have happily stored June 4th, not April 6th, and who knows when anybody would have noticed the "error."

I urge you to always use yyyymmdd (without the dashes) for a date without time - it will never fail, regardless of regional, language, or dateformat settings, and across any of the date/time data types. (And absolutely do not store it as a string data type in SQL Server - always store it as a proper date or time data type.)

And it's not just German

I whipped up this demo code to test the interpretation of month and day across all of the languages in SQL Server.

DECLARE @sql NVARCHAR(MAX) = N'SET NOCOUNT ON; DECLARE @d datetime;';

SELECT @sql += N'BEGIN TRY;
SET LANGUAGE ' + QUOTENAME(name) + ';
SELECT @d = CONVERT(datetime, ''2015-04-13'');' -- also try 2015-13-04
+ '
INSERT #x SELECT [alias], name, 1 FROM sys.syslanguages
WHERE name = N''' + name + ''';
END TRY
BEGIN CATCH;
INSERT #x SELECT [alias], name, 0 FROM sys.syslanguages
WHERE name = N''' + name + ''';
END CATCH;
' FROM sys.syslanguages;

EXEC sys.sp_executesql @sql;

SELECT [alias],name,success FROM #x ORDER BY success,[alias];

DROP TABLE #x;

I was actually quite surprised to discover the number of languages where yyyy-mm-dd gets misinterpreted - I always used French as the example, but really believed it was an edge case until I ran the above:

failed
(expected yyyy-dd-mm)
succeeded
(expected yyyy-mm-dd)
Arabic (Arabic) Greek (ελληνικά) Croatian (hrvatski)
Bokmål (norsk (bokmål)) Italian (Italiano) English (us_english)
Brazilian (Português (Brasil)) Norwegian (Norsk) Hungarian (magyar)
British English (British) Polish (polski) Japanese (日本語)
Bulgarian (български) Portuguese (Português) Korean (한국어)
Czech (čeština) Romanian (română) Latvian (latviešu)
Danish (Dansk) Russian (русский) Lithuanian (lietuvių)
Dutch (Nederlands) Slovak (slovenčina) Simplified Chinese (简体中文)
Estonian (eesti) Slovenian (slovenski) Swedish (Svenska)
Finnish (Suomi) Spanish (Español) Traditional Chinese (繁體中文)
French (Français) Thai (ไทย)
German (Deutsch) Turkish (Türkçe)

Now, if you swap 2015-04-13 for 2015-13-04, you will see all languages flip - the languages that interpret the second number as day will now succeed. And if you use a safe format, 20150413 (without the dashes), nothing will show up as failed at all. Of course, the point of this demo is not for you to memorize which languages will interpret an unsafe date literal the wrong way, but rather to convince you to always use a safe literal because you never know how a user will configure their client language.

And while yyyy-mm-dd is "safe" for data types like date, I recommend against using it as an exception - I'd rather be consistent. And for similar reasons, avoid passing strings like MM/dd/yyyy or dd/MM/yyyy to represent dates, because they can be misinterpreted just as easily. If you need time, then use yyyy-MM-ddThh:nn:ss.xxx[xxxx] - and yes, the T is crucial. I talk about this issue and more in "Bad habits to kick : mis-handling date / range queries."

Please let the client format dates

I see a lot of effort being spent on using T-SQL to manipulate a date/time value into a specific format intended for a report viewer or other end user. Things like CONVERT(char(10), [column], 101) or, quite often much worse, FORMAT([column], 'MM/dd/yyyy').

Let's play guess the date! Was this updated Feb 1 or Jan 2?
I have no idea.

The main problem is that you can't always control your audience. You may only have American users this moment, but that could change. And then, like above, 4/6/2015 gets interpreted by users as June 4th instead of April 6th.

Also, if you dump the output into a grid of some kind, and the user expects to sort the date column by, well, date, it actually will lump all the January dates together (regardless of year), then all the February dates, and so on. This is because it's sorting by string (unless the app has been trained to actually keep track of the original date value somehow). Oh, and if you use the more logical dd/MM/yyyy, it just breaks in a different way - all of the 1sts of the month will sort together, then all of the 2nds, then the 3rds, etc.

While SQL Server blurs the lines a bit about whether a date is just a string, I urge you to not output them that way - convert to a string for display purposes as late as possible, and when it makes sense, obey the end user's preference. We have a presentation tier for a reason: to apply formatting to raw data coming out of the database. Leave it in its native format until you can't, instead of hammering it into a string as soon as you can.

Please don't use shorthand

I get a kick out of seeing code like this:

SELECT DATEPART(D, GETDATE());

Now, what sort of productivity or performance gains do you think you might have gained by typing D instead of spelling out DAY? Here's why you're going to care: some shorthand is not what you would expect, and could be very problematic if you're troubleshooting at 3 AM. This is based on a quiz I've given to hundreds of attendees of my Bad Habits sessions, and maybe three people have passed in all that time:

SELECT w = DATEPART(W, '20151223'), y = DATEPART(Y, '20151223');
 
/* results:          w        y
                     ------   ------
                     4        357                 */

Admit it, it's not what you expected either, is it? W stands for WEEKDAY, not WEEK. And Y stands for DAYOFYEAR, not YEAR. Please, just spell it out. You avoid these problems and make your code self-documenting, at the mere cost of an additional 32 milliseconds or so of typing. And I promise the extra characters won't slow down your queries.

Another quick one, this:

DECLARE @d datetime = GETDATE(); SELECT @d - 1;

Fails when you use the new types:

DECLARE @d date = GETDATE() - 1;

Msg 206, Level 16, State 2, Line 1 
Operand type clash: date is incompatible with int

When adding or subtracting days from a date or datetime, just always use DATEADD(), even though it's a bit more to type.

Both of these are explained in more detail in "Bad Habits to Kick : Using shorthand with date/time operations."

Please don't use BETWEEN

When I see date range queries that use BETWEEN, I shudder. While it's very easy to find the beginning of a range, how exactly do you find the end? Unless you're dealing exclusively with the date data type (or datetime with reliable constraints preventing time), there isn't a trustworthy way to find the end, due to the varying precision and rounding behaviors of the different date/time types, as well as the problem known as February. And no, EOMONTH() is not the solution, as it just gets the last day of the given month, at midnight, which is only useful if you don't store time.

It is always possible to use an open-ended range, even when using the date data type (again, don't use BETWEEN with date, as it just makes your technique inconsistent). This is because it's always easier to find the beginning of the next range than the end of the current one. Take a month, for example. I can always get the beginning of the current month like this:

DECLARE @start datetime, @end datetime;
 
SET @start = CONVERT(date, DATEADD(DAY, -DATEPART(DAY, GETDATE()), GETDATE()));
 
/* result:      2016-04-01        */

Finding the "end" of the month is harder, because I have to know what data type I'm dealing with. I could use this common technique:

SET @end = DATEADD(MILLISECOND, -3, DATEADD(MONTH, 1, @start));
 
/* result:      2016-04-30 23:59:59.997        */

That's great, as long as the data type of all of the variables, parameters, columns, and both implicit and explicit conversions along the way stay as datetime. If the parameter changes to smalldatetime, your "end" now rounds up to May 1st at midnight. And if the column changes to datetime2, there's a lottery-style chance you could lose data (since a row with 2016-04-30 23:59:59.9984132 could theoretically be stored).

This is much more reliable:

SET @end = DATEADD(MONTH, 1, @start);
 
-- now the query changes from:
 
... WHERE [column] BETWEEN @start AND @end;
 
-- to the slightly more verbose:
 
... WHERE [column] >= @start
      AND [column] <  @end;

There is no chance you can include too much data or leave some out when you say "greater than or equal to the beginning of this range, and less than the beginning of the next." More information in "What do BETWEEN and the devil have in common?"

Conclusion

There you have it, four quick tips on dating responsibly, in as few words as I know how. Hope you've found it useful!