13 November 2006

COLLATE

Well, not strictly SQL 2005 related but, as some of our developers have recently noticed, our database servers are set to be case insensitive. SQL Server, by default, is case insensitive. For example, SELECT * FROM myTable WHERE column = 'six' gives the same results as SELECT * FROM myTable WHERE column = 'SiX'.

So, how do we do a case sensitive SELECT statement for something like a password comparison, when you don't want 'six' = 'SiX'?

SQL Server provides a COLLATE clause to do this. You may have seen this clause when generating creation scripts for tables. Here is a quick example:

CREATE TABLE [dbo].[cd_LandCategory] (
[LandCategory_SV] [int] NOT NULL ,
[Desc_TX] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
etc...

The second to last character group in the collation specifies the case sensitivity (CI = Case Insensitive).

It turns out you can also include the COLLATE clause in a query to specify the collation you want your query to use at runtime, completely independent of the DDL used to create the table and store the data (this is similar in function to the CAST function). Using the above table snippet, you may want to write a case sensitive query on the [Desc_TX] column:

SELECT [Desc_TX]
FROM [dbo].[cd_LandCategory]
WHERE [Desc_TX] = 'Old' COLLATE SQL_Latin1_General_CP1_CS_AS

I have changed the CI character group in the collation to CS, providing a case sensitive comparison. One thing to note is that, if I am comparing two columns, I will need to use COLLATE on both sides of the comparison or, I could get a collation error or other unexpected results:

SELECT [Desc_TX]
FROM [dbo].[cd_LandCategory]
WHERE [Desc_TX] COLLATE SQL_Latin1_General_CP1_CS_AS = [SomeOtherColumn_TX] COLLATE SQL_Latin1_General_CP1_CS_AS


The COLLATE clause seems to work the same on both SQL 2000 and SQL 2005.

A readily apparent use for this is comparing passwords in a case-sensitive manner (assuming you didn't create your password column with a case sensitive collation to start with; a common oversight).

Below is a quick code sample you can run on your favorite development server to see how collation, and overriding it with the COLLATE clause, works. It covers case sensitive and insensitive column collations with no COLLATE clause in the query. It also demonstrates overriding the column's collation using the COLLATE clause.

/******** COLLATION EXAMPLE 1.
Case-insensitive column and no COLLATE in where clause ************/

CREATE TABLE #tmp (pwd varchar(100) COLLATE SQL_Latin1_General_CP1_CI_AS not null)
INSERT
#tmp values ('@ssWord')
INSERT
#tmp values ('@ssword')
INSERT
#tmp values ('bit3mE')
INSERT
#tmp values ('bit3me')

SELECT * FROM #tmp
WHERE pwd = 'bit3me'

DROP TABLE #tmp
GO

/******** COLLATION EXAMPLE 2.
Case-sensitive column and no COLLATE in where clause ************/

CREATE TABLE #tmp (pwd varchar(100) COLLATE SQL_Latin1_General_CP1_CS_AS not null)
INSERT
#tmp values ('@ssWord')
INSERT
#tmp values ('@ssword')
INSERT
#tmp values ('bit3mE')
INSERT
#tmp values ('bit3me')

SELECT * FROM #tmp
WHERE pwd = 'bit3me'

DROP TABLE #tmp
GO

/******** COLLATION EXAMPLE 3.
Case-insensitive column and case-sensitive COLLATE in where clause ************/

CREATE TABLE #tmp (pwd varchar(100) COLLATE SQL_Latin1_General_CP1_CI_AS not null)
INSERT
#tmp values ('@ssWord')
INSERT
#tmp values ('@ssword')
INSERT
#tmp values ('bit3mE')
INSERT
#tmp values ('bit3me')

SELECT * FROM #tmp
WHERE pwd = 'bit3me' COLLATE SQL_Latin1_General_CP1_CS_AS

DROP TABLE #tmp
GO

/******** COLLATION EXAMPLE 4.
Case-sensitive column and case-INsensitive COLLATE in where clause ************/

CREATE TABLE #tmp (pwd varchar(100) COLLATE SQL_Latin1_General_CP1_CS_AS not null)
INSERT
#tmp values ('@ssWord')
INSERT
#tmp values ('@ssword')
INSERT
#tmp values ('bit3mE')
INSERT
#tmp values ('bit3me')

SELECT * FROM #tmp
WHERE pwd = 'bit3me' COLLATE SQL_Latin1_General_CP1_CI_AS

DROP TABLE #tmp
GO