Total Pageviews

COLLATION in SQL SERVER


A collation specifies the bit patterns that represent each character and the rules by which characters are sorted and compared. The physical storage of character strings in Microsoft SQL Server is controlled by collations.

We can store objects that have different collations, in single database. Various collations can be specified down to the level of columns. Each column in a table can have different collations.

Each SQL Server collation specifies three properties:
  • The sort order to use for Unicode data types (nchar, nvarchar, and ntext). A sort order defines the sequence in which characters are sorted, and the way characters are evaluated in comparison operations.
  • The sort order to use for non-Unicode character data types (char, varchar, and text).
  • The code page used to store non-Unicode character data.
Before moving to depth, first we understand, what is code page?

A code page is a character set, not more than that or we can say, A code page, also known as a character set, is a set of 256 uppercase and lowercase letters, numbers, and symbols. The printable characters of the first 128 values are the same for all character set choices. The last 128 characters, sometimes referred to as extended characters, differ from set to set.

The code page for SQL Server determines both the types of characters that SQL Server recognizes in your database, and the results that SQL Server returns for queries that involve character comparisons.

As we all knows, in computer characters store on the basis of bit pattern (ON/OFF). There are 8 bits in a byte and 8 bits support 256 different pattern. There are 16 bits in 2 bytes, and 16 bits can be turned ON and OFF in 65,536 unique patterns. A program that uses 2 bytes to represent each character can represent up to 65,536 characters.

As single byte code page supports only 256 different bit patterns so Single-byte character sets (code page) are also inadequate to store all the characters used by many languages.

To resolve the character conversion and interpretation problem, Unicode Standards have been defined. Unicode uses two bytes to store each character. Because 65,536 characters are enough to cover all the commonly used characters from all the languages of the world. If all the computers and programs in a network use Unicode, there is no need for any character conversions, each user will see exactly the same characters as all other users, and no loss of characters will occur.

Collation should be selected as per the language of SQL Server instance. We should pick the collation that best support the requirements of language of SQL Server instance.

When you support users who speak multiple languages, it is most important to use the Unicode data types for all character data. Unicode was designed to eliminate the code page conversion difficulties of the non-Unicode data types. Collation still makes a difference when you implement all columns using Unicode data types because it defines the sort order for comparisons and sorts of Unicode characters.
Multiple collations can use the same code page for non-Unicode data.  Although all of these collations use the same set of bits to represent non-Unicode character data, the sorting and comparison rules they apply are slightly different to handle the dictionary definitions of the correct sequence of characters in the language or alphabet associated with the collation.

SQL Server offers the SQL_Latin1_General_CP1_CI_AS collation as the default collation for server installations.

Note that the code page of SQL Server is different from the code page of the operating system.






No comments:

Post a Comment