Total Pageviews

COLLATION in SQL SERVER


A collation specifies the bit patterns that represent each character and the rules by which characters are sorted and compared. The physical storage of character strings in Microsoft SQL Server is controlled by collations.

We can store objects that have different collations, in single database. Various collations can be specified down to the level of columns. Each column in a table can have different collations.

Each SQL Server collation specifies three properties:
  • The sort order to use for Unicode data types (nchar, nvarchar, and ntext). A sort order defines the sequence in which characters are sorted, and the way characters are evaluated in comparison operations.
  • The sort order to use for non-Unicode character data types (char, varchar, and text).
  • The code page used to store non-Unicode character data.
Before moving to depth, first we understand, what is code page?

A code page is a character set, not more than that or we can say, A code page, also known as a character set, is a set of 256 uppercase and lowercase letters, numbers, and symbols. The printable characters of the first 128 values are the same for all character set choices. The last 128 characters, sometimes referred to as extended characters, differ from set to set.

The code page for SQL Server determines both the types of characters that SQL Server recognizes in your database, and the results that SQL Server returns for queries that involve character comparisons.

As we all knows, in computer characters store on the basis of bit pattern (ON/OFF). There are 8 bits in a byte and 8 bits support 256 different pattern. There are 16 bits in 2 bytes, and 16 bits can be turned ON and OFF in 65,536 unique patterns. A program that uses 2 bytes to represent each character can represent up to 65,536 characters.

As single byte code page supports only 256 different bit patterns so Single-byte character sets (code page) are also inadequate to store all the characters used by many languages.

To resolve the character conversion and interpretation problem, Unicode Standards have been defined. Unicode uses two bytes to store each character. Because 65,536 characters are enough to cover all the commonly used characters from all the languages of the world. If all the computers and programs in a network use Unicode, there is no need for any character conversions, each user will see exactly the same characters as all other users, and no loss of characters will occur.

Collation should be selected as per the language of SQL Server instance. We should pick the collation that best support the requirements of language of SQL Server instance.

When you support users who speak multiple languages, it is most important to use the Unicode data types for all character data. Unicode was designed to eliminate the code page conversion difficulties of the non-Unicode data types. Collation still makes a difference when you implement all columns using Unicode data types because it defines the sort order for comparisons and sorts of Unicode characters.
Multiple collations can use the same code page for non-Unicode data.  Although all of these collations use the same set of bits to represent non-Unicode character data, the sorting and comparison rules they apply are slightly different to handle the dictionary definitions of the correct sequence of characters in the language or alphabet associated with the collation.

SQL Server offers the SQL_Latin1_General_CP1_CI_AS collation as the default collation for server installations.

Note that the code page of SQL Server is different from the code page of the operating system.






Data Models in SQL Server


Data Models

A data model is a collection of concepts that can be used to describe the structure of a database. Or, we can say, Data model is a way of storing and retrieving the data.  There are different data models.  Data models differ in the way they allow users to view and manipulate relationships between entities. Each has its own way of storing the data. The following are the different data models:

The hierarchical model
The data is sorted hierarchically, using a downward tree. This model uses pointers to navigate between stored data. It was the first DBMS model.
In other words, In this model, data is stored in the form of a tree.  The data is represented by parent-child relationship. Each tree contains a single root record and one or more subordinate records.
This model supports only one-to-many relationship between entities.




The network model
Like the hierarchical model, this model uses pointers toward stored data. However, it does not necessarily use a downward tree structure.
Data is stored along with pointers, which specify the relationship between entities. This model is complex. It is difficult to understand both the way data is stored and the way data is manipulated.  It is capable of supporting many-to-many relationship between entities, which hierarchical model doesn’t. 



The relational model (RDBMS)
This stores data in the form of two-dimensional table. Table is a collection of rows and columns. The data is manipulated based on the relational theory of mathematics.



The deductive model
Data is represented as a table, but is manipulated using predicate calculus.

The object model (ODBMS, object-oriented database management system)
The data is stored in the form of objects, which are structures called classes that display the data within. The fields are instances of these classes.











Unicode / Non-Unicode in SQL Server


Traditional non-Unicode data types in Microsoft® SQL Server™ allow the use of characters that are defined by a particular character set. A character set is chosen during SQL Server Setup and cannot be changed. Using Unicode data types, a column can store any character defined by the Unicode Standard, which includes all of the characters defined in the various character sets. Unicode data types take twice as much storage space as non-Unicode data types.

Unicode data is stored using the nchar, nvarchar, and ntext data types in SQL Server. Use these data types for columns that store characters from more than one character set.

The SQL Server Unicode data types are based on the National Character data types in the SQL-92 standard. SQL-92 uses the prefix character "n" to identify these data types and values.

Use of nchar, nvarchar, and ntext is the same as char, varchar, and text, respectively, except that:
  • Unicode supports a wider range of characters.
  • More space is needed to store Unicode characters.
  • The maximum size of nchar and nvarchar columns is 4,000 characters, not 8,000 characters like char and varchar.
  • Unicode constants are specified with a leading N: N'A Unicode string'.
  • All Unicode data uses the same Unicode code page. Collations do not control the code page used for Unicode columns, only attributes such as comparison rules and case sensitivity.
Unicode is best suited for systems that need to support at least one foreign language: "The Unicode specification defines a single encoding scheme for most characters widely used in businesses around the world. All computers consistently translate the bit patterns in Unicode data into characters using the single Unicode specification. This ensures that the same bit pattern is always converted to the same character on all computers. Data can be freely transferred from one database or computer to another without concern that the receiving system will translate the bit patterns into characters incorrectly.



Types of Table Hints in SQL Server


In my previous post, we have gone through about the basic of SQL Server Hints. Now here we are moving with type of Table Hints.
There are various types of Table Hints. Some of the important and useful for day to day basis are as below:-

NOEXPAND:-
Specifies that any indexed views are not expanded to access underlying tables when the query optimizer processes the query. The query optimizer treats the view like a table with clustered index. NOEXPAND applies only to indexed views.
Normal views have to be expanded because they don't actually stored any data, they're just saved select statements.

INDEX (index_value [,... n ] ) | INDEX = ( index_value):-
Index hints (a form of a table hint) are used to specify which index or indexes you want used when a query runs. When you specify one or more indexes, the Query Optimizer will use them when creating the execution plan for your query. Sometimes the SQL Server query optimizer does not pick the best index to use in a join operation.  This can be caused by a multitude of factors (out of date statistics, wrong indexing strategy for a table, etc) ... for what ever reason it happens, there is a way to force SQL Server to pick the correct index at runtime.  This is called an INDEX HINT.

NOLOCK:-
This hint allows queries to run even while updates are being performed on the tables you want to view.  It allows you to do what is called a “Dirty Read” on the data.  This term is used because you could potentially view data that has not been committed yet and could get rolled back.  We use this hint to avoid blocking on our site.

TABLOCK:-
Specifies that the acquired lock is applied at the table level. The type of lock that is acquired depends on the statement being executed. For example, a SELECT statement may acquire a shared lock. By specifying TABLOCK, the shared lock is applied to the entire table instead of at the row or page level. If HOLDLOCK is also specified, the table lock is held until the end of the transaction.

TABLOCKX:-
The TABLOCKX hint specifies that an exclusive lock will be taken on the table until the end of the Transact-SQL statement or the end of the transaction. The TABLOCKX provides the worst concurrency.




PARAMETERIZED CODE IN EXECUTE SQL TASK


We have gotten the introduction about Execute SQL Task in my previous post. Now here we look into, how to handle parameterized code in Execute SQL Task.

The Execute SQL Task supports, below parameter types :-
  • Input :- for input param
  • Output:- for output param
  • ReturnValue:- for return codes.
Parameters can be mapped to user defined variables or system variables.

PARAMETER NAME AND MARKER: -

Parameter name and marker used to set the mapping between parameter and variables. It depends on the used Connection and Connection Manager.

In the case of EXCEL and OLEDB connection type, 

  • We use “?” as parameter marker. For example :-

    SELECT *
    FROM tbl1
    WHERE name = ?
    • To provide values to parameters marker, variables are mapped to parameter names in parameter mapping tab of Execute SQL Task. Then, the Execute SQL task uses the ordinal value of the parameter name (OR we can say Ordinal Value of parameter marker) in the parameter list to load values from variables to parameters.



    Connection Type
    Parameter Marker
    Parameter Name
    Example
    Excel & OLE DB
    ?
    0,1,2,3...
    SELECT *
    FROM tbl
    WHERE name = ?

    EXCEL, OLE DB connection managers require that the SQL command use question marks (?) as parameter markers and 0-based numeric values as parameter names.

    USING PARAMETER IN “WHERE” CLAUSE:-

    We use WHERE clause with SELECT, INSERT, UPDATE and DELETE commands to specify filters that define the conditions each row in the source tables must meet to qualify for an SQL command. Parameters provide the filter values in the WHERE clauses.

    We can use parameter marker to dynamically provide the parameter values, for example :-

    SELECT *
    FROM tbl
    WHERE name = ?
    AND city = ?

    In the above example, parameter name should be 0 and 1 respectively in the case of OLE DB and EXCEL connection.

    USING PARAMETERS WITH STORED PROCEDURE:-

    We can also use parameter mapping to execute stored procedure. For example,

    As an input parameter in the case of Excel and OLE DB connection:-

    EXEC proc_Name ?

    As an output parameter in the case of Excel and OLE DB connection:-

    EXEC proc_Name ? OUTPUT

    GETTING VALUES OF RETURNCODE: -

    A stored procedure can return an integer value, called a return code, to indicate the execution status of a procedure. To implement return codes in the Execute SQL task, we have to use parameters of the ReturnValue type.

    Example of input parameter in the case of EXCEL AND OLE DB connection: -

    EXEC ? = proc_Name 1






    Types of Query Hints in SQL Server


    In my previous post, we have gone through about the basic of SQL Server Hints. Now here we are moving with type of Query Hints.

    There are various types of Query Hints. Some of the important and useful for day to day basis are as below:-


    • FAST number_rows:- Specifies that the query is optimized for fast retrieval of the first number_rows. This is a nonnegative integer. After the first number_rows are returned, the query continues execution and produces its full result set.

    • FORCE ORDER:-Normally, the optimizer will determine the order in which the joins occur but using the FORCE ORDER   hint you can make the optimizer use the order of joins as listed in the query itself. This would be done if you've got a fairly high degree of certainty that your join order is better than that supplied by the optimizer. The optimizer can make incorrect choices when the statistics are not up to date, when the data distribution is less than optimal or if the query has a high degree of complexity.
    • KEEP PLAN:- Forces the query optimizer to relax the estimated recompile threshold for a query. The estimated recompile threshold is the point at which a query is automatically recompiled when the estimated number of indexed column changes have been made to a table by running UPDATE, DELETE, MERGE, or INSERT statements. Specifying KEEP PLAN makes sure that a query will not be recompiled as frequently when there are multiple updates to a table.
    • KEEPFIXED PLAN:- It eliminates any recompile due to changes in statistics.

    • MAXRECURSION number:- The MAXRECURSION   hint places an upper limit on the number of recursions within a query.Valid values are between 0 and 32,767. Setting the value to zero allows for infinite recursion. The default number of recursions is 100. When the number is reached, an error is returned and the recursive loop is exited. This will cause any open transactions to be rolled back. Using the option doesn't change the execution plan but, because of the error, an actual execution plan might not be returned.

    • RECOMPILE:- Instructs the SQL Server Database Engine to discard the plan generated for the query after it executes, forcing the query optimizer to recompile a query plan the next time the same query is executed. Without specifying RECOMPILE, the Database Engine caches query plans and reuses them. 







    Type of Join Hints in SQL Server

    In my previous post, we have gone through about the basic of SQL Server Hints. Now here we are moving with type of Join Hints.

    1. LOOP:- Loop Join also known as "NESTED JOIN". This type of join made up by an outer loop and an inner loop. When the query runs for each row of the outer loop, the inner loop is executed completely. This join is effective only when the outer loop query is small and the inner loop query has all the proper optimizations applied. This join method is very useful with small transactions.

    2. HASH: - Hash Join works with large data set. It does not require ordered result set to join. Hash join requires equijoin predicate to join tables. Equijoin predicate is comparing values between one table to other table using “equals to” (“=”) operator. Hash join gives best performance when two more join tables are joined and at-least one of them have no index or is not sorted. It is also expected that smaller of the either of table can be read in memory completely (though not necessary).

      The query optimizer makes a Hash join in two phases: build and probe. So, Hash join has two inputs: the build input and the probe input.

    3. MERGE: - The merge join requires that both inputs be sorted on the merge columns.This join keeps both of the tables sorted in parallel and compares each table row by row simultaneously with each other. It compares one row of the first table with one row of the second table. If they are equal, that row qualifies; otherwise, whichever row has the lower value is discarded and another row is obtained from that input. This process repeats until all rows have been processed.

      Merge join itself is very fast, but it can be an expensive choice if sort operations are required. However, if the data volume is large and the desired data can be obtained pre-sorted from existing B-tree indexes, merge join is often the fastest available join algorithm.

    4. REMOTE: - Specifies that the join operation is performed on the site of the right table. This is useful when the left table is a local table and the right table is a remote table. REMOTE should be used only when the left table has fewer rows than the right table.

      If the right table is local, the join is performed locally. If both tables are remote but from different data sources, REMOTE causes the join to be performed on the site of the right table. If both tables are remote tables from the same data source, REMOTE is not required.

      REMOTE cannot be used when one of the values being compared in the join predicate is cast to a different collation using the COLLATE clause.

      REMOTE can be used only for INNER JOIN operations.



    Hints in SQL Server


    Hints are the option, specified for enforcement by the SQL Server query processor on SELECT, INSERT, UPDATE or DELETE statements.

    Note: - SQL Server Query optimizer is a very smart tool and it makes a best selection of execution plan. Suggesting hints to the Query Optimizer should be attempted when absolutely necessary and by experienced developers who know exactly what they are doing

    Type of Hints: -

    • Join Hints: - This hint is used when more than one table is used in a query. Two or more tables can be joined using different kinds of joins. This hint forces the type of join algorithm that is used between two tables. Join hints are specified in the FROM clause of a query. Joins can be used in SELECT, UPDATE and DELETE statements.
    • Query Hints: - Query hints specify that the indicated hints should be used throughout the query. Any hint used in the query is applied to the complete query, as opposed to part of it. There is no way to specify that only a certain part of a query should be used with the hint. After any query, the OPTION clause is specified to apply the logic to this query.
      If one or more query hints cause the query optimizer not to generate a valid plan, error 8622 is raised.
      Query Hints applies to INSERT, UPDATE, DELETE, SELECT, MERGE statements.
    • Table Hints: - This hint is used when certain kind of locking mechanism of tables has to be controlled. Table hints are specified in the FROM clause of the DML statement and affect only the table or view referenced in that clause. It applies to INSERT, UPDATE, DELETE, SELECT and MERGE.