Friday, February 10, 2012

clustered index

I have question about clustered index.
If you look at share point portal server database you will see, that primary
key clustered index is composed of one unique identifier column and 2
columns of nvarchar(255) data type.
Isn't that too long clustered index?
I thought that clustered indexes should be as narrow as possible? For
example, identity column is perfect for that.
So, I would create identity clustered index and put other columns in
nonclustered index if needed.
Can someone explain me why is so extensive clustered index there?
Second question:
I always thought that every table should have primary key, but I found a lot
that there are tables without it.
Why?
I found a table, which had one unique index, 3 not unique indexes and no one
of them was clustered.
So table without primary key and without any clustered index. What could be
the benefit of that?
What is the difference if you have primary key (which by default creates
unique clustered index on columns of PK) or just unique clustered index
without primary key?
I think that this is pretty the same.
Thank you for some explanation about that,
SimonI wouldn't use the SharePoint database as an example of how to best do
things in your db. First off I don't agree with some of their techniques and
you always have to understand what they are using the indexes for and how
they use the data. This is different for each application. Usually wide
clustered indexes are discouraged but that does not mean they can not be
used effectively in some cases. And in general every table should have a
clustered index but there will always be exceptions. The point is to not
look at someother application as an example of what you should do without
understanding why they did it. Try to understand what a clustered index is
best for and apply that to your schema the way that makes the most sense.
This might help:
http://www.sql-server-performance.c...red_indexes.asp
As for PK vs. unique you should always have a PK constraint defined. The
main difference physically between the two is that a PK can not have any
nulls where as a Unique can have one.
Andrew J. Kelly SQL MVP
"simon" <simon.zupan@.iware.si> wrote in message
news:fJ_Sf.996$oj5.378456@.news.siol.net...
>I have question about clustered index.
> If you look at share point portal server database you will see, that
> primary key clustered index is composed of one unique identifier column
> and 2 columns of nvarchar(255) data type.
> Isn't that too long clustered index?
> I thought that clustered indexes should be as narrow as possible? For
> example, identity column is perfect for that.
> So, I would create identity clustered index and put other columns in
> nonclustered index if needed.
> Can someone explain me why is so extensive clustered index there?
> Second question:
> I always thought that every table should have primary key, but I found a
> lot that there are tables without it.
> Why?
> I found a table, which had one unique index, 3 not unique indexes and no
> one of them was clustered.
> So table without primary key and without any clustered index. What could
> be the benefit of that?
> What is the difference if you have primary key (which by default creates
> unique clustered index on columns of PK) or just unique clustered index
> without primary key?
> I think that this is pretty the same.
> Thank you for some explanation about that,
> Simon
>|||If there's anyone that knows differently, please feel free to correct
me.
DDL:
CREATE TABLE Transactions
(
TransactionClass NVARCHAR(255)
, TransactionDate DATETIME NOT NULL
, ...<more columns>
)
CREATE CLUSTERED INDEX IXC_TransactionClass_TransactionDate ON
Transactions(TransactionClass, TransactionDate)
In certain cases, it may be beneficial to have a wide clustered index.
Lets say you have a table with billions of rows. Each row represents
data specific to a transaction. (notice that I'm carefully avoiding
using the term "record" here as whether the data included in the row
constitutes a complete record is irrelevant for the example).
Continuing, lets say there are hundreds of thousands of distinct values
in the TransactionClass column and you frequently look up data based on
this column. And rarely, if ever, research based on data in any other
column, it may be beneficial to cluster based on the TransactionClass
column. (other readers, please correct me here if I'm wrong) The width
of your clustered index only becomes a factor when you have other
nonclustered indexes defined, as the clustered index is included as the
row pointer in each row of your nonclustered index. In short, if you
have the following schema:
CREATE TABLE t
(
a DATATYPE_A
, b DATATYPE_B
, c DATATYPE_C
, d DATATYPE_D
)
If you cluster on a and DATATYPE_A is long, but you have no
nonclustered indexes defined, there is little implication. However,
when you add a nonclustered index, say on b and c, your index can be
thought of as a table with the following schema:
CREATE TABLE index_t_b_c
(
b DATATYPE_B
, c DATATYPE_C
, a DATATYPE_A
)
as [a] must be included with your nonclustered index. The implications
here are as follows:

>From my understanding, when SQL Server fetches data from disk, it
fetches one 64KB extent at a time. When your rows are narrower, the
fetch will return more rows than if they're wider. If your clustered
key is 90 bytes and your index row is 100 bytes (lets forget about
other data overhead for the time being), then 90% of your index row is
the clustered key. If your clustered key is 4 bytes (INT), then only
4/14, or about 30% of your index is your clustered key, meaning that
with each physical fetch from disk, seven times as many index entries
may be fetched, dramatically increasing performance. When nonclustered
indexes don't exist on your table, this doesn't matter.
-Alan

No comments:

Post a Comment