Friday, February 10, 2012

Clustered Index

We are currently addressing many performance issues in an existing
application. One of the items we are investigating is adding clustered
indexes to tables that do not have them. Currently, about 90 tables out of
200 don't have clustered indexes. The long-term solution is to analyze each
table and determine what the best clustered index is. As a shorter-term
solution I've done some analysis to determine what some of the best
candidates would be by writing a query to analyze primary keys, identity
columns, and unique indexes. The problem is that many of these tables have
none of those (primary keys, identity columns, or unique indexes).
With the understanding that this database has much to be redesigned (meaning
I'm not currently looking for comments on bad design), this is my question:
What are the benefits / divantages of adding an IDENTITY column and a
CLUSTERED INDEX on that identity column to every table that does not have a
clustered index, primary key, identity column, and unique index. I'm mainly
looking for insight into any divantages or problems this will cause. The
reason for the identity column is that it would be a "safe" column to add
from an application perspective and would be a decent candidate for the
clustered index.
I'm very hesitant about doing this blindly and am sing to find out if its
feasible in the short-term to gain performance by eliminating heaps in this
manner without introducing other problems.
Thanks,
Mike Jansen> What are the benefits / divantages of adding an IDENTITY column and a
> CLUSTERED INDEX on that identity column to every table that does not have
> a clustered index, primary key, identity column, and unique index.
Why do your tables not have keys?
If the column is not there for any good reason, I see no advantage in adding
it. But I really am curious why you have enough tables without the above
elements where this is even a concern or some desperate grasp at using a
jackhammer to hang a picture...|||> What are the benefits / divantages of adding an IDENTITY column and a
> CLUSTERED INDEX on that identity column to every table that does not have
a
> clustered index, primary key, identity column, and unique index. I'm main
ly
> looking for insight into any divantages or problems this will cause. T
he
> reason for the identity column is that it would be a "safe" column to add
> from an application perspective and would be a decent candidate for the
> clustered index.
The only benefit will be that using an identity column will avoid page split
during insert operations, but from the point of view of query performance,
only you can know. You should analyze every posible query you execute agains
t
the table and from there, spot possible columns. Preferable those used in
range queries, "group by" and "sort by" operations.
Tips on Optimizing SQL Server Clustered Indexes
http://www.sql-server-performance.c...red_indexes.asp
AMB
"Mike Jansen" wrote:

> We are currently addressing many performance issues in an existing
> application. One of the items we are investigating is adding clustered
> indexes to tables that do not have them. Currently, about 90 tables out o
f
> 200 don't have clustered indexes. The long-term solution is to analyze ea
ch
> table and determine what the best clustered index is. As a shorter-term
> solution I've done some analysis to determine what some of the best
> candidates would be by writing a query to analyze primary keys, identity
> columns, and unique indexes. The problem is that many of these tables hav
e
> none of those (primary keys, identity columns, or unique indexes).
> With the understanding that this database has much to be redesigned (meani
ng
> I'm not currently looking for comments on bad design), this is my question
:
> What are the benefits / divantages of adding an IDENTITY column and a
> CLUSTERED INDEX on that identity column to every table that does not have
a
> clustered index, primary key, identity column, and unique index. I'm main
ly
> looking for insight into any divantages or problems this will cause. T
he
> reason for the identity column is that it would be a "safe" column to add
> from an application perspective and would be a decent candidate for the
> clustered index.
> I'm very hesitant about doing this blindly and am sing to find out if i
ts
> feasible in the short-term to gain performance by eliminating heaps in thi
s
> manner without introducing other problems.
> Thanks,
> Mike Jansen
>
>|||> Why do your tables not have keys?
You would have to ask people who left before I started.

> If the column is not there for any good reason, I see no advantage in
> adding it. But I really am curious why you have enough tables without the
> above elements where this is even a concern or some desperate grasp at
> using a jackhammer to hang a picture...
Adding the identity column has no other purpose other than to create a
decent candidate for a clustered index (because its an always-increasing
value and won't be changed) where time is lacking to determine a better one.
The idea I'm trying to get valid feedback on is: Generally speaking, does
adding an identity column and a clustered index on that identity column give
me a performance gain over a heap without introducing any significant
problems? And if it does introduce problems, what might they be? The
assumption being just having a clustered index on a table will perform
better on SELECTs than SELECTs on a heap. Adding the identity column and
clustering on it is to avoid performance problems during INSERTs or UPDATEs
from a poorly chosen clustering index without having to do the full analysis
of all 50 tables. I realize that better analysis will yield a better
clustering index. So I'm not looking for comments on that. I'm looking for
input on whether this will truly be better performing than the heaps (which
I believe is "yes") and are there any drawbacks or divantages (apart
from the extra disk space for the identity column).
This is just a small piece of short-to-mid-term performance improvements.
The mid-to-long-term improvements include more drastic analysis and
redesign. I can't satisfy your curiosity about the origin of these
problems, because I don't know them.
Thanks for any constructive help,
Mike|||> The idea I'm trying to get valid feedback on is: Generally speaking, does
> adding an identity column and a clustered index on that identity column
> give me a performance gain
A gain WHERE? If it's just a heap, then I don't know whether your INSERT
performance will improve, since it shouldn't have much effect on the
physical location of new rows. And I don't know what your queries look
like, so it's tough to comment on that as well. If you are doing a lot of
range queries on dates, for example, it would make much more sense for the
clustered index to be on the column(s) with date-related data.
I would say that the question is very difficult to answer "in general." It
seems like a very haphazard approach to me, and you will probably do better
to spend a day or two analyzing the impact of adding sensible keys
(regardless of whether they are natural or surrogates like identity), where
clustered indexes should be, and why -- e.g. what is each table being used
for, how high is the traffic, and what kind of queries are having
performance issues. If you give it a workload, the index tuning wizard
should give a better suggestion than just a blanket "throw identities on all
tables."
A|||Thanks Alejandro. The identity column was generically chosen to avoid
performance hits during INSERTs and UPDATEs. We aren't looking for the
optimal query performance boost in this but simply any amount of boost.
Getting the optimal boost will take more time to analyze than we have for
the short-term. The optimal boost will come later when we take the time to
do a more detailed analysis.
My main concerns are 1) that we are actually getting a performance boost on
SELECTs just by adding a clustered index (compared to having heap) and 2)
that we aren't introducing anything negative because of the clustered index
being clustered on an identity column created solely for the purpose of the
clustered index.
Thanks,
Mike|||You didn't mention the method you are currently using to determine an
indexing strategy. You can try to infer this by examing the SQL selects, but
using the Show Execution Plan option of Query Analyzer can be more
revealing. Be sure you understand what a clustered index is and how it
affects the physical sorting (or re-sorting) of pages in the table.
"Mike Jansen" <mjansen_nntp@.mail.com> wrote in message
news:%23SSNMtfkFHA.2484@.TK2MSFTNGP15.phx.gbl...
> We are currently addressing many performance issues in an existing
> application. One of the items we are investigating is adding clustered
> indexes to tables that do not have them. Currently, about 90 tables out
> of 200 don't have clustered indexes. The long-term solution is to analyze
> each table and determine what the best clustered index is. As a
> shorter-term solution I've done some analysis to determine what some of
> the best candidates would be by writing a query to analyze primary keys,
> identity columns, and unique indexes. The problem is that many of these
> tables have none of those (primary keys, identity columns, or unique
> indexes).
> With the understanding that this database has much to be redesigned
> (meaning I'm not currently looking for comments on bad design), this is my
> question:
> What are the benefits / divantages of adding an IDENTITY column and a
> CLUSTERED INDEX on that identity column to every table that does not have
> a clustered index, primary key, identity column, and unique index. I'm
> mainly looking for insight into any divantages or problems this will
> cause. The reason for the identity column is that it would be a "safe"
> column to add from an application perspective and would be a decent
> candidate for the clustered index.
> I'm very hesitant about doing this blindly and am sing to find out if
> its feasible in the short-term to gain performance by eliminating heaps in
> this manner without introducing other problems.
> Thanks,
> Mike Jansen
>|||Hey Mike,
Before you do anything in the short term, you need to sit down and
write out a long-term plan. I realize that you (like most of us) are
pressed for time, and are trying to solve the problem as quickly as you
can to move on to bigger issues. However, if you're honest with
yourself, you'd probably admit that many short-term solutions NEVER get
revisited.
I'm not saying that a short-term solution is not a good idea (sometimes
you gotta do what you gotta do); I am saying that you should make sure
that it is a short-term solution. Six months from now, if your shim is
still holding up the table, you may never get around to fixing the
problem. Have a plan to address the problem, even if you can't get to
it today; set a date for fixing the problem. (BTW, I'm lecturing
myself just as much as I'm lecturing you).
In my experience, any table of a reasonable size (and that definition
varies based on performance) will benefit from a clustered index. If
you're doing a lot of INSERTS and UPDATES, the clustered index belongs
on a monotically increasing column (including IDENTITY values, but
timestamps or DateOfInsert columns might make more sense). If your
data is relatively static (lookup values, etc), then a clustered index
will make more sense on columns where you are retrieving a large range
of data. Regardless of where you place it, most tables benefit from
it.
http://msdn.microsoft.com/library/d...>
_05_5h6b.asp
HTH,
Stu|||Mike,
If there are many deletes, then over time a Heap can take up too much
space, which can hurt Select performance. This is something a Clustered
Index can prevent.
Other than that, I can't see how the performance would increase, other
than purely accidental. Without keys, and without analysing the queries,
you are only guessing. And yes, you can hurt performance if you do not
choose the clustered index correctly.
By the way: although a unique index is preferred, the clustered index
does not have to be unique. So I would definitely NOT add an Identity
column. If you have no clue, and there are already indexes on the table,
then you could choose the narrowest nonclustered index and promote it to
be the clustered index. If there are no existing indexes on the table,
you could create the clustered index on the smallest column with the
most distinct values.
But just like everyone is telling you: the real solution is to create a
well thought out, properly normalized data model, with proper keys and
relations. When that is in place, you might not even have to add any
indexes, other than indexed on the foreign key constraints.
HTH,
Gert-Jan
Mike Jansen wrote:
> We are currently addressing many performance issues in an existing
> application. One of the items we are investigating is adding clustered
> indexes to tables that do not have them. Currently, about 90 tables out o
f
> 200 don't have clustered indexes. The long-term solution is to analyze ea
ch
> table and determine what the best clustered index is. As a shorter-term
> solution I've done some analysis to determine what some of the best
> candidates would be by writing a query to analyze primary keys, identity
> columns, and unique indexes. The problem is that many of these tables hav
e
> none of those (primary keys, identity columns, or unique indexes).
> With the understanding that this database has much to be redesigned (meani
ng
> I'm not currently looking for comments on bad design), this is my question
:
> What are the benefits / divantages of adding an IDENTITY column and a
> CLUSTERED INDEX on that identity column to every table that does not have
a
> clustered index, primary key, identity column, and unique index. I'm main
ly
> looking for insight into any divantages or problems this will cause. T
he
> reason for the identity column is that it would be a "safe" column to add
> from an application perspective and would be a decent candidate for the
> clustered index.
> I'm very hesitant about doing this blindly and am sing to find out if i
ts
> feasible in the short-term to gain performance by eliminating heaps in thi
s
> manner without introducing other problems.
> Thanks,
> Mike Jansen|||Apologies if a different version gets posted; for some reason my
earlier post went off into the ether.
I understand your desire to quickly fix the problem so you can move on
to bigger and better things, but a first step to fixing this is to
write out a plan with definitive dates for fixing all of the
performance issues. Don't let this short-term fix become a permanent
part of your solution (again, I'm lecturing myself just as much as I'm
lecturing you).
I have never encountered a situation where a table of reasonable size
could not benefit from a clustered index; putting the index in the
wrong spot could be bad, but in most cases, a clustered index will
improve performance. If you're doing a lot of INSERTS and UPDATES,
then placing a clustered index on a monotonically increasing value (say
a DateEntered field or DateLoaded or a Timestamp) is a good idea to
avoid page splits. If you place a clustered index on a randomly loaded
field (say a UNIQUEIDENTIFIER or a varchar), you run the risk of
fragmentation. If your data doesn't change that much, the risk is
probably acceptable.
Note that non-clustered indexes include pointers to a clustered index,
so changing the clustered index can have downstream effects on your
nonclustered indexing solution. If you have a lot of non-clustered
indexes (which is likely, given the unstructured "design" you
inherited; been there, done that), be prepared to wait a while as they
get rebuilt when you add a clustered index.
HTH,
Stu

No comments:

Post a Comment