Clustered DB Server: Clustered Index on Non-Unique Column

I've done a little research but I'd like to get some opinion from the group.
I'll also preface this by saying that we can't change how it was done in the
first place, only investigate how to fix it if possible.
Let's say you've got a table that has 4,333,776 rows. Currently that table
has a PK that is functioning as a clustered index(it's the identity). You'v
e
then got two more indexes that were placed on this table that are not
clustered.
The problem is that the clustered index is never used, and it appears that
if we moved the clustered to one of the other keys we may gain some
performance. The problem is that the column that is preferrable isn't unique
so SQL Server will add the "uniqueifer" to it. I've checked on the
selectivity ratio of the column and it's currently at .81. Not sure if
that's .81% or 81%. If it's the latter I wonder if it's close enough to the
sometimes suggested 95% unique.
Any thoughts? Any more details I can give to make this more helpful?Is the clustered index unique? If not, why not? Also, what is the PK?
Have you got the complete build script for the table, including indexes and
constraints?
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
SQL Server MVP
Toronto, ON Canada
https://mvp.support.microsoft.com/profile/Tom.Moreau
"jason7655" <jason7655@.discussions.microsoft.com> wrote in message
news:036D72E7-F282-4316-AA6B-53F555B9DF19@.microsoft.com...
I've done a little research but I'd like to get some opinion from the group.
I'll also preface this by saying that we can't change how it was done in the
first place, only investigate how to fix it if possible.
Let's say you've got a table that has 4,333,776 rows. Currently that table
has a PK that is functioning as a clustered index(it's the identity).
You've
then got two more indexes that were placed on this table that are not
clustered.
The problem is that the clustered index is never used, and it appears that
if we moved the clustered to one of the other keys we may gain some
performance. The problem is that the column that is preferrable isn't unique
so SQL Server will add the "uniqueifer" to it. I've checked on the
selectivity ratio of the column and it's currently at .81. Not sure if
that's .81% or 81%. If it's the latter I wonder if it's close enough to the
sometimes suggested 95% unique.
Any thoughts? Any more details I can give to make this more helpful?|||It does sound like the clustered index is being wasted on field that doesn't
usualy get searched.
Clustered indexes are great for range queries. Say you wanted to get a list
of records from 07/010/2007 to 07/31/2007, having a clueterd index on the
date field would be great.
Nonclustered indexes are better for queries that return one or a few records
.
--
MG
"jason7655" wrote:

> I've done a little research but I'd like to get some opinion from the grou
p.
> I'll also preface this by saying that we can't change how it was done in t
he
> first place, only investigate how to fix it if possible.
> Let's say you've got a table that has 4,333,776 rows. Currently that tabl
e
> has a PK that is functioning as a clustered index(it's the identity). You
've
> then got two more indexes that were placed on this table that are not
> clustered.
> The problem is that the clustered index is never used, and it appears that
> if we moved the clustered to one of the other keys we may gain some
> performance. The problem is that the column that is preferrable isn't uniq
ue
> so SQL Server will add the "uniqueifer" to it. I've checked on the
> selectivity ratio of the column and it's currently at .81. Not sure if
> that's .81% or 81%. If it's the latter I wonder if it's close enough to th
e
> sometimes suggested 95% unique.
> Any thoughts? Any more details I can give to make this more helpful?|||jason7655 wrote:
> I've done a little research but I'd like to get some opinion from the grou
p.
> I'll also preface this by saying that we can't change how it was done in t
he
> first place, only investigate how to fix it if possible.
> Let's say you've got a table that has 4,333,776 rows. Currently that tabl
e
> has a PK that is functioning as a clustered index(it's the identity). You
've
> then got two more indexes that were placed on this table that are not
> clustered.
> The problem is that the clustered index is never used, and it appears that
> if we moved the clustered to one of the other keys we may gain some
> performance. The problem is that the column that is preferrable isn't uniq
ue
> so SQL Server will add the "uniqueifer" to it. I've checked on the
> selectivity ratio of the column and it's currently at .81. Not sure if
> that's .81% or 81%. If it's the latter I wonder if it's close enough to th
e
> sometimes suggested 95% unique.
> Any thoughts? Any more details I can give to make this more helpful?
Why is it a problem that the clustered index is never used?
Anyway, if the clustered index is not unique, then a uniqueifier is only
added to "duplicate" rows. For those rows, the key requires more
storage. Depending on your table structure it will require anywhere
between 6 and 8 bytes more per row. This extra space might not be a
problem for you, but that is something for you to decide.
Another thing to remember is that the clustered index keys are also
stored in all nonclustered indexes. So a narrow clustered index key
(such as int) is preferred over a wide index key. If you choose a very
wide clustered index key, then you other nonclustered index will grow a
lot (and slow down just as much).
HTH,
Gert-Jan|||PK = Primary Key
Let me try to explain a little better.
This table has 3 indexes.
Index 1
Primary key, clustered. 1,2,3,4,5,etc.
Index 2
non-clustered. Use 2nd most.
Index 3
non-clustered, but the main column used by most queries. Used the most.
From my limited reading of sites (including
http://www.sql-server-performance.c...ndexes_p1.aspx,
http://www.sql-server-performance.c...ndexes_p2.aspx, and
various other documents) it would seem that we could gain performance by
changing the Index 3 to clustered.
It's not unique and that raises another question. I would hope that the SQL
Server "uniquefier" that gets added would not change the makeup of the
current data, only create a new column.
Does that make things more clear? Is this a good idea?
"Gert-Jan Strik" wrote:

> jason7655 wrote:
> Why is it a problem that the clustered index is never used?
> Anyway, if the clustered index is not unique, then a uniqueifier is only
> added to "duplicate" rows. For those rows, the key requires more
> storage. Depending on your table structure it will require anywhere
> between 6 and 8 bytes more per row. This extra space might not be a
> problem for you, but that is something for you to decide.
> Another thing to remember is that the clustered index keys are also
> stored in all nonclustered indexes. So a narrow clustered index key
> (such as int) is preferred over a wide index key. If you choose a very
> wide clustered index key, then you other nonclustered index will grow a
> lot (and slow down just as much).
> HTH,
> Gert-Jan
>|||I've also found the following snippet that has gotten me a little confused
and seems to go counter to other things I've read:
"Think of a clustered index as the foundation of the table. Keep it simple
and clean. The point is that if you add non-clustered indexes to the table,
they repeat the clustered index and add the indexed column(s). So ideally
your clustered index should be on a single column.
Ideally, a clustered index should be on a column where new rows have a
steadily increasing value. Otherwise the clustered index will get fragmented
,
and so will the non-clustered indexes based on it.
In short, for a table that sees lots of inserts, use only the identity
column for the clustered index."
(http://sql-server-performance.com/C...3961.aspx#53961)
I could use some clarification on this one.
"jason7655" wrote:
[vbcol=seagreen]
> PK = Primary Key
> Let me try to explain a little better.
> This table has 3 indexes.
> Index 1
> Primary key, clustered. 1,2,3,4,5,etc.
> Index 2
> non-clustered. Use 2nd most.
> Index 3
> non-clustered, but the main column used by most queries. Used the most.
> From my limited reading of sites (including
> http://www.sql-server-performance.c...ndexes_p1.aspx,
> http://www.sql-server-performance.c...ndexes_p2.aspx, and
> various other documents) it would seem that we could gain performance by
> changing the Index 3 to clustered.
> It's not unique and that raises another question. I would hope that the SQ
L
> Server "uniquefier" that gets added would not change the makeup of the
> current data, only create a new column.
> Does that make things more clear? Is this a good idea?
> "Gert-Jan Strik" wrote:
>|||Jason,
Finding rows by seeking the clustered index is less expensive than
seeking a nonclustered index followed by a bookmark lookup. However, for
exact match selects, the difference usually isn't that big, for example
WHERE my_id = 15.
Where the clustered index really comes in handy is when you select a
range that potentially returns a lot of rows (or causes a lot of rows to
be processed). For example WHERE order_date >= '20070701' AND order_date
< '20070801'. In that case, the column order_date would really benefit
from a clustered index.
Another situation is a table that is joined very often to the table that
it is referrencing, and the Primary Key contains the Foreign Key of the
referenced table. For example a "Order Details" table with a Primary Key
on (order_id, line_no) that references an Orders table on column
(order_id). In such a case, creating the clustered index on the Primary
Key/Foreign Key can be a good choice. Basically it is the same rule
here. When joining, there are potentially many rows that need to be
processed.
So changing your index 3 to be the clustered index could be a good
choice, but there is not enough information for me to decide. If you
join this table a lot on the columns of index 1, then join performance
could drop. You should analyse your workload, and decide which queries
are most important (with respect to performance). The, if possible, you
should test the different scenario's and see what works best for you.
BTW: if the clustered index is defined as not unique, the uniqueifier is
added under the covers. Its existance of columns values will never be
exposed when you select from the table.
As for your other post, see my opinion inline:

> I've also found the following snippet that has gotten me a little confused
> and seems to go counter to other things I've read:
> "Think of a clustered index as the foundation of the table. Keep it simple
> and clean. The point is that if you add non-clustered indexes to the table
,
> they repeat the clustered index and add the indexed column(s).
Correct.

> So ideally your clustered index should be on a single column.
The point is not how many columns it concerns. The point is how many
bytes the key uses. From that point of view, two int columns are better
than one varchar(50) column with an average fill of 25 characters. Even
6 int columns are better than a varchar(50) column with average string
size of 25 characters.

> Ideally, a clustered index should be on a column where new rows have a
> steadily increasing value. Otherwise the clustered index will get fragmented,[/vbc
ol]
True. So from that point of view, a clustered index on a GUID is not a
good idea, unless you defragment regularly.
[vbcol=seagreen]
> and so will the non-clustered indexes based on it.
I think this is false. Nonclustered indexes will fragment if new values
of the (nonclustered) indexed key are not "steadily increasing".
However, since nonclustered indexes are much smaller than the clustered
index (which includes the table data), the cost of fragmentation of a
nonclustered index is typically a lot smaller. Of course, regular
defragmenting is the solution for this too.
There is one exception. If your nonclustered index has a very low
selectivy (just a few distinct values), then the nonclustered index'
fragmentation will be very similar to the clustered index'
fragmentation.

> In short, for a table that sees lots of inserts, use only the identity
> column for the clustered index."
> (http://sql-server-performance.com/C...3961.aspx#53961)[/vbco
l]
I definitely disagree with this statement as a general rule, especially
because of the compulsory tone.
For starters, if you don't use Identity as a (surrogate) key, then IMO
you should not add one.
Next, in many situations, the primary key's index is used a lot in
joins, but in other situations (such as filtering on "Lookup Tables")
the optimizer's access path is different and the primary key's index is
used very little.
And finally: it is very unusual to have range selects on an Identity
column. Such range selects are much more common for datetime columns.
So putting a clustered index on an Identity Primary Key could be a good
choice, but it really depends on your situation, and IMO you should not
have policy to always add an Identity and/or to always create the
clustered index on the Identity.
HTH,
Gert-Jan
jason7655 wrote:[vbcol=seagreen]
> PK = Primary Key
> Let me try to explain a little better.
> This table has 3 indexes.
> Index 1
> Primary key, clustered. 1,2,3,4,5,etc.
> Index 2
> non-clustered. Use 2nd most.
> Index 3
> non-clustered, but the main column used by most queries. Used the most.
> From my limited reading of sites (including
> http://www.sql-server-performance.c...ndexes_p1.aspx,
> http://www.sql-server-performance.c...ndexes_p2.aspx, and
> various other documents) it would seem that we could gain performance by
> changing the Index 3 to clustered.
> It's not unique and that raises another question. I would hope that the SQ
L
> Server "uniquefier" that gets added would not change the makeup of the
> current data, only create a new column.
> Does that make things more clear? Is this a good idea?
>
[snip]

Sunday, February 12, 2012

Clustered Index on Non-Unique Column

No comments:

Post a Comment

Clustered DB Server

Blog Archive

About Me