Tuesday, February 14, 2012

Clustered Indexes

I just gone through a datawarehouse CD seminar for SQL2K
but a little confused on something. The author said not
to use clustered indexes with star/snowflake schemas. Is
there a reason for this? The author never explained why
we should not use clustered indexes in a datawarehouse.
Does anyone have an answer?
In my not-so-humble opinion, that's absolutely insane. I've very rarely
found a reason not to have a clustered index, and the time I can remember
off the top of my head had something to do with strange real-time insert and
update patterns causing page splits. As a data warehouse will generally not
be loaded in real time (but rather through an ETL process), there is no such
concern; query performance should be the primary focus. Clustered indexes
are invaluable for tuning certain types of queries. To not use them would
be almost to doom your project to failure...
Out of curiosity, what CD/seminar is it? Can you post some more about the
reasons the author gave? Perhaps there was some context or certain
situations mentioned?
Adam Machanic
SQL Server MVP
http://www.sqljunkies.com/weblog/amachanic
<anonymous@.discussions.microsoft.com> wrote in message
news:0bfe01c4bbc0$00f66840$a401280a@.phx.gbl...
> I just gone through a datawarehouse CD seminar for SQL2K
> but a little confused on something. The author said not
> to use clustered indexes with star/snowflake schemas. Is
> there a reason for this? The author never explained why
> we should not use clustered indexes in a datawarehouse.
> Does anyone have an answer?
|||Thanks for the reply. The CD was from Learnkey. It
mentions that when you use clustered indexes it will make
the non-clustered index huge.
|||Depends on how much you have in the cluster and what your definition of
'huge' is
Adam Machanic
SQL Server MVP
http://www.sqljunkies.com/weblog/amachanic
<anonymous@.discussions.microsoft.com> wrote in message
news:174d01c4bc4c$4f113280$a601280a@.phx.gbl...
> Thanks for the reply. The CD was from Learnkey. It
> mentions that when you use clustered indexes it will make
> the non-clustered index huge.
|||Let me help... Although I am an author for Learnkey, I did not do the
current SQL 2k Data Warehouse stuff, but I think I can tell you what the
author was thinking about here..
Think about the leaf level of non-clustered indexes. Each entry contains the
key and the row locator(pointer the actual data row.) When the non-clustered
index is built on a heap ( no clustered index exists for the table) , the
row locator is the physical address of the data row ( I think 6 bytes - it
is the file#-Page#-Row#onthePage).
If you build a clustered index on this same table on the lastname column (
which is perhaps varchar(24) with an average length of 8 characters). The
row locator for EACH of the non-clustered indexes is replaced by the
clustered index key... We just grew the leaf entry for the row locator from
6 to 8 bytes...
So the size of the clustered index key matters.
I do not know if I have the same opinion as the author you mention ( I'll
have to think about it.) But now you know why he said it..
Wayne Snyder, MCDBA, SQL Server MVP
Mariner, Charlotte, NC
www.mariner-usa.com
(Please respond only to the newsgroups.)
I support the Professional Association of SQL Server (PASS) and it's
community of SQL Server professionals.
www.sqlpass.org
<anonymous@.discussions.microsoft.com> wrote in message
news:174d01c4bc4c$4f113280$a601280a@.phx.gbl...
> Thanks for the reply. The CD was from Learnkey. It
> mentions that when you use clustered indexes it will make
> the non-clustered index huge.
|||never, ever, ever, ever, ever
base a clustered index on anything other than an integer or date.
it slows down all of the other indexes on the table. I prefer using INTs for
clustered indexes-- especallyt for hash indexing.. but there are some times
that it is really helpful to use a date for a clustered index.
"Wayne Snyder" <wayne.nospam.snyder@.mariner-usa.com> wrote in message
news:uXp9Z59yEHA.3708@.TK2MSFTNGP14.phx.gbl...
> Let me help... Although I am an author for Learnkey, I did not do the
> current SQL 2k Data Warehouse stuff, but I think I can tell you what the
> author was thinking about here..
> Think about the leaf level of non-clustered indexes. Each entry contains
the
> key and the row locator(pointer the actual data row.) When the
non-clustered
> index is built on a heap ( no clustered index exists for the table) , the
> row locator is the physical address of the data row ( I think 6 bytes - it
> is the file#-Page#-Row#onthePage).
> If you build a clustered index on this same table on the lastname column (
> which is perhaps varchar(24) with an average length of 8 characters). The
> row locator for EACH of the non-clustered indexes is replaced by the
> clustered index key... We just grew the leaf entry for the row locator
from
> 6 to 8 bytes...
> So the size of the clustered index key matters.
> I do not know if I have the same opinion as the author you mention ( I'll
> have to think about it.) But now you know why he said it..
> --
> Wayne Snyder, MCDBA, SQL Server MVP
> Mariner, Charlotte, NC
> www.mariner-usa.com
> (Please respond only to the newsgroups.)
> I support the Professional Association of SQL Server (PASS) and it's
> community of SQL Server professionals.
> www.sqlpass.org
> <anonymous@.discussions.microsoft.com> wrote in message
> news:174d01c4bc4c$4f113280$a601280a@.phx.gbl...
>
|||Aaron,
I agree. Clustered indexes on int, bigint, datetime things that make
natural range lookus work best. But exactly how does using other types slow
other nonclustered indexes?
Lets say I need to be able to get all the people who's last name begins with
a letter. So the clustered index for the table is set to LastNameInitial
char(1). How would this slow down other indexes or am I just choosing a
specific exception to a general rule?
Danny
"aaron kempf" <aaron_kempf@.hotmail.com> wrote in message
news:OU$xvnc0EHA.3708@.TK2MSFTNGP14.phx.gbl...
> never, ever, ever, ever, ever
> base a clustered index on anything other than an integer or date.
> it slows down all of the other indexes on the table. I prefer using INTs
> for
> clustered indexes-- especallyt for hash indexing.. but there are some
> times
> that it is really helpful to use a date for a clustered index.
>
>
> "Wayne Snyder" <wayne.nospam.snyder@.mariner-usa.com> wrote in message
> news:uXp9Z59yEHA.3708@.TK2MSFTNGP14.phx.gbl...
> the
> non-clustered
> from
>
|||"aaron kempf" <aaron_kempf@.hotmail.com> wrote in message
news:OU$xvnc0EHA.3708@.TK2MSFTNGP14.phx.gbl...
> never, ever, ever, ever, ever
> base a clustered index on anything other than an integer or date.
> it slows down all of the other indexes on the table. I prefer using INTs
for
> clustered indexes-- especallyt for hash indexing.. but there are some
times
> that it is really helpful to use a date for a clustered index.
I don't agree, at all.
Use whatever is natural to use. Don't constrain yourself with these
rules.
Adam Machanic
SQL Server MVP
http://www.sqljunkies.com/weblog/amachanic
|||"Danny" <istdrs@.flash.net> wrote in message
news:Vq%od.23751$Rf1.15788@.newssvr19.news.prodigy. com...
> Lets say I need to be able to get all the people who's last name begins
with
> a letter. So the clustered index for the table is set to LastNameInitial
> char(1). How would this slow down other indexes or am I just choosing a
> specific exception to a general rule?
There is no rule, general or otherwise.
Remember that columns from clustered indexes will be duplicated as the
leaves of non-clustered indexes; so try to keep them fairly narrow. I
assume that's what Aaron was referring to. However, there are times when a
wide clustered index works quite well, and definitely lots of situations in
which using non-integers/dates is appropriate.
Adam Machanic
SQL Server MVP
http://www.sqljunkies.com/weblog/amachanic
|||so anyone got the best site explaining in detail cluster and non-cluster
index which impact a HUGE system (production). TQ
"anonymous@.discussions.microsoft.com" wrote:

> I just gone through a datawarehouse CD seminar for SQL2K
> but a little confused on something. The author said not
> to use clustered indexes with star/snowflake schemas. Is
> there a reason for this? The author never explained why
> we should not use clustered indexes in a datawarehouse.
> Does anyone have an answer?
>

No comments:

Post a Comment