Clustered DB Server: Clustered index vs. nonclustered index for GUID primary key

Hello,
Would someone out there understand and can explain to me why clustered index
for GUID primary key will result a faster select operation when we look up
one row by a primary key like below comparing with non-clustered index for
the primary key? I experimented this myself, so there is no doubt that
clustered index is better but I don't know why.
select *
from MyTable -- there is no other index in this table.
where FileId = 'DAF02BF0-A809-4D59-8887-A2D3ED5C81B1'
I read through several documentations and the B-tree seems similar enough
not to make a difference. In fact, my common sense (not very reliable now)
tells me that the result should be the other way around. Help!!!
Thank you very much in advance,Regardless of the data type, a clustered index seek is more efficient than a
non-clustered seek because a bookmark lookup is not needed. Note that a
clustered index is a B-tree with the actual data pages as leaf nodes. A
non-clustered index is also a B-tree but the leaf nodes are 'pointers' (RID
or clustering key) to the data rows so additional i/o is needed for the
bookmark lookup.
Hope this helps.
Dan Guzman
SQL Server MVP
"Zeng" <zzy@.nonospam.com> wrote in message
news:eMeR3SwiEHA.3988@.tk2msftngp13.phx.gbl...
> Hello,
> Would someone out there understand and can explain to me why clustered
index
> for GUID primary key will result a faster select operation when we look up
> one row by a primary key like below comparing with non-clustered index for
> the primary key? I experimented this myself, so there is no doubt that
> clustered index is better but I don't know why.
> select *
> from MyTable -- there is no other index in this table.
> where FileId = 'DAF02BF0-A809-4D59-8887-A2D3ED5C81B1'
> I read through several documentations and the B-tree seems similar enough
> not to make a difference. In fact, my common sense (not very reliable now)
> tells me that the result should be the other way around. Help!!!
> Thank you very much in advance,
>|||Dan correctly answered your question but I wanted to make sure you need to
be aware of more than the effects of the Bookmark or not when comparing a
clustered index to Non-clustered on a GUID. Since Guid's are random by
nature you can get a tremendous amount of page splitting that can certainly
slow down inserts and increase the size of the table dramatically. The Guid
as a CI will also add an additional 16 bytes on to each row of every
non-clustered index as well. I am not trying to talk you out of using a
GUID as your CI but wanted to be sure you understood the other ramifications
of it. There has been a lot of discussion on this topic in these newsgroups
that you may want to google on before you decide one way or the other.
Andrew J. Kelly SQL MVP
"Zeng" <zzy@.nonospam.com> wrote in message
news:eMeR3SwiEHA.3988@.tk2msftngp13.phx.gbl...
> Hello,
> Would someone out there understand and can explain to me why clustered
index
> for GUID primary key will result a faster select operation when we look up
> one row by a primary key like below comparing with non-clustered index for
> the primary key? I experimented this myself, so there is no doubt that
> clustered index is better but I don't know why.
> select *
> from MyTable -- there is no other index in this table.
> where FileId = 'DAF02BF0-A809-4D59-8887-A2D3ED5C81B1'
> I read through several documentations and the B-tree seems similar enough
> not to make a difference. In fact, my common sense (not very reliable now)
> tells me that the result should be the other way around. Help!!!
> Thank you very much in advance,
>|||thanks but that is still very vague for me, you said the leaf nodes are
pointers, hm you mean leaf nodes contain bunch of pointers. Don't all index
nodes contain bunch of pointers to other index nodes or data pages? Why no
bookmark lookup needed when the acutal data pages are leaf nodes? Please
help...
"Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message
news:%23KLT4bwiEHA.2052@.TK2MSFTNGP15.phx.gbl...
> Regardless of the data type, a clustered index seek is more efficient than
a
> non-clustered seek because a bookmark lookup is not needed. Note that a
> clustered index is a B-tree with the actual data pages as leaf nodes. A
> non-clustered index is also a B-tree but the leaf nodes are 'pointers'
(RID
> or clustering key) to the data rows so additional i/o is needed for the
> bookmark lookup.
> --
> Hope this helps.
> Dan Guzman
> SQL Server MVP
> "Zeng" <zzy@.nonospam.com> wrote in message
> news:eMeR3SwiEHA.3988@.tk2msftngp13.phx.gbl...
> index
up[vbcol=seagreen]
for[vbcol=seagreen]
enough[vbcol=seagreen]
now)[vbcol=seagreen]
>|||With the exception of clustered index leaf nodes, all index nodes contain 'a
bunch of' pointers'. Clustered index leaf nodes contain the actual data
rows instead of pointers so no additional data access are needed.
You might peruse Books Online topics Clustered Indexes
<architec.chm::/8_ar_da2_1tbn.htm> and Non-Clustered Indexes
<architec.chm::/8_ar_da2_75mb.htm> for a thorough discussion on the
differences.
Hope this helps.
Dan Guzman
SQL Server MVP
"Zeng" <zzy@.nonospam.com> wrote in message
news:%23EYAw0wiEHA.704@.TK2MSFTNGP09.phx.gbl...
> thanks but that is still very vague for me, you said the leaf nodes are
> pointers, hm you mean leaf nodes contain bunch of pointers. Don't all
index
> nodes contain bunch of pointers to other index nodes or data pages? Why
no
> bookmark lookup needed when the acutal data pages are leaf nodes? Please
> help...
> "Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message
> news:%23KLT4bwiEHA.2052@.TK2MSFTNGP15.phx.gbl...
than[vbcol=seagreen]
> a
> (RID
look[vbcol=seagreen]
> up
> for
that[vbcol=seagreen]
> enough
> now)
>|||Just to add to what others have said...
If a clustered index has been created for a table, the clustered index *is*
the table - all the columns of the table (and all the rows) are present in
the clustered index. So when you find 'DAF02BF0...' in the clustered index
(which is fast, since it's an equality search on the key, just what indexes
are for), you have also arrived at the entire row of the table and no
further data retrieval is needed to get the columns SELECT * must return.
A non-clustered index is more like a complete index for a book, or a
concordance. A concordance contains every word in the book, and for each
word, the page number(s) where that word appears in the book. If there are
pictures, punctuation, or other things in the book, you'll have to go to the
page to find them - they won't be in the index. If you look up and find
'DAF02BF0...' in a nonclustered index, you won't also find the other values
from that row you need to return SELECT *. You will only find out where to
go to find them, and will need to do more work - the index tells you where
to go by storing either the physical page location of the row or the
clustered index key of the row (the former if the table is not a clustered
index, the latter if the table is a clustered index).
I think a lot of this becomes easier to understand once you realize that the
clustered index is the table. A nonclustered index is only some of the
columns of the table, with references to direct you to the values for the
missing columns.
Steve Kass
Drew University
"Zeng" <zzy@.nonospam.com> wrote in message
news:eMeR3SwiEHA.3988@.tk2msftngp13.phx.gbl...
> Hello,
> Would someone out there understand and can explain to me why clustered
index
> for GUID primary key will result a faster select operation when we look up
> one row by a primary key like below comparing with non-clustered index for
> the primary key? I experimented this myself, so there is no doubt that
> clustered index is better but I don't know why.
> select *
> from MyTable -- there is no other index in this table.
> where FileId = 'DAF02BF0-A809-4D59-8887-A2D3ED5C81B1'
> I read through several documentations and the B-tree seems similar enough
> not to make a difference. In fact, my common sense (not very reliable now)
> tells me that the result should be the other way around. Help!!!
> Thank you very much in advance,
>|||Andrew J. Kelly wrote:
> Dan correctly answered your question but I wanted to make sure you
> need to be aware of more than the effects of the Bookmark or not when
> comparing a clustered index to Non-clustered on a GUID. Since Guid's
> are random by nature you can get a tremendous amount of page
> splitting that can certainly slow down inserts and increase the size
> of the table dramatically. The Guid as a CI will also add an
> additional 16 bytes on to each row of every non-clustered index as
> well. I am not trying to talk you out of using a GUID as your CI but
> wanted to be sure you understood the other ramifications of it.
> There has been a lot of discussion on this topic in these newsgroups
> that you may want to google on before you decide one way or the
> other.
>
I agree with Andrew. If you can, use an IDENTITY column as your PK
instead of a GUID, especially if you are going to use a clustered index.
Not only will you eliminate page splitting, but your clustered key will
be much smaller and this will translate to smaller non-clustered indexes
as well.
David G.|||On 2004-08-26, David G. <david_nospam@.nospam.com> wrote:
> Andrew J. Kelly wrote:
> I agree with Andrew. If you can, use an IDENTITY column as your PK
> instead of a GUID, especially if you are going to use a clustered index.
> Not only will you eliminate page splitting, but your clustered key will
> be much smaller and this will translate to smaller non-clustered indexes
> as well.
What about rowguid columns used for replication? Replication wizzard adds
rowguid columns, and creates clustered index on them if I don't allready
have clustered index on that particular table. Is it then better to change
that rowguid index to non-clustered one?
Mike
--
"I can do it quick. I can do it cheap. I can do it well. Pick any two."
Mario Splivalo
msplival@.jagor.srce.hr|||Mario Splivalo wrote:
> On 2004-08-26, David G. <david_nospam@.nospam.com> wrote:
> What about rowguid columns used for replication? Replication wizzard
> adds rowguid columns, and creates clustered index on them if I don't
> allready have clustered index on that particular table. Is it then
> better to change that rowguid index to non-clustered one?
> Mike
If the table experiences a lot of inserts (highly transactional), then
you will likely see page splits and the hard drive array may be writing
all over the place to get the row on the proper page. You can limit page
splits using a fill factor on the index and reindexing periodically to
keep the free page space available. But the table will be larger as a
result and the drive heads still may be moving around more than
necessary.
I'm not sure why SQL Server adds the index as clustered. Probably just a
default like it uses when adding a PK constraint.
Remember that clustered index keys are part of all non-clustered
indexes. So adding a 16-byte GUID CI adds an additional 16-bytes to each
key in a non-clustered index. If you have a non-clustered index on an
INT IDENTITY, then you'll be increasing storage from 4-bytes to 20-bytes
for each key, which is substantial (makes storage much more costly, not
to mention backups and restore operations, and writing).
I would suggest, as a matter of practice, you add a clustered index to
all tables. That way, SQL Server doesn't do it for you on a column that
probably shouldn't have one.
David G.|||I wasn't aware that the wizard made these clustered. The Guid used for
replication does not need to be the PK or clustered and often is not.
Andrew J. Kelly SQL MVP
"Mario Splivalo" <majk@.fly.srk.fer.hr> wrote in message
news:slrncir0tn.8sl.majk@.fly.srk.fer.hr...
> On 2004-08-26, David G. <david_nospam@.nospam.com> wrote:
> What about rowguid columns used for replication? Replication wizzard adds
> rowguid columns, and creates clustered index on them if I don't allready
> have clustered index on that particular table. Is it then better to change
> that rowguid index to non-clustered one?
> Mike
> --
> "I can do it quick. I can do it cheap. I can do it well. Pick any two."
> Mario Splivalo
> msplival@.jagor.srce.hr

Tuesday, February 14, 2012

Clustered index vs. nonclustered index for GUID primary key

No comments:

Post a Comment

Clustered DB Server

Blog Archive

About Me