Tuesday, February 14, 2012

Clustered index vs. nonclustered index for GUID primary key

Hello,
Would someone out there understand and can explain to me why clustered index
for GUID primary key will result a faster select operation when we look up
one row by a primary key like below comparing with non-clustered index for
the primary key? I experimented this myself, so there is no doubt that
clustered index is better but I don't know why.
select *
from MyTable -- there is no other index in this table.
where FileId = 'DAF02BF0-A809-4D59-8887-A2D3ED5C81B1'
I read through several documentations and the B-tree seems similar enough
not to make a difference. In fact, my common sense (not very reliable now)
tells me that the result should be the other way around. Help!!!
Thank you very much in advance,Regardless of the data type, a clustered index seek is more efficient than a
non-clustered seek because a bookmark lookup is not needed. Note that a
clustered index is a B-tree with the actual data pages as leaf nodes. A
non-clustered index is also a B-tree but the leaf nodes are 'pointers' (RID
or clustering key) to the data rows so additional i/o is needed for the
bookmark lookup.
--
Hope this helps.
Dan Guzman
SQL Server MVP
"Zeng" <zzy@.nonospam.com> wrote in message
news:eMeR3SwiEHA.3988@.tk2msftngp13.phx.gbl...
> Hello,
> Would someone out there understand and can explain to me why clustered
index
> for GUID primary key will result a faster select operation when we look up
> one row by a primary key like below comparing with non-clustered index for
> the primary key? I experimented this myself, so there is no doubt that
> clustered index is better but I don't know why.
> select *
> from MyTable -- there is no other index in this table.
> where FileId = 'DAF02BF0-A809-4D59-8887-A2D3ED5C81B1'
> I read through several documentations and the B-tree seems similar enough
> not to make a difference. In fact, my common sense (not very reliable now)
> tells me that the result should be the other way around. Help!!!
> Thank you very much in advance,
>|||Dan correctly answered your question but I wanted to make sure you need to
be aware of more than the effects of the Bookmark or not when comparing a
clustered index to Non-clustered on a GUID. Since Guid's are random by
nature you can get a tremendous amount of page splitting that can certainly
slow down inserts and increase the size of the table dramatically. The Guid
as a CI will also add an additional 16 bytes on to each row of every
non-clustered index as well. I am not trying to talk you out of using a
GUID as your CI but wanted to be sure you understood the other ramifications
of it. There has been a lot of discussion on this topic in these newsgroups
that you may want to google on before you decide one way or the other.
--
Andrew J. Kelly SQL MVP
"Zeng" <zzy@.nonospam.com> wrote in message
news:eMeR3SwiEHA.3988@.tk2msftngp13.phx.gbl...
> Hello,
> Would someone out there understand and can explain to me why clustered
index
> for GUID primary key will result a faster select operation when we look up
> one row by a primary key like below comparing with non-clustered index for
> the primary key? I experimented this myself, so there is no doubt that
> clustered index is better but I don't know why.
> select *
> from MyTable -- there is no other index in this table.
> where FileId = 'DAF02BF0-A809-4D59-8887-A2D3ED5C81B1'
> I read through several documentations and the B-tree seems similar enough
> not to make a difference. In fact, my common sense (not very reliable now)
> tells me that the result should be the other way around. Help!!!
> Thank you very much in advance,
>|||thanks but that is still very vague for me, you said the leaf nodes are
pointers, hm you mean leaf nodes contain bunch of pointers. Don't all index
nodes contain bunch of pointers to other index nodes or data pages? Why no
bookmark lookup needed when the acutal data pages are leaf nodes? Please
help...
"Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message
news:%23KLT4bwiEHA.2052@.TK2MSFTNGP15.phx.gbl...
> Regardless of the data type, a clustered index seek is more efficient than
a
> non-clustered seek because a bookmark lookup is not needed. Note that a
> clustered index is a B-tree with the actual data pages as leaf nodes. A
> non-clustered index is also a B-tree but the leaf nodes are 'pointers'
(RID
> or clustering key) to the data rows so additional i/o is needed for the
> bookmark lookup.
> --
> Hope this helps.
> Dan Guzman
> SQL Server MVP
> "Zeng" <zzy@.nonospam.com> wrote in message
> news:eMeR3SwiEHA.3988@.tk2msftngp13.phx.gbl...
> > Hello,
> >
> > Would someone out there understand and can explain to me why clustered
> index
> > for GUID primary key will result a faster select operation when we look
up
> > one row by a primary key like below comparing with non-clustered index
for
> > the primary key? I experimented this myself, so there is no doubt that
> > clustered index is better but I don't know why.
> >
> > select *
> > from MyTable -- there is no other index in this table.
> > where FileId = 'DAF02BF0-A809-4D59-8887-A2D3ED5C81B1'
> >
> > I read through several documentations and the B-tree seems similar
enough
> > not to make a difference. In fact, my common sense (not very reliable
now)
> > tells me that the result should be the other way around. Help!!!
> >
> > Thank you very much in advance,
> >
> >
>|||With the exception of clustered index leaf nodes, all index nodes contain 'a
bunch of' pointers'. Clustered index leaf nodes contain the actual data
rows instead of pointers so no additional data access are needed.
You might peruse Books Online topics Clustered Indexes
<architec.chm::/8_ar_da2_1tbn.htm> and Non-Clustered Indexes
<architec.chm::/8_ar_da2_75mb.htm> for a thorough discussion on the
differences.
--
Hope this helps.
Dan Guzman
SQL Server MVP
"Zeng" <zzy@.nonospam.com> wrote in message
news:%23EYAw0wiEHA.704@.TK2MSFTNGP09.phx.gbl...
> thanks but that is still very vague for me, you said the leaf nodes are
> pointers, hm you mean leaf nodes contain bunch of pointers. Don't all
index
> nodes contain bunch of pointers to other index nodes or data pages? Why
no
> bookmark lookup needed when the acutal data pages are leaf nodes? Please
> help...
> "Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message
> news:%23KLT4bwiEHA.2052@.TK2MSFTNGP15.phx.gbl...
> > Regardless of the data type, a clustered index seek is more efficient
than
> a
> > non-clustered seek because a bookmark lookup is not needed. Note that a
> > clustered index is a B-tree with the actual data pages as leaf nodes. A
> > non-clustered index is also a B-tree but the leaf nodes are 'pointers'
> (RID
> > or clustering key) to the data rows so additional i/o is needed for the
> > bookmark lookup.
> >
> > --
> > Hope this helps.
> >
> > Dan Guzman
> > SQL Server MVP
> >
> > "Zeng" <zzy@.nonospam.com> wrote in message
> > news:eMeR3SwiEHA.3988@.tk2msftngp13.phx.gbl...
> > > Hello,
> > >
> > > Would someone out there understand and can explain to me why clustered
> > index
> > > for GUID primary key will result a faster select operation when we
look
> up
> > > one row by a primary key like below comparing with non-clustered index
> for
> > > the primary key? I experimented this myself, so there is no doubt
that
> > > clustered index is better but I don't know why.
> > >
> > > select *
> > > from MyTable -- there is no other index in this table.
> > > where FileId = 'DAF02BF0-A809-4D59-8887-A2D3ED5C81B1'
> > >
> > > I read through several documentations and the B-tree seems similar
> enough
> > > not to make a difference. In fact, my common sense (not very reliable
> now)
> > > tells me that the result should be the other way around. Help!!!
> > >
> > > Thank you very much in advance,
> > >
> > >
> >
> >
>|||Just to add to what others have said...
If a clustered index has been created for a table, the clustered index *is*
the table - all the columns of the table (and all the rows) are present in
the clustered index. So when you find 'DAF02BF0...' in the clustered index
(which is fast, since it's an equality search on the key, just what indexes
are for), you have also arrived at the entire row of the table and no
further data retrieval is needed to get the columns SELECT * must return.
A non-clustered index is more like a complete index for a book, or a
concordance. A concordance contains every word in the book, and for each
word, the page number(s) where that word appears in the book. If there are
pictures, punctuation, or other things in the book, you'll have to go to the
page to find them - they won't be in the index. If you look up and find
'DAF02BF0...' in a nonclustered index, you won't also find the other values
from that row you need to return SELECT *. You will only find out where to
go to find them, and will need to do more work - the index tells you where
to go by storing either the physical page location of the row or the
clustered index key of the row (the former if the table is not a clustered
index, the latter if the table is a clustered index).
I think a lot of this becomes easier to understand once you realize that the
clustered index is the table. A nonclustered index is only some of the
columns of the table, with references to direct you to the values for the
missing columns.
Steve Kass
Drew University
"Zeng" <zzy@.nonospam.com> wrote in message
news:eMeR3SwiEHA.3988@.tk2msftngp13.phx.gbl...
> Hello,
> Would someone out there understand and can explain to me why clustered
index
> for GUID primary key will result a faster select operation when we look up
> one row by a primary key like below comparing with non-clustered index for
> the primary key? I experimented this myself, so there is no doubt that
> clustered index is better but I don't know why.
> select *
> from MyTable -- there is no other index in this table.
> where FileId = 'DAF02BF0-A809-4D59-8887-A2D3ED5C81B1'
> I read through several documentations and the B-tree seems similar enough
> not to make a difference. In fact, my common sense (not very reliable now)
> tells me that the result should be the other way around. Help!!!
> Thank you very much in advance,
>|||Andrew J. Kelly wrote:
> Dan correctly answered your question but I wanted to make sure you
> need to be aware of more than the effects of the Bookmark or not when
> comparing a clustered index to Non-clustered on a GUID. Since Guid's
> are random by nature you can get a tremendous amount of page
> splitting that can certainly slow down inserts and increase the size
> of the table dramatically. The Guid as a CI will also add an
> additional 16 bytes on to each row of every non-clustered index as
> well. I am not trying to talk you out of using a GUID as your CI but
> wanted to be sure you understood the other ramifications of it.
> There has been a lot of discussion on this topic in these newsgroups
> that you may want to google on before you decide one way or the
> other.
>
I agree with Andrew. If you can, use an IDENTITY column as your PK
instead of a GUID, especially if you are going to use a clustered index.
Not only will you eliminate page splitting, but your clustered key will
be much smaller and this will translate to smaller non-clustered indexes
as well.
--
David G.|||On 2004-08-26, David G. <david_nospam@.nospam.com> wrote:
> Andrew J. Kelly wrote:
>> Dan correctly answered your question but I wanted to make sure you
>> need to be aware of more than the effects of the Bookmark or not when
>> comparing a clustered index to Non-clustered on a GUID. Since Guid's
>> are random by nature you can get a tremendous amount of page
>> splitting that can certainly slow down inserts and increase the size
>> of the table dramatically. The Guid as a CI will also add an
>> additional 16 bytes on to each row of every non-clustered index as
>> well. I am not trying to talk you out of using a GUID as your CI but
>> wanted to be sure you understood the other ramifications of it.
>> There has been a lot of discussion on this topic in these newsgroups
>> that you may want to google on before you decide one way or the
>> other.
> I agree with Andrew. If you can, use an IDENTITY column as your PK
> instead of a GUID, especially if you are going to use a clustered index.
> Not only will you eliminate page splitting, but your clustered key will
> be much smaller and this will translate to smaller non-clustered indexes
> as well.
What about rowguid columns used for replication? Replication wizzard adds
rowguid columns, and creates clustered index on them if I don't allready
have clustered index on that particular table. Is it then better to change
that rowguid index to non-clustered one?
Mike
--
"I can do it quick. I can do it cheap. I can do it well. Pick any two."
Mario Splivalo
msplival@.jagor.srce.hr|||Mario Splivalo wrote:
> On 2004-08-26, David G. <david_nospam@.nospam.com> wrote:
>> Andrew J. Kelly wrote:
>> Dan correctly answered your question but I wanted to make sure you
>> need to be aware of more than the effects of the Bookmark or not
>> when comparing a clustered index to Non-clustered on a GUID. Since
>> Guid's are random by nature you can get a tremendous amount of page
>> splitting that can certainly slow down inserts and increase the size
>> of the table dramatically. The Guid as a CI will also add an
>> additional 16 bytes on to each row of every non-clustered index as
>> well. I am not trying to talk you out of using a GUID as your CI
>> but wanted to be sure you understood the other ramifications of it.
>> There has been a lot of discussion on this topic in these newsgroups
>> that you may want to google on before you decide one way or the
>> other.
>>
>> I agree with Andrew. If you can, use an IDENTITY column as your PK
>> instead of a GUID, especially if you are going to use a clustered
>> index. Not only will you eliminate page splitting, but your
>> clustered key will be much smaller and this will translate to
>> smaller non-clustered indexes as well.
> What about rowguid columns used for replication? Replication wizzard
> adds rowguid columns, and creates clustered index on them if I don't
> allready have clustered index on that particular table. Is it then
> better to change that rowguid index to non-clustered one?
> Mike
If the table experiences a lot of inserts (highly transactional), then
you will likely see page splits and the hard drive array may be writing
all over the place to get the row on the proper page. You can limit page
splits using a fill factor on the index and reindexing periodically to
keep the free page space available. But the table will be larger as a
result and the drive heads still may be moving around more than
necessary.
I'm not sure why SQL Server adds the index as clustered. Probably just a
default like it uses when adding a PK constraint.
Remember that clustered index keys are part of all non-clustered
indexes. So adding a 16-byte GUID CI adds an additional 16-bytes to each
key in a non-clustered index. If you have a non-clustered index on an
INT IDENTITY, then you'll be increasing storage from 4-bytes to 20-bytes
for each key, which is substantial (makes storage much more costly, not
to mention backups and restore operations, and writing).
I would suggest, as a matter of practice, you add a clustered index to
all tables. That way, SQL Server doesn't do it for you on a column that
probably shouldn't have one.
David G.|||I wasn't aware that the wizard made these clustered. The Guid used for
replication does not need to be the PK or clustered and often is not.
--
Andrew J. Kelly SQL MVP
"Mario Splivalo" <majk@.fly.srk.fer.hr> wrote in message
news:slrncir0tn.8sl.majk@.fly.srk.fer.hr...
> On 2004-08-26, David G. <david_nospam@.nospam.com> wrote:
> > Andrew J. Kelly wrote:
> >> Dan correctly answered your question but I wanted to make sure you
> >> need to be aware of more than the effects of the Bookmark or not when
> >> comparing a clustered index to Non-clustered on a GUID. Since Guid's
> >> are random by nature you can get a tremendous amount of page
> >> splitting that can certainly slow down inserts and increase the size
> >> of the table dramatically. The Guid as a CI will also add an
> >> additional 16 bytes on to each row of every non-clustered index as
> >> well. I am not trying to talk you out of using a GUID as your CI but
> >> wanted to be sure you understood the other ramifications of it.
> >> There has been a lot of discussion on this topic in these newsgroups
> >> that you may want to google on before you decide one way or the
> >> other.
> >>
> >
> > I agree with Andrew. If you can, use an IDENTITY column as your PK
> > instead of a GUID, especially if you are going to use a clustered index.
> > Not only will you eliminate page splitting, but your clustered key will
> > be much smaller and this will translate to smaller non-clustered indexes
> > as well.
> What about rowguid columns used for replication? Replication wizzard adds
> rowguid columns, and creates clustered index on them if I don't allready
> have clustered index on that particular table. Is it then better to change
> that rowguid index to non-clustered one?
> Mike
> --
> "I can do it quick. I can do it cheap. I can do it well. Pick any two."
> Mario Splivalo
> msplival@.jagor.srce.hr|||"David G." <david_nospam@.nospam.com> wrote in message
news:O5nSdgziEHA.396@.TK2MSFTNGP12.phx.gbl...
> Mario Splivalo wrote:
> > On 2004-08-26, David G. <david_nospam@.nospam.com> wrote:
> >> Andrew J. Kelly wrote:
> >> Dan correctly answered your question but I wanted to make sure you
> >> need to be aware of more than the effects of the Bookmark or not
> >> when comparing a clustered index to Non-clustered on a GUID. Since
> >> Guid's are random by nature you can get a tremendous amount of page
> >> splitting that can certainly slow down inserts and increase the size
> >> of the table dramatically. The Guid as a CI will also add an
> >> additional 16 bytes on to each row of every non-clustered index as
> >> well. I am not trying to talk you out of using a GUID as your CI
> >> but wanted to be sure you understood the other ramifications of it.
> >> There has been a lot of discussion on this topic in these newsgroups
> >> that you may want to google on before you decide one way or the
> >> other.
> >>
> >>
> >> I agree with Andrew. If you can, use an IDENTITY column as your PK
> >> instead of a GUID, especially if you are going to use a clustered
> >> index. Not only will you eliminate page splitting, but your
> >> clustered key will be much smaller and this will translate to
> >> smaller non-clustered indexes as well.
> >
> > What about rowguid columns used for replication? Replication wizzard
> > adds rowguid columns, and creates clustered index on them if I don't
> > allready have clustered index on that particular table. Is it then
> > better to change that rowguid index to non-clustered one?
> >
> > Mike
> If the table experiences a lot of inserts (highly transactional), then
> you will likely see page splits and the hard drive array may be writing
> all over the place to get the row on the proper page. You can limit page
> splits using a fill factor on the index and reindexing periodically to
> keep the free page space available. But the table will be larger as a
> result and the drive heads still may be moving around more than
> necessary.
> I'm not sure why SQL Server adds the index as clustered. Probably just a
> default like it uses when adding a PK constraint.
> Remember that clustered index keys are part of all non-clustered
> indexes. So adding a 16-byte GUID CI adds an additional 16-bytes to each
> key in a non-clustered index. If you have a non-clustered index on an
> INT IDENTITY, then you'll be increasing storage from 4-bytes to 20-bytes
> for each key, which is substantial (makes storage much more costly, not
> to mention backups and restore operations, and writing).
Not quite. An index on an integer column contains more than 4 bytes per row
regardless of whether there is a clustered index on the table. The index
must have some kind of reference to the location of the full table row to
begin with, in addition to the 4 bytes for the integer. Adding a GUID CI
adds fewer than 16 bytes to each non-clustered index. The addition of the
CI replaces a row locator (assuming no CI to begin with) with the new CI key
(or new CI key plus 4-byte uniquifier if the CI is not declared as unique).
So if there was no clustered index to begin with, I think the new 16-byte
reference replaces an existing 8-byte row locator, increasing the size of
each index row by 8 bytes, not 16.
SK
> I would suggest, as a matter of practice, you add a clustered index to
> all tables. That way, SQL Server doesn't do it for you on a column that
> probably shouldn't have one.
>
> --
> David G.
>|||Is this true? Basically, there are 2 differences between clustered and
nonclustered index:
1) Nonclustered index will typically require *one* extra disk reading
operations because the leaf nodes are not the data itself, AND
2) Nonclustered index will require a few more disk reading operations
because the data of each row is not stored together in one place. I have
looked at these references that Dan Clustered Indexes
<architec.chm::/8_ar_da2_1tbn.htm> and Non-Clustered Indexes
<architec.chm::/8_ar_da2_75mb.htm> but doesn't find anything about this
point. And I wonder how come SqlServer doesn't always store each row
together by default' And is this the bookmark operation that the document
refering to?
Is there anything else?
"Steve Kass" <skass@.drew.edu> wrote in message
news:uLKdk8xiEHA.2808@.TK2MSFTNGP10.phx.gbl...
> Just to add to what others have said...
> If a clustered index has been created for a table, the clustered index
*is*
> the table - all the columns of the table (and all the rows) are present in
> the clustered index. So when you find 'DAF02BF0...' in the clustered
index
> (which is fast, since it's an equality search on the key, just what
indexes
> are for), you have also arrived at the entire row of the table and no
> further data retrieval is needed to get the columns SELECT * must return.
> A non-clustered index is more like a complete index for a book, or a
> concordance. A concordance contains every word in the book, and for each
> word, the page number(s) where that word appears in the book. If there
are
> pictures, punctuation, or other things in the book, you'll have to go to
the
> page to find them - they won't be in the index. If you look up and find
> 'DAF02BF0...' in a nonclustered index, you won't also find the other
values
> from that row you need to return SELECT *. You will only find out where
to
> go to find them, and will need to do more work - the index tells you where
> to go by storing either the physical page location of the row or the
> clustered index key of the row (the former if the table is not a clustered
> index, the latter if the table is a clustered index).
> I think a lot of this becomes easier to understand once you realize that
the
> clustered index is the table. A nonclustered index is only some of the
> columns of the table, with references to direct you to the values for the
> missing columns.
> Steve Kass
> Drew University
> "Zeng" <zzy@.nonospam.com> wrote in message
> news:eMeR3SwiEHA.3988@.tk2msftngp13.phx.gbl...
> > Hello,
> >
> > Would someone out there understand and can explain to me why clustered
> index
> > for GUID primary key will result a faster select operation when we look
up
> > one row by a primary key like below comparing with non-clustered index
for
> > the primary key? I experimented this myself, so there is no doubt that
> > clustered index is better but I don't know why.
> >
> > select *
> > from MyTable -- there is no other index in this table.
> > where FileId = 'DAF02BF0-A809-4D59-8887-A2D3ED5C81B1'
> >
> > I read through several documentations and the B-tree seems similar
enough
> > not to make a difference. In fact, my common sense (not very reliable
now)
> > tells me that the result should be the other way around. Help!!!
> >
> > Thank you very much in advance,
> >
> >
>
>|||Zeng wrote:
> Is this true? Basically, there are 2 differences between clustered and
> nonclustered index:
> 1) Nonclustered index will typically require *one* extra disk reading
> operations because the leaf nodes are not the data itself, AND
> 2) Nonclustered index will require a few more disk reading operations
> because the data of each row is not stored together in one place. I
> have looked at these references that Dan Clustered Indexes
> <architec.chm::/8_ar_da2_1tbn.htm> and Non-Clustered Indexes
> <architec.chm::/8_ar_da2_75mb.htm> but doesn't find anything about
> this point. And I wonder how come SqlServer doesn't always store each
> row together by default' And is this the bookmark operation that the
> document refering to?
>
I'm not sure what you mean by "how come SqlServer doesn't always store
each
> row together by default?". In a clustered index, the rows on a given
page are in order, although pages themselves may be out of sequence.
The clustered index is itself the table. That's why you can only have
one on a table. It would be impossible to sort the data two different
ways.
On a non-clustered index, as others have mentioned, you only have
pointers to the actual data. To get there, SQL Server performs a
bookmark lookup, which is a very fast operation, just not as fast as if
the query could use a clustered index. Unless the query can make use of
a covering index (all columns and parameters are in the index itself), a
bookmark lookup will occur.
David G.|||From Steve Cass comment, the data of one row in non-clustered index (for
table w/o clustered index) is not stored in one place, it's fragmented. If
I understand him correctly, if my table has 40 columns, then the data of
those 40 columns even for just one row might be scattered in different
places; so I'm was wondering how come SqlServer doesn't store data of all
columns belonging to one row in one place just like when there is clustered
index in the table.
"David G." <david_nospam@.nospam.com> wrote in message
news:OZawwB5iEHA.2848@.TK2MSFTNGP10.phx.gbl...
> Zeng wrote:
> > Is this true? Basically, there are 2 differences between clustered and
> > nonclustered index:
> > 1) Nonclustered index will typically require *one* extra disk reading
> > operations because the leaf nodes are not the data itself, AND
> > 2) Nonclustered index will require a few more disk reading operations
> > because the data of each row is not stored together in one place. I
> > have looked at these references that Dan Clustered Indexes
> > <architec.chm::/8_ar_da2_1tbn.htm> and Non-Clustered Indexes
> > <architec.chm::/8_ar_da2_75mb.htm> but doesn't find anything about
> > this point. And I wonder how come SqlServer doesn't always store each
> > row together by default' And is this the bookmark operation that the
> > document refering to?
> >
> >
> I'm not sure what you mean by "how come SqlServer doesn't always store
> each
> > row together by default?". In a clustered index, the rows on a given
> page are in order, although pages themselves may be out of sequence.
> The clustered index is itself the table. That's why you can only have
> one on a table. It would be impossible to sort the data two different
> ways.
> On a non-clustered index, as others have mentioned, you only have
> pointers to the actual data. To get there, SQL Server performs a
> bookmark lookup, which is a very fast operation, just not as fast as if
> the query could use a clustered index. Unless the query can make use of
> a covering index (all columns and parameters are in the index itself), a
> bookmark lookup will occur.
>
> --
> David G.
>|||> If
> I understand him correctly, if my table has 40 columns, then the data of
> those 40 columns even for just one row might be scattered in different
> places;
No. one row is always stored together. In fact, one row always fit on one page. (With the exception of text
and image columns, of course.)
--
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
"Zeng" <zzy@.nonospam.com> wrote in message news:uD%23bdn6iEHA.396@.TK2MSFTNGP12.phx.gbl...
> From Steve Cass comment, the data of one row in non-clustered index (for
> table w/o clustered index) is not stored in one place, it's fragmented. If
> I understand him correctly, if my table has 40 columns, then the data of
> those 40 columns even for just one row might be scattered in different
> places; so I'm was wondering how come SqlServer doesn't store data of all
> columns belonging to one row in one place just like when there is clustered
> index in the table.
>
> "David G." <david_nospam@.nospam.com> wrote in message
> news:OZawwB5iEHA.2848@.TK2MSFTNGP10.phx.gbl...
> > Zeng wrote:
> > > Is this true? Basically, there are 2 differences between clustered and
> > > nonclustered index:
> > > 1) Nonclustered index will typically require *one* extra disk reading
> > > operations because the leaf nodes are not the data itself, AND
> > > 2) Nonclustered index will require a few more disk reading operations
> > > because the data of each row is not stored together in one place. I
> > > have looked at these references that Dan Clustered Indexes
> > > <architec.chm::/8_ar_da2_1tbn.htm> and Non-Clustered Indexes
> > > <architec.chm::/8_ar_da2_75mb.htm> but doesn't find anything about
> > > this point. And I wonder how come SqlServer doesn't always store each
> > > row together by default' And is this the bookmark operation that the
> > > document refering to?
> > >
> > >
> >
> > I'm not sure what you mean by "how come SqlServer doesn't always store
> > each
> > > row together by default?". In a clustered index, the rows on a given
> > page are in order, although pages themselves may be out of sequence.
> >
> > The clustered index is itself the table. That's why you can only have
> > one on a table. It would be impossible to sort the data two different
> > ways.
> >
> > On a non-clustered index, as others have mentioned, you only have
> > pointers to the actual data. To get there, SQL Server performs a
> > bookmark lookup, which is a very fast operation, just not as fast as if
> > the query could use a clustered index. Unless the query can make use of
> > a covering index (all columns and parameters are in the index itself), a
> > bookmark lookup will occur.
> >
> >
> > --
> > David G.
> >
>|||Zeng,
The data in a non-clustered index is duplicate data. Suppose table T
has columns A, B, C, D, and E. The data is stored together - all 5
columns and all rows, either in the clustered index, if there is one, or
in what is called a heap. In either case, let's just call it the table.
The table, or the clustered index
A B C D E
A B C D E
A B C D E
...
A B C D E
If we now add a nonclustered index on column D, we will have a second
copy of column D stored away from the table. To make the connection
between these D values and the rest of the table, we need to store
additional information with each D value so we can find the A, B, C, and
E values in the same row as the D value. So the nonclustered index
looks like this (remember, this is stored in addition to the table - the
table also contains a copy of the D values)
The nonclustered index on D
D <location of this row in the table>
D <location of this row in the table>
D <location of this row in the table>
...
D <location of this row in the table>
So now the D values are stored in two places. Once with the whole
table, but probably not in an order that lets us search for a particular
D value, and a second time in the nonclustered index, in order by D
value, but without the rest of the column values - instead of that
information, a reference to the full table row is stored alongside each
D value. The way I think of it, the nonclustered index has "rows" just
like the table does, but they don't include as much information.
So to answer your question, SQL Server always stores the data of all
columns in one place. It may, however, store the data from some columns
in another place (a nonclustered index) *in addition*.to where all the
data is stored together.
SK
Zeng wrote:
>From Steve Cass comment, the data of one row in non-clustered index (for
>table w/o clustered index) is not stored in one place, it's fragmented. If
>I understand him correctly, if my table has 40 columns, then the data of
>those 40 columns even for just one row might be scattered in different
>places; so I'm was wondering how come SqlServer doesn't store data of all
>columns belonging to one row in one place just like when there is clustered
>index in the table.
>
>"David G." <david_nospam@.nospam.com> wrote in message
>news:OZawwB5iEHA.2848@.TK2MSFTNGP10.phx.gbl...
>
>>Zeng wrote:
>>
>>Is this true? Basically, there are 2 differences between clustered and
>>nonclustered index:
>>1) Nonclustered index will typically require *one* extra disk reading
>>operations because the leaf nodes are not the data itself, AND
>>2) Nonclustered index will require a few more disk reading operations
>>because the data of each row is not stored together in one place. I
>>have looked at these references that Dan Clustered Indexes
>><architec.chm::/8_ar_da2_1tbn.htm> and Non-Clustered Indexes
>><architec.chm::/8_ar_da2_75mb.htm> but doesn't find anything about
>>this point. And I wonder how come SqlServer doesn't always store each
>>row together by default' And is this the bookmark operation that the
>>document refering to?
>>
>>
>>I'm not sure what you mean by "how come SqlServer doesn't always store
>>each
>>
>>row together by default?". In a clustered index, the rows on a given
>>
>>page are in order, although pages themselves may be out of sequence.
>>The clustered index is itself the table. That's why you can only have
>>one on a table. It would be impossible to sort the data two different
>>ways.
>>On a non-clustered index, as others have mentioned, you only have
>>pointers to the actual data. To get there, SQL Server performs a
>>bookmark lookup, which is a very fast operation, just not as fast as if
>>the query could use a clustered index. Unless the query can make use of
>>a covering index (all columns and parameters are in the index itself), a
>>bookmark lookup will occur.
>>
>>--
>>David G.
>>
>
>

No comments:

Post a Comment