Clustered DB Server: clustered index choice

Hi.
We have a table called Master that has like ~400000 rows, and details
table with ~5M rows that is masterid column and gets about 5-6k
inserts during the business day, updates and deletes never happen.
The details table get a lot of reads (dozens per second), and 90% of
all queries to it are "where masterid = @.blah".
Currently, it is clustered by detailid (surrogate primary key) and has
a non clustered index on detailid column. The performance is
relatively good, but then our server is really top notch so it might
be worse on weaker servers of our clients.
The question is, given the query nature, would it be resonable to
cluster the table by masterid? or is the non-clustered index
sufficient to make reads by masterid fast?Sergey
What is the version of SQL SERVER?

> Currently, it is clustered by detailid (surrogate primary key) and has
> a non clustered index on detailid column.
Sorry, I don't understand, did you mean that uoi have clustered indec on
detailid column and non-clustered on masterid?

> all queries to it are "where masterid = @.blah".
How many rows do yuo get by using this condition?
"Sergei Shelukhin" <realgeek@.gmail.com> wrote in message
news:1177913754.037363.81800@.p77g2000hsh.googlegroups.com...
> Hi.
> We have a table called Master that has like ~400000 rows, and details
> table with ~5M rows that is masterid column and gets about 5-6k
> inserts during the business day, updates and deletes never happen.
> The details table get a lot of reads (dozens per second), and 90% of
> all queries to it are "where masterid = @.blah".
> Currently, it is clustered by detailid (surrogate primary key) and has
> a non clustered index on detailid column. The performance is
> relatively good, but then our server is really top notch so it might
> be worse on weaker servers of our clients.
> The question is, given the query nature, would it be resonable to
> cluster the table by masterid? or is the non-clustered index
> sufficient to make reads by masterid fast?
>|||Sergei
Thanks for putting a nice post together - you've made a good attempt at
covering what's important in youur post to enable us to give you a solid
answer. Hoever there are other pieces of information which are crucial for
anyone to give you a really comprehensive answer.
Firstly, you mentioned that 90% of all queries to it are "where masterid =
@.blah". This may be so, but queally important is which columns are accessed
by these queries? If * then the answer is very likely that yes, a CIX on
masterid is your best best. On the other hand, if 90% of your queries only
access a small subset of the columns, then it might not matter which column
the CIX is on as a NCIX might be much better.
Is it possible for you to post the DDL for these tables & examples of the
queries being run? It would also be helpful to know what types of queries
comprise the other 10% as although these might not be run as frequently,
it's possible they might still represent an abnormally large workload if
they run inefficiently.
Regards,
Regards,
Greg Linwood
SQL Server MVP
http://www.SQLBenchmarkPro.com
http://blogs.sqlserver.org.au/blogs/greg_linwood
"Sergei Shelukhin" <realgeek@.gmail.com> wrote in message
news:1177913754.037363.81800@.p77g2000hsh.googlegroups.com...
> Hi.
> We have a table called Master that has like ~400000 rows, and details
> table with ~5M rows that is masterid column and gets about 5-6k
> inserts during the business day, updates and deletes never happen.
> The details table get a lot of reads (dozens per second), and 90% of
> all queries to it are "where masterid = @.blah".
> Currently, it is clustered by detailid (surrogate primary key) and has
> a non clustered index on detailid column. The performance is
> relatively good, but then our server is really top notch so it might
> be worse on weaker servers of our clients.
> The question is, given the query nature, would it be resonable to
> cluster the table by masterid? or is the non-clustered index
> sufficient to make reads by masterid fast?
>|||Uri: SQL Server version is 2005; yeah masterid index is non-clustered,
my bad.
Select gets, on average, 10-30 rows, and the majority of data in the
table (almost select *, the only column omitted is masterid itself).
Sorry I don't have access to ddl here, it's at work.
Other 10% of queries are select top N * from details where userid =
@.userid order by detailid desc, N is currently 30. Userid is another
foreign key column that has non clustered index on it.
The thing about this query however is that it is unimportant and we
could cache it or abandon it altogether if it yields a significant
performance gain to the first one.|||Sergey
As Greag has already said
> if 90% of your queries only
>access a small subset of the columns, then it might not matter which column
>the CIX is on as a NCIX might be much better.
Well also take a look at INCLUDE clause a new feature that you may use to
cover other columns ( I don't know how many do you have?)
"Sergei Shelukhin" <realgeek@.gmail.com> wrote in message
news:1177919501.967547.319830@.p77g2000hsh.googlegroups.com...
> Uri: SQL Server version is 2005; yeah masterid index is non-clustered,
> my bad.
> Select gets, on average, 10-30 rows, and the majority of data in the
> table (almost select *, the only column omitted is masterid itself).
> Sorry I don't have access to ddl here, it's at work.
> Other 10% of queries are select top N * from details where userid =
> @.userid order by detailid desc, N is currently 30. Userid is another
> foreign key column that has non clustered index on it.
> The thing about this query however is that it is unimportant and we
> could cache it or abandon it altogether if it yields a significant
> performance gain to the first one.
>|||I think the problem with clustering the detail table on masterid is that you
will wind up page-splitting at the tail of the table as you add new rows in
because the masterid is (I assume) an identity column. So the detail table
will get all these 50/50 page splits as you add in new data. Since (per a
later post in this thread) you return 10-30 rows of data per detail read it
is likely not THAT much slower to use a heap table and a non-clustered index
or keep your clustered pk index on detailid. The bookmark lookup won't be
that costly and you keep very tight data pages and have reduced maintenance
as well.
TheSQLGuru
President
Indicium Resources, Inc.
"Sergei Shelukhin" <realgeek@.gmail.com> wrote in message
news:1177913754.037363.81800@.p77g2000hsh.googlegroups.com...
> Hi.
> We have a table called Master that has like ~400000 rows, and details
> table with ~5M rows that is masterid column and gets about 5-6k
> inserts during the business day, updates and deletes never happen.
> The details table get a lot of reads (dozens per second), and 90% of
> all queries to it are "where masterid = @.blah".
> Currently, it is clustered by detailid (surrogate primary key) and has
> a non clustered index on detailid column. The performance is
> relatively good, but then our server is really top notch so it might
> be worse on weaker servers of our clients.
> The question is, given the query nature, would it be resonable to
> cluster the table by masterid? or is the non-clustered index
> sufficient to make reads by masterid fast?
>|||>I think the problem with clustering the detail table on masterid is that you will wind up[v
bcol=seagreen]
>page-splitting at the tail of the table as you add new rows in because the
masterid is (I assume)
>an identity column.[/vbcol]
SQL Server won't page split 50/50 when you add new values for a monotonicall
y increasing index key.
It will just allocate new pages at the end of the linked list:
USE tempdb
CREATE TABLE a(c1 int identity, filler char(300) default 'hello')
INSERT INTO a SELECT TOP 20000 'hello' FROM syscolumns AS a, syscolumns AS b
SELECT * FROM sys. dm_db_index_physical_stats(DB_ID('tempdb
'), OBJECT_ID('a')
, NULL, NULL, NULL)
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://sqlblog.com/blogs/tibor_karaszi
"TheSQLGuru" <kgboles@.earthlink.net> wrote in message news:utRvh65iHHA.1220@.TK2MSFTNGP03.phx
.gbl...
>I think the problem with clustering the detail table on masterid is that yo
u will wind up
>page-splitting at the tail of the table as you add new rows in because the
masterid is (I assume)
>an identity column. So the detail table will get all these 50/50 page spli
ts as you add in new
>data. Since (per a later post in this thread) you return 10-30 rows of dat
a per detail read it is
>likely not THAT much slower to use a heap table and a non-clustered index o
r keep your clustered pk
>index on detailid. The bookmark lookup won't be that costly and you keep v
ery tight data pages and
>have reduced maintenance as well.
> --
> TheSQLGuru
> President
> Indicium Resources, Inc.
> "Sergei Shelukhin" <realgeek@.gmail.com> wrote in message
> news:1177913754.037363.81800@.p77g2000hsh.googlegroups.com...
>|||Here are the usage stats. I wonder what user_lookups is? Is it good
or bad? I know what scans and seeks are but have no idea of how the
lookup is different from seek.
updates seeks scans lookups
PK 22648 1551 1566 57490
TaskID_index 22648 57509 0 0
Also, Tibor: I guess the splitting can occur when details are added to
the old masters, but my guess is that a generous fillfactor setting
can cure this...|||> Also, Tibor: I guess the splitting can occur when details are added to
> the old masters, but my guess is that a generous fillfactor setting
> can cure this...
I'm not use I see the big picture here. I just wanted to point out that page
splits do not occur of
you add rows for a b-tree with increasing values. My apologies if I confused
the subject...

> Here are the usage stats. I wonder what user_lookups is?
This is when SQL Server finds a row in an NC index and then uses the row loc
ator to find the
corresponding row in the clustered index. AKA bookmark lookup. Can be expens
ive if low selectivity,
so covering indexes will eliminate this...
--
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://sqlblog.com/blogs/tibor_karaszi
"Sergei Shelukhin" <realgeek@.gmail.com> wrote in message
news:1178093405.536234.312520@.c35g2000hsg.googlegroups.com...
> Here are the usage stats. I wonder what user_lookups is? Is it good
> or bad? I know what scans and seeks are but have no idea of how the
> lookup is different from seek.
>
> updates seeks scans lookups
> PK 22648 1551 1566 57490
> TaskID_index 22648 57509 0 0
> Also, Tibor: I guess the splitting can occur when details are added to
> the old masters, but my guess is that a generous fillfactor setting
> can cure this...
>

Sunday, February 12, 2012

clustered index choice

No comments:

Post a Comment

Clustered DB Server

Blog Archive

About Me