Clustered DB Server: Clustered Indexes

I have a client who has several tables all of which have no Clustered Index
created on them. For the sake of discussion, let's look at one of the larger
tables 1 million plus rows. The only index is a priamry key constraint
creating a unique non clustered index with a fillfactor of 90. The key fields
are
memberid int 4,
serialnum char(17),
stocknum char(20)
This table every night runs through about 3000 update / inserts. None of the
updates include the key fields. Due to processing rules DTS is not used and
all operations are logged. One program running on the server (single
threaded) does all the updates.
I was under the impression that this index should be a clustered index.
I have read a couple of posts which would suggest I am wrong.
Does anyone have any input? Would it be correct to leave this table with no
clustered index. Note when they work on data it is by memberid as thing are
now there is no physical sort to the data files. If the clstered index was
just put on memberid SQL Server would append a row id so it's unique. How
much overhead is invloved in this?
I have never seen this approach before. Is leaving the table as a heap and
having only a non clustered index which would store a row locator as opposed
to the clustered key more efficient?
Thanks,
Jon A
Jon A wrote:
> I have a client who has several tables all of which have no Clustered
> Index created on them. For the sake of discussion, let's look at one
> of the larger tables 1 million plus rows. The only index is a priamry
<SNIP>
Depending entirely on how the table is used and what queries are running
against it. Using a natural key, like the one used in the table, as
clustered could cause page splitting if the table were insert heavy. It
could also help if many queries are run against the table that involve
sets of member ids.
I think your bigger problem is this:
"This table every night runs through about 3000 update / inserts. None
of the
updates include the key fields."
I would add that your key is very large (41 characters). If you make it
clustered, you'll add a lot of overhead to all non-clustered indexes,
should you ever add them.
I would leave the PK index as is right now and look into adding an
additional index to help the nightly updates and any other queries that
are run against the table that have no index to help them.
David Gugick
Imceda Software
www.imceda.com
|||"David Gugick" wrote:

> I would leave the PK index as is right now and look into adding an
> additional index to help the nightly updates and any other queries that
> are run against the table that have no index to help them.
>
I tend to be in agreement with leaving this primary key non clustered, but I
am wondering since all work with the data is done on member id would that be
a good choice for a clustered index and let SQL add the row id so it is
unique or just leave the table without a clustered index? Since the update
program pulls data by member is it more efficient to have it sorted that way.
And just use a small fill factor along with periodically updating stats and
performing index maintainance.
|||If MemberID is the column used most often to fetch and update the rows then
it makes a lot of sense to cluster on it. You can alleviate a bookmark
lookup with each call. 3000 updates a night is trivial and just make a
reasonable fill factor to account for some inserts and minor updates. Even
forgetting about all that a clustered index can actually increase
performance on inserts under the right conditions. And without a CI there
is no way to control fragmentation.
Andrew J. Kelly SQL MVP
"Jon A" <JonA@.discussions.microsoft.com> wrote in message
news:54770860-AA65-45A8-8C7D-F6169FD0AAFF@.microsoft.com...
> "David Gugick" wrote:
> I tend to be in agreement with leaving this primary key non clustered, but
> I
> am wondering since all work with the data is done on member id would that
> be
> a good choice for a clustered index and let SQL add the row id so it is
> unique or just leave the table without a clustered index? Since the update
> program pulls data by member is it more efficient to have it sorted that
> way.
> And just use a small fill factor along with periodically updating stats
> and
> performing index maintainance.
|||"Andrew J. Kelly" wrote:

> If MemberID is the column used most often to fetch and update the rows then
> it makes a lot of sense to cluster on it. You can alleviate a bookmark
> lookup with each call. 3000 updates a night is trivial and just make a
> reasonable fill factor to account for some inserts and minor updates. Even
> forgetting about all that a clustered index can actually increase
> performance on inserts under the right conditions. And without a CI there
> is no way to control fragmentation.
>
As I am looking through the code most of the updates and searches are done by
memberid int 4,
serialnum char(17),
stocknum char(20)
Should this then be clustered?
|||Member Id is the primary key...
Are the inserts (added members) monotonically increasing in value
(Incrementing) ?
If so, then inserts will probably not cause Page Splits. However, if this is
not the case Inserts will cause splitting.
Either way, updates, deletes will likely cause splits to occur.
eventually the table will become fragmented and you will not be able to
defrag it because it is a Heap (No clustered Index)
I highly recommend that you place "A" clustered index on the table (and all
tables for that matter).
You do NOT have to cluster on the Pkey. Cluster on the column that you most
often sort on or perform range searches on.
I just left a company that had 100% heaps in prod. Over time, our SQL
Database absolutely TANKED.
it was VERY difficult to create clusters AFTER The system went live. It
required downtime and a HUGE Level of effort to accomplish the task (Not to
mention a HUGE amount of political strife as our other DBA swore I was
smoking the crack pipe in my recommendation)
Since we finally put clusters and Index Maintenance plans in place, the prod
system is smoking.
cheers
Greg Jackson
Portland, Oregon
"Jon A" <JonA@.discussions.microsoft.com> wrote in message
news:59FB13D5-1B57-4E73-9545-30CC1CAA8C9F@.microsoft.com...
> "Andrew J. Kelly" wrote:
> As I am looking through the code most of the updates and searches are done
> by
> memberid int 4,
> serialnum char(17),
> stocknum char(20)
> Should this then be clustered?
|||Do you mean all three columns at once or each individually? If it is all
three then the other two are mostly useless if you were to include them in
the clustered index definition. That is due to the fact that MemberID is the
PK and there will only be 1 row that matches that ID. So I would make the
PK the CI and be done with it. If you also search on the other two
individually it would depend on how selective they are. How many rows are
there with each value?
Andrew J. Kelly SQL MVP
"Jon A" <JonA@.discussions.microsoft.com> wrote in message
news:59FB13D5-1B57-4E73-9545-30CC1CAA8C9F@.microsoft.com...
> "Andrew J. Kelly" wrote:
> As I am looking through the code most of the updates and searches are done
> by
> memberid int 4,
> serialnum char(17),
> stocknum char(20)
> Should this then be clustered?
|||Greg,
[vbcol=seagreen]
Actually Deletes will not cause a page split. Inserts and updates may cause
page splits but a proper fill factor and routine reindexing can alleviate
most of that.
Andrew J. Kelly SQL MVP
"pdxJaxon" <GregoryAJackson@.Hotmail.com> wrote in message
news:O25DkpvFFHA.3504@.TK2MSFTNGP12.phx.gbl...
> Member Id is the primary key...
> Are the inserts (added members) monotonically increasing in value
> (Incrementing) ?
> If so, then inserts will probably not cause Page Splits. However, if this
> is not the case Inserts will cause splitting.
> Either way, updates, deletes will likely cause splits to occur.
> eventually the table will become fragmented and you will not be able to
> defrag it because it is a Heap (No clustered Index)
> I highly recommend that you place "A" clustered index on the table (and
> all tables for that matter).
> You do NOT have to cluster on the Pkey. Cluster on the column that you
> most often sort on or perform range searches on.
> I just left a company that had 100% heaps in prod. Over time, our SQL
> Database absolutely TANKED.
> it was VERY difficult to create clusters AFTER The system went live. It
> required downtime and a HUGE Level of effort to accomplish the task (Not
> to mention a HUGE amount of political strife as our other DBA swore I was
> smoking the crack pipe in my recommendation)
> Since we finally put clusters and Index Maintenance plans in place, the
> prod system is smoking.
>
> cheers
> Greg Jackson
> Portland, Oregon
> "Jon A" <JonA@.discussions.microsoft.com> wrote in message
> news:59FB13D5-1B57-4E73-9545-30CC1CAA8C9F@.microsoft.com...
>
|||memberid int 4,
serialnum char(17),
stocknum char(20)
This is the primary key and defines uniqueness in the table. A member can
have multiple items and the same item may have multiple stock numbers.(please
note I didn't design this I am just trying to solve there issues of
performance. They do not want to ovewrhaul the design.) The primary key is
now non clustered. All lookups for updates or insert are done by searching on
these fields and ordering the results on these fields. The only thing is they
process a member at a time. The performance problem they have is as the
tables have grown, after one night of uploads they need to stop SQL server
and restart. Or the next nights run goes from 4 hours to 12 hours. The memory
in SQL grows until there is little left for the Web Server. I believe the
memory problem is due to the fragmentation of the tables due to being heap
tables. The machine is 2 processor P4 with 2 gig ram and a SCSI 3 disk raid 5
with 32 meg ram. The batch program runs on the server and is single
connection,single threaded, and nothing else is running(no lock problems).
Would it help the fragmentation to Cluster on memberid because as the batch
runs it goes a member at a time?
|||Jon A wrote:
> memberid int 4,
> serialnum char(17),
> stocknum char(20)
> This is the primary key and defines uniqueness in the table. A member
> can have multiple items and the same item may have multiple stock
> numbers.(please note I didn't design this I am just trying to solve
> there issues of performance. They do not want to ovewrhaul the
> design.) The primary key is now non clustered. All lookups for
> updates or insert are done by searching on these fields and ordering
> the results on these fields. The only thing is they process a member
> at a time. The performance problem they have is as the tables have
> grown, after one night of uploads they need to stop SQL server and
> restart. Or the next nights run goes from 4 hours to 12 hours. The
> memory in SQL grows until there is little left for the Web Server. I
> believe the memory problem is due to the fragmentation of the tables
> due to being heap tables. The machine is 2 processor P4 with 2 gig
> ram and a SCSI 3 disk raid 5 with 32 meg ram. The batch program runs
> on the server and is single connection,single threaded, and nothing
> else is running(no lock problems). Would it help the fragmentation to
> Cluster on memberid because as the batch runs it goes a member at a
> time?
First I would get rid of any sorting operations against this table.
ORDER BY clauses are rarely required by production applications and add
unnecessary overhead.
If you are processing a member id at a time (how are you doing this?),
then it does make some sense to use a clustered index. If you use a
set-based operation on the ID, having a clustered index will help a lot.
If you are accessing IDs one at a time, then it may not help much over a
non-clustered index as other processes may cause the disk heads to move
somewhere else between ID queries. At the very least, you'll get rid of
the bookmark lookup operation which can be responsible for 50% of a
query's overhead.
But the clustered index design does pose some problems. On one hand,
clustering the entire PK should help your queries and enforce the PK
constraint. On the other hand, large clustered keys cause non-clustered
indexes to grow (since the clustered key is part of the non-clustered
index). But since you have no non-clustered indexes right now, I would
cluster on the PK.
The memory issue is a configuration problem at your end. SQL Server uses
memory as it reads data, and generally does not release memory back into
the system unless specifically asked to do so by the OS. You are running
a web server on the same box. When running multiple applications on the
same box, it's imperative to set a maximum memory limit for SQL Server,
or you'll likely run into the situation you already see. That is, SQL
Server uses so much memory that other services are left memory-starved.
One reason SQL Server can chew up memory fast is when it runs many
unoptimized queries. As the data is read, it is loaded into memory.
Memory is not released, but the pages in memory are likely being swapped
as others are read. So check for queries that require tuning (like the
nightly upoptimized updates).
But the best way to avoid SQL Server using too much memory is to set a
memory limit.
David Gugick
Imceda Software
www.imceda.com

Tuesday, February 14, 2012

Clustered Indexes

No comments:

Post a Comment

Clustered DB Server

Blog Archive

About Me