Sunday, February 12, 2012

clustered index on a VERY big table

I already posted this over on sqlteam so don't peek there if you haven't seen that post yet. :)

So now to the question:

Anyone care to guess how long it took me to build a clustered index on a table with 900 million rows? This is the largest amount of data in a single table I have had to work with thus far in my career! It's sorta fun to work with such large datasets. :)

Some details:

1. running sql 2005 on a dual proc 32bit server, 8gb ram, hyperthreaded, 3ghz clock. disk is a decent SAN, not sure of the specs though.

2. ddl for table:

CREATE TABLE [dbo].[fld](
[id] [bigint] NOT NULL,
[id2] [tinyint] NOT NULL,
[extid] [bigint] NOT NULL,
[dd] [bit] NOT NULL,
[mp] [tinyint] NOT NULL,
[ss] [tinyint] NOT NULL,
[cc] [datetime] NOT NULL,
[ff] [tinyint] NOT NULL,
[mm] [smallint] NOT NULL,
[ds] [smallint] NOT NULL
)

3. ddl for index (this is the only index on the table):

CREATE CLUSTERED INDEX [CIfld]
ON [dbo].[fld]
(
extid asc
)WITH (FILLFACTOR=100, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF)

4. extid column was not sorted to begin with. ordering was completely random.

Note that I have changed the column names, etc, to protect the innocent. I can't go into details about what it's for or I'd be violating NDA type stuff.ok, I'll just tell you. :)

it took a mere 1hr 14min.

pretty fast for sorting and rearranging 900m rows I thought!|||Not too shabby! Also, why are you using 100% fillfactor?|||There are no variable length fields, so you may as well, unless there are large numbers of inserts in between existing numbers. Is extid supposed to be an identity field?|||What happens if you load it with now index, unloaded with QUERYOUT sorted by the id, then reload it, and then build the index?|||why are you using 100% fillfactor?

because this table is not transactional. no inserts after the index is built, only selects

Is extid supposed to be an identity field?

No. It's not a unique column. However is highly selective. requirement was to return ranges of extid also ordered by extid, I found reads were fastest if I built a clustered index on this column just for this purpose.

What happens if you load it with now index, unloaded with QUERYOUT sorted by the id, then reload it, and then build the index?

That could be something to try if I ever need to do this again. I didn't look at other methods because 1.25 hrs was more than acceptable for building this index. :)

No comments:

Post a Comment