Friday, February 24, 2012

Clustering Service not starting right away on One Node

I just set up a cluster attached to a SAN. I have had it where the cluster
service on one of the nodes doesn't start up right away. I have checked the
services to make sure that it is set to automatic which it is. Both nodes
are current with the latest patches and security updates. I'm a little
clueless as to why this is happening. Here is the real weird part after 1
minute the cluster service starts on the node that is giving me troubles.
The other node is perfectly fine.
Hi
If you do a fail-over using cluster admin, are there any resources (that SQL
server depends on) take a long to come online?
I have seen similar issues when the devices take long to come online due to
high SAN activity.
Does the SQL Server resource come online, but take a long time until it has
done it's recovery steps?
This can occur when the other node was de-porting it's devices and still had
IO pending. This results in not all pages being fluished to SAN, so SQL
Server has to do more recovery on the database start-up.
The best guage of how quickly a resource comes online is to look at cluster
admin during the failover.
Regards
Mike
Regards
Mike
"Thomas" wrote:

> I just set up a cluster attached to a SAN. I have had it where the cluster
> service on one of the nodes doesn't start up right away. I have checked the
> services to make sure that it is set to automatic which it is. Both nodes
> are current with the latest patches and security updates. I'm a little
> clueless as to why this is happening. Here is the real weird part after 1
> minute the cluster service starts on the node that is giving me troubles.
> The other node is perfectly fine.
|||We haven't installed SQL server yet. I should've posted that first. But I
know from passed installs that SQL does take some time to come online. The
SAN doesn't have much activity on it right now
"Mike Epprecht (SQL MVP)" wrote:
[vbcol=seagreen]
> Hi
> If you do a fail-over using cluster admin, are there any resources (that SQL
> server depends on) take a long to come online?
> I have seen similar issues when the devices take long to come online due to
> high SAN activity.
> Does the SQL Server resource come online, but take a long time until it has
> done it's recovery steps?
> This can occur when the other node was de-porting it's devices and still had
> IO pending. This results in not all pages being fluished to SAN, so SQL
> Server has to do more recovery on the database start-up.
> The best guage of how quickly a resource comes online is to look at cluster
> admin during the failover.
> Regards
> Mike
> Regards
> Mike
> "Thomas" wrote:
|||Hi
Have a look in your event logs and check the time differences between when
Node A shuts down and Node B notices it and starts up. There will be at least
15 event messages during this process. Post the information here so that I
can compare it to our big clusters.
Regards
Mike
"Thomas" wrote:
[vbcol=seagreen]
> We haven't installed SQL server yet. I should've posted that first. But I
> know from passed installs that SQL does take some time to come online. The
> SAN doesn't have much activity on it right now
> "Mike Epprecht (SQL MVP)" wrote:
|||That may be somewhat normal on a simultaneous startup. The first node grabs
the quorum device and owns the cluster but isn't talking on the network yet.
The second node tries to get the device but times out. Eventually the
service comes online and talks to the other node and agrees on who is in
charge. This is especially prevalent on SCSI-based clusters.
Check the System and Application event logs on both systems to see if there
are any unusual startup errors. Also, check what happens when the second
node is rebooted. If the cluster service does come online quickly, it is
just a device contention issue. I try and avoid powering up more than one
cluster node at a time.
Geoff N. Hiten
Microsoft SQL Server MVP
Senior Database Administrator
Careerbuilder.com
I support the Professional Association for SQL Server
www.sqlpass.org
"Thomas" <Thomas@.discussions.microsoft.com> wrote in message
news:AF61883D-AE50-4F3F-AD3F-3F07834A2B78@.microsoft.com...
> I just set up a cluster attached to a SAN. I have had it where the
cluster
> service on one of the nodes doesn't start up right away. I have checked
the
> services to make sure that it is set to automatic which it is. Both nodes
> are current with the latest patches and security updates. I'm a little
> clueless as to why this is happening. Here is the real weird part after 1
> minute the cluster service starts on the node that is giving me troubles.
> The other node is perfectly fine.
|||Node 2 which I haven't seen the problem with having ownership of the cluster.
When I reboot node 1 is when I see the problem of it taking 1 mintue to
start the cluster service.
"Geoff N. Hiten" wrote:

> That may be somewhat normal on a simultaneous startup. The first node grabs
> the quorum device and owns the cluster but isn't talking on the network yet.
> The second node tries to get the device but times out. Eventually the
> service comes online and talks to the other node and agrees on who is in
> charge. This is especially prevalent on SCSI-based clusters.
> Check the System and Application event logs on both systems to see if there
> are any unusual startup errors. Also, check what happens when the second
> node is rebooted. If the cluster service does come online quickly, it is
> just a device contention issue. I try and avoid powering up more than one
> cluster node at a time.
>
> --
> Geoff N. Hiten
> Microsoft SQL Server MVP
> Senior Database Administrator
> Careerbuilder.com
> I support the Professional Association for SQL Server
> www.sqlpass.org
> "Thomas" <Thomas@.discussions.microsoft.com> wrote in message
> news:AF61883D-AE50-4F3F-AD3F-3F07834A2B78@.microsoft.com...
> cluster
> the
>
>
|||Node order is arbitrary in a cluster. We could use Node X and Node Y
instead of Node 1 and Node 2.
Try manually stopping and starting the cluster service on Node 1. If it
restarts quickly, then the problem likely is one of the services that the
cluster service depends on. Time service is a usual suspect for that, but
you will have to check the entire list. Again, the Application and System
event logs are your friends here.
Now is the time to deal with this issue, not after you load SQL and get this
baby into production.
Geoff N. Hiten
Microsoft SQL Server MVP
Senior Database Administrator
Careerbuilder.com
I support the Professional Association for SQL Server
www.sqlpass.org
"Thomas" <Thomas@.discussions.microsoft.com> wrote in message
news:29B2402A-537B-4DCE-A137-FA200DD28872@.microsoft.com...
> Node 2 which I haven't seen the problem with having ownership of the
cluster.[vbcol=seagreen]
> When I reboot node 1 is when I see the problem of it taking 1 mintue to
> start the cluster service.
> "Geoff N. Hiten" wrote:
grabs[vbcol=seagreen]
yet.[vbcol=seagreen]
there[vbcol=seagreen]
second[vbcol=seagreen]
is[vbcol=seagreen]
one[vbcol=seagreen]
checked[vbcol=seagreen]
nodes[vbcol=seagreen]
little[vbcol=seagreen]
after 1[vbcol=seagreen]
troubles.[vbcol=seagreen]

No comments:

Post a Comment