Clustered DB Server: Cluster will not fail over.

We had an error over the weekend of mass porportions(Sunday 3pm PST). Long story short; the model database was detached and the SQL Server was stopped, with it still detached. This happened to happen on our primary Production Database Clustered Server which is the bread-n-butter of the compay. (OUCH!)
It was time for some fast actions. We started the re-install SQL Server. In order to do so, the previous install had to be uninstalled. This seemed to go smoothly enough, but when re-applying the SP3a, we encountered an error. After researching the error, apparently in a clustered environment this will occur since the SP3a files still reside on the node(s). Microsoft states that if within a particular log file it results with an 'Installation was Successful', to disregard the error. I double checked the log file and sure enough the error was disregarded.
We moved along with the installation. We were able to restore all the user databases and all system databases with the exception of the master database. Unfortunately, even with starting SQL Server in single-user mode, the restore of the master database would not take. So it was not restored, but all other databases were. Fortunately, I ran a quick script to recover all the user logins previous to the disaster, which I reapplied to the new installation of SQL Server. Everything came back up and the QA Team successfully tested the production Application (Monday 4am PST). (Fhweeh)
After the succesful testing of the production environment, we tested the fail-over which resulted in SQL Server not starting on the secondary node. All the resources came right up on it, but not SQL Server. The only error that was that it was not able to locate the file on 'O\logs\mastlog.ldf'. This error did not make sense since SQL Server uses the same file for the primary node. We were pressed for time since it was closing to start of business East Coast time, so we left the server as is.
Throughout the day there were other issues that arose, one in particular was certain systems were not able to connect to the server via TCP/IP. In order to have them connect they needed to create an alias of the server and use Name Pipes. This seems to be a rising concern because there are users who need to connect via ODBC to a widely used particular Access Application, which seems to only like the TCP/IP route. I am somewhat sure this is related to the cluster failure.
Anyway, this is the first time I've had to take a breathe to revisit the problem at hand. We have been dealing with another server that crashed on the same day, resulting in a brand new build of a SQL Server Cluster environment (completely non related to the issue at hand).
I'm sorry for the long winded story. Would you have any idea as to why the cluster would fail on failover along with the TCP/IP issue?
Thanks in Advanced..
95% of the systems that were not able to connect to the server via TCP/IP
were Windows 2003 Server systems.
"Admiral" <admiral@.blackopsplatoon.com> wrote in message
news:uoXFWWbBGHA.216@.TK2MSFTNGP15.phx.gbl...
We had an error over the weekend of mass porportions(Sunday 3pm PST). Long
story short; the model database was detached and the SQL Server was stopped,
with it still detached. This happened to happen on our primary Production
Database Clustered Server which is the bread-n-butter of the compay.
(OUCH!)
It was time for some fast actions. We started the re-install SQL Server.
In order to do so, the previous install had to be uninstalled. This seemed
to go smoothly enough, but when re-applying the SP3a, we encountered an
error. After researching the error, apparently in a clustered environment
this will occur since the SP3a files still reside on the node(s).
Microsoft states that if within a particular log file it results with an
'Installation was Successful', to disregard the error. I double checked the
log file and sure enough the error was disregarded.
We moved along with the installation. We were able to restore all the user
databases and all system databases with the exception of the master
database. Unfortunately, even with starting SQL Server in single-user mode,
the restore of the master database would not take. So it was not restored,
but all other databases were. Fortunately, I ran a quick script to recover
all the user logins previous to the disaster, which I reapplied to the new
installation of SQL Server. Everything came back up and the QA Team
successfully tested the production Application (Monday 4am PST). (Fhweeh)
After the succesful testing of the production environment, we tested the
fail-over which resulted in SQL Server not starting on the secondary node.
All the resources came right up on it, but not SQL Server. The only error
that was that it was not able to locate the file on 'O\logs\mastlog.ldf'.
This error did not make sense since SQL Server uses the same file for the
primary node. We were pressed for time since it was closing to start of
business East Coast time, so we left the server as is.
Throughout the day there were other issues that arose, one in particular was
certain systems were not able to connect to the server via TCP/IP. In order
to have them connect they needed to create an alias of the server and use
Name Pipes. This seems to be a rising concern because there are users who
need to connect via ODBC to a widely used particular Access Application,
which seems to only like the TCP/IP route. I am somewhat sure this is
related to the cluster failure.
Anyway, this is the first time I've had to take a breathe to revisit the
problem at hand. We have been dealing with another server that crashed on
the same day, resulting in a brand new build of a SQL Server Cluster
environment (completely non related to the issue at hand).
I'm sorry for the long winded story. Would you have any idea as to why the
cluster would fail on failover along with the TCP/IP issue?
Thanks in Advanced..
|||Continued research..
I failed to connect thru isql + IP connection via command line, which leads
me to believe that TCP/IP is not correct.
Thanks again and please bare with me, I haven't slept since Saturday night.
"Admiral" <admiral@.blackopsplatoon.com> wrote in message
news:un7D9ZbBGHA.2840@.TK2MSFTNGP12.phx.gbl...
> 95% of the systems that were not able to connect to the server via TCP/IP
> were Windows 2003 Server systems.
>
> "Admiral" <admiral@.blackopsplatoon.com> wrote in message
> news:uoXFWWbBGHA.216@.TK2MSFTNGP15.phx.gbl...
> We had an error over the weekend of mass porportions(Sunday 3pm PST).
> Long story short; the model database was detached and the SQL Server was
> stopped, with it still detached. This happened to happen on our primary
> Production Database Clustered Server which is the bread-n-butter of the
> compay. (OUCH!)
> It was time for some fast actions. We started the re-install SQL Server.
> In order to do so, the previous install had to be uninstalled. This
> seemed to go smoothly enough, but when re-applying the SP3a, we
> encountered an error. After researching the error, apparently in a
> clustered environment this will occur since the SP3a files still reside on
> the node(s). Microsoft states that if within a particular log file it
> results with an 'Installation was Successful', to disregard the error. I
> double checked the log file and sure enough the error was disregarded.
> We moved along with the installation. We were able to restore all the
> user databases and all system databases with the exception of the master
> database. Unfortunately, even with starting SQL Server in single-user
> mode, the restore of the master database would not take. So it was not
> restored, but all other databases were. Fortunately, I ran a quick script
> to recover all the user logins previous to the disaster, which I reapplied
> to the new installation of SQL Server. Everything came back up and the QA
> Team successfully tested the production Application (Monday 4am PST).
> (Fhweeh)
> After the succesful testing of the production environment, we tested the
> fail-over which resulted in SQL Server not starting on the secondary node.
> All the resources came right up on it, but not SQL Server. The only error
> that was that it was not able to locate the file on 'O\logs\mastlog.ldf'.
> This error did not make sense since SQL Server uses the same file for the
> primary node. We were pressed for time since it was closing to start of
> business East Coast time, so we left the server as is.
> Throughout the day there were other issues that arose, one in particular
> was certain systems were not able to connect to the server via TCP/IP. In
> order to have them connect they needed to create an alias of the server
> and use Name Pipes. This seems to be a rising concern because there are
> users who need to connect via ODBC to a widely used particular Access
> Application, which seems to only like the TCP/IP route. I am somewhat
> sure this is related to the cluster failure.
> Anyway, this is the first time I've had to take a breathe to revisit the
> problem at hand. We have been dealing with another server that crashed on
> the same day, resulting in a brand new build of a SQL Server Cluster
> environment (completely non related to the issue at hand).
> I'm sorry for the long winded story. Would you have any idea as to why
> the cluster would fail on failover along with the TCP/IP issue?
> Thanks in Advanced..
>
>
>
|||First, what you should have done with a blown model DB:
Start SQL Server in single user mode with trace flag -T3608. This stops SQL from recovering anything except the master database. Reattach Model. If necessary, use files copied from another installation at the exact same SP and Hotfix level. Stop SQL Server and restart normally. Sorry, but it really is that simple. Oh, and lock whoever detached "model" out of the system. HE is too dangerous to allow near your system.
You didn't mention whether you blew the cluster away or not or just rebuilt SQL. If you blew the cluster away, make sure that each disk resource has the same drive letter on all nodes and the disk resources fail over correctly from node to node. Stop the resource group, move it, and start each disk resource independently to test.
The Named Pipes only issue sounds like an incomplete SP3a install. Windows 2003 will prevent TCP/IP access if it detects a pre-SP3a SQL installation. Follow this article and re-apply SP3a.
http://support.microsoft.com/default...b;en-us;815431
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP
"Admiral" <admiral@.blackopsplatoon.com> wrote in message news:uoXFWWbBGHA.216@.TK2MSFTNGP15.phx.gbl...
We had an error over the weekend of mass porportions(Sunday 3pm PST). Long story short; the model database was detached and the SQL Server was stopped, with it still detached. This happened to happen on our primary Production Database Clustered Server which is the bread-n-butter of the compay. (OUCH!)
It was time for some fast actions. We started the re-install SQL Server. In order to do so, the previous install had to be uninstalled. This seemed to go smoothly enough, but when re-applying the SP3a, we encountered an error. After researching the error, apparently in a clustered environment this will occur since the SP3a files still reside on the node(s). Microsoft states that if within a particular log file it results with an 'Installation was Successful', to disregard the error. I double checked the log file and sure enough the error was disregarded.
We moved along with the installation. We were able to restore all the user databases and all system databases with the exception of the master database. Unfortunately, even with starting SQL Server in single-user mode, the restore of the master database would not take. So it was not restored, but all other databases were. Fortunately, I ran a quick script to recover all the user logins previous to the disaster, which I reapplied to the new installation of SQL Server. Everything came back up and the QA Team successfully tested the production Application (Monday 4am PST). (Fhweeh)
After the succesful testing of the production environment, we tested the fail-over which resulted in SQL Server not starting on the secondary node. All the resources came right up on it, but not SQL Server. The only error that was that it was not able to locate the file on 'O\logs\mastlog.ldf'. This error did not make sense since SQL Server uses the same file for the primary node. We were pressed for time since it was closing to start of business East Coast time, so we left the server as is.
Throughout the day there were other issues that arose, one in particular was certain systems were not able to connect to the server via TCP/IP. In order to have them connect they needed to create an alias of the server and use Name Pipes. This seems to be a rising concern because there are users who need to connect via ODBC to a widely used particular Access Application, which seems to only like the TCP/IP route. I am somewhat sure this is related to the cluster failure.
Anyway, this is the first time I've had to take a breathe to revisit the problem at hand. We have been dealing with another server that crashed on the same day, resulting in a brand new build of a SQL Server Cluster environment (completely non related to the issue at hand).
I'm sorry for the long winded story. Would you have any idea as to why the cluster would fail on failover along with the TCP/IP issue?
Thanks in Advanced..
|||This information helps.
Try this article:
http://support.microsoft.com/default...b;en-us;555017
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP
"Admiral" <admiral@.blackopsplatoon.com> wrote in message
news:un7D9ZbBGHA.2840@.TK2MSFTNGP12.phx.gbl...
> 95% of the systems that were not able to connect to the server via TCP/IP
> were Windows 2003 Server systems.
>
> "Admiral" <admiral@.blackopsplatoon.com> wrote in message
> news:uoXFWWbBGHA.216@.TK2MSFTNGP15.phx.gbl...
> We had an error over the weekend of mass porportions(Sunday 3pm PST).
> Long story short; the model database was detached and the SQL Server was
> stopped, with it still detached. This happened to happen on our primary
> Production Database Clustered Server which is the bread-n-butter of the
> compay. (OUCH!)
> It was time for some fast actions. We started the re-install SQL Server.
> In order to do so, the previous install had to be uninstalled. This
> seemed to go smoothly enough, but when re-applying the SP3a, we
> encountered an error. After researching the error, apparently in a
> clustered environment this will occur since the SP3a files still reside on
> the node(s). Microsoft states that if within a particular log file it
> results with an 'Installation was Successful', to disregard the error. I
> double checked the log file and sure enough the error was disregarded.
> We moved along with the installation. We were able to restore all the
> user databases and all system databases with the exception of the master
> database. Unfortunately, even with starting SQL Server in single-user
> mode, the restore of the master database would not take. So it was not
> restored, but all other databases were. Fortunately, I ran a quick script
> to recover all the user logins previous to the disaster, which I reapplied
> to the new installation of SQL Server. Everything came back up and the QA
> Team successfully tested the production Application (Monday 4am PST).
> (Fhweeh)
> After the succesful testing of the production environment, we tested the
> fail-over which resulted in SQL Server not starting on the secondary node.
> All the resources came right up on it, but not SQL Server. The only error
> that was that it was not able to locate the file on 'O\logs\mastlog.ldf'.
> This error did not make sense since SQL Server uses the same file for the
> primary node. We were pressed for time since it was closing to start of
> business East Coast time, so we left the server as is.
> Throughout the day there were other issues that arose, one in particular
> was certain systems were not able to connect to the server via TCP/IP. In
> order to have them connect they needed to create an alias of the server
> and use Name Pipes. This seems to be a rising concern because there are
> users who need to connect via ODBC to a widely used particular Access
> Application, which seems to only like the TCP/IP route. I am somewhat
> sure this is related to the cluster failure.
> Anyway, this is the first time I've had to take a breathe to revisit the
> problem at hand. We have been dealing with another server that crashed on
> the same day, resulting in a brand new build of a SQL Server Cluster
> environment (completely non related to the issue at hand).
> I'm sorry for the long winded story. Would you have any idea as to why
> the cluster would fail on failover along with the TCP/IP issue?
> Thanks in Advanced..
>
>
>
|||Geoff,
I truly appreciate your response. I actually did try to reattach the model
using this method, unfortunately everytime I would attemp to attach the
model db, SQL Server would immediately turn off. At the time I was really
pressed for time, which led to the decision for a re-install. I do believe
there is more to the story that was not told to me.
Also, I do apologize if I led you to believe that the cluster is on a
Windows 2003 platform. It actually resides on a Windows 2000 Advanced
Server. The link you provided me, should still work for W2k? Like I
mentioned when it rains it pours and I've been putting out too many fires
for a Christmas week. I can't thank you enough for the response.
"Geoff N. Hiten" <SQLCraftsman@.gmail.com> wrote in message
news:OFkKlhbBGHA.1088@.tk2msftngp13.phx.gbl...
First, what you should have done with a blown model DB:
Start SQL Server in single user mode with trace flag -T3608. This stops SQL
from recovering anything except the master database. Reattach Model. If
necessary, use files copied from another installation at the exact same SP
and Hotfix level. Stop SQL Server and restart normally. Sorry, but it
really is that simple. Oh, and lock whoever detached "model" out of the
system. HE is too dangerous to allow near your system.
You didn't mention whether you blew the cluster away or not or just rebuilt
SQL. If you blew the cluster away, make sure that each disk resource has
the same drive letter on all nodes and the disk resources fail over
correctly from node to node. Stop the resource group, move it, and start
each disk resource independently to test.
The Named Pipes only issue sounds like an incomplete SP3a install. Windows
2003 will prevent TCP/IP access if it detects a pre-SP3a SQL installation.
Follow this article and re-apply SP3a.
http://support.microsoft.com/default...b;en-us;815431
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP
"Admiral" <admiral@.blackopsplatoon.com> wrote in message
news:uoXFWWbBGHA.216@.TK2MSFTNGP15.phx.gbl...
We had an error over the weekend of mass porportions(Sunday 3pm PST).
Long story short; the model database was detached and the SQL Server was
stopped, with it still detached. This happened to happen on our primary
Production Database Clustered Server which is the bread-n-butter of the
compay. (OUCH!)
It was time for some fast actions. We started the re-install SQL Server.
In order to do so, the previous install had to be uninstalled. This seemed
to go smoothly enough, but when re-applying the SP3a, we encountered an
error. After researching the error, apparently in a clustered environment
this will occur since the SP3a files still reside on the node(s).
Microsoft states that if within a particular log file it results with an
'Installation was Successful', to disregard the error. I double checked the
log file and sure enough the error was disregarded.
We moved along with the installation. We were able to restore all the
user databases and all system databases with the exception of the master
database. Unfortunately, even with starting SQL Server in single-user mode,
the restore of the master database would not take. So it was not restored,
but all other databases were. Fortunately, I ran a quick script to recover
all the user logins previous to the disaster, which I reapplied to the new
installation of SQL Server. Everything came back up and the QA Team
successfully tested the production Application (Monday 4am PST). (Fhweeh)
After the succesful testing of the production environment, we tested the
fail-over which resulted in SQL Server not starting on the secondary node.
All the resources came right up on it, but not SQL Server. The only error
that was that it was not able to locate the file on 'O\logs\mastlog.ldf'.
This error did not make sense since SQL Server uses the same file for the
primary node. We were pressed for time since it was closing to start of
business East Coast time, so we left the server as is.
Throughout the day there were other issues that arose, one in particular
was certain systems were not able to connect to the server via TCP/IP. In
order to have them connect they needed to create an alias of the server and
use Name Pipes. This seems to be a rising concern because there are users
who need to connect via ODBC to a widely used particular Access Application,
which seems to only like the TCP/IP route. I am somewhat sure this is
related to the cluster failure.
Anyway, this is the first time I've had to take a breathe to revisit the
problem at hand. We have been dealing with another server that crashed on
the same day, resulting in a brand new build of a SQL Server Cluster
environment (completely non related to the issue at hand).
I'm sorry for the long winded story. Would you have any idea as to why
the cluster would fail on failover along with the TCP/IP issue?
Thanks in Advanced..
|||The process is unnecessary for Windows 2000. The enhanced security of
Windows 2003 requires the extra steps.
Try and run comclust.exe on each node to set MSDTC to cluster mode. It
can't hurt and it may help. It definitely sounds like a "I didn't touch
anything" situation where you aren't getting the full story.
This one may be worth opening a PSS case. They can send you a diagnostic
package that will tell them exactly what is broken and they can walk you
through fixing it.
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP
"Admiral" <admiral@.blackopsplatoon.com> wrote in message
news:uXPvMDkBGHA.628@.TK2MSFTNGP14.phx.gbl...
> Geoff,
> I truly appreciate your response. I actually did try to reattach the
> model using this method, unfortunately everytime I would attemp to attach
> the model db, SQL Server would immediately turn off. At the time I was
> really pressed for time, which led to the decision for a re-install. I do
> believe there is more to the story that was not told to me.
> Also, I do apologize if I led you to believe that the cluster is on a
> Windows 2003 platform. It actually resides on a Windows 2000 Advanced
> Server. The link you provided me, should still work for W2k? Like I
> mentioned when it rains it pours and I've been putting out too many fires
> for a Christmas week. I can't thank you enough for the response.
>
>
> "Geoff N. Hiten" <SQLCraftsman@.gmail.com> wrote in message
> news:OFkKlhbBGHA.1088@.tk2msftngp13.phx.gbl...
> First, what you should have done with a blown model DB:
> Start SQL Server in single user mode with trace flag -T3608. This stops
> SQL from recovering anything except the master database. Reattach Model.
> If necessary, use files copied from another installation at the exact same
> SP and Hotfix level. Stop SQL Server and restart normally. Sorry, but it
> really is that simple. Oh, and lock whoever detached "model" out of the
> system. HE is too dangerous to allow near your system.
> You didn't mention whether you blew the cluster away or not or just
> rebuilt SQL. If you blew the cluster away, make sure that each disk
> resource has the same drive letter on all nodes and the disk resources
> fail over correctly from node to node. Stop the resource group, move it,
> and start each disk resource independently to test.
> The Named Pipes only issue sounds like an incomplete SP3a install.
> Windows 2003 will prevent TCP/IP access if it detects a pre-SP3a SQL
> installation. Follow this article and re-apply SP3a.
> http://support.microsoft.com/default...b;en-us;815431
>
> --
> Geoff N. Hiten
> Senior Database Administrator
> Microsoft SQL Server MVP
>
>
> "Admiral" <admiral@.blackopsplatoon.com> wrote in message
> news:uoXFWWbBGHA.216@.TK2MSFTNGP15.phx.gbl...
> We had an error over the weekend of mass porportions(Sunday 3pm PST).
> Long story short; the model database was detached and the SQL Server was
> stopped, with it still detached. This happened to happen on our primary
> Production Database Clustered Server which is the bread-n-butter of the
> compay. (OUCH!)
> It was time for some fast actions. We started the re-install SQL Server.
> In order to do so, the previous install had to be uninstalled. This
> seemed to go smoothly enough, but when re-applying the SP3a, we
> encountered an error. After researching the error, apparently in a
> clustered environment this will occur since the SP3a files still reside on
> the node(s). Microsoft states that if within a particular log file it
> results with an 'Installation was Successful', to disregard the error. I
> double checked the log file and sure enough the error was disregarded.
> We moved along with the installation. We were able to restore all the
> user databases and all system databases with the exception of the master
> database. Unfortunately, even with starting SQL Server in single-user
> mode, the restore of the master database would not take. So it was not
> restored, but all other databases were. Fortunately, I ran a quick script
> to recover all the user logins previous to the disaster, which I reapplied
> to the new installation of SQL Server. Everything came back up and the QA
> Team successfully tested the production Application (Monday 4am PST).
> (Fhweeh)
> After the succesful testing of the production environment, we tested the
> fail-over which resulted in SQL Server not starting on the secondary node.
> All the resources came right up on it, but not SQL Server. The only error
> that was that it was not able to locate the file on 'O\logs\mastlog.ldf'.
> This error did not make sense since SQL Server uses the same file for the
> primary node. We were pressed for time since it was closing to start of
> business East Coast time, so we left the server as is.
> Throughout the day there were other issues that arose, one in particular
> was certain systems were not able to connect to the server via TCP/IP. In
> order to have them connect they needed to create an alias of the server
> and use Name Pipes. This seems to be a rising concern because there are
> users who need to connect via ODBC to a widely used particular Access
> Application, which seems to only like the TCP/IP route. I am somewhat
> sure this is related to the cluster failure.
> Anyway, this is the first time I've had to take a breathe to revisit the
> problem at hand. We have been dealing with another server that crashed on
> the same day, resulting in a brand new build of a SQL Server Cluster
> environment (completely non related to the issue at hand).
> I'm sorry for the long winded story. Would you have any idea as to why
> the cluster would fail on failover along with the TCP/IP issue?
> Thanks in Advanced..
>
>
>
|||ouch...
this situation happened a while ago for us. one of our techs dettached both
the model AND msdb database (whilst in single user mode) and then sql server
was shut down. Well it would not start up.
I managed to execute the following command on the server itself:
sqlservr -c -f -s <instancename> /T3608
This worked bringing up the sql server in minimal mode. Thereupon i used
query analyser to connect and reattached both the msdb and model databases.
All was back to normal
Best of luck!
john
"Admiral" <admiral@.blackopsplatoon.com> wrote in message
news:uXPvMDkBGHA.628@.TK2MSFTNGP14.phx.gbl...
> Geoff,
> I truly appreciate your response. I actually did try to reattach the
> model using this method, unfortunately everytime I would attemp to attach
> the model db, SQL Server would immediately turn off. At the time I was
> really pressed for time, which led to the decision for a re-install. I do
> believe there is more to the story that was not told to me.
> Also, I do apologize if I led you to believe that the cluster is on a
> Windows 2003 platform. It actually resides on a Windows 2000 Advanced
> Server. The link you provided me, should still work for W2k? Like I
> mentioned when it rains it pours and I've been putting out too many fires
> for a Christmas week. I can't thank you enough for the response.
>
>
> "Geoff N. Hiten" <SQLCraftsman@.gmail.com> wrote in message
> news:OFkKlhbBGHA.1088@.tk2msftngp13.phx.gbl...
> First, what you should have done with a blown model DB:
> Start SQL Server in single user mode with trace flag -T3608. This stops
> SQL from recovering anything except the master database. Reattach Model.
> If necessary, use files copied from another installation at the exact same
> SP and Hotfix level. Stop SQL Server and restart normally. Sorry, but it
> really is that simple. Oh, and lock whoever detached "model" out of the
> system. HE is too dangerous to allow near your system.
> You didn't mention whether you blew the cluster away or not or just
> rebuilt SQL. If you blew the cluster away, make sure that each disk
> resource has the same drive letter on all nodes and the disk resources
> fail over correctly from node to node. Stop the resource group, move it,
> and start each disk resource independently to test.
> The Named Pipes only issue sounds like an incomplete SP3a install.
> Windows 2003 will prevent TCP/IP access if it detects a pre-SP3a SQL
> installation. Follow this article and re-apply SP3a.
> http://support.microsoft.com/default...b;en-us;815431
>
> --
> Geoff N. Hiten
> Senior Database Administrator
> Microsoft SQL Server MVP
>
>
> "Admiral" <admiral@.blackopsplatoon.com> wrote in message
> news:uoXFWWbBGHA.216@.TK2MSFTNGP15.phx.gbl...
> We had an error over the weekend of mass porportions(Sunday 3pm PST).
> Long story short; the model database was detached and the SQL Server was
> stopped, with it still detached. This happened to happen on our primary
> Production Database Clustered Server which is the bread-n-butter of the
> compay. (OUCH!)
> It was time for some fast actions. We started the re-install SQL Server.
> In order to do so, the previous install had to be uninstalled. This
> seemed to go smoothly enough, but when re-applying the SP3a, we
> encountered an error. After researching the error, apparently in a
> clustered environment this will occur since the SP3a files still reside on
> the node(s). Microsoft states that if within a particular log file it
> results with an 'Installation was Successful', to disregard the error. I
> double checked the log file and sure enough the error was disregarded.
> We moved along with the installation. We were able to restore all the
> user databases and all system databases with the exception of the master
> database. Unfortunately, even with starting SQL Server in single-user
> mode, the restore of the master database would not take. So it was not
> restored, but all other databases were. Fortunately, I ran a quick script
> to recover all the user logins previous to the disaster, which I reapplied
> to the new installation of SQL Server. Everything came back up and the QA
> Team successfully tested the production Application (Monday 4am PST).
> (Fhweeh)
> After the succesful testing of the production environment, we tested the
> fail-over which resulted in SQL Server not starting on the secondary node.
> All the resources came right up on it, but not SQL Server. The only error
> that was that it was not able to locate the file on 'O\logs\mastlog.ldf'.
> This error did not make sense since SQL Server uses the same file for the
> primary node. We were pressed for time since it was closing to start of
> business East Coast time, so we left the server as is.
> Throughout the day there were other issues that arose, one in particular
> was certain systems were not able to connect to the server via TCP/IP. In
> order to have them connect they needed to create an alias of the server
> and use Name Pipes. This seems to be a rising concern because there are
> users who need to connect via ODBC to a widely used particular Access
> Application, which seems to only like the TCP/IP route. I am somewhat
> sure this is related to the cluster failure.
> Anyway, this is the first time I've had to take a breathe to revisit the
> problem at hand. We have been dealing with another server that crashed on
> the same day, resulting in a brand new build of a SQL Server Cluster
> environment (completely non related to the issue at hand).
> I'm sorry for the long winded story. Would you have any idea as to why
> the cluster would fail on failover along with the TCP/IP issue?
> Thanks in Advanced..
>
>
>
|||I will give comclust.exe an attempt. I am looking into a solution for this
and of course if needed we will open a PSS case. I do have the impression
that SP3 was not fully installed even though it stated on the Microsoft site
otherwise. Is it worth a try to use the suggested stated on the link you
provided?
I honestly did not have time to investigate as to why the cluster would not
fail over. I quickly did give it a try to make sure the disks failed over
successfully, but will do so again tonight.
I thank you again for your continued help.
"Geoff N. Hiten" <SQLCraftsman@.gmail.com> wrote in message
news:OmJuIWkBGHA.2704@.TK2MSFTNGP15.phx.gbl...
> The process is unnecessary for Windows 2000. The enhanced security of
> Windows 2003 requires the extra steps.
> Try and run comclust.exe on each node to set MSDTC to cluster mode. It
> can't hurt and it may help. It definitely sounds like a "I didn't touch
> anything" situation where you aren't getting the full story.
> This one may be worth opening a PSS case. They can send you a diagnostic
> package that will tell them exactly what is broken and they can walk you
> through fixing it.
> --
> Geoff N. Hiten
> Senior Database Administrator
> Microsoft SQL Server MVP
>
> "Admiral" <admiral@.blackopsplatoon.com> wrote in message
> news:uXPvMDkBGHA.628@.TK2MSFTNGP14.phx.gbl...
>
|||The named pipes alias is necessary for Windows 2003. It is not necessary
for Windows 2000. I would definitely try and reinstall SP3a and see if it
helps.
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP
"Admiral" <admiral@.blackopsplatoon.com> wrote in message
news:OdF8h1kBGHA.2704@.TK2MSFTNGP15.phx.gbl...
>I will give comclust.exe an attempt. I am looking into a solution for this
>and of course if needed we will open a PSS case. I do have the impression
>that SP3 was not fully installed even though it stated on the Microsoft
>site otherwise. Is it worth a try to use the suggested stated on the link
>you provided?
> I honestly did not have time to investigate as to why the cluster would
> not fail over. I quickly did give it a try to make sure the disks failed
> over successfully, but will do so again tonight.
> I thank you again for your continued help.
>
> "Geoff N. Hiten" <SQLCraftsman@.gmail.com> wrote in message
> news:OmJuIWkBGHA.2704@.TK2MSFTNGP15.phx.gbl...
>

Friday, February 10, 2012

Cluster will not fail over.

No comments:

Post a Comment

Clustered DB Server

Blog Archive

About Me