Wednesday, March 7, 2012

COALESCE as a Performance Enhancer?!?

I am using a clever product called SQL Expert Pro to optimize
problematic SQL statements. It works by generating syntactically
identical variations of the original SQL, finding all unique execution
plans, and batch testing them.

Anyway, it generally turns out that a very odd change dramatically
improves performance (for a PeopleSoft database on SQL Server 7). Run
time goes from 52 seconds to 8 seconds.

The change is to replace a join condition in a where clause with an
odd equivalent COALESCE construct, e.g. replace WHERE C.PAY_ID =
D.PAY_ID with WHERE C.PAY_ID = COALESCE(D.PAY_ID, D.PAY_ID).

Has anyone seen this sort of behavior before? Why would it be
advantageous to COALESCE on the same field twice?

The original and COALESCE'd versions are shown below. Does the fact
that this is a nine-table join over large tables have anything to do
with it?

Original Query (52 sec)

SELECT F.ACCT_ID,
I.ENTITY_NAME,
A.TNDR_SOURCE_CD,
C.PAY_EVENT_ID,
C.NON_CIS_NAME,
C.NON_CIS_REF_NBR,
C.NON_CIS_COMMENT,
C.PAY_AMT,
D.PAY_SEG_AMT,
E.ACCOUNTING_DT,
':1',
':2',
':3',
E.FREEZE_OPRID,
E.FREEZE_DTTM,
E.FT_TYPE_FLG
FROM PS_CI_TNDR_CTL A,
PS_CI_PAY_TNDR B,
PS_CI_PAY C,
PS_CI_PAY_SEG D,
PS_CI_SA F,
PS_CI_SA_TYPE G,
PS_CI_ACCT_PER H,
PS_CI_PER_NAME I,
PS_CW_FT E
WHERE A.TNDR_CTL_ID = B.TNDR_CTL_ID
AND A.TNDR_SOURCE_CD LIKE 'STK%'
AND B.PAY_EVENT_ID = C.PAY_EVENT_ID
AND C.PAY_ID = D.PAY_ID
AND D.PAY_SEG_ID = E.SIBLING_ID
AND E.ACCOUNTING_DT BETWEEN '2003-10-01' AND '2003-10-31'
AND E.FT_TYPE_FLG IN ('PS', 'PX')
AND NOT EXISTS (SELECT 'X'
FROM PS_CW_INTERFACE_ID J
WHERE J.PAYOR_ACCT_ID = F.ACCT_ID)
AND E.SA_ID = F.SA_ID
AND F.SA_TYPE_CD = G.SA_TYPE_CD
AND G.DEBT_CL_CD = 'NCIS'
AND F.ACCT_ID = H.ACCT_ID
AND H.PER_ID = I.PER_ID
ORDER BY 2

Optimized Query (8 sec)

SELECT F.ACCT_ID,
I.ENTITY_NAME,
A.TNDR_SOURCE_CD,
C.PAY_EVENT_ID,
C.NON_CIS_NAME,
C.NON_CIS_REF_NBR,
C.NON_CIS_COMMENT,
C.PAY_AMT,
D.PAY_SEG_AMT,
E.ACCOUNTING_DT,
':1',
':2',
':3',
E.FREEZE_OPRID,
E.FREEZE_DTTM,
E.FT_TYPE_FLG
FROM PS_CI_TNDR_CTL A,
PS_CI_PAY_TNDR B,
PS_CI_PAY C,
PS_CI_PAY_SEG D,
PS_CI_SA F,
PS_CI_SA_TYPE G,
PS_CI_ACCT_PER H,
PS_CI_PER_NAME I,
PS_CW_FT E
WHERE A.TNDR_CTL_ID = COALESCE(B.TNDR_CTL_ID,B.TNDR_CTL_ID)
AND A.TNDR_SOURCE_CD LIKE 'STK%'
AND B.PAY_EVENT_ID = COALESCE(C.PAY_EVENT_ID, C.PAY_EVENT_ID)
AND C.PAY_ID = COALESCE(D.PAY_ID, D.PAY_ID)
AND D.PAY_SEG_ID = COALESCE(E.SIBLING_ID, E.SIBLING_ID)
AND COALESCE(E.ACCOUNTING_DT, E.ACCOUNTING_DT) BETWEEN '2003-10-01'
AND '2003-10-31'
AND COALESCE(E.FT_TYPE_FLG, E.FT_TYPE_FLG) IN ('PS', 'PX')
AND NOT EXISTS (SELECT 'X'
FROM PS_CW_INTERFACE_ID J
WHERE COALESCE(J.PAYOR_ACCT_ID, J.PAYOR_ACCT_ID) =
F.ACCT_ID)
AND E.SA_ID = COALESCE(F.SA_ID,F.SA_ID)
AND F.SA_TYPE_CD = COALESCE(G.SA_TYPE_CD,G.SA_TYPE_CD)
AND G.DEBT_CL_CD = 'NCIS'
AND F.ACCT_ID = COALESCE(H.ACCT_ID ,H.ACCT_ID)
AND H.PER_ID = COALESCE(I.PER_ID ,I.PER_ID)
ORDER BY 2Check out the query plan for both queries, and you are most likely to
see the difference.

I suppose the COALESCE(JoinColumn,JoinColumn) will be interpreted as a
non-optimizable expression. Because of this, the access path analysis
and join strategy will change. My guess is that it will change in favor
of loop joins. Also, the compilation time is most likely to drop
dramatically.

Gert-Jan

Jeff Roughgarden wrote:
> I am using a clever product called SQL Expert Pro to optimize
> problematic SQL statements. It works by generating syntactically
> identical variations of the original SQL, finding all unique execution
> plans, and batch testing them.
> Anyway, it generally turns out that a very odd change dramatically
> improves performance (for a PeopleSoft database on SQL Server 7). Run
> time goes from 52 seconds to 8 seconds.
> The change is to replace a join condition in a where clause with an
> odd equivalent COALESCE construct, e.g. replace WHERE C.PAY_ID =
> D.PAY_ID with WHERE C.PAY_ID = COALESCE(D.PAY_ID, D.PAY_ID).
> Has anyone seen this sort of behavior before? Why would it be
> advantageous to COALESCE on the same field twice?
> The original and COALESCE'd versions are shown below. Does the fact
> that this is a nine-table join over large tables have anything to do
> with it?
> Original Query (52 sec)
> SELECT F.ACCT_ID,
> I.ENTITY_NAME,
> A.TNDR_SOURCE_CD,
> C.PAY_EVENT_ID,
> C.NON_CIS_NAME,
> C.NON_CIS_REF_NBR,
> C.NON_CIS_COMMENT,
> C.PAY_AMT,
> D.PAY_SEG_AMT,
> E.ACCOUNTING_DT,
> ':1',
> ':2',
> ':3',
> E.FREEZE_OPRID,
> E.FREEZE_DTTM,
> E.FT_TYPE_FLG
> FROM PS_CI_TNDR_CTL A,
> PS_CI_PAY_TNDR B,
> PS_CI_PAY C,
> PS_CI_PAY_SEG D,
> PS_CI_SA F,
> PS_CI_SA_TYPE G,
> PS_CI_ACCT_PER H,
> PS_CI_PER_NAME I,
> PS_CW_FT E
> WHERE A.TNDR_CTL_ID = B.TNDR_CTL_ID
> AND A.TNDR_SOURCE_CD LIKE 'STK%'
> AND B.PAY_EVENT_ID = C.PAY_EVENT_ID
> AND C.PAY_ID = D.PAY_ID
> AND D.PAY_SEG_ID = E.SIBLING_ID
> AND E.ACCOUNTING_DT BETWEEN '2003-10-01' AND '2003-10-31'
> AND E.FT_TYPE_FLG IN ('PS', 'PX')
> AND NOT EXISTS (SELECT 'X'
> FROM PS_CW_INTERFACE_ID J
> WHERE J.PAYOR_ACCT_ID = F.ACCT_ID)
> AND E.SA_ID = F.SA_ID
> AND F.SA_TYPE_CD = G.SA_TYPE_CD
> AND G.DEBT_CL_CD = 'NCIS'
> AND F.ACCT_ID = H.ACCT_ID
> AND H.PER_ID = I.PER_ID
> ORDER BY 2
> Optimized Query (8 sec)
> SELECT F.ACCT_ID,
> I.ENTITY_NAME,
> A.TNDR_SOURCE_CD,
> C.PAY_EVENT_ID,
> C.NON_CIS_NAME,
> C.NON_CIS_REF_NBR,
> C.NON_CIS_COMMENT,
> C.PAY_AMT,
> D.PAY_SEG_AMT,
> E.ACCOUNTING_DT,
> ':1',
> ':2',
> ':3',
> E.FREEZE_OPRID,
> E.FREEZE_DTTM,
> E.FT_TYPE_FLG
> FROM PS_CI_TNDR_CTL A,
> PS_CI_PAY_TNDR B,
> PS_CI_PAY C,
> PS_CI_PAY_SEG D,
> PS_CI_SA F,
> PS_CI_SA_TYPE G,
> PS_CI_ACCT_PER H,
> PS_CI_PER_NAME I,
> PS_CW_FT E
> WHERE A.TNDR_CTL_ID = COALESCE(B.TNDR_CTL_ID,B.TNDR_CTL_ID)
> AND A.TNDR_SOURCE_CD LIKE 'STK%'
> AND B.PAY_EVENT_ID = COALESCE(C.PAY_EVENT_ID, C.PAY_EVENT_ID)
> AND C.PAY_ID = COALESCE(D.PAY_ID, D.PAY_ID)
> AND D.PAY_SEG_ID = COALESCE(E.SIBLING_ID, E.SIBLING_ID)
> AND COALESCE(E.ACCOUNTING_DT, E.ACCOUNTING_DT) BETWEEN '2003-10-01'
> AND '2003-10-31'
> AND COALESCE(E.FT_TYPE_FLG, E.FT_TYPE_FLG) IN ('PS', 'PX')
> AND NOT EXISTS (SELECT 'X'
> FROM PS_CW_INTERFACE_ID J
> WHERE COALESCE(J.PAYOR_ACCT_ID, J.PAYOR_ACCT_ID) =
> F.ACCT_ID)
> AND E.SA_ID = COALESCE(F.SA_ID,F.SA_ID)
> AND F.SA_TYPE_CD = COALESCE(G.SA_TYPE_CD,G.SA_TYPE_CD)
> AND G.DEBT_CL_CD = 'NCIS'
> AND F.ACCT_ID = COALESCE(H.ACCT_ID ,H.ACCT_ID)
> AND H.PER_ID = COALESCE(I.PER_ID ,I.PER_ID)
> ORDER BY 2|||Hi Jeff,

That's pretty bizarre. I can only guess that the old style JOIN
creates a cartesian product with a lot of nulls everywhere, before the
WHERE filters the rows. And that coalesce somehow handles the nulls
better?

No comments:

Post a Comment