This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] MatchVectorAllZeroTest - handle OR vector reductions
ClosedPublic

Authored by RKSimon on Jun 10 2020, 3:49 AM.

Details

Summary

This patch extends to handle OR vector reduction patterns where the result is compared against zero.

Fixes PR45378

Diff Detail

Event Timeline

RKSimon created this revision.Jun 10 2020, 3:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2020, 3:49 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
spatel added inline comments.Jun 10 2020, 7:07 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
45814

Code comments like this and the function name don't apply if we're matching a reduction, right?
I think we need to refactor and/or update comments to reduce confusion.

llvm/test/CodeGen/X86/pr45378.ll
10–11

Add 'nounwind' attribute to remove .cfi noise?
Would it make sense to minimize the test to return the i1 icmp rather than branch and make an external call?

RKSimon updated this revision to Diff 269866.Jun 10 2020, 8:50 AM

rebased tests

RKSimon updated this revision to Diff 270612.Jun 14 2020, 2:29 AM

Update comments to describe reductions as well

RKSimon planned changes to this revision.Jun 14 2020, 10:38 AM

I think I can put this in LowerVectorAllZeroTest instead

RKSimon updated this revision to Diff 270637.Jun 14 2020, 1:35 PM
RKSimon retitled this revision from [X86][SSE] combineVectorSizedSetCCEquality - handle OR vector reductions to [X86][SSE] LowerVectorAllZeroTest - handle OR vector reductions.
RKSimon edited the summary of this revision. (Show Details)

Refreshed the patch to be based off LowerVectorAllZeroTest instead of combineVectorSizedSetCCEquality, which already handles scalar reductions and with a suitable refactor can handle vector reductions as well.

RKSimon planned changes to this revision.Jun 14 2020, 1:53 PM

More changes to come - still finding regressions

RKSimon updated this revision to Diff 270761.Jun 15 2020, 8:31 AM
RKSimon retitled this revision from [X86][SSE] LowerVectorAllZeroTest - handle OR vector reductions to [X86][SSE] MatchVectorAllZeroTest - handle OR vector reductions.
RKSimon edited the summary of this revision. (Show Details)

Rebased now that the LowerVectorAllZero refactor is complete

spatel added inline comments.Jun 15 2020, 1:18 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
21395

Can we assert that CC is ISD::SETEQ or ISD::SETNE here?

RKSimon updated this revision to Diff 270853.Jun 15 2020, 1:54 PM

Add condition code assertions

spatel accepted this revision.Jun 15 2020, 2:55 PM

LGTM

This revision is now accepted and ready to land.Jun 15 2020, 2:55 PM
This revision was automatically updated to reflect the committed changes.
hans added a subscriber: hans.Jun 22 2020, 12:27 PM

This seems to be causing Chromium test failures. Diffing the asm looks suspicious:

 Disassembly of section .text._ZN5glcts18ComputeShaderTestsC2ERN4deqp7ContextE:
@@ -29795,7 +29795,7 @@
     11c4:      66 0f 63 c3             packsswb %xmm3,%xmm0
     11c8:      66 0f d7 c0             pmovmskb %xmm0,%eax
     11cc:      66 85 c0                test   %ax,%ax
-    11cf:      0f 84 65 01 00 00       je     133a <_ZN5glcts12_GLOBAL__N_128AdvancedPipelineComputeChain3RunEv+0x133a>
+    11cf:      0f 85 65 01 00 00       jne    133a <_ZN5glcts12_GLOBAL__N_128AdvancedPipelineComputeChain3RunEv+0x133a>
     11d5:      49 8b 45 20             mov    0x20(%r13),%rax
     11d9:      48 8b 00                mov    (%rax),%rax
     11dc:      48 8b 40 10             mov    0x10(%rax),%rax
@@ -29894,7 +29894,7 @@
     137d:      66 0f 63 c2             packsswb %xmm2,%xmm0
     1381:      66 0f d7 c0             pmovmskb %xmm0,%eax
     1385:      66 85 c0                test   %ax,%ax
-    1388:      0f 84 64 01 00 00       je     14f2 <_ZN5glcts12_GLOBAL__N_128AdvancedPipelineComputeChain3RunEv+0x14f2>
+    1388:      0f 85 64 01 00 00       jne    14f2 <_ZN5glcts12_GLOBAL__N_128AdvancedPipelineComputeChain3RunEv+0x14f2>
     138e:      49 8b 45 20             mov    0x20(%r13),%rax
     1392:      48 8b 00                mov    (%rax),%rax
     1395:      48 8b 40 10             mov    0x10(%rax),%rax

It seems the conditions on the jumps there are getting flipped, which really looks like a bug. I've posted a repro here: https://bugs.chromium.org/p/chromium/issues/detail?id=1097758#c9

I've reverted in 1357c065783e8d66c9db4be59aca389d1dc6c05f in the meantime.