Page MenuHomePhabricator

[AMDGPU] Fix a miscompile with S_ADD/S_SUB
ClosedPublic

Authored by piotr on Aug 31 2020, 8:30 AM.

Details

Summary

[AMDGPU] Fix a miscompile with S_ADD/S_SUB

The helper function isBoolSGPR is too aggressive when determining
when a v_cndmask can be skipped on a boolean value because the
function does not check the operands of and/or/xor.

This can be problematic for the Add/Sub combines that can leave
bits set even for inactive lanes leading to wrong results.

Fix this by inspecting the operands of and/or/xor recursively.

Diff Detail

Event Timeline

piotr created this revision.Aug 31 2020, 8:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2020, 8:30 AM
piotr requested review of this revision.Aug 31 2020, 8:30 AM
arsenm added inline comments.Sep 1 2020, 6:11 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
10255

It's only 32-bit for wave32. This also doesn't consider scalar booleans

10260–10264

I think this is the wrong place to fix this up. At this point we don't really know anything about lanes or registers, and the combine itself should already be correct

foad added a subscriber: foad.Sep 1 2020, 6:14 AM
piotr updated this revision to Diff 291165.Sep 11 2020, 2:39 AM

I think you are right - I moved the fix to the better place.

piotr retitled this revision from [AMDGPU] Fix a miscompile in add combine to [AMDGPU] Fix a miscompile with S_ADD/S_SUB.Sep 11 2020, 2:41 AM
piotr edited the summary of this revision. (Show Details)
arsenm added inline comments.Sep 16 2020, 10:44 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1056 ↗(On Diff #291165)

I don't think it will end up mattering with the operations legal for i1 (although maybe trunc is also a problem), but I think it would be somewhat safer to list valid boolean sources instead

llvm/test/CodeGen/AMDGPU/combine-add-zext-xor.ll
17

Should be able to reduce this more

31–32

Should name values (opt -intsnamer)

60–61

Attributes canonically come after declarations

piotr added inline comments.Sep 17 2020, 7:46 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1056 ↗(On Diff #291165)

Will do, but what do you mean exactly by "valid boolean sources" here?

llvm/test/CodeGen/AMDGPU/combine-add-zext-xor.ll
17

This already comes from bugpoint, but I will try to reduce the test a bit more.

arsenm added inline comments.Sep 17 2020, 8:24 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1056 ↗(On Diff #291165)

I mean like setcc + class intrinsics, other things that will select to VOPC outputs

llvm/test/CodeGen/AMDGPU/combine-add-zext-xor.ll
17

Bugpoint isn't that smart and you can usually massage the IR a bit to help it make more progress

piotr updated this revision to Diff 293433.Sep 22 2020, 6:14 AM
piotr edited the summary of this revision. (Show Details)

Simplified test and added 5 more.

piotr added inline comments.Sep 22 2020, 6:18 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1056 ↗(On Diff #291165)

I looked at this and I am not convinced we need to handle setcc here (added TODO). And fp_class would be mapped to VALU, so no need to handle this here either.

llvm/test/CodeGen/AMDGPU/combine-add-zext-xor.ll
17

Simplified test (also added "-start-before=amdgpu-isel -stop-after=amdgpu-isel") and added 5 more cases.

arsenm added inline comments.Oct 6 2020, 7:29 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1056 ↗(On Diff #291165)

I mean entirely invert the logic to specifically check for the cases that will map to VOPC. You would not be adding setcc, you would be checking setcc and co. instead of and/or/xor

piotr added a comment.Oct 6 2020, 7:47 AM

Ah now I see what you meant, sorry.

piotr updated this revision to Diff 297352.Oct 9 2020, 3:54 PM

Invert the condition.

foad added inline comments.Oct 12 2020, 2:14 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1056 ↗(On Diff #297352)

In future it would be nice to improve this to detect a tree of SETCCs connected by ANDs and ORs. This kind of pattern seems to occur quite frequently:

        v_cmp_ne_u32_e64 s29, s29, 1
        v_cmp_ne_u32_e64 s0, s0, 0
        s_and_b32 s0, s0, s29
+       s_and_b32 s0, exec_lo, s0

where your patch has added the last AND instruction.

arsenm added inline comments.Oct 12 2020, 10:46 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1056 ↗(On Diff #297352)

I thought we had an isVectorBoolean or something like that for identifying these cases recursively (I know I have a patch implementing the equivalent for globalisel lying around at least)

piotr updated this revision to Diff 298369.Oct 15 2020, 7:07 AM

Good spot - I looked at the shader database and indeed the pattern with two connected setcc is frequent enough to special case it so I updated the patch.

piotr updated this revision to Diff 298372.Oct 15 2020, 7:13 AM

Renamed variable to avoid a multiline assignment.

foad added inline comments.Oct 15 2020, 8:14 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1056 ↗(On Diff #297352)

There's this in SIISelLowering.cpp:

// Returns true if argument is a boolean value which is not serialized into
// memory or argument and does not require v_cmdmask_b32 to be deserialized.
static bool isBoolSGPR(SDValue V) {
arsenm added inline comments.Oct 22 2020, 9:02 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1059–1063 ↗(On Diff #298372)

Should reuse isBoolSGPR

piotr added inline comments.Oct 22 2020, 11:43 PM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1059–1063 ↗(On Diff #298372)

isBoolSGPR() does not work for my case - I need to distinguish between two groups (AND/OR/XOR) and (SETCC/FP_CLASS), but that function returns true for both.

arsenm added inline comments.Oct 23 2020, 8:26 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1059–1063 ↗(On Diff #298372)

I'd say isBoolSGPR is buggy then. In any case, this should be moved out to a separate function

foad added inline comments.Oct 23 2020, 8:30 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1059–1063 ↗(On Diff #298372)

Currently isBoolSGPR returns true for any and/or/xor, without recursively checking that the operands are boolean. But perhaps it should be fixed? Then you could use it here too.

reverse ping

piotr added a comment.Jan 21 2021, 1:01 AM

Sorry for the lack of activity lately - I plan to get back to this patch in the near future.

piotr updated this revision to Diff 322919.Feb 11 2021, 1:11 AM

Following the suggesstions of updating isBoolSGPR.

piotr edited the summary of this revision. (Show Details)Feb 11 2021, 1:12 AM

I think you only posted a partial patch for the latest revision

llvm/test/CodeGen/AMDGPU/combine-add-zext-xor.ll
2

Skipping all the IR passes and going straight to the isel is really weird (and I don't think actually works in general). Is there a need to skip the IR passes here?

Can you also stop after SIFixSGPRCopies (or finalizeisel) instead?

piotr added a comment.Feb 11 2021, 6:44 AM

This is the intended patch, no need to change AMDGPUISelDAGToDAG.cpp in the most recent version. Here, due to the updated check in isBoolSGPR the problematic combines that would strip zext will not happen (see SITargetLowering::performAddCombine/performSubCombine). This is similar to the very first approach I put up for a review.

llvm/test/CodeGen/AMDGPU/combine-add-zext-xor.ll
2

Sure, will update the test.

foad accepted this revision.Feb 11 2021, 6:52 AM

Patch LGTM modulo Matt's comments about the test.

This revision is now accepted and ready to land.Feb 11 2021, 6:52 AM
piotr updated this revision to Diff 323908.Feb 16 2021, 1:10 AM

Tests rewritten.

arsenm accepted this revision.Feb 16 2021, 7:10 AM
This revision was landed with ongoing or failed builds.Feb 17 2021, 3:26 AM
This revision was automatically updated to reflect the committed changes.