Page MenuHomePhabricator

[AMDGPU] Fix a miscompile with S_ADD/S_SUB
Needs ReviewPublic

Authored by piotr on Aug 31 2020, 8:30 AM.

Details

Reviewers
arsenm
Summary

The Add/Sub combine removes a zext when present on a boolean operation,
which is problematic if the resultant addcarry stays in sgpr (is not
moved to vgpr during si-fix-sgpr-copies). In the absence of v_cndmask
that is normally generated for such a zext, the resultant condition code
can have bits set for inactive lanes which can produce wrong results.

Therefore, it is necessary to clear bits set for inactive lanes if
the result of boolean operation is fed into S_ADD_CO_PSEUDO/S_SUB_CO_PSEUDO.

Diff Detail

Event Timeline

piotr created this revision.Aug 31 2020, 8:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2020, 8:30 AM
piotr requested review of this revision.Aug 31 2020, 8:30 AM
arsenm added inline comments.Sep 1 2020, 6:11 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
10123 ↗(On Diff #288955)

It's only 32-bit for wave32. This also doesn't consider scalar booleans

10128–10132 ↗(On Diff #288955)

I think this is the wrong place to fix this up. At this point we don't really know anything about lanes or registers, and the combine itself should already be correct

foad added a subscriber: foad.Sep 1 2020, 6:14 AM
piotr updated this revision to Diff 291165.Sep 11 2020, 2:39 AM

I think you are right - I moved the fix to the better place.

piotr retitled this revision from [AMDGPU] Fix a miscompile in add combine to [AMDGPU] Fix a miscompile with S_ADD/S_SUB.Sep 11 2020, 2:41 AM
piotr edited the summary of this revision. (Show Details)
arsenm added inline comments.Sep 16 2020, 10:44 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1056

I don't think it will end up mattering with the operations legal for i1 (although maybe trunc is also a problem), but I think it would be somewhat safer to list valid boolean sources instead

llvm/test/CodeGen/AMDGPU/combine-add-zext-xor.ll
17

Should be able to reduce this more

31–32

Should name values (opt -intsnamer)

60–61

Attributes canonically come after declarations

piotr added inline comments.Sep 17 2020, 7:46 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1056

Will do, but what do you mean exactly by "valid boolean sources" here?

llvm/test/CodeGen/AMDGPU/combine-add-zext-xor.ll
17

This already comes from bugpoint, but I will try to reduce the test a bit more.

arsenm added inline comments.Sep 17 2020, 8:24 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1056

I mean like setcc + class intrinsics, other things that will select to VOPC outputs

llvm/test/CodeGen/AMDGPU/combine-add-zext-xor.ll
17

Bugpoint isn't that smart and you can usually massage the IR a bit to help it make more progress

piotr updated this revision to Diff 293433.Sep 22 2020, 6:14 AM
piotr edited the summary of this revision. (Show Details)

Simplified test and added 5 more.

piotr added inline comments.Sep 22 2020, 6:18 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1056

I looked at this and I am not convinced we need to handle setcc here (added TODO). And fp_class would be mapped to VALU, so no need to handle this here either.

llvm/test/CodeGen/AMDGPU/combine-add-zext-xor.ll
17

Simplified test (also added "-start-before=amdgpu-isel -stop-after=amdgpu-isel") and added 5 more cases.

arsenm added inline comments.Oct 6 2020, 7:29 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1056

I mean entirely invert the logic to specifically check for the cases that will map to VOPC. You would not be adding setcc, you would be checking setcc and co. instead of and/or/xor

piotr added a comment.Oct 6 2020, 7:47 AM

Ah now I see what you meant, sorry.

piotr updated this revision to Diff 297352.Oct 9 2020, 3:54 PM

Invert the condition.

foad added inline comments.Oct 12 2020, 2:14 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1056

In future it would be nice to improve this to detect a tree of SETCCs connected by ANDs and ORs. This kind of pattern seems to occur quite frequently:

        v_cmp_ne_u32_e64 s29, s29, 1
        v_cmp_ne_u32_e64 s0, s0, 0
        s_and_b32 s0, s0, s29
+       s_and_b32 s0, exec_lo, s0

where your patch has added the last AND instruction.

arsenm added inline comments.Oct 12 2020, 10:46 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1056

I thought we had an isVectorBoolean or something like that for identifying these cases recursively (I know I have a patch implementing the equivalent for globalisel lying around at least)

piotr updated this revision to Diff 298369.Oct 15 2020, 7:07 AM

Good spot - I looked at the shader database and indeed the pattern with two connected setcc is frequent enough to special case it so I updated the patch.

piotr updated this revision to Diff 298372.Oct 15 2020, 7:13 AM

Renamed variable to avoid a multiline assignment.

foad added inline comments.Oct 15 2020, 8:14 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1056

There's this in SIISelLowering.cpp:

// Returns true if argument is a boolean value which is not serialized into
// memory or argument and does not require v_cmdmask_b32 to be deserialized.
static bool isBoolSGPR(SDValue V) {
arsenm added inline comments.Oct 22 2020, 9:02 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1059–1063

Should reuse isBoolSGPR

piotr added inline comments.Oct 22 2020, 11:43 PM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1059–1063

isBoolSGPR() does not work for my case - I need to distinguish between two groups (AND/OR/XOR) and (SETCC/FP_CLASS), but that function returns true for both.

arsenm added inline comments.Oct 23 2020, 8:26 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1059–1063

I'd say isBoolSGPR is buggy then. In any case, this should be moved out to a separate function

foad added inline comments.Oct 23 2020, 8:30 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1059–1063

Currently isBoolSGPR returns true for any and/or/xor, without recursively checking that the operands are boolean. But perhaps it should be fixed? Then you could use it here too.