InstCombine/AMDGPU: Fix constant folding of llvm.amdgcn.{icmp,fcmp}
ClosedPublic

Authored by nhaehnle on Fri, Apr 21, 3:09 AM.

Details

Summary

The return value of these intrinsics should always have 0 bits for
inactive threads. This means that when all arguments are constant
and the comparison evaluates to true, the intrinsic should return
the current exec mask.

Fixes some GL_ARB_shader_ballot tests.

Diff Detail

Repository
rL LLVM
nhaehnle created this revision.Fri, Apr 21, 3:09 AM
arsenm added inline comments.Fri, Apr 21, 11:59 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
3406 ↗(On Diff #96116)

No else

3410–3413 ↗(On Diff #96116)

Should have a comment about why this is done

test/Transforms/InstCombine/amdgcn-intrinsics.ll
1538 ↗(On Diff #96116)

Check that attributes 4 contains convergent?

nhaehnle updated this revision to Diff 96413.Mon, Apr 24, 10:05 AM

Address review comments and apply some more formatting fixes.

arsenm accepted this revision.Mon, Apr 24, 10:07 AM

LGTM

lib/Transforms/InstCombine/InstCombineCalls.cpp
3435–3437 ↗(On Diff #96413)

Multiline, so braces

test/Transforms/InstCombine/amdgcn-intrinsics.ll
1538 ↗(On Diff #96116)

This is really missing nounwind?

This revision is now accepted and ready to land.Mon, Apr 24, 10:07 AM
nhaehnle added inline comments.Mon, Apr 24, 10:20 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
3435–3437 ↗(On Diff #96413)

Fixing this before I commit.

test/Transforms/InstCombine/amdgcn-intrinsics.ll
1538 ↗(On Diff #96116)

Yes. It's an attribute of the call-site, not of the function. read_register itself is nounwind readonly, but doesn't have the convergent attribute.

This revision was automatically updated to reflect the committed changes.