Page MenuHomePhabricator

InstCombine/AMDGPU: Fix constant folding of llvm.amdgcn.{icmp,fcmp}
ClosedPublic

Authored by nhaehnle on Apr 21 2017, 3:09 AM.

Details

Summary

The return value of these intrinsics should always have 0 bits for
inactive threads. This means that when all arguments are constant
and the comparison evaluates to true, the intrinsic should return
the current exec mask.

Fixes some GL_ARB_shader_ballot tests.

Event Timeline

nhaehnle created this revision.Apr 21 2017, 3:09 AM
arsenm added inline comments.Apr 21 2017, 11:59 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
3438

No else

3442–3445

Should have a comment about why this is done

test/Transforms/InstCombine/amdgcn-intrinsics.ll
1538–1540

Check that attributes 4 contains convergent?

nhaehnle updated this revision to Diff 96413.Apr 24 2017, 10:05 AM

Address review comments and apply some more formatting fixes.

arsenm accepted this revision.Apr 24 2017, 10:07 AM

LGTM

lib/Transforms/InstCombine/InstCombineCalls.cpp
3435–3437

Multiline, so braces

test/Transforms/InstCombine/amdgcn-intrinsics.ll
1538–1540

This is really missing nounwind?

This revision is now accepted and ready to land.Apr 24 2017, 10:07 AM
nhaehnle added inline comments.Apr 24 2017, 10:20 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
3435–3437

Fixing this before I commit.

test/Transforms/InstCombine/amdgcn-intrinsics.ll
1538–1540

Yes. It's an attribute of the call-site, not of the function. read_register itself is nounwind readonly, but doesn't have the convergent attribute.

This revision was automatically updated to reflect the committed changes.