This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Use v_max_f* for fcanonicalize
ClosedPublic

Authored by rampitec on Aug 17 2017, 6:01 PM.

Details

Summary

If denorms are not flushed we can use max instead of multiplication
by 1. For double that is simply faster, while for float and half
it is shorter, because mul uses constant bus and VOP3.

Diff Detail

Repository
rL LLVM

Event Timeline

rampitec created this revision.Aug 17 2017, 6:01 PM
arsenm added inline comments.Aug 25 2017, 9:12 AM
lib/Target/AMDGPU/AMDGPUInstructions.td
45–47 ↗(On Diff #111594)

How / why this change?

lib/Target/AMDGPU/SIInstructions.td
1280–1285 ↗(On Diff #111594)

I think it would be clearer to have let Predicates = [NoFP16Denormals] rather than relying on AddedComplexity to prefer one pattern over the other

test/CodeGen/AMDGPU/fcanonicalize-denorms.ll
8 ↗(On Diff #111594)

Can you merge this with fcanonicalize.ll? That one avoids multiple run lines by using the attributes on the different functions

rampitec added inline comments.Aug 25 2017, 10:09 AM
lib/Target/AMDGPU/AMDGPUInstructions.td
45–47 ↗(On Diff #111594)

These predicates were previously unused, thus error went undetected.

test/CodeGen/AMDGPU/fcanonicalize-denorms.ll
8 ↗(On Diff #111594)

fcnonicalize.ll defaults to SI, to it cannot lower f16 tests. I can change it to tonga.

rampitec updated this revision to Diff 112714.Aug 25 2017, 10:36 AM

Added predicates NoFP??Denormals.

rampitec updated this revision to Diff 112716.Aug 25 2017, 10:45 AM
rampitec marked 2 inline comments as done.

Merged test into fcanonicalize.ll.

b-sumner edited edge metadata.Aug 29 2017, 1:19 PM

Looks fine to me; I suggested using max since it is faster in many cases.

arsenm accepted this revision.Aug 29 2017, 5:17 PM

LGTM

This revision is now accepted and ready to land.Aug 29 2017, 5:17 PM
This revision was automatically updated to reflect the committed changes.