This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Always use rcp + mul with fast math
ClosedPublic

Authored by rampitec on Jun 29 2017, 4:16 PM.

Details

Summary

Regardless of relaxation options such as -cl-fast-relaxed-math
we are producing rather long code for fdiv via amdgcn_fdiv_fast
intrinsic. This intrinsic is used to replace fdiv with 2.5ulp
metadata and does not handle denormals, thus believed to be fast.

An fdiv instruction can also have fast math flag either by itself
or together with fpmath metadata. Clang used with a relaxation flag
always produces both metadata and fast flag:

%div = fdiv fast float %v, %0, !fpmath !12
!12 = !{float 2.500000e+00}

Current implementation ignores fast flag and favors metadata. An
instruction with just fast flag would be lowered to a fastest rcp +
mul, but that never happen on practice because of described mutual
clang and BE behavior.

This change allows an "fdiv fast" to be always lowered as rcp + mul.

Diff Detail

Event Timeline

rampitec created this revision.Jun 29 2017, 4:16 PM
arsenm added inline comments.Jun 29 2017, 4:22 PM
lib/Target/AMDGPU/SIISelLowering.cpp
3778–3779

This is a possible source of errors now that less-strict flags can now trigger this

rampitec added inline comments.Jun 29 2017, 4:30 PM
lib/Target/AMDGPU/SIISelLowering.cpp
3778–3779

That is the intent to trigger this. That is how HSAIL compiler works and we had no complaints so far. Anyway, I am running confirmance now.

A philosophical question though what shall preval, unsafe fp or denorm support. Once again, HSAIL favors relaxation, so I did the same.

In fact this implementation is stricter than one we have in HSAIL. We were applying options to library as well, while here it is only applied to user code.

rampitec added inline comments.Jun 29 2017, 4:59 PM
lib/Target/AMDGPU/SIISelLowering.cpp
3778–3779

Actually fp denorms are not supported with 2.5 ulp fdiv even now, so there is no change in this respect.

rampitec added inline comments.Jun 29 2017, 5:56 PM
lib/Target/AMDGPU/SIISelLowering.cpp
3778–3779

JFYI. OCL conformance passed.

arsenm added inline comments.Jun 29 2017, 6:20 PM
lib/Target/AMDGPU/SIISelLowering.cpp
3778–3779

Adding unsafe algebra to the node from just arcp is incorrect. I wouldn't expect this to show up in conformance, this could potentially introduce unsafe algebraic transforms in use instructions.This should at most preserve the original set of math flags, not promote them.

rampitec updated this revision to Diff 104812.Jun 29 2017, 8:12 PM
rampitec marked an inline comment as done.

Preserved flags on a new node instead of forging them.

rampitec added inline comments.Jun 29 2017, 8:17 PM
lib/Target/AMDGPU/SIISelLowering.cpp
3778–3779

Strange that was not noticed when this code was written. Fixed.

arsenm accepted this revision.Jul 6 2017, 11:41 AM

LGTM. This could still preserve the intersection of the incoming flags but that's a separate patch

This revision is now accepted and ready to land.Jul 6 2017, 11:41 AM
This revision was automatically updated to reflect the committed changes.