This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Improve reciprocal handling
ClosedPublic

Authored by rampitec on Jun 5 2018, 5:25 PM.

Details

Summary

When denormals are supported we are producing a full division for
1.0f / x. That still can be replaced by the faster version:

bool c = fabs(x) > 0x1.0p+96f;
float s = c ? 0x1.0p-32f : 1.0f;
x *= s;
return s * v_rcp_f32(x)

in case if requested accuracy is 2.5ulp or less. The same version
is used if denormals are not supported for non 1.0 numerators, where
just v_rcp_f32 is then used for 1.0 numerator.

The optimization of 1/x is extended to the case -1/x, which is the
same except for the resulting sign bit.

OpenCL conformance passed with both enabled and disabled denorms.

Diff Detail

Event Timeline

rampitec created this revision.Jun 5 2018, 5:25 PM
arsenm added inline comments.Jun 6 2018, 6:38 AM
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
376

I think if you use the stuff in PatternMatch.h you can easily check for constant splats if you want this to work for vectors too

386–389

Merge into a return of the logically combined condition

423

Check constant first? Also isn't just isa<Constant> sufficient? Not sure why this needs to check it at all since shouldKeepFDivF32 already checks this

test/CodeGen/AMDGPU/fdiv32-to-rcp-folding.ll
98–101

This will only effectively check for one, although I think there's a FileCheck patch out for review to fix this

rampitec updated this revision to Diff 150163.Jun 6 2018, 10:28 AM
rampitec marked 2 inline comments as done.
rampitec added inline comments.
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
376

It now can work on an arbitrary constant vector, not splat only. Then this helper is used on an element. There is even a test for non-splat.

423

Check for constant only is not sufficient. It can replace fdiv when a numerator is not constant and no denorms. But I have removed it all together because it will be checked later anyway.

rampitec added inline comments.Jun 6 2018, 12:16 PM
test/CodeGen/AMDGPU/fdiv32-to-rcp-folding.ll
98–101

Then after FileCheck imorived the will test mire than now. It is really impossible to make reliable non-dag checks here in presence of two schedulers.

arsenm accepted this revision.Jun 6 2018, 2:16 PM

LGTM

This revision is now accepted and ready to land.Jun 6 2018, 2:16 PM
This revision was automatically updated to reflect the committed changes.