This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Enhancement on FDIV lowering in AMDGPUCodeGenPrepare
ClosedPublic

Authored by cfang on Jan 28 2020, 3:03 PM.

Details

Summary

This is to enhance the work in https://reviews.llvm.org/D71293.

  1. We change the fpmath accuracy threshold for rcp from 2.5 to 1.0 ULP. This is cased on the accuracy that rcp can achieve in the hardware.
  2. We do fdiv re-association only with unsafe-math or reciprocal-math (fast-math implies reciprocal-math).

Diff Detail

Event Timeline

cfang created this revision.Jan 28 2020, 3:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 28 2020, 3:03 PM
arsenm added inline comments.Jan 29 2020, 2:41 PM
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
714

fdiv.fast doesn't' care about the reassociation

cfang marked 2 inline comments as done.Jan 29 2020, 2:48 PM
cfang added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
714

You are right. This is just the optimization priority issue.

If we can reassociate fdiv, x/y -> x * rcp(y) is faster than fdiv.fast so we don't do fdiv.fast.

arsenm added inline comments.Jan 29 2020, 3:30 PM
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
714

The comment and variable name are misleading, as no reassociate is going on here. This needs an explanation here

cfang marked 2 inline comments as done.Jan 30 2020, 3:15 PM
cfang added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
714

I am going to write an explanation here.
But I am confused about fdiv.fast intrinsic:
1.0/x -> fdiv.fast (1.0, x) when denormals are supported. Because I think does not support fdiv.fast.

arsenm added inline comments.Jan 31 2020, 8:13 AM
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
714

I'm not sure what the question is. fdiv.fast is used depending on whether the denormal mode needs to be switched or not, and is separate from rcp. If we can use rcp, it's preferable to fdiv.fast

cfang updated this revision to Diff 241807.Jan 31 2020, 2:18 PM

update based on comment.

  1. 2.5ulp threshold for fdiv.fast, 1.0 ulp for rcp
  2. introduce a function lowerUsingFDivFast to use fdiv.fast, this function should be called after lowerUsingRcp because rcp is the prefedrence

This update should clear some confusion.

arsenm added inline comments.Jan 31 2020, 2:59 PM
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
611

This has nothing to do with reassociation

614

This should not be referred to ass lowering

624

We aren't fdiv here. We're handling an fdiv, and not splitting it into a multiple and rcp

cfang marked 3 inline comments as done.Feb 3 2020, 9:50 AM
cfang added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
611

Division re-association: a/b -> a * rcp(b), and one special case is 1.0/b => 1.0*rcp(b) = rcp(b).
This is how 1.0/x -> rcp(x) associated with "re-association".

614

I am thinking of a different name. Do you have a meaningful name for the function in mind?

624

As explained in a previous comment, 1.0/x -> 1.0*rcp(x) = rcp(x) is a special case of re-association.
As a result, if the options specify re-association, we can do 1.0/x -> rcp(x).

arsenm added inline comments.Feb 5 2020, 9:53 AM
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
611

This isn't reassocation. This is just special handling of 1.0/b. Nothing algebraic changes here. There's no multiply introduced here

614

combineRcp?

cfang marked 2 inline comments as done.Feb 5 2020, 12:07 PM
cfang added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
611

Ok, it seems we have a different understanding here.

I think this is still just a naming issue. Originally the name is something like UnsafeMath?
But I do think arcp also gives the permission to do 1.0/x -> rcp(x) even though no multitply is explicitly generated
(1.0 * rcp(x) = rcp(x)).

Maybe we should go back to use the original name as HasFastUnsafeOptions to clear your confusion?

614

Better to be somethingUseRcp and somethingUseFastFDiv. I am still not sure what something should be here.
optimizeFDivUsingRcp?

cfang updated this revision to Diff 242750.Feb 5 2020, 2:11 PM

Rename a few functions and variables:

  1. optimizeWithRcp and optimizeWithFDivFast
  2. bool AllowInaccurateRcp = unsafe-fp-math || arcp
    • Specified or implied by the options or function attributes, rcp is allowed to be used even though it is not accurate.
arsenm added inline comments.Feb 5 2020, 2:41 PM
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
611

It's also the comment and flag being checked. There's no implicit or explicit multiply here, this is just a reciprocal. This pass is not responsible for doing the reassociating allowed by arcp. This should be a check for the approximate function math flag. allowRecriprocal is not relevant here

716

You don't need this anymore with getFPAccuracy

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
7422

Again, I thought we decided that arcp does not mean this can be less accurate. This should be checking for approximate function

cfang marked 3 inline comments as done.Feb 5 2020, 3:06 PM
cfang added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
611

So we only check unsafe-fast-math or afn here, not arcp? Thanks.

716

Do you think we will no longer need fpmath metadata after this point?
in fdiv.fast intrinsic or the fdiv itself which may not be changed (especially in the vector cases).

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
7422

Check only afn right? not arcp && afn? Thanks.

arsenm added inline comments.Feb 5 2020, 3:40 PM
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
611

afn, and the closest match for the attribute is unsafe-fast-math, and not arcp.

716

getFPAccuracy already did the only check you needed for it, so this should be a dead variable

cfang updated this revision to Diff 242955.Feb 6 2020, 11:05 AM

Updated based on feedback:

  1. replace arcp with afn
    • it is the ApproxFunc fast math bit that matters to use inaccurate rcp.
  2. drop fpmath metadata since it will no longer be used after we visit fdiv here.
arsenm accepted this revision.Feb 6 2020, 4:51 PM

LGTM

This revision is now accepted and ready to land.Feb 6 2020, 4:51 PM
This revision was automatically updated to reflect the committed changes.