This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fold llvm.amdgcn.cos and llvm.amdgcn.sin intrinsics
ClosedPublic

Authored by foad on May 28 2020, 2:30 AM.

Diff Detail

Event Timeline

foad created this revision.May 28 2020, 2:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2020, 2:31 AM
foad marked an inline comment as done.May 28 2020, 2:34 AM
foad added inline comments.
llvm/lib/Analysis/ConstantFolding.cpp
1918

This folds amdgcn.sin(1.0) to a very small value that is not exactly 0.0. Should I add an explicit check for all of the quarter-integer values that should fold to exactly -1.0 or 0.0 or +1.0 ?

arsenm added inline comments.May 28 2020, 6:44 AM
llvm/lib/Analysis/ConstantFolding.cpp
1918

I've never been sure what exactly the policy should be for folding these sorts of intrinsics. I think nobody is happy with relying on the host libm call to begin with, plus the hardware instructions don't give the exact same results. We're already constant folding more precise results for rcp than the hardware instruction gives though (which is probably more useful in general).

I'm not sure this is how this will work going forward, but you could also consider strictfp on the call site (it may happen already, but you should add a test where certain folds are skipped)

Also you should use numbers::pi rather than relying on M_PI (and I strongly prefer using an explicit 2.0 rather than 2)

llvm/test/Analysis/ConstantFolding/AMDGPU/sin.ll
175

Also should test snan -> qnan (this also won't happen for !ieee_mode, so this case should depend on strictfp also)

178

Add some cases where strictfp calls are skipped

foad marked 2 inline comments as done.May 28 2020, 7:14 AM
foad added inline comments.
llvm/lib/Analysis/ConstantFolding.cpp
1918

you could also consider strictfp on the call site

I thought:

  • strictfp is to do with whether certain math functions are guaranteed to respect the ambient rounding mode etc
  • the definition of amdgcn.cos is "do what the hardware instruction does"
  • the hardware instruction doesn't behave differently depending on the rounding mode
  • so there's no need to worry about strictfp

But perhaps I'm wrong about some or all of this.

llvm/test/Analysis/ConstantFolding/AMDGPU/sin.ll
175

All of the FP constant folding code is skipped for infinities and nans so there's not much to check. (I didn't really mean to add any of these inf and nan checks here, just carelessly cut n pasted from another test.)

arsenm added inline comments.May 28 2020, 7:39 AM
llvm/lib/Analysis/ConstantFolding.cpp
1918

I'm pretty sure the nan quieting behavior will change based on the ieee_mode bit. AMDGPU has a wider set of FP environment properties. Plus, these support exceptions anyway

llvm/test/Analysis/ConstantFolding/AMDGPU/sin.ll
175

That would be a bug, since I would expect them to fold too

b-sumner added inline comments.May 28 2020, 10:13 AM
llvm/lib/Analysis/ConstantFolding.cpp
1918

I don't understand the first comment above where the argument is 1.0. These intrinsics take radians, and not the scaled values the ISA expects,, correct? Similarly, if any bounds checking is done, shouldn't it be on radians?

arsenm added inline comments.May 28 2020, 10:32 AM
llvm/lib/Analysis/ConstantFolding.cpp
1918

From the lowering it looks like they directly feed the subtarget instruction. The lowering for llvm.sin/llvm.cos inserts the scale or not depending on the target (I'm not sure why you would ever need the raw amdgcn intrinsic form)

arsenm added inline comments.May 28 2020, 10:35 AM
llvm/lib/Analysis/ConstantFolding.cpp
1918

The device libraries have one function that uses both in the raw form, and doesn't consider the two subtarget behaviors

foad marked an inline comment as done.May 29 2020, 1:58 AM
foad added inline comments.
llvm/lib/Analysis/ConstantFolding.cpp
1918

LLPC sometimes generates the amdgcn.sin/cos intrinsics. The problem with introducing the multiply when llvm.sin/cos is lowered is that it doesn't get folded away if the argument was already something multiplied or divided by a constant, which is pretty common.

foad updated this revision to Diff 267149.May 29 2020, 2:33 AM
foad marked an inline comment as done.
  • Force exact results for quarter-integer inputs
  • Use numbers::pi
  • Add strictfp tests
foad marked 2 inline comments as done.May 29 2020, 2:34 AM
foad added inline comments.
llvm/test/Analysis/ConstantFolding/AMDGPU/sin.ll
175

A missed optimization perhaps, but it's always been like that for all math intrinsics, so I don't plan to fix it as part of this patch.

arsenm added inline comments.May 29 2020, 7:30 AM
llvm/lib/Analysis/ConstantFolding.cpp
1918

It should in the DAG lowering. We're not propagating fast math flags now in the sin lowering, maybe that's your problem?

arsenm accepted this revision.Jun 2 2020, 5:16 PM
This revision is now accepted and ready to land.Jun 2 2020, 5:16 PM
This revision was automatically updated to reflect the committed changes.
foad marked an inline comment as done.
thakis added a subscriber: thakis.Jun 3 2020, 3:07 AM
thakis added inline comments.
llvm/test/Analysis/ConstantFolding/AMDGPU/sin.ll
139

Looks like this ends in BCD instead of BCC on Windows (with clang-cl as host compiler at least): http://45.33.8.238/win/16679/step_11.txt

foad marked 2 inline comments as done.Jun 3 2020, 3:41 AM
foad added inline comments.
llvm/lib/Analysis/ConstantFolding.cpp
1918

Good point, thanks, see D80813. Perhaps there is no good reason for LLPC to use the amdgcn intrinsics after all.

llvm/test/Analysis/ConstantFolding/AMDGPU/sin.ll
139

Thanks, should be fixed in rGc27214c23446e423ec2e7eb8650a65cc5f0a16aa.