This is an archive of the discontinued LLVM Phabricator instance.

Constant folding of llvm.amdgcn.trig.preop
Needs ReviewPublic

Authored by arsenm on Feb 18 2022, 11:12 AM.

Details

Summary

If the parameters(the input and segment select) coming in to amdgcn.trig.preop intrinsic are compile time constants, then this patch pre-computes the output of amdgcn.trig.preop on the CPU and replaces the uses with the computed constant.

All the existing AMDGPU lit cases pass along with the negative cases where the parameters to this intrinsic are variable.
Added a simple test case with the exact output that matches the output from the GPU.

Created a small HIP test application with the exact compute logic(and the constants used for 2/pi) running on the CPU and the intrinsic invoked for the GPU kernel.
Ran the test over the entire range of double floating-point. The outputs from the CPU and those from the intrinsic on gfx10 AMD GPU match.

Diff Detail

Event Timeline

Ravi created this revision.Feb 18 2022, 11:12 AM
Ravi requested review of this revision.Feb 18 2022, 11:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 18 2022, 11:12 AM
arsenm added inline comments.Feb 18 2022, 11:27 AM
llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
329

Relying on size of double, also this is technically undefined.

1010

uint32_t

1026–1027

Early exit and reduce indentation

1029

Indentation off

1044

Should stick with apfloat operations

1048

You can use scalbn for APFloat instead of relying on host ldexp

foad added a comment.Feb 18 2022, 11:36 AM

Ran the test over the entire range of double floating-point.

Just curious: how long does it take to test all 2^64 inputs?

Ran the test over the entire range of double floating-point.

Just curious: how long does it take to test all 2^64 inputs?

The only dependence is on the exponent so 2K inputs is sufficient.

Ravi added a comment.Feb 18 2022, 11:28 PM

Ran the test over the entire range of double floating-point.

Just curious: how long does it take to test all 2^64 inputs?

It took around 5 days on Ryzen 9 - 5950. Was running a single thread for CPU though. And each input was checked with all 32 segments.

Ravi added a comment.Feb 18 2022, 11:31 PM

Ran the test over the entire range of double floating-point.

Just curious: how long does it take to test all 2^64 inputs?

The only dependence is on the exponent so 2K inputs is sufficient.

Yes right. That would have greatly reduced the run times. Missed out, didn't give a thought in that direction.

Ravi added inline comments.Feb 18 2022, 11:38 PM
llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
1044

Yes, will do that.

1048

Yes..will try it out and check for any precision difference. Should be good as long as the internal implementation of APFloat is within 0.5 ULP.

arsenm added inline comments.Feb 21 2022, 6:51 AM
llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
1048

Actually we should be constant folding the ldexp intrinsic too. I thought I did that before, but in the code above I don't see it handling arbitrary constants

Ravi updated this revision to Diff 428730.May 11 2022, 11:40 AM

Fixed all the review comments.

Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2022, 11:40 AM
Ravi marked 6 inline comments as done.May 11 2022, 11:42 AM
Ravi added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
1048

Can be done in another patch

arsenm added inline comments.May 11 2022, 4:18 PM
llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
1024

demorgan this

1035

extra parentheses

1042

extra parentheses

1054

extra parentheses

1055

extra parentheses

foad added inline comments.May 12 2022, 8:40 AM
llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
1021

Capitalizing the names like CSrc, CSeg, FSrc, NumBits etc is more common.

1029

Just use ->getValue. And in general I think you should convert CSrc and CSeg to plain uint32_t / uint64_t as soon as possible. You have lots of uses of APInt below where it really is not necessary.

Reverse ping

arsenm commandeered this revision.Nov 16 2022, 1:41 PM
arsenm updated this revision to Diff 475913.
arsenm edited reviewers, added: Ravi; removed: arsenm.

@Ravi where are these tests you mentioned?

arsenm added inline comments.Nov 16 2022, 1:45 PM
llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
5638 ↗(On Diff #475913)

I feel like the tests are lacking in sample values but coming up with ones to test every point in the table seems exhausting

arsenm updated this revision to Diff 476233.Nov 17 2022, 2:00 PM

Add assert, which shows this table lookup is broken.

Also improve poison handling

foad added inline comments.Nov 18 2022, 6:47 AM
llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
1054

This (and some of the calculation below) can just use int. No need for APInt.