Page MenuHomePhabricator

Add custom lowering for llvm.log{,10}.{f16,f32} intrinsics
ClosedPublic

Authored by rivanvx on Feb 14 2017, 7:42 AM.

Details

Summary

Presently AMDGPU backend errors with "unsupported call to function" upon encountering a call to llvm.log{,10}.{f16,f32} intrinsics. This patch adds custom lowering to avoid that error on both R600 and SI.

Diff Detail

Repository
rL LLVM

Event Timeline

rivanvx created this revision.Feb 14 2017, 7:42 AM
rivanvx edited the summary of this revision. (Show Details)Feb 14 2017, 7:43 AM

Needs tests

lib/Target/AMDGPU/SIISelLowering.cpp
297 ↗(On Diff #88374)

Should also handle f16?

3478 ↗(On Diff #88374)

I've been meaning to remove the dependent on the math.h constants. Can you define this somewhere locally?

rivanvx updated this revision to Diff 88557.Feb 15 2017, 9:25 AM
rivanvx marked 2 inline comments as done.
rivanvx edited the summary of this revision. (Show Details)

Better version: handle log10 as well, define constants locally, handle f16, do not pretend to handle f64. Tests incoming.

rivanvx retitled this revision from Add custom lowering for llvm.log.f32 intrinsic to Add custom lowering for llvm.log{,10}.{f16,f32} intrinsics.Feb 15 2017, 9:26 AM
rivanvx edited the summary of this revision. (Show Details)

Needs tests

rivanvx updated this revision to Diff 89036.Feb 18 2017, 12:04 PM
rivanvx edited the summary of this revision. (Show Details)
jvesely added inline comments.
lib/Target/AMDGPU/AMDGPUISelLowering.cpp
250 ↗(On Diff #89036)

v2f32 and v4f32 can be moved to
for (MVT VT : FloatVectorTypes) {
block (line ~398)

1891 ↗(On Diff #89036)

Does FDIV have good enough precision to do this? OCL requires 2.5 ULP, and I'm not sure how good the EG/CM hw is.
libclc uses precomputed constants and multiplication, maybe the same can be applied here.

rivanvx updated this revision to Diff 91190.Mar 9 2017, 10:28 AM
rivanvx marked 2 inline comments as done.

TODO: TESTS NEED TO BE UPDATED

Addresed the comments.

arsenm added inline comments.Mar 9 2017, 10:56 AM
lib/Target/AMDGPU/AMDGPUISelLowering.cpp
16–18 ↗(On Diff #91190)

Should name something else that doesn't collide with the standard names

test/CodeGen/AMDGPU/llvm.log.f16.ll
2 ↗(On Diff #91190)

Can you also add a gfx900 lines for testing <2 x hal>

test/CodeGen/AMDGPU/llvm.log10.ll
3 ↗(On Diff #91190)

Remove -mcpu=SI. Also should sort r600 lines later

rivanvx updated this revision to Diff 95598.Apr 18 2017, 10:20 AM
rivanvx marked 3 inline comments as done.

Sorry for the delay, updated patch, now passes tests, addressed all the comments. I would appreciate if you could check if I got all the R_ and A_ prefixes correctly (it's just variable naming, but there is some logic behind it -- register vs address, I presume?) in f16 tests.

arsenm accepted this revision.Apr 18 2017, 1:58 PM

LGTM with minor test cleanup

test/CodeGen/AMDGPU/llvm.log10.ll
2 ↗(On Diff #95598)

s/SI/GCN

18 ↗(On Diff #95598)

Can you name these vars?

This revision is now accepted and ready to land.Apr 18 2017, 1:58 PM
rivanvx updated this revision to Diff 95716.Apr 19 2017, 4:54 AM
rivanvx marked 2 inline comments as done.

This should be it.

rivanvx updated this revision to Diff 95718.Apr 19 2017, 4:56 AM

That was only a partial diff, should be correct now.

jvesely added inline comments.Apr 24 2017, 8:26 AM
lib/Target/AMDGPU/AMDGPUISelLowering.cpp
1895 ↗(On Diff #95718)

You can pass the log2base constant here to avoid second switch and simplify the code. Just a nitpick

1914 ↗(On Diff #95718)

Using FMUL and inverted Log2Base should be both faster and more precise.

rivanvx updated this revision to Diff 113884.Sep 5 2017, 11:06 AM
rivanvx marked 2 inline comments as done.

@jvesely I would like to get an ACK before I tackle the tests.

looks ok to me. hope you were not waiting for me this whole time.

looks ok to me. hope you were not waiting for me this whole time.

No, absolutely not, it was a combination of limited time for Clover/Clang/LLVM and reluctance to finish things that are almost done but not "interesting" to complete. Thanks for the OK, full patch coming much sooner than this one. :-)

rivanvx updated this revision to Diff 118162.Oct 8 2017, 5:21 AM

This should be it.

jvesely accepted this revision.Oct 12 2017, 9:37 AM
arsenm accepted this revision.Nov 25 2017, 10:33 AM

LGTM

This revision was automatically updated to reflect the committed changes.