This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Provide fastmath sqrt and div functions in altivec.h
ClosedPublic

Authored by nemanjai on Apr 23 2021, 4:40 PM.

Details

Summary

This adds the long overdue implementations of these functions that have been part of the ABI document and are now part of the "Power Vector Intrinsic Programming Reference" (PVIPR).

The approach is to add new builtins and to emit code with the fast flag regardless of whether fastmath was specified on the command line.

Diff Detail

Event Timeline

nemanjai created this revision.Apr 23 2021, 4:40 PM
nemanjai requested review of this revision.Apr 23 2021, 4:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2021, 4:40 PM
qiucf added a subscriber: qiucf.Apr 25 2021, 9:31 PM
qiucf added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
15121

Seems FMF will be automatically restored without the three lines.

vector float test_recipdivd(vector float a, vector float b) {
  vector float x = vec_recipdiv(a, b);
  vector float y = x + b;
  return y;
}
define dso_local <4 x float> @test_recipdivd(<4 x float> %a, <4 x float> %b) {
entry:
  %recipdiv.i = fdiv fast <4 x float> %a, %b
  %add = fadd <4 x float> %recipdiv.i, %b
  ret <4 x float> %add
}

See https://reviews.llvm.org/D96231#inline-901337.

nemanjai added inline comments.Apr 26 2021, 12:07 PM
clang/lib/CodeGen/CGBuiltin.cpp
15121

Thanks for finding that. I did notice that and was wondering how the FMF flags return to what they were in the X86 code. So I added the reset of the flags just to be on the safe side. Now that I see that, I'll get rid of those.

bmahjour requested changes to this revision.Apr 28 2021, 8:46 AM
bmahjour added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
15129

I wonder if we can do better than "fdiv fast"... does the current lowering of "fdiv fast" employ an estimation algorithm via iterative refinement on POWER?

15133

This doesn't implement a reciprocal square root, it just performs a square root! At the very least we need a divide instruction following the call to the intrinsic, but I'm not sure if that'll result in the most optimal codegen at the end. Perhaps we need a new builtin?

clang/test/CodeGen/builtins-ppc-vsx.c
2297

See my comment above about the missing reciprocal operation.

This revision now requires changes to proceed.Apr 28 2021, 8:46 AM
nemanjai added inline comments.Apr 28 2021, 11:10 AM
clang/lib/CodeGen/CGBuiltin.cpp
15129

Yes. This fast includes arcp which will trigger the estimation+refinement algorithm in the back end.

15133

Oh, I misread the documentation. This really seems like a bizarre thing to offer a user. I will change this to 1/sqrt().
In terms of providing optimal performance, with fast-math, the optimizer should get rid of the divide. If compiled at -O0, it isn't reasonable to expect optimal performance to begin with.

nemanjai updated this revision to Diff 341456.Apr 29 2021, 3:18 AM

Changed rsqrt to be an actual reciprocal rather than just a refined sqrt estimate.

I have verified that the code generated is equivalent to gcc's and the results produced are the same.

This revision is now accepted and ready to land.Apr 29 2021, 3:52 PM
This revision was landed with ongoing or failed builds.Apr 30 2021, 5:18 PM
This revision was automatically updated to reflect the committed changes.