This is an archive of the discontinued LLVM Phabricator instance.

[X86] Lowering X86 avx512 sqrt intrinsics to IR - LLVM
ClosedPublic

Authored by tkrupa on Dec 27 2017, 6:57 AM.

Details

Summary

Together with a matching clang patch, lowering the sqrt intrinsics.
Notice that for the scalar type there is another move instruction which should be removed by another patch.

This patch removes the sqrt intrinsics and give support AutoUpgrade.cpp for backward compatibility.

Diff Detail

Event Timeline

uriel.k created this revision.Dec 27 2017, 6:57 AM

Won't this mean that explicit calls to the SSE sqrt intrinsics may be converted to the rsqrt+NR estimates in some cases?

RKSimon added inline comments.Dec 28 2017, 8:36 AM
test/CodeGen/X86/sse-intrinsics-x86.ll
461–462

Why did you move this test?

test/CodeGen/X86/sse2-intrinsics-fast-isel.ll
2954

Shouldn't that be llvm.sqrt.v2f64?

test/CodeGen/X86/sse2-intrinsics-x86-upgrade.ll
20

Strip these checks and regenerate

uriel.k updated this revision to Diff 128383.Jan 1 2018, 1:12 AM
uriel.k marked 3 inline comments as done.

Won't this mean that explicit calls to the SSE sqrt intrinsics may be converted to the rsqrt+NR estimates in some cases?

Yes, this is expected as that's what we are aiming by lowering the intrinsics to IR code, we want the compiler to make a better decision, to get better performance.
Correct me if miss something special about this intrinsic.

test/CodeGen/X86/sse-intrinsics-x86.ll
461–462

You are right, my mistake. fixed.

test/CodeGen/X86/sse2-intrinsics-fast-isel.ll
2954

fixed.

Simon, is there anything else you think that is needed to be changed before accepting the revision?
Thanks

Simon, is there anything else you think that is needed to be changed before accepting the revision?
Thanks

I'm still a little worried about this - it can create a lot more bit differences in results than previous other intrinsics where we've replaced with generic implementations - I guess _mm_div_ps already does this to an extent (and other fadd/fsub/fmul cases via re-association etc.).

Maybe I'm just being a little over cautious, but at very least I'd like to see D41168 update the intrinsic documentation to explain that -ffast-math may result in rsqrt+nr codgen under some circumstances - it still says that (v)sqrtps will be generated.

Simon, is there anything else you think that is needed to be changed before accepting the revision?
Thanks

I'm still a little worried about this - it can create a lot more bit differences in results than previous other intrinsics where we've replaced with generic implementations - I guess _mm_div_ps already does this to an extent (and other fadd/fsub/fmul cases via re-association etc.).

Maybe I'm just being a little over cautious, but at very least I'd like to see D41168 update the intrinsic documentation to explain that -ffast-math may result in rsqrt+nr codgen under some circumstances - it still says that (v)sqrtps will be generated.

We want to allow more transforms with -ffast-math regardless of whether the source used intrinsics, so we shouldn’t treat sqrt differently than other math ops.

That said, updating the clang header docs is necessary too. We’re almost certainly going to see fallout from this change and get bug reports.
One nit: these patches are titled ‘avx512’ when they apply to all avx/sse. When committing, I recommend splitting these patches up by intrinsic (if not individually, then at least by sse/avx/avx512). This will reduce the risk that the whole thing gets reverted…that will also make it easier to pinpoint exactly which intrinsic is under investigation when the complaints come in.

Looking at clang's CGBuiltin.cpp we do have precedent for using Intrinsic::sqrt for builtins for AArch64, PowerPC, and SystemZ.

include/llvm/IR/IntrinsicsX86.td
4497

Why are we renaming intrinsics here? Is this done to purposely exclude the AVX512 intrinsics? Why are we doing that?

mike.dvoretsky added inline comments.
include/llvm/IR/IntrinsicsX86.td
4497

It seems to me that this is done to avoid unconditionally generating the intrinsic in CodeGenFunction::EmitBuiltinExpr in CGBuiltin.cpp on the clang side while keeping the intrinsic available in IR for cases where the rounding mode isn't 4 and it's not being lowered. I haven't been able to find other intrinsic-lowering patches that take measures to keep the intrinsic available rather than just deleting it from IR, so I can't say if this change is conventional. If it isn't, then we need to either look into changing the algorithm in EmitBuiltinExpr to check for lowering before checking if llvm supports the intrinsic, or propose a renaming convention for cases like this one. In the latter case I would propose to put "nonlowered" in the names after the target prefix to keep these distinguishable as renamed, rather than aiming for a similar name and confusing people.

mike.dvoretsky added inline comments.Feb 7 2018, 9:16 AM
include/llvm/IR/IntrinsicsX86.td
4497

Looks like a better method to preserve the intrinsics exists for this case. Instead of renaming them, one may simply remove the GCCBuiltin template from their def's here and leave them untouched in X86IntrinsicsInfo.h. That method should be made conventional for patches like this and D41168.

@uriel.k Sorry, this patch and D41168 fell off my radar for a while - are you still looking at this?

@uriel.k Sorry, this patch and D41168 fell off my radar for a while - are you still looking at this?

Sorry, I've been away for a while.
There will be someone else replacing me.
i'm not sure if he will continue this revision or start a new one, so for now I leave this open.

Thanks for asking

Uriel

tkrupa added a subscriber: tkrupa.Mar 30 2018, 2:23 AM

I was assigned to finish this task. Is it possible to set me as an author of this ticket or do I need to open a new one?

I was assigned to finish this task. Is it possible to set me as an author of this ticket or do I need to open a new one?

You can commandeer this patch to set you as the author - look under the 'Add Action...' tab on Phabricator

tkrupa commandeered this revision.Mar 30 2018, 4:28 AM
tkrupa added a reviewer: uriel.k.
tkrupa updated this revision to Diff 140404.Mar 30 2018, 4:41 AM

As mike.dvoretsky suggested, I reversed renaming of those 4 round intrinsics. Instead, I removed the binding to gcc builtins in IntrinsicsX86.td. This way, they don't get lowered in AutoUpgrade.cpp but new code emitted with clang still gets lowered. Besides that, I changed comments in AutoUpgrade.cpp from "Added 6.0" to "Added in 7.0".

craig.topper added inline comments.Apr 4 2018, 10:21 AM
test/CodeGen/X86/avx512-intrinsics-upgrade.ll
4

This patch doesn't appear to have removed the scalar intrinsics. I dont' see any AutoUpgrade code or removal from X86InstrinsicsInfo.h

tkrupa added inline comments.Apr 5 2018, 1:42 AM
test/CodeGen/X86/avx512-intrinsics-upgrade.ll
4

You're right, they only get lowered in clang. I gave the reasoning in the comment to the last upload.
Is it enough to just move these 4 tests back to test/CodeGen/X86/avx512-intrinsics.ll or is it crucial to also lower them in LLVM part?

craig.topper added inline comments.Apr 5 2018, 10:04 AM
test/CodeGen/X86/avx512-intrinsics-upgrade.ll
4

You can just move them back. But if clang isn't using them, make sure the GCCBuiltin is removed from IntrinsicsX86.td and leave a FIXME saying that they can be removed. There are lot of FIXMEs like that in that file already.

tkrupa updated this revision to Diff 141287.Apr 6 2018, 12:47 AM
tkrupa marked an inline comment as done.
craig.topper added inline comments.Apr 6 2018, 9:40 AM
include/llvm/IR/IntrinsicsX86.td
4504

Isn't clang still using this one when the rounding mode is non-default?

4507

Same with this one?

tkrupa added inline comments.Apr 8 2018, 11:52 PM
include/llvm/IR/IntrinsicsX86.td
4504

It does, ss and sd intrinsics also do. The GCCBuiltin binds needed to be removed to enable lowering in AutoUpgrade but yeah, these definitions should stay. Is erasing the FIXME enough or should there be some note to not remove them?

craig.topper added inline comments.Apr 9 2018, 9:42 AM
include/llvm/IR/IntrinsicsX86.td
4504

Removing the FIXME should be enough. If anyone tries to delete it, they'll get a build error in clang.

tkrupa updated this revision to Diff 142013.Apr 11 2018, 8:06 AM

Removed FIXME annotations.

This revision is now accepted and ready to land.Apr 11 2018, 10:24 AM
tkrupa requested review of this revision.Jun 1 2018, 2:46 AM
tkrupa updated this revision to Diff 149416.

Added lowering of scalar sqrt intrinsics without rounding (relies on D47621).

craig.topper added inline comments.Jun 3 2018, 3:28 PM
lib/IR/AutoUpgrade.cpp
82

The sse.sqrt.ss and sse2.sqrt.sd intrinsics are still in IntrinsicsX86.td

tkrupa updated this revision to Diff 149925.Jun 5 2018, 3:38 AM
tkrupa marked an inline comment as done.
This revision is now accepted and ready to land.Jun 8 2018, 9:25 AM
This revision was automatically updated to reflect the committed changes.