This is an archive of the discontinued LLVM Phabricator instance.

[x86] scalarize extract element 0 of FP math
ClosedPublic

Authored by spatel on Feb 15 2019, 6:44 AM.

Details

Summary

This is another step towards ensuring that we produce the optimal code for reductions, but there are other potential benefits as seen in the tests diffs:

  1. Memory loads may get scalarized resulting in more efficient code.
  2. Memory stores may get scalarized resulting in more efficient code.
  3. Complex ops like fdiv/sqrt get scalarized which may be faster instructions depending on uarch.
  4. Even simple ops like addss/subss/mulss/roundss may result in faster operation/less frequency throttling when scalarized depending on uarch.

The TODO comment suggests 1 or more follow-ups for opcodes that can currently result in regressions.
The tests for "minimum" and "maximum" IR in extractelement-fp.ll are commented out because those currently crash independently of this patch. I'm not sure what that problem is yet.

Diff Detail

Event Timeline

spatel created this revision.Feb 15 2019, 6:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2019, 6:44 AM
RKSimon added inline comments.Feb 15 2019, 8:09 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
34235

llvm_unreachable

spatel updated this revision to Diff 187027.Feb 15 2019, 9:01 AM
spatel marked an inline comment as done.

Patch updated:
Add llvm_unreachable so we don't accidentally return without a value.

RKSimon added inline comments.Feb 20 2019, 12:42 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
34224

(style) Is clang-format happy with this?

llvm/test/CodeGen/X86/extractelement-fp.ll
138

Should unary ops handle non-zero index extractions?

spatel marked 4 inline comments as done.Feb 21 2019, 4:00 PM
spatel added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
34224

Definitely not. :)
I was trying to keep the list organized and save space while fetching things from the header file, but that didn't last. I'll fix it to use the standard 1-case-per-line.

llvm/test/CodeGen/X86/extractelement-fp.ll
138

There's some set of conditions under which we want to do that, but I don't think unary alone is it. For example, it probably makes sense to shuffle an fdiv to avoid the vector op. Also, anything that would get expanded via unrolling during legalization? I'll add a TODO note so we can follow-up.

spatel updated this revision to Diff 187882.Feb 21 2019, 4:02 PM
spatel marked 2 inline comments as done.

Patch updated:

  1. Added TODO comment about handling non-zero extract index.
  2. Reformatted switch.
RKSimon accepted this revision.Feb 28 2019, 10:37 AM

LGTM with a couple of minors

llvm/lib/Target/X86/X86ISelLowering.cpp
34240

You should be able to add RSQRTSS + RCPSS as well?

llvm/test/CodeGen/X86/extractelement-fp.ll
309

Slightly annoying - the vector version broadcasts a scalar, the scalar version loads a whole vector..... Can you raise a bug on this please?

This revision is now accepted and ready to land.Feb 28 2019, 10:37 AM
spatel marked 4 inline comments as done.Feb 28 2019, 11:36 AM
spatel added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
34240

Yes - I'll add that to the TODO. There are other x86-specific opcodes like FMAX/FMAXC too, so I need to do a full audit.

llvm/test/CodeGen/X86/extractelement-fp.ll
309
This revision was automatically updated to reflect the committed changes.
spatel marked 2 inline comments as done.