This is an archive of the discontinued LLVM Phabricator instance.

[x86] invert a vector select IR canonicalization with a binop identity constant
ClosedPublic

Authored by spatel on Jan 31 2022, 12:50 PM.

Details

Summary

This is an intentionally limited/different form of D90113. That patch bravely tries to generalize folds where we pull a binop into the arms of a select:
N0 + (Cond ? 0 : FVal) --> Cond ? N0 : (N0 + FVal)
...across types and targets. This is the inverse of IR canonicalization as discussed in D113442

I'm not sure if this is even profitable within x86, so I'm only proposing to handle x86 vector fadd/fsub as a 1st step. The intent is to prevent AVX512 regressions as mentioned in D113442. Please look closely at the test diffs to confirm if this is correct and better.

Diff Detail

Event Timeline

spatel created this revision.Jan 31 2022, 12:50 PM
spatel requested review of this revision.Jan 31 2022, 12:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2022, 12:50 PM
xbolva00 added inline comments.Jan 31 2022, 12:57 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
48969

Target independent location with target hook?

spatel marked an inline comment as done.Jan 31 2022, 4:10 PM
spatel added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
48969

Yes, that is the most likely outcome. I think it would be fine to start with just generic opcodes, and then we could extend it with target-specific as needed. That's similar to what we have for TLI.isCommutativeBinop() (used below here).

pengfei added inline comments.Feb 1 2022, 2:23 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
48974–48977

Do we need to consider the nsz case?

llvm/test/CodeGen/X86/vector-bo-select.ll
696–697

Is the left code better?

704–705

The left seems better.

If we're targeting the D113442 regressions for 14.x I'd probably suggest we just limit this to AVX512 targets as a quick fix and iterate on it for 15.x.

llvm/test/CodeGen/X86/vector-bo-select.ll
697

This is definitely a regression - VBLENDVPS is a lot slower than VPAND - but the codegen is awful anyway - combineToExtendBoolVectorInReg is supposed to deal with this, but it obviously fails :(

705

Not a notable regression, but non-AVX512VL targets are going to see this change - do we care?

spatel marked an inline comment as done.Feb 1 2022, 5:12 AM

If we're targeting the D113442 regressions for 14.x I'd probably suggest we just limit this to AVX512 targets as a quick fix and iterate on it for 15.x.

I agree. I didn't restrict it in this first draft just to show that there are likely going to be several subtle opportunities/regressions that need to be dealt with. The blendv codegen with AVX2 is one.

For AVX512, I'm not sure exactly which flavors we want to include/exclude - should I add more/different RUN lines to the test file? Do we want more tests with 512-bit types?

If we're targeting the D113442 regressions for 14.x I'd probably suggest we just limit this to AVX512 targets as a quick fix and iterate on it for 15.x.

I agree. I didn't restrict it in this first draft just to show that there are likely going to be several subtle opportunities/regressions that need to be dealt with. The blendv codegen with AVX2 is one.

For AVX512, I'm not sure exactly which flavors we want to include/exclude - should I add more/different RUN lines to the test file? Do we want more tests with 512-bit types?

I'd probably say float/double 512-bit vectors for AVX512 - and 128/256-bit vectors with AVX512VL

spatel marked an inline comment as done.Feb 1 2022, 8:39 AM
spatel added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
48974–48977

In IR, it looks like we still create the -0.0 constant even with nsz, but yes, I suspect we will want to handle both forms of zero if we have NSZ.

We'll need more test coverage. I'll add another TODO for now.

spatel updated this revision to Diff 404948.Feb 1 2022, 8:43 AM
spatel marked an inline comment as done.

Patch updated:

  1. Added checks for AVX512 and AVX512VL.
  2. Added tests to show more diffs.
  3. Added TODO comments (there are many!).
spatel updated this revision to Diff 404950.Feb 1 2022, 8:46 AM

Patch updated:
The previous upload didn't have all of the TODO comments updated.

LuoYuanke accepted this revision.Feb 1 2022, 10:52 PM

LGMT. Thanks, Sanjay. I think we can combine more operator based on this patch.

This revision is now accepted and ready to land.Feb 1 2022, 10:52 PM
pengfei accepted this revision.Feb 2 2022, 4:54 AM

LGTM.

This revision was landed with ongoing or failed builds.Feb 2 2022, 5:18 AM
This revision was automatically updated to reflect the committed changes.

Please also backport to llvm 14 branch.