Page MenuHomePhabricator

[Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold
ClosedPublic

Authored by lebedev.ri on Jun 4 2019, 12:28 PM.

Details

Summary

This was originally reported in D62818.
https://rise4fun.com/Alive/oPH

InstCombine does the opposite fold, in hope that C l>>/<< Y expression
will be hoisted out of a loop if Y is invariant and X is not.
But as it is seen from the diffs here, if it didn't get hoisted,
the produced assembly is almost universally worse.

Much like with my recent "hoist add/sub by/from const" patches,
we should get almost universal win if we hoist constant,
there is almost always an "and/test by imm" instruction,
but "shift of imm" not so much, so we may avoid having to
materialize the immediate, and thus need one less register.
And since we now shift not by constant, but by something else,
the live-range of that something else may reduce.

Special care needs to be applied not to disturb x86 BT / hexagon tstbit
instruction pattern. And to not get into endless combine loop.

From what i can tell,

  • PPC changes are all good
  • AMDGPU neutral
  • AArch64 good except vectors (pattern with lshr improves, but shl symmetrically degrades)
  • ARM all good, at least in general?
  • X86 look ok, the immediate gets encoded int test instruction (vectors are a mess regardless)

Diff Detail

Event Timeline

lebedev.ri created this revision.Jun 4 2019, 12:28 PM

Forgot to add, please bikeshed on the function names.
I know these are horrible, and i can't come up with better ones.

For the cases involving sign bits, are we actually expecting the IR to look like this when we reach SelectionDAG? With some form of D62818, I would expect we end up with "icmp slt"...

AArch64 good except vectors (pattern with lshr improves, but shl symmetrically degrades)

AArch64 doesn't have separate shift left and shift right instructions; instead, it's based on the sign of the shift amount. This can make the optimal pattern a little strange, yes.

For the cases involving sign bits, are we actually expecting the IR to look like this when we reach SelectionDAG? With some form of D62818, I would expect we end up with "icmp slt"...

Probably not, but that is what we end up with currently https://godbolt.org/z/IDhcTo
That does look like missing instcombine fold (+a fold that doesn't fire),
not sure if we want to also handle it here in backend.

AArch64 good except vectors (pattern with lshr improves, but shl symmetrically degrades)

AArch64 doesn't have separate shift left and shift right instructions; instead, it's based on the sign of the shift amount. This can make the optimal pattern a little strange, yes.

FWIW, the AMDGPU changes aren't entirely neutral because they shift the wait for previous memory loads up by one instructions, so it's a minor loss. However, that seems benign enough that I wouldn't object.

RKSimon added inline comments.Jun 5 2019, 3:47 AM
lib/Target/X86/X86ISelLowering.h
845 ↗(On Diff #202996)

put this into X86ISelLowering.cpp, same for Hexagon

test/CodeGen/X86/hoist-and-by-const-from-shl-in-eqcmp-zero.ll
740 ↗(On Diff #202996)

please can you run this through llvm-mca to compare perf?

lebedev.ri added inline comments.Jun 5 2019, 4:10 AM
test/CodeGen/X86/hoist-and-by-const-from-shl-in-eqcmp-zero.ll
740 ↗(On Diff #202996)

While this isn't what you asked, i can tell you right away that both variants are very far from optimal:
https://godbolt.org/z/6dKBNR

lebedev.ri marked an inline comment as done.Jun 5 2019, 4:44 AM
lebedev.ri added inline comments.
test/CodeGen/X86/hoist-and-by-const-from-shl-in-eqcmp-zero.ll
740 ↗(On Diff #202996)

Hm, i was looking at that wrong, of course that isn't identical,
because those sse/sse2 shifts all shift all elements by the same amount.

mca: https://godbolt.org/z/PQrkhj
(i did not ensure strictly sequential execution though)

lebedev.ri updated this revision to Diff 203143.Jun 5 2019, 7:30 AM
lebedev.ri marked an inline comment as done and an inline comment as not done.

Add X86 AVX2 runlines, move hasBitTest definitions into .cpp

test/CodeGen/X86/hoist-and-by-const-from-shl-in-eqcmp-zero.ll
740 ↗(On Diff #202996)

It's interesting to note how, much like aarch64, these vector changes are symmetrical.
(X & (C << Y)) ==/!= 0 is results in best x86 pre-avx2 codegen
Should i just tune the TLI x86 hook?

Not sure, but this pattern regressed
https://godbolt.org/z/vu9C6F

Clang 8 OK, trunk not..

@nikic

nikic added a comment.Jun 5 2019, 8:53 AM

@xbolva00 Looks like an instcombine fail:

define dso_local i32 @do_shift(i32, i32) local_unnamed_addr #0 {
  %3 = tail call i32 @llvm.usub.sat.i32(i32 %1, i32 %0)
  %4 = add i32 %3, %0
  ret i32 %4
}

usub.sat(a, b) + b should be canonicalized to umax(a, b).

lebedev.ri updated this revision to Diff 203236.Jun 5 2019, 1:13 PM
lebedev.ri marked 4 inline comments as done.

Disable non-beneficial folds.

ping. i *think* i addressed all review notes?

ping again.
I'm not aware of any blockers/review feedback that i need to address here.

LGTM with one minor change.

include/llvm/CodeGen/TargetLowering.h
538 ↗(On Diff #203236)

Maybe worth clarifying here that the point of this hook is to prevent DAGCombine from breaking the pattern?

include/llvm/CodeGen/TargetLowering.h
579 ↗(On Diff #203236)

Is there some reason to send XC in as a parameter rather than asking if it is a constant within this function and giving that constant a local name?

lebedev.ri marked 2 inline comments as done.Jul 24 2019, 2:49 PM
lebedev.ri added inline comments.
include/llvm/CodeGen/TargetLowering.h
538 ↗(On Diff #203236)

Or creating it if it has potential of being recognized, will add.

579 ↗(On Diff #203236)

I'm pretty sure i have tried that.
As far as i recall, i can't do that because isConstOrConstSplat() is defined in DAGCombine.cpp,
which isn't guaranteed to be linked in every library where this hook could end up.

spatel added inline comments.Jul 24 2019, 3:13 PM
include/llvm/CodeGen/TargetLowering.h
579 ↗(On Diff #203236)

Ah yes, we have a mess of misplaced helper functions like that. No need to hold this patch up any more then, but it would be good to add a comment somewhere with that explanation.

lebedev.ri marked 3 inline comments as done.

Thank you for taking a look!
Rebased, addressed notes about comments.
Care to stamp? :)

spatel accepted this revision.Jul 24 2019, 3:42 PM

LGTM

This revision is now accepted and ready to land.Jul 24 2019, 3:42 PM

LGTM

Thank you for the review!

This revision was automatically updated to reflect the committed changes.