Page MenuHomePhabricator

[DAGCombiner] improve throughput of shift+logic+shift
ClosedPublic

Authored by spatel on Fri, Aug 30, 2:45 PM.

Details

Summary

The motivating case for this is a long way from here:
https://bugs.llvm.org/show_bug.cgi?id=43146
...but I think this is where we have to start.

We need to canonicalize/optimize sequences of shift and logic to ease pattern matching for things like bswap and improve perf in general. But without the artificial limit of '!LegalTypes' (early combining), there are a lot of test diffs, and not all are good.

In the minimal tests added for this proposal, x86 should have better throughput in all cases. AArch64 is neutral because it can fold shifts into bitwise logic ops.

There are 3 shift opcodes and 3 logic opcodes for a total of 9 possible patterns:
https://rise4fun.com/Alive/VlI
https://rise4fun.com/Alive/n1m
https://rise4fun.com/Alive/1Vn

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Fri, Aug 30, 2:45 PM
spatel marked an inline comment as done.Fri, Aug 30, 3:02 PM
spatel added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7322–7323 ↗(On Diff #218173)

I forgot to delete the existing check of the TLI hook when I copied it. That should be an NFC preliminary commit, but I need to confirm.

efriedma added inline comments.
llvm/test/CodeGen/AArch64/bitfield-insert.ll
269 ↗(On Diff #218173)

Not really related to your patch, but I'm surprised we aren't matching the and+lsl to ubfiz. Something odd going on with the zext, I guess: we lower the zext to a mask, but then don't switch the "and" with the shift like instcombine would. (Simplified example: "int a(unsigned char x) { return x*8; }", vs. "int a(int x) { return ((unsigned char)x)*8; }".)

If that gets fixed, it looks like this saves one instruction, maybe? Not sure what effect that fix would have on the old code.

lebedev.ri added inline comments.Sat, Aug 31, 2:25 AM
llvm/test/CodeGen/AArch64/bitfield-insert.ll
269 ↗(On Diff #218173)

Random mention that i'm seeing similar issue in https://reviews.llvm.org/D62100#change-c0aDpsCUsZ0a
That patch is stuck due to that i think :/

spatel updated this revision to Diff 218222.Sat, Aug 31, 8:28 AM

Patch updated:
Rebased after cleanup - rL370587.

spatel updated this revision to Diff 218223.Sat, Aug 31, 8:58 AM

Patch updated:
Rebased after improving test coverage - cycle through various integer and vector types.
It shows we miss an 'LEA' opportunity on x86, but I don't think that's a big deal.

I think this looks ok, some comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7251–7252 ↗(On Diff #218223)

I think you want to make matchFirstShift() return that X.

7266 ↗(On Diff #218223)

It would be less confusing to C1 = Shift->getOperand(1); somewhere up there, and use C1 here.

spatel updated this revision to Diff 218257.Sun, Sep 1, 5:34 AM
spatel marked 3 inline comments as done.

Patch updated:
Try to improve readability.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7251–7252 ↗(On Diff #218223)

That looked more awkward to me than capturing the shifted value by reference along with the shift amount.
See if the updated code is easier to read.

7266 ↗(On Diff #218223)

I've never found a good way to juggle the naming of the various SDValue, ConstantSDNode, and underlying APInt variables. See if the update is easier to read.

lebedev.ri accepted this revision.Sun, Sep 1, 6:16 AM

Thanks, that is better.
LG with some thoughts.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7207–7209 ↗(On Diff #218257)

+ the shifts must be identical

7217–7219 ↗(On Diff #218257)

Is there some function already that has whitelist of such bitwise ops?
I suspect there are more candidates.

7240–7243 ↗(On Diff #218257)

This looks suspicious to be honest.
Is there a case where that is so?
If not, can this be an assert until then?

7247 ↗(On Diff #218257)

I'd think bool APInt::ult(uint64_t RHS) would be fine to use too

This revision is now accepted and ready to land.Sun, Sep 1, 6:16 AM
spatel marked 2 inline comments as done.Sun, Sep 1, 7:30 AM
spatel added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7217–7219 ↗(On Diff #218257)

Target-specific opcodes? And/or/xor are the only generic opcodes that I am aware of.

7240–7243 ↗(On Diff #218257)

It's real - SDAG is full of surprising situations like this. I hit an assert in a regression test without this.

That was before I added the !LegalTypes constraint, so it might be hidden now, but there are few bounds down here.

lebedev.ri marked an inline comment as done.Sun, Sep 1, 7:39 AM
lebedev.ri added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7217–7219 ↗(On Diff #218257)

Right yes, i was talking about target-specific ones.

RKSimon added inline comments.Sun, Sep 1, 9:22 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7240–7243 ↗(On Diff #218257)

getShiftAmountTy gets used in most cases when a combine creates a new shift, so it is likely to have occurred before legalization.

This revision was automatically updated to reflect the committed changes.
spatel marked an inline comment as done.
spatel marked an inline comment as done.Sun, Sep 1, 12:07 PM
spatel added inline comments.
llvm/test/CodeGen/AArch64/bitfield-insert.ll
269 ↗(On Diff #218173)