This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] try to form more bt/test + set out of shift+mask patterns
ClosedPublic

Authored by spatel on Aug 23 2019, 2:56 PM.

Details

Summary

The motivating bugs are:
https://bugs.llvm.org/show_bug.cgi?id=41340
https://bugs.llvm.org/show_bug.cgi?id=42697

As discussed there, we could view this as a failure of IR canonicalization, but then we would need to implement a backend fixup with target overrides to get this right in all cases. Instead, we can just view this as a target-specific opportunity. It's not even clear for x86 exactly when we should favor test+set; some CPUs have better theoretical throughput for the ALU ops than bt/test.

This patch is made more complicated than I expected because there's an early DAGCombine for 'and' that can change types of the intermediate ops via trunc+anyext.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Aug 23 2019, 2:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2019, 2:56 PM
craig.topper added inline comments.Aug 23 2019, 6:17 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
39006 ↗(On Diff #216957)

How do we know the shift amount isn't bigger than the bit width?

I would think this should go into DAGCombiner under hasBitTest() hook.

lebedev.ri added inline comments.Aug 24 2019, 4:54 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
38981 ↗(On Diff #216957)

I'd think we should, it should just change the predicate i think?

craig.topper added inline comments.Aug 24 2019, 8:58 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
38981 ↗(On Diff #216957)

Without the not we don’t save any instructions do we? Setcc also has lower throughput on recent Intel CPUs and has a partial register update..

spatel marked 2 inline comments as done.Aug 25 2019, 11:41 AM

I would think this should go into DAGCombiner under hasBitTest() hook.

Yes, I forgot we have that hook. Hexagon has enabled it as well as x86, so I added some more tests.

llvm/lib/Target/X86/X86ISelLowering.cpp
38981 ↗(On Diff #216957)

In all the tests I looked at, we avoid the partial reg update problem by xor'ing the reg. But yes, without the 'not' instruction, this is trading 2 instructions for 2 instructions that might have less throughput.

39006 ↗(On Diff #216957)

Yes, good catch. I think we're safe in the basic case without trunc/ext (all oversized shifts get folded to undef), but since we're potentially truncating the input value, we either need to guard against that or add some logic to decide what width we should do the and/setcc. I'll add a bailout for now.

spatel updated this revision to Diff 217058.Aug 25 2019, 11:45 AM

Patch updated:

  1. Move the code to DAGCombiner using the hasBitTest() TLI hook.
  2. Add a bailout if the shift amount exceeds a potentially narrower bitwidth.
  3. Copy tests to Hexagon target and update for diffs.

I think the Hexagon diffs are an improvement, but I don't know what is optimal there; adding Krzysztof as reviewer.

craig.topper added inline comments.Aug 25 2019, 6:24 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5225 ↗(On Diff #217058)

Is it possible for the shift to be on a type larger than 64 bits and have an absurdly out of bounds shift amount that we haven't collapsed to undef yet such that this asserts? Or for that matter a 64 bit shift amount that's larger than 0xffffffff and not been folded yet. Since we truncate a uint64_t to unsigned here.

spatel marked an inline comment as done.Aug 26 2019, 4:18 AM
spatel added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5225 ↗(On Diff #217058)

Given current limits, I don't think either of those scenarios are possible. IR types are capped well below 2^32:
http://llvm.org/docs/LangRef.html#integer-type

And we shouldn't create an overshift node in the 1st place:
SelectionDAG::simplifyShift() - https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/SelectionDAG.cpp#L7120

If that's somehow violated, then the underlying assert in APInt should fire within SDValue::getConstantOperandVal().

But there's practically no cost for leaving the shift amount as uint64_t rather than unsigned, so I can change that.

spatel updated this revision to Diff 217112.Aug 26 2019, 4:29 AM

Patch updated:

  1. Use 'uint64_t' for shift amount instead of implicitly truncating with 'unsigned'.
  2. Fix x86-specific code comment that was left over from earlier rev.
craig.topper added inline comments.Aug 26 2019, 11:07 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5225 ↗(On Diff #217058)

My first scenario was just something like i128 with a shift amount greater than 1 << 64 which would assert in getConstantOperandVal. I know we try to simplify shifts when they're created. I was just worried about a scenario where the shift amount isn't constant when we created it but something folded behind the shift to make it constant and out of bounds then we visited the 'and' node without visiting the shift to simplify it. Its probably not a very realistic scenario, but I know Simon has updated some things to not use getConstantOperandVal because weird scenarios have come up.

spatel updated this revision to Diff 217225.Aug 26 2019, 12:41 PM

Patch updated:
More protection against unexpectedly huge numbers: use getConstantOperandAPInt() rather than getConstantOperandVal().

This revision is now accepted and ready to land.Aug 26 2019, 3:32 PM
spatel updated this revision to Diff 217474.Aug 27 2019, 12:27 PM

Patch updated:
Rebased after rL369947
@kparzysz - do we need to add more/different pattern-matching to form 'tstbit'?

lebedev.ri accepted this revision.Aug 29 2019, 1:27 PM

Looks ok to me.

spatel retitled this revision from [x86] try to form more bt/test + set out of shift+mask patterns to [DAGCombiner] try to form more bt/test + set out of shift+mask patterns.Aug 29 2019, 2:05 PM

Looks ok to me.

But hexagon’s codegen regressed :(

Looks ok to me.

But hexagon’s codegen regressed :(

Yes, although it only regressed after the changes from rL369947 which looks like a response to the earlier rev of this patch.
Ping @kparzysz to see if we should wait for another Hexagon update or if we can proceed and mark those tests with 'TODO'.

xbolva00 added a comment.EditedAug 30 2019, 2:33 PM

Yeah,

int foo_b(int a) {

return (a&1024) == 0;

}

seems to form tstbit..

but more typical case:
_Bool afoo_b(int a) {

return (a&1024) == 0;

}

forms and and cmp.eq...

if hexagon enables hasBitTest(), it is not ideal that it cannot handle this basic case - it should be fixed.

https://godbolt.org/z/gG-6j4

xbolva00 accepted this revision.Aug 31 2019, 11:45 AM

if hexagon enables hasBitTest(), it is not ideal that it cannot handle this basic case - it should be fixed.

..but since this is hexagon general problem (just exposed more now), it should not block this patch futher.

spatel added a comment.Sep 2 2019, 7:34 AM

Filed a hexagon bug here, so this is not lost:
https://bugs.llvm.org/show_bug.cgi?id=43194

This revision was automatically updated to reflect the committed changes.