This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] shrink switch conditions better (PR24766)
ClosedPublic

Authored by spatel on Sep 18 2015, 8:44 AM.

Details

Summary

As noted in PR24766:
https://llvm.org/bugs/show_bug.cgi?id=24766#c2

...we're not doing a great job of optimizing/shrinking switch conditions in InstCombine. If we do better, we could remove bogus case statements.

Step 1 to solving this is to remove a hack that was added for the benefit of x86 codegen. It prevented shrinking the switch condition even to smaller legal types.

FWIW, I don't see any perf difference on test-suite on x86 with this change, and I'm not sure how the supposed extra zext x86 instructions would hurt anyway.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 35088.Sep 18 2015, 8:44 AM
spatel retitled this revision from to [InstCombine] shrink switch conditions better (PR24766).
spatel updated this object.
spatel added reviewers: nlewycky, sanjoy, hfinkel.
spatel added a subscriber: llvm-commits.
hfinkel edited edge metadata.Sep 18 2015, 4:36 PM

FWIW, I don't see any perf difference on test-suite on x86 with this change, and I'm not sure how the supposed extra zext x86 instructions would hurt anyway.

Okay, but I'm still hesitant to generate unnecessary instructions. Do we still end up with the unnecessary zext instructions with this patch?

lib/Transforms/InstCombine/InstructionCombining.cpp
2160 ↗(On Diff #35088)

I think it would be better to refactor InstCombiner::ShouldChangeType to have a version that directly takes the bit widths. ShouldChangeType just calls getPrimitiveSizeInBits on the provided types anyway, and that prevents the need to speculatively call Type::getIntNTy.

spatel updated this revision to Diff 35286.Sep 21 2015, 11:40 AM
spatel edited edge metadata.

Patch updated:
Use new version (r248170) of "ShouldChangeType()".

Okay, but I'm still hesitant to generate unnecessary instructions. Do we still end up with the unnecessary zext instructions with this patch?

Yes, we're producing unnecessary zexts. Based on that comment in InstCombine, I thought this was just a quirk of x86 (possibly related to bug 22473), but it's a target-independent problem for SelectionDAGBuilder::visitSwitch(). It causes unnecessary mask instructions for PPC and ARM64 too:

Optimized lowered selection DAG: BB#8 'positive1:entry'
         t0: i32 = Register %vreg3
       t1: i32,ch = CopyFromReg t0, t0
     t2: i16 = truncate t1
     t3: i16 = Constant<1001>
     t4: ch = seteq
   t5: i1 = setcc t2, t3, t4

Type-legalized selection DAG: BB#8 'positive1:entry'
           t0: i32 = Register %vreg3
         t1: i32,ch = CopyFromReg t0, t0
         t15: i32 = Constant<65535>
       t16: i32 = and t1, t15
       t10: i32 = Constant<1001>
       t4: ch = seteq
     t11: i32 = setcc t16, t10, t4

I think the root problem is that I'm mixing up the legal types specified by the datalayout with the codegen definition of legality based on the register width. So I'm not sure if we should just hack around this limitation here in InstCombine's visitSwitchInst() or shrink as much as possible here without regard for legality and then fix it up in CodeGenPrepare?

Okay, but I'm still hesitant to generate unnecessary instructions. Do we still end up with the unnecessary zext instructions with this patch?

Yes, we're producing unnecessary zexts. Based on that comment in InstCombine, I thought this was just a quirk of x86 (possibly related to bug 22473), but it's a target-independent problem for SelectionDAGBuilder::visitSwitch(). It causes unnecessary mask instructions for PPC and ARM64 too:

Optimized lowered selection DAG: BB#8 'positive1:entry'
         t0: i32 = Register %vreg3
       t1: i32,ch = CopyFromReg t0, t0
     t2: i16 = truncate t1
     t3: i16 = Constant<1001>
     t4: ch = seteq
   t5: i1 = setcc t2, t3, t4

Type-legalized selection DAG: BB#8 'positive1:entry'
           t0: i32 = Register %vreg3
         t1: i32,ch = CopyFromReg t0, t0
         t15: i32 = Constant<65535>
       t16: i32 = and t1, t15
       t10: i32 = Constant<1001>
       t4: ch = seteq
     t11: i32 = setcc t16, t10, t4

I think the root problem is that I'm mixing up the legal types specified by the datalayout with the codegen definition of legality based on the register width. So I'm not sure if we should just hack around this limitation here in InstCombine's visitSwitchInst() or shrink as much as possible here without regard for legality and then fix it up in CodeGenPrepare?

On theoretical grounds, I favor the latter. We should shrink aggressively for our canonical form because smaller types convey the most information at the IR level; we tend to lose information when we promote. If we can do this, and then fix it up later based on actual target-type-legality information later, I think that's better.

sanjoy resigned from this revision.Apr 17 2016, 10:18 PM
sanjoy removed a reviewer: sanjoy.

Resigning from old patch to get it off my queue. Feel free to add me back if this is revived at some point.

spatel updated this revision to Diff 57224.May 13 2016, 11:32 AM
spatel added a reviewer: sanjoy.

Patch (finally) updated:
Sorry for letting this sit for so long.
We added the CGP safety harness in D13532 / rL251857 , so we're now free to shrink in InstCombine without regard for the target.

Note that we still don't catch the '9999' case that Nick mentioned in https://llvm.org/bugs/show_bug.cgi?id=24766#c2 ; I think there must be a hole in SimplifyCFG's EliminateDeadSwitchCases() or computeKnownBits() itself. I'll investigate that next.

Note that we still don't catch the '9999' case that Nick mentioned in https://llvm.org/bugs/show_bug.cgi?id=24766#c2 ; I think there must be a hole in SimplifyCFG's EliminateDeadSwitchCases() or computeKnownBits() itself. I'll investigate that next.

A possible fix for that problem: D20275

Ping.

Note that D20275 / rL270222 used ComputeNumSignBits() to eliminate cases. We could probably do something similar here in InstCombine as a follow-on.

Hmm. My understanding was that InstCombine was not supposed to create IR with illegal types if it could help it.

Hmm. My understanding was that InstCombine was not supposed to create IR with illegal types if it could help it.

Do we state that anywhere? Should we make an exception for cases where we know that we have a fixup in a later pass (as we do for this in CGP)?

I was proceeding based on Hal's earlier comment:
"We should shrink aggressively for our canonical form because smaller types convey the most information at the IR level."

Hmm. My understanding was that InstCombine was not supposed to create IR with illegal types if it could help it.

Do we state that anywhere? Should we make an exception for cases where we know that we have a fixup in a later pass (as we do for this in CGP)?

I was proceeding based on Hal's earlier comment:
"We should shrink aggressively for our canonical form because smaller types convey the most information at the IR level."

https://github.com/llvm-mirror/llvm/blob/a2bdb6de81cd1de9daaeac9fa8bd24b6e0a48263/lib/Transforms/InstCombine/InstructionCombining.cpp#L91

Hmm. My understanding was that InstCombine was not supposed to create IR with illegal types if it could help it.

Do we state that anywhere? Should we make an exception for cases where we know that we have a fixup in a later pass (as we do for this in CGP)?

I was proceeding based on Hal's earlier comment:
"We should shrink aggressively for our canonical form because smaller types convey the most information at the IR level."

https://github.com/llvm-mirror/llvm/blob/a2bdb6de81cd1de9daaeac9fa8bd24b6e0a48263/lib/Transforms/InstCombine/InstructionCombining.cpp#L91

Ha...I should've remembered that. I added the alternate flavor of that function for the earlier/weaker version of *this* patch!
So then we're on to my 2nd question: we have a safety mechanism in place for this transform; does that change the outcome?
I understand the general reluctance for weird types given the disaster potential, but I don't see the downside for this specific case.

Note that we already got SimplifyCFG to remove the dead case for the example in PR24766, but I wonder if this transform might come into play for something like PR27555:
https://llvm.org/bugs/show_bug.cgi?id=27555

majnemer accepted this revision.Jun 29 2016, 9:00 AM
majnemer added a reviewer: majnemer.

LGTM

This revision is now accepted and ready to land.Jun 29 2016, 9:00 AM
This revision was automatically updated to reflect the committed changes.