This is an archive of the discontinued LLVM Phabricator instance.

Revert 221429 and fix bug in TargetLowering::SimplifyDemandedBits
ClosedPublic

Authored by gilr on Jan 13 2015, 6:33 AM.

Details

Summary

Suggest to replace rL221429 with the fix described in D6321.

The bug in SimplifyDemandedBits (using DemandedMask instead of the single-use guarded NewMask) should be fixed regardless of the specific X86 blend optimization. Once it's applied, handling multi-uses in the X86 specific code is not needed for correctness.

While 221429 offers a way to perform the optimization even under multiple uses the current implementation would not get a chance to do so due to the multi-use guard (also note that test3 on vselect-avx.ll actually benefits from not performing the optimization as the sext-in-reg is removed altogether if left untouched). Perhaps this approach is worth generalizing to other multi-use cases?

Diff Detail

Repository
rL LLVM

Event Timeline

gilr updated this revision to Diff 18081.Jan 13 2015, 6:33 AM
gilr retitled this revision from to Revert 221429 and fix bug in TargetLowering::SimplifyDemandedBits.
gilr updated this object.
gilr edited the test plan for this revision. (Show Details)
gilr added a reviewer: qcolombet.
gilr added subscribers: Unknown Object (MLST), mkuper.
qcolombet edited edge metadata.Jan 14 2015, 1:47 PM

Hi Gil,

How does this impact the performances on the whole test suite + externals?

Indeed, when I did r221429, handling the multiple uses like you do in the proposal here, lead to bad regressions. That said, what I tried at that time was slightly different so it may not be the case here.
Anyway, we need to ascertain that.

Let me know if you do not have access to externals, I may be able to run some benchmarks on your behalf.

Thanks,
-Quentin

gilr added a comment.Jan 27 2015, 12:01 AM

Hi Quentin,

I ran lnt w/ spec2006 and didn't witness any related performance degradations. However if you could also run it (for more coverage if you have other externals, and to double-check me) that would be great.

One thing I did notice is that this specific sext-in-reg may be optimized away if left alone: DAGCombiner checks whether the value was already sign extended and drops it (this actually happens in vselect-avx.ll). This suggests that at least on some conditions this X86 optimization is redundant all together, but this requires thorough checking beyond this bug fix.

Thanks,
Gil

Hi Gil,

Thanks for checking.
I will try to give a try later this week.

One question though!

First, some context.
I can see what kind of bug your patch fixes but I am not convinced that we should remove the SHRUNK_BLEND node. This node prevents optimization on VSELECT (either generic or target specific) to take place later on. This is required because the boolean feeding the related node does not match anymore the expected layout of x86 generic boolean and if those optimizations happens, we may generate wrong code.

Please clarify how your patch addresses that.

Thanks,
-Quentin

gilr added a comment.Jan 29 2015, 6:09 AM

Hi Quentin,

Ahh, right, removing the SHRUNK_BLEND node would re-invalidate the booleaness property of the VBLEND condition.
I'll prepare a new patch which keeps the SHRUNK_BLEND node and only fixes the DemandedMask bug (to protect other callers to SimplifyDemandedBits). I'll also look into narrowing this optimization to skip cases where the sext-in-reg could provably be removed on account of all bits already being set.

Thanks,
Gil

Hi Gil,

When do you think you will update the patch?

I am trying to plan when I should spare some time to look into the benchmarks.

Thanks,
-Quentin

gilr added a comment.Feb 5 2015, 1:25 PM

Hi Quentin,

Hopefuly early next week (still looking into narrowing the optimization down).

Thanks,
Gil

gilr updated this revision to Diff 19651.Feb 10 2015, 2:03 AM
gilr edited edge metadata.

This patch modifies the SIGN_EXTEND_INREG case of SimplifyDemandedBits by:

  • Fixing the bug (by using the multi-use guarded) NewMask instead of DemandedMask.
  • Avoiding simplifying to shl when the sext-in-reg is known to already be sign-extended, as we can expect it to be removed altogether (by DAGCombiner::visitSIGN_EXTEND_INREG()).
qcolombet accepted this revision.Feb 16 2015, 11:06 AM
qcolombet edited edge metadata.

Hi Gil,

Thanks for your patience.

The patch does not impact the benchmarks in either direction.

LGTM, with a minor nitpick.

Thanks,
-Quentin

lib/CodeGen/SelectionDAG/TargetLowering.cpp
800 ↗(On Diff #19651)

Variable names start with a capital letter.

This revision is now accepted and ready to land.Feb 16 2015, 11:06 AM
This revision was automatically updated to reflect the committed changes.
gilr added a comment.Feb 18 2015, 7:02 AM

Hi Quentin,

Thanks a lot for helping with this!

One issue that still bothers me has to do with the code in X86ISelLoweing.cpp that handles the multiple uses for this optimization when Cond != TLO.Old:

  • Now that SimplifyDemandedBits should not simplify multi-used values, is that code still needed?
  • If it is needed, can you please explain why aren't the uses of TLO.Old being checked (for being VSELECT) and replaced? Isn't it the boolean property of TLO.Old that was damaged by the bit simplification?

Gil

  • Now that SimplifyDemandedBits should not simplify multi-used values, is that code still needed?

I believe this is not used anymore.

The select-avx.ll test case that changed that supposed to exerce that path. As you can see, this does not occur :).

  • If it is needed, can you please explain why aren't the uses of TLO.Old being checked (for being VSELECT) and replaced?

Wasn’t it what was happening?
Anyway, no used anymore.

Q.