Page MenuHomePhabricator

[InstCombine] Fold shift-select if all the operands are constant
Needs ReviewPublic

Authored by fvrmatteo on Mon, Jul 27, 8:20 AM.

Details

Summary

The following example shows that the lshr instruction is not properly folding, even though the involved operands (select and shift amount) are constant and could therefore be optimised into a single select instruction with shifted operands. Testing the optimisation with Alive2 seems to confirm the correctness.

I'm not familiar enough with the InstCombine code (yet), but it looked reasonable to call the SimplifySelectsFeedingBinaryOp method while visiting the shift instructions and enable it to handle the case where all the operands are constant, although I think this case could be handled differently because I noticed that changing the lshr instruction with other instructions (shl, xor) was properly folding the code (changing the value of the final constants as expected).

Diff Detail

Event Timeline

fvrmatteo created this revision.Mon, Jul 27, 8:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptMon, Jul 27, 8:20 AM
fvrmatteo updated this revision to Diff 280936.Mon, Jul 27, 9:00 AM

Applied formatting missing from one file.

fvrmatteo added reviewers: nikic, lebedev.ri. fvrmatteo removed 1 blocking reviewer(s): majnemer.Mon, Jul 27, 11:49 AM

It looks like currently we only transform this for the single-use case, but not the multi-use case: https://godbolt.org/z/9oYeEq

fvrmatteo added a comment.EditedMon, Jul 27, 12:59 PM

I thought the same when I first saw those calls to hasOneUse, but if the code to handle the multiple users case that I added to InstructionCombining.cpp is kept and the calls to SimplifySelectsFeedingBinaryOp are removed from InstCombineShifts.cpp, the simplification won't work anymore.

In the example @nikic just shared, the single use-case it's handled by commonShiftTransforms -> FoldShiftByConstant (and not by SimplifySelectsFeedingBinaryOp, which is never called for the shifts, even though it would be correct to do so, I think), but that doesn't seem to handle in a meaningful way the case where the incoming value to the shift it's used more than once.

Adding more context to the missed optimisation.

FoldShiftByConstant is calling canEvaluateShifted and internally that's checking if the instruction has a single use:

// We can't mutate something that has multiple uses: doing so would
// require duplicating the instruction in general, which isn't profitable.
if (!I->hasOneUse()) return false;

This is the check that lets the optimisation to be properly applied when a single use is present and blocks it when more uses are present. I therefore reverted my changes and tried to disable that check, resulting in the optimisation being applied properly:

ICE: GetShiftedValue propagating shift through expression to eliminate shift:
  IN:   %v0 = select i1 %arg, i64 64, i64 0
  SH:   %v1 = lshr exact i64 %v0, 6
IC: ADD:   %v0 = select i1 %arg, i64 64, i64 0
IC: Replacing   %v1 = lshr exact i64 %v0, 6
    with   %v0 = select i1 %arg, i64 1, i64 0

But resulting in a final LLVM-IR which is wrong:

; ModuleID = 'lshr-select-const.ll'
source_filename = "lshr-select-const.ll"

define i64 @fold_lshr(i1 %arg) {
entry:
  %v0 = zext i1 %arg to i64
  %0 = or i64 %v0, 8589934590
  %v2.neg = add nuw nsw i64 %0, 1
  %v6 = and i64 %v2.neg, 5369111591
  ret i64 %v6
}

I think that's the hasOneUse call that we are looking for, but simply disabling it is a non-solution because we run into wrong code and we need to take into account the instruction duplication.

Although I think checking for constant operands before checking for hasOneUse may be necessary because that's not going to result into duplication.

It's my first investigation of a missed optimisation, I'm sorry for the increasing context.

But resulting in a final LLVM-IR which is wrong:

That doesn't sound good.
Can you reduce the bitwidth of the testcase down to i8 at most?

I reduced the bit-width to i8 and the same issue is happening when I only disable the if (!I->hasOneUse()) return false; line from canEvaluateShifted. Is there a chance that disabling the check is not sufficient but that would actually require more changes?

Attached you can find the narrowed down test case executed with the disabled line and its debug output.

lebedev.ri requested changes to this revision.Tue, Jul 28, 2:10 AM

I reduced the bit-width to i8 and the same issue is happening when I only disable the if (!I->hasOneUse()) return false; line from canEvaluateShifted. Is there a chance that disabling the check is not sufficient but that would actually require more changes?

Attached you can find the narrowed down test case executed with the disabled line and its debug output.

Please just post IR inline, it's super inconvient to extract it from attachmensts
As i suspected, the alive-tv result is red herring:

$ /repositories/alive2/build-Clang-release/alive-tv /tmp/test.ll 

----------------------------------------
define i8 @src(i1 %arg) {
%entry:
  %v0 = select i1 %arg, i8 64, i8 0
  %v1 = lshr exact i8 %v0, 6
  %v2 = xor i8 %v1, 1
  %v3 = shl nuw i8 %v0, 57
  %v4 = ashr exact i8 %v3, 63
  %v5 = sub nsw i8 0, %v2
  %v6 = and i8 %v5, 91
  %v7 = and i8 %v4, 61
  %v8 = add nsw nuw i8 %v7, %v6
  ret i8 %v8
}
=>
define i8 @tgt(i1 %arg) {
%entry:
  %v0 = zext i1 %arg to i8
  %0 = or i8 %v0, 126
  %v2.neg = add nuw i8 %0, 1
  %v6 = and i8 %v2.neg, 91
  ret i8 %v6
}
Transformation seems to be correct!

It is generally not correct to specify --disable-undef-input / --disable-poison-input.
If this optimization can be achieved by relaxing existing optimizations, let's do that first.

This revision now requires changes to proceed.Tue, Jul 28, 2:10 AM
fvrmatteo added a comment.EditedTue, Jul 28, 2:30 AM

Please just post IR inline, it's super inconvient to extract it from attachmensts

Took note of this, sorry!

As i suspected, the alive-tv result is red herring

Is it? I may be doing something wrong but if I take the optimised code for the i64 version:

define i64 @fold_lshr(i1 %arg) {
entry:
  %v0 = zext i1 %arg to i64
  %0 = or i64 %v0, 8589934590
  %v2.neg = add nuw nsw i64 %0, 1
  %v6 = and i64 %v2.neg, 5369111591
  ret i64 %v6
}

and substitute the values false and true to %arg I get wrong values (5369111591 and 0) instead of the expected values (5369111591 and 5369111361) that are obtained if false and true are substituted in the unoptimised code.

I'm confused now, I don't know if I'm seeing things or doing wrong assumptions. Would you be able to prove the i64 example with Alive2 (with no --disable-undef-input) or to replicate my tests with false and true, please?

Edit: link showing the differences between true and false inputs on the i64 version of the function: https://godbolt.org/z/7ez4cW.

@lebedev.ri I just realised that obviously narrowing the problem from i64 to i8 will break it. I'll gladly relax the existing optimisation if possible and submit a new patch, but to me it's clear that there's something wrong going on with the i64 optimisation. I'm now trying to compile Alive2 with the trunk version of LLVM, but I would wait for your confirmation on the i64 case (with Alive2 or input values if possible) before updating the current patch.

@lebedev.ri I just realised that obviously narrowing the problem from i64 to i8 will break it. I'll gladly relax the existing optimisation if possible and submit a new patch, but to me it's clear that there's something wrong going on with the i64 optimisation. I'm now trying to compile Alive2 with the trunk version of LLVM, but I would wait for your confirmation on the i64 case (with Alive2 or input values if possible) before updating the current patch.

I originally tried to proof i64 variant, and alive OOM's on it.
I'm not really sure what is going on there.

I'm not really familiar with canEvaluateShifted(),
but i will be really surprised if that one-use check
is there for correctness of the result.

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1052–1055

I would think SimplifySelectsFeedingBinaryOp() should be in commonShiftTransforms().

fvrmatteo added a comment.EditedTue, Jul 28, 6:44 AM

OK I think I figured it out.

The hasOneUse in canEvaluateShifted is absolutely necessary because after its evaluation by FoldShiftByConstant the original select instruction it's replaced with a new shifted select instruction, which means that the original users of %v0 = select i1 %arg, i64 64, i64 0 will now use its shifted version %v0 = select i1 %arg, i64 1, i64 0 generated by FoldShiftByConstant.

The relevant code is the following:

// See if we can propagate this shift into the input, this covers the trivial
// cast of lshr(shl(x,c1),c2) as well as other more complex cases.
if (I.getOpcode() != Instruction::AShr &&
    canEvaluateShifted(Op0, Op1C->getZExtValue(), isLeftShift, *this, &I)) {
  LLVM_DEBUG(
      dbgs() << "ICE: GetShiftedValue propagating shift through expression"
                " to eliminate shift:\n  IN: "
             << *Op0 << "\n  SH: " << I << "\n");

  return replaceInstUsesWith(
      I, getShiftedValue(Op0, Op1C->getZExtValue(), isLeftShift, *this, DL));  // NOTE: here we are replacing the instruction with more than 1 use
}

In fact getShiftedValue tampers with Op0 which in our case it's the original select that if it's not unique will result in wrong values being propagated:

case Instruction::Select:
    I->setOperand(
        1, getShiftedValue(I->getOperand(1), NumBits, isLeftShift, IC, DL));
    I->setOperand(
        2, getShiftedValue(I->getOperand(2), NumBits, isLeftShift, IC, DL));
    return I;

From the debug output we can see that disabling if (!I->hasOneUse()) return false; leads to the following (wrong) sequences:

Substitution of %v0 = select i1 %arg, i64 64, i64 0 with %v0 = select i1 %arg, i64 1, i64 0:

IC: Visiting:   %v1 = lshr exact i64 %v0, 6
ICE: GetShiftedValue propagating shift through expression to eliminate shift:
  IN:   %v0 = select i1 %arg, i64 64, i64 0
  SH:   %v1 = lshr exact i64 %v0, 6
IC: ADD:   %v0 = select i1 %arg, i64 64, i64 0
IC: Replacing   %v1 = lshr exact i64 %v0, 6
    with   %v0 = select i1 %arg, i64 1, i64 0
IC: Mod =   %v1 = lshr exact i64 %v0, 6
    New =   %v1 = lshr exact i64 %v0, 6
IC: ERASE   %v1 = lshr exact i64 %v0, 6
IC: ADD DEFERRED:   %v0 = select i1 %arg, i64 1, i64 0

Transformation of %v0 = select i1 %arg, i64 1, i64 0 in %v0 = zext i1 %arg to i64:

IC: Visiting:   %v0 = select i1 %arg, i64 1, i64 0
IC: Old =   %v0 = select i1 %arg, i64 1, i64 0
    New =   <badref> = zext i1 %arg to i64
IC: ADD:   %v0 = zext i1 %arg to i64
IC: ERASE   %0 = select i1 %arg, i64 1, i64 0
IC: Visiting:   %v0 = zext i1 %arg to i64

Wrong input to %v3 = shl nuw i64 %v0, 57 that will now use %v0 = zext i1 %arg to i64 instead of the original %v0 = select i1 %arg, i64 64, i64 0:

IC: Visiting:   %v2 = xor i64 %v0, 1
IC: Visiting:   %v3 = shl nuw i64 %v0, 57
IC: Mod =   %v3 = shl nuw i64 %v0, 57
    New =   %v3 = shl nuw nsw i64 %v0, 57
IC: ADD:   %v3 = shl nuw nsw i64 %v0, 57
IC: Visiting:   %v3 = shl nuw nsw i64 %v0, 57
IC: Old =   %v3 = shl nuw nsw i64 %v0, 57
    New =   <badref> = select i1 %arg, i64 144115188075855872, i64 0
IC: ADD:   %v3 = select i1 %arg, i64 144115188075855872, i64 0

@lebedev.ri I think that calling the patched version of SimplifySelectsFeedingBinaryOp (issuing a direct call from commonShiftTransforms as you suggested) may be the safest solution, as it is meant to precisely simplify this case. Additionally SimplifySelectsFeedingBinaryOp will replace the lshr instruction (which is the correct one to be simplified) instead of the select (used as input).

I also think that we may try to modify the canEvaluateShifted and FoldShiftByConstant to handle the multiple-uses case, but that seems way more complex as the code around those functions it's really one-use centric.

nikic added a comment.Wed, Jul 29, 9:12 AM

Okay, after looking at this again I think you're on the right track here.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
883

I have two questions here:

First, can this change be tested independently of the change to shift operators? This looks pretty orthogonal and should be testable through existing uses of SimplifySelectsFeedingBinaryOp.

Second, I don't think that checking for the constant case specifically is the right thing to do. It should be mentioned that just dropping the hasOneUse() checks above would be correct and end up replacing a binop with a select -- and I guess a canonicality decision has been made here to not do that.

If we want to still allow the replacement if the select has constant operands (which seems to be the intention here), then we should drop the hasOneUse() checks above, and instead check whether either the select has one use, or True and False folded to constants. This may occur even if the original operands were not constants.

fvrmatteo updated this revision to Diff 281868.Thu, Jul 30, 4:00 AM

@nikic I tried to oblige to your comment and come up with a check in SimplifySelectsFeedingBinaryOp to verify if we are in the condition where either the instruction has one use or True and False fold into constants.

I'm not sure I understood the question about the addition of the test file. Technically this is an issue that I noticed when %v1 is ashr/lshr/sub, but for example the same case with add/xor/or/shl properly folds with no modifications whatsoever. Hence my original confusion and the test that we carried on with @lebedev.ri about the hasOneUse call part of the canEvaluateShifted function.

fvrmatteo marked an inline comment as done.EditedThu, Jul 30, 9:16 AM

Also, calling TempTrue->deleteValue(); and TempFalse->deleteValue(); if the values are not being used because the optimisation is not doable (conditions are not met) should be the right thing to do to avoid leaking, right?

Edit: Although using it seems to trigger a crash when trying to delete a llvm::Constant: constants should be destroyed with destroyConstant. I'll wait for more inputs before proceeding with this.