This is an archive of the discontinued LLVM Phabricator instance.

Add constant combines for `(urem/srem (mul X, Y), (mul X, Z))`
ClosedPublic

Authored by goldstein.w.n on Jan 31 2023, 1:57 PM.

Details

Summary

We can handle the following cases + some nsw/nuw flags:

(srem (mul X, Y), (mul X, Z))

[If `srem(Y, Z) == 0`]
    -> 0
        - https://alive2.llvm.org/ce/z/PW4XZ-
[If `srem(Y, Z) == Y`]
    -> `(mul nuw nsw X, Y)`
        - https://alive2.llvm.org/ce/z/DQe9Ek
    -> `(mul nsw X, Y)`
        - https://alive2.llvm.org/ce/z/Nr_MdH

[If `Y`/`Z` are constant]
    -> `(mul/shl nuw nsw X, (srem Y, Z))`
        - https://alive2.llvm.org/ce/z/ccTFj2
        - https://alive2.llvm.org/ce/z/i_UQ5A
    -> `(mul/shl nsw X, (srem Y, Z))`
        - https://alive2.llvm.org/ce/z/mQKc63
        - https://alive2.llvm.org/ce/z/uERkKH

(urem (mul X, Y), (mul X, Z))

[If `urem(Y, Z) == 0`]
    -> 0
        - https://alive2.llvm.org/ce/z/LL7UVR
[If `srem(Y, Z) == Y`]
    -> `(mul nuw nsw X, Y)`
        - https://alive2.llvm.org/ce/z/9Kgs_i
    -> `(mul nuw X, Y)`
        - https://alive2.llvm.org/ce/z/ow9i8u

[If `Y`/`Z` are constant]
    -> `(mul nuw nsw X, (srem Y, Z))`
        - https://alive2.llvm.org/ce/z/mNnQqJ
        - https://alive2.llvm.org/ce/z/Bj_DR-
        - https://alive2.llvm.org/ce/z/X6ZEtQ
    -> `(mul nuw X, (srem Y, Z))`
        - https://alive2.llvm.org/ce/z/SJYtUV

The rationale for doing this all in InstCombine rather than handling
the constant mul cases in InstSimplify is we often create a new
instruction because we are able to deduce more nsw/nuw flags than
the original instruction had.

Diff Detail

Event Timeline

goldstein.w.n created this revision.Jan 31 2023, 1:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2023, 1:57 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
goldstein.w.n requested review of this revision.Jan 31 2023, 1:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2023, 1:57 PM

@nikic, I think I may have added too many tests. The issue is there are alot of edge cases around the nsw/nuw flags that I felt needed to be tested. If you see any that you think are redundant, however, am happy to drop.

This patch is quite difficult to review in it's current state. There are over 20 if statements with various levels of nesting in one function, with the largest if statement spanning 150 lines. I think some of these cases could be handled better with helper functions, where you could pass the binary operators through and deduce if the flags and operands are valid for a combine from within those functions. It's very difficult to keep track of which variables are which when they are named A, Y, BO1, CY, CZ, AY, AZ in combination with the long and layered if statements with verbose comments.

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1878–1883

This comment spans several lines and both variants describe basically the same thing. I think something like

// (rem (mul nuw/nsw X, Y), (mul X, Z))
//      if (rem Y, Z) == 0
//          -> 0

is probably enough.

1886–1892

You can simplify this comment to one case

1916–1918

I think the code is self explanatory and describes this better than the comment

1923–1934

This comment is very verbose, its longer than the if statement and difficult to understand. If there are exceptions for particular cases then you can probably describe them next to the cases. This may also make more sense to explain in a commit message or next to specific tests.

1940–1943

Comment can be simplified to one case.

This patch is quite difficult to review in it's current state. There are over 20 if statements with various levels of nesting in one function, with the largest if statement spanning 150 lines. I think some of these cases could be handled better with helper functions, where you could pass the binary operators through and deduce if the flags and operands are valid for a combine from within those functions. It's very difficult to keep track of which variables are which when they are named A, Y, BO1, CY, CZ, AY, AZ in combination with the long and layered if statements with verbose comments.

I'll cleanup the comments for V2 (and can also try and make the variable names more intuitive, thinking with A prefix -> APint, C prefix -> constant), but have trouble seeing how to add a useful helper. Each case is pretty specific both in the flags you need to check and the flags we can deduce for the output.

Where you thinking a helper for each case? Or something more generic?

Make comments less verbose
Make variable names more intuitive

goldstein.w.n marked 4 inline comments as done.Feb 2 2023, 2:33 PM

I'm thinking we could have a function which attempts to simplify irem/mul/shl patterns and returns successes if the simplification is possible. Similarly to how there are calls such as

if (Instruction *R = FoldOpIntoSelect(I, SI))
          return R;

inside commonIRemTransforms, we could have something along the lines of

if (Instruction *res = SimplifyIRemMulShl(I))
  return res;

at the end of commonIRemTransforms.

This would give us a lot more control over managing the cases and their bailout conditions separately and cleanly. I think it would also make it more obvious that we are just handling two cases primary cases as well, whereas its quite hard to parse and keep a mental note of the complicated if statements which are present currently.
I have had a go at rearranging the pre-combine logic:

Instruction *InstCombinerImpl::SimplifyIRemMulShl(BinaryOperator &I) {
  Value *Op0 = I.getOperand(0), *Op1 = I.getOperand(1);
  Value *A, *B, *C, *D;
  if (!(match(Op0, m_Mul(m_Value(A), m_Value(B))) ||
        match(Op0, m_Shl(m_Value(A), m_Value(B)))) ||
      !(match(Op1, m_Mul(m_Value(C), m_Value(D))) ||
        match(Op1, m_Shl(m_Value(C), m_Value(D)))))
    return nullptr;

  Value *X, *Y, *Z;
  X = nullptr;
  // Do this by hand as opposed to using m_Specific because either A/B (or
  // C/D) can be our X.
  if (A == C || A == D) {
    X = A;
    Y = B;
    Z = A == C ? D : C;
  } else if (B == C || B == D) {
    X = B;
    Y = A;
    Z = B == C ? D : C;
  }

  BinaryOperator *BO0 = dyn_cast<BinaryOperator>(Op0);
  BinaryOperator *BO1 = dyn_cast<BinaryOperator>(Op1);
  if (!BO0 || !BO1)
    return nullptr;

  Constant *CX = X ? dyn_cast<Constant>(X) : nullptr;
  if (!X || (CX && CX->isOneValue()))
    return nullptr;
    
  Type * Ty = I.getType();
  ConstantInt *ConstY = dyn_cast<ConstantInt>(Y);
  ConstantInt *ConstZ = dyn_cast<ConstantInt>(Z);
  if (Ty->isVectorTy()) {
    auto *VConstY = dyn_cast<Constant>(Y);
    auto *VConstZ = dyn_cast<Constant>(Z);
    if (VConstY && VConstZ) {
      VConstY = VConstY->getSplatValue();
      VConstZ = VConstZ->getSplatValue();
      if (VConstY && VConstZ) {
        ConstY = dyn_cast<ConstantInt>(VConstY);
        ConstZ = dyn_cast<ConstantInt>(VConstZ);
      }
    }
  }

  bool IsSigned = I.getOpcode() == Instruction::SRem;
  // Check constant folds first.
  if (ConstY && ConstZ) {
    APInt APIntY = ConstY->getValue();
    APInt APIntZ = ConstZ->getValue();

    // Just treat the shifts as mul, we may end up returning a mul by power
    // of 2 but that will be cleaned up later.
    if (BO0->getOpcode() == Instruction::Shl)
      APIntY = APInt(APIntY.getBitWidth(), 1) << APIntY;
    if (BO1->getOpcode() == Instruction::Shl)
      APIntZ = APInt(APIntZ.getBitWidth(), 1) << APIntZ;

    APInt RemYZ = IsSigned ? APIntY.srem(APIntZ) : APIntY.urem(APIntZ);

    // (rem (mul nuw/nsw X, Y), (mul X, Z))
    //      if (rem Y, Z) == 0
    //          -> 0
    if (RemYZ.isZero() &&
        (IsSigned ? BO0->hasNoSignedWrap() : BO0->hasNoUnsignedWrap()))
      return replaceInstUsesWith(I, ConstantInt::getNullValue(Ty));

    // (rem (mul X, Y), (mul nuw/nsw X, Z))
    //      if (rem Y, Z) == Y
    //          -> (mul nuw/nsw X, Y)
    if (RemYZ == APIntY &&
        (IsSigned ? BO1->hasNoSignedWrap() : BO1->hasNoUnsignedWrap())) {
      // We are returning Op0 essentially but we can also add no wrap flags.
      BinaryOperator *BO =
          BinaryOperator::CreateMul(X, ConstantInt::get(Ty, APIntY));
      // We can add nsw/nuw if remainder op is signed/unsigned, also we
      // can copy any overflow flags from Op0.
      if (IsSigned || BO0->hasNoSignedWrap())
        BO->setHasNoSignedWrap();
      if (!IsSigned || BO0->hasNoUnsignedWrap())
        BO->setHasNoUnsignedWrap();
      return BO;
    }

    // (rem (mul nuw/nsw X, Y), (mul {nsw} X, Z))
    //      if Y >= Z
    //          -> (mul {nuw} nsw X, (rem Y, Z))
    // NB: (rem Y, Z) is a constant.
    if (APIntY.uge(APIntZ) &&
        (IsSigned ? (BO0->hasNoSignedWrap() && BO1->hasNoSignedWrap())
                  : BO0->hasNoUnsignedWrap())) {
      BinaryOperator *BO =
          BinaryOperator::CreateMul(X, ConstantInt::get(Ty, RemYZ));
      BO->setHasNoSignedWrap();
      if (!IsSigned || BO0->hasNoUnsignedWrap())
        BO->setHasNoUnsignedWrap();
      return BO;
    }
  }

  // Check if desirable to do generic replacement.
  // NB: It may be beneficial to do this if we have X << Z even if there are
  // multiple uses of Op0/Op1 as it will eliminate the urem (urem of a power
  // of 2 is converted to add/and) and urem is pretty expensive (maybe more
  // sense in DAGCombiner).
  if ((ConstY && ConstZ) ||
      (Op0->hasOneUse() && Op1->hasOneUse() &&
       (IsSigned ? (BO0->getOpcode() != Instruction::Shl &&
                    BO1->getOpcode() != Instruction::Shl)
                 : (BO0->getOpcode() != Instruction::Shl ||
                    BO1->getOpcode() == Instruction::Shl)))) {
    // (rem (mul nuw/nsw X, Y), (mul nuw {nsw} X, Z)
    //        -> (mul nuw/nsw X, (rem Y, Z))
    if (IsSigned ? (BO0->hasNoSignedWrap() && BO1->hasNoSignedWrap() &&
                    BO1->hasNoUnsignedWrap())
                 : (BO0->hasNoUnsignedWrap() && BO1->hasNoUnsignedWrap())) {
      // Convert the shifts to multiplies, cleaned up elsewhere.
      if (BO0->getOpcode() == Instruction::Shl)
        Y = Builder.CreateShl(ConstantInt::get(Ty, 1), Y);
      if (BO1->getOpcode() == Instruction::Shl)
        Z = Builder.CreateShl(ConstantInt::get(Ty, 1), Z);
      BinaryOperator *BO = BinaryOperator::CreateMul(
          X, IsSigned ? Builder.CreateSRem(Y, Z) : Builder.CreateURem(Y, Z));

      if (IsSigned || BO0->hasNoSignedWrap() || BO1->hasNoSignedWrap())
        BO->setHasNoSignedWrap();
      if (!IsSigned || (BO0->hasNoUnsignedWrap() && BO1->hasNoUnsignedWrap()))
        BO->setHasNoUnsignedWrap();
      return BO;
    }
  }

  return nullptr;
}

This function could be split even further into SimplifyIRemMulShlConst and SimplifyIRemMulShlGeneric, or we could just leave the generic case in this function.

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1830–1833

I think we already know BO0 = X * Y and BO1 = X * Zby now from other comments and matches.

1838–1841

In my opinion these variables names are less descriptive than having the function calls inline

1846–1863

We can remove the else statement to the start here, and assign ConstY/ConstZ instead like so:

ConstantInt *ConstY = dyn_cast<ConstantInt>(Y);
ConstantInt *ConstZ = dyn_cast<ConstantInt>(Z);
if (Ty->isVectorTy()) {
  auto *VConstY = dyn_cast<Constant>(Y);
  auto *VConstZ = dyn_cast<Constant>(Z);
  if (VConstY && VConstZ) {
    VConstY = VConstY->getSplatValue();
    VConstZ = VConstZ->getSplatValue();
    if (VConstY && VConstZ) {
      ConstY = dyn_cast<ConstantInt>(VConstY);
      ConstZ = dyn_cast<ConstantInt>(VConstZ);
    }
  }
}

I think it would also be beneficial to split the Const and Generic cases into separate patches to simplify the review process

goldstein.w.n marked an inline comment as done.

Move to helper, split generic case

I'm thinking we could have a function which attempts to simplify irem/mul/shl patterns and returns successes if the simplification is possible. Similarly to how there are calls such as

I see, thought you meant function for each case, done in V3.

if (Instruction *R = FoldOpIntoSelect(I, SI))
          return R;

inside commonIRemTransforms, we could have something along the lines of

if (Instruction *res = SimplifyIRemMulShl(I))
  return res;

at the end of commonIRemTransforms.

This would give us a lot more control over managing the cases and their bailout conditions separately and cleanly. I think it would also make it more obvious that we are just handling two cases primary cases as well, whereas its quite hard to parse and keep a mental note of the complicated if statements which are present currently.
I have had a go at rearranging the pre-combine logic:

Instruction *InstCombinerImpl::SimplifyIRemMulShl(BinaryOperator &I) {
  Value *Op0 = I.getOperand(0), *Op1 = I.getOperand(1);
  Value *A, *B, *C, *D;
  if (!(match(Op0, m_Mul(m_Value(A), m_Value(B))) ||
        match(Op0, m_Shl(m_Value(A), m_Value(B)))) ||
      !(match(Op1, m_Mul(m_Value(C), m_Value(D))) ||
        match(Op1, m_Shl(m_Value(C), m_Value(D)))))
    return nullptr;

  Value *X, *Y, *Z;
  X = nullptr;
  // Do this by hand as opposed to using m_Specific because either A/B (or
  // C/D) can be our X.
  if (A == C || A == D) {
    X = A;
    Y = B;
    Z = A == C ? D : C;
  } else if (B == C || B == D) {
    X = B;
    Y = A;
    Z = B == C ? D : C;
  }

  BinaryOperator *BO0 = dyn_cast<BinaryOperator>(Op0);
  BinaryOperator *BO1 = dyn_cast<BinaryOperator>(Op1);
  if (!BO0 || !BO1)
    return nullptr;

  Constant *CX = X ? dyn_cast<Constant>(X) : nullptr;
  if (!X || (CX && CX->isOneValue()))
    return nullptr;
    
  Type * Ty = I.getType();
  ConstantInt *ConstY = dyn_cast<ConstantInt>(Y);
  ConstantInt *ConstZ = dyn_cast<ConstantInt>(Z);
  if (Ty->isVectorTy()) {
    auto *VConstY = dyn_cast<Constant>(Y);
    auto *VConstZ = dyn_cast<Constant>(Z);
    if (VConstY && VConstZ) {
      VConstY = VConstY->getSplatValue();
      VConstZ = VConstZ->getSplatValue();
      if (VConstY && VConstZ) {
        ConstY = dyn_cast<ConstantInt>(VConstY);
        ConstZ = dyn_cast<ConstantInt>(VConstZ);
      }
    }
  }

  bool IsSigned = I.getOpcode() == Instruction::SRem;
  // Check constant folds first.
  if (ConstY && ConstZ) {
    APInt APIntY = ConstY->getValue();
    APInt APIntZ = ConstZ->getValue();

    // Just treat the shifts as mul, we may end up returning a mul by power
    // of 2 but that will be cleaned up later.
    if (BO0->getOpcode() == Instruction::Shl)
      APIntY = APInt(APIntY.getBitWidth(), 1) << APIntY;
    if (BO1->getOpcode() == Instruction::Shl)
      APIntZ = APInt(APIntZ.getBitWidth(), 1) << APIntZ;

    APInt RemYZ = IsSigned ? APIntY.srem(APIntZ) : APIntY.urem(APIntZ);

    // (rem (mul nuw/nsw X, Y), (mul X, Z))
    //      if (rem Y, Z) == 0
    //          -> 0
    if (RemYZ.isZero() &&
        (IsSigned ? BO0->hasNoSignedWrap() : BO0->hasNoUnsignedWrap()))
      return replaceInstUsesWith(I, ConstantInt::getNullValue(Ty));

    // (rem (mul X, Y), (mul nuw/nsw X, Z))
    //      if (rem Y, Z) == Y
    //          -> (mul nuw/nsw X, Y)
    if (RemYZ == APIntY &&
        (IsSigned ? BO1->hasNoSignedWrap() : BO1->hasNoUnsignedWrap())) {
      // We are returning Op0 essentially but we can also add no wrap flags.
      BinaryOperator *BO =
          BinaryOperator::CreateMul(X, ConstantInt::get(Ty, APIntY));
      // We can add nsw/nuw if remainder op is signed/unsigned, also we
      // can copy any overflow flags from Op0.
      if (IsSigned || BO0->hasNoSignedWrap())
        BO->setHasNoSignedWrap();
      if (!IsSigned || BO0->hasNoUnsignedWrap())
        BO->setHasNoUnsignedWrap();
      return BO;
    }

    // (rem (mul nuw/nsw X, Y), (mul {nsw} X, Z))
    //      if Y >= Z
    //          -> (mul {nuw} nsw X, (rem Y, Z))
    // NB: (rem Y, Z) is a constant.
    if (APIntY.uge(APIntZ) &&
        (IsSigned ? (BO0->hasNoSignedWrap() && BO1->hasNoSignedWrap())
                  : BO0->hasNoUnsignedWrap())) {
      BinaryOperator *BO =
          BinaryOperator::CreateMul(X, ConstantInt::get(Ty, RemYZ));
      BO->setHasNoSignedWrap();
      if (!IsSigned || BO0->hasNoUnsignedWrap())
        BO->setHasNoUnsignedWrap();
      return BO;
    }
  }

  // Check if desirable to do generic replacement.
  // NB: It may be beneficial to do this if we have X << Z even if there are
  // multiple uses of Op0/Op1 as it will eliminate the urem (urem of a power
  // of 2 is converted to add/and) and urem is pretty expensive (maybe more
  // sense in DAGCombiner).
  if ((ConstY && ConstZ) ||
      (Op0->hasOneUse() && Op1->hasOneUse() &&
       (IsSigned ? (BO0->getOpcode() != Instruction::Shl &&
                    BO1->getOpcode() != Instruction::Shl)
                 : (BO0->getOpcode() != Instruction::Shl ||
                    BO1->getOpcode() == Instruction::Shl)))) {
    // (rem (mul nuw/nsw X, Y), (mul nuw {nsw} X, Z)
    //        -> (mul nuw/nsw X, (rem Y, Z))
    if (IsSigned ? (BO0->hasNoSignedWrap() && BO1->hasNoSignedWrap() &&
                    BO1->hasNoUnsignedWrap())
                 : (BO0->hasNoUnsignedWrap() && BO1->hasNoUnsignedWrap())) {
      // Convert the shifts to multiplies, cleaned up elsewhere.
      if (BO0->getOpcode() == Instruction::Shl)
        Y = Builder.CreateShl(ConstantInt::get(Ty, 1), Y);
      if (BO1->getOpcode() == Instruction::Shl)
        Z = Builder.CreateShl(ConstantInt::get(Ty, 1), Z);
      BinaryOperator *BO = BinaryOperator::CreateMul(
          X, IsSigned ? Builder.CreateSRem(Y, Z) : Builder.CreateURem(Y, Z));

      if (IsSigned || BO0->hasNoSignedWrap() || BO1->hasNoSignedWrap())
        BO->setHasNoSignedWrap();
      if (!IsSigned || (BO0->hasNoUnsignedWrap() && BO1->hasNoUnsignedWrap()))
        BO->setHasNoUnsignedWrap();
      return BO;
    }
  }

  return nullptr;
}

This function could be split even further into SimplifyIRemMulShlConst and SimplifyIRemMulShlGeneric, or we could just leave the generic case in this function.

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1830–1833

I think we already know BO0 = X * Y and BO1 = X * Zby now from other comments and matches.

Imo there is no real cost to the extra comment, can obv delete but seems like no harm for potential clarity.

1838–1841

In my opinion these variables names are less descriptive than having the function calls inline

Its mostly stylistic to save column width, with the full BO....->has...() makes the if conditions very large and imo harder to follow.

I think it would also be beneficial to split the Const and Generic cases into separate patches to simplify the review process

Done, see: D143417 for the generic, this patch now only handles the constant case.

The commit message needs cleaning up. There are references to cases based on having a single uses which are incorrectly left over from splitting up the patch. The alive tests would be easier to follow if each case contained strictly its necessary components instead of commenting out several cases.

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1713–1714

Value *X = nullptr, *Y, *Z;

1731

Use auto here since the type name is on the RHS of the assignment. This goes for all of the dyn_cast assignments below

1743

I don't think this variable gives us much benefit over simply I.getType()

1759

I think IsSRem is more descriptive and clearer to read later on. Now that we've removed the nested if statements we can move this assignment directly before its first use.

1760–1764

For these names maybe call them something like BO0HasNSW, BO1HasNSW to make it read a bit more literally and lessen the reader's memory overhead of what the 0 in NSW0 represents. We can move these assignments directly before their first use now.

1766

Comment can be removed since we're only handling the constant cases in this patch, and the code this is describing is self explanatory.

1792

This comment is noise as its just a verbal description of the one above

1795–1796

You can just keep '// Copy any overflow flags from Op0. The flag propagation is described above this if block and by the tests.

1807

This comment can be removed since we already know rem(Y, Z) is a constant as we've previously bailed out of the function with

if (!ConstY || !ConstZ)
  return nullptr;
1812

This can be simplified to NUW0 since the top level if statement makes it impossible for NUW0 to be false if IsSigned == false at this point

llvm/test/Transforms/InstCombine/urem-mul.ll
1 ↗(On Diff #495738)

This file is called urem-mul.ll but should be called rem-mul.ll or srem-urem-mul.ll since it has both urem and srem

41 ↗(On Diff #495738)

It would be good to have a positive test case where both sides of the rem are shl binops

103 ↗(On Diff #495188)

Is it correct to generate a nsw flag for this case? Running this case through alive (https://alive2.llvm.org/ce/z/j9NY-S) I get a timeout unless I disable undef inputs, whereas just generating nuw accepts the transform (https://alive2.llvm.org/ce/z/332-Em)

MattDevereau added inline comments.Feb 8 2023, 10:34 AM
llvm/test/Transforms/InstCombine/urem-mul.ll
103 ↗(On Diff #495188)

Phabricator marked this comment on the old revision, but it is still relevant.

goldstein.w.n marked 11 inline comments as done.Feb 8 2023, 12:04 PM
goldstein.w.n added inline comments.
llvm/test/Transforms/InstCombine/urem-mul.ll
103 ↗(On Diff #495188)

Make it i7 and it verifies: https://alive2.llvm.org/ce/z/5itDPy

Rebase test changes, fix many nits

goldstein.w.n added inline comments.Feb 8 2023, 12:06 PM
llvm/test/Transforms/InstCombine/urem-mul.ll
41 ↗(On Diff #495738)

We have @urem_XY_XZ_with_Y_Z_is_mul_X_RemYZ_with_nsw_out2

MattDevereau added inline comments.Feb 8 2023, 12:34 PM
llvm/test/Transforms/InstCombine/urem-mul.ll
103 ↗(On Diff #495188)

Fair enough, do you have any idea why it does this?

goldstein.w.n added inline comments.Feb 8 2023, 1:37 PM
llvm/test/Transforms/InstCombine/urem-mul.ll
103 ↗(On Diff #495188)

I think after its done all the symbolic simplifications, it does an exhaustive search through valid values. Smaller types have smaller exhaustive range so completes faster.

@goldstein.w.n Please update the commit message/description to properly reflect the patch after it was split (e.g. references to one use removed, an example of the Y >= Z transform etc)

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1705

A Variety of transforms. This comment can be moved above the function.

1726–1735
auto *BO0 = dyn_cast<BinaryOperator>(Op0);
auto *BO1 = dyn_cast<BinaryOperator>(Op1);
if (!X || !BO0 || !BO1)
  return nullptr;
1738

The * here is not needed

1742–1755

We can do

ConstantInt *ConstY, *ConstZ;
Constant *CY, *CZ;
if (!I.getType()->isVectorTy() || !match(Y, m_Constant(CY)) ||
    !match(CY->getSplatValue(), m_ConstantInt(ConstY)) ||
    !match(Z, m_Constant(CZ)) ||
    !match(CZ->getSplatValue(), m_ConstantInt(ConstZ)))
  if (!match(Y, m_ConstantInt(ConstY)) || !match(Z, m_ConstantInt(ConstZ)))
    return nullptr;

Which saves us a few lines, and this also lets us remove the later check for

if (!ConstY || !ConstZ)
  return nullptr;
1759

This comment isn't done, IsSRem is still defined far before it is first used

1760–1764

This comment isn't done, these variables are still defined far before they are used.

llvm/test/Transforms/InstCombine/rem-mul-shl.ll
42–51

Please add a scalable version of this test:

define <vscale x 16 x i8> @urem_XY_XZ_with_CY_rem_CZ_eq_0_scalable(<vscale x 16 x i8> %X) {
; CHECK-LABEL: @urem_XY_XZ_with_CY_rem_CZ_eq_0_scalable(
; CHECK-NEXT:    ret <vscale x 16 x i8> zeroinitializer
;
  %BO0 = mul nuw <vscale x 16 x i8> %X, shufflevector (<vscale x 16 x i8> insertelement (<vscale x 16 x i8> poison, i8 15, i64 0), <vscale x 16 x i8> poison, <vscale x 16 x i32> zeroinitializer)
  %BO1 = mul <vscale x 16 x i8> %X, shufflevector (<vscale x 16 x i8> insertelement (<vscale x 16 x i8> poison, i8 5, i64 0), <vscale x 16 x i8> poison, <vscale x 16 x i32> zeroinitializer)
  %r = urem <vscale x 16 x i8> %BO0, %BO1
  ret <vscale x 16 x i8> %r
}
232–252

Please add a scalable version of this test:

define <vscale x 16 x i8> @srem_XY_XZ_with_CY_rem_CZ_eq_0_scalable(<vscale x 16 x i8> %X) {
; CHECK-LABEL: @srem_XY_XZ_with_CY_rem_CZ_eq_0_scalable(
; CHECK-NEXT:    ret <vscale x 16 x i8> zeroinitializer
;
  %BO0 = mul nsw <vscale x 16 x i8> %X, shufflevector (<vscale x 16 x i8> insertelement (<vscale x 16 x i8> poison, i8 15, i64 0), <vscale x 16 x i8> poison, <vscale x 16 x i32> zeroinitializer)
  %BO1 = mul <vscale x 16 x i8> %X, shufflevector (<vscale x 16 x i8> insertelement (<vscale x 16 x i8> poison, i8 5, i64 0), <vscale x 16 x i8> poison, <vscale x 16 x i32> zeroinitializer)
  %r = srem <vscale x 16 x i8> %BO0, %BO1
  ret <vscale x 16 x i8> %r
}
goldstein.w.n edited the summary of this revision. (Show Details)Feb 9 2023, 8:53 AM
goldstein.w.n added inline comments.Feb 9 2023, 8:57 AM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1738

I prefer to keep * when its a pointer, think its clearer.

1742–1755

Prefer not because the generic patch doesn't want to bail out if we can't find constants.

1759

Prefer not to change because the generic patch needs it declared before the if(ConstY && ConstZ) {...}.

1760–1764

Prefer not to change because the generic patch needs it declared before the if(ConstY && ConstZ) {...}.

@goldstein.w.n Please update the commit message/description to properly reflect the patch after it was split (e.g. references to one use removed, an example of the Y >= Z transform etc)

Done, had updated commit, just didn't update summary.

goldstein.w.n marked 5 inline comments as done.Feb 9 2023, 9:14 AM
goldstein.w.n edited the summary of this revision. (Show Details)Feb 9 2023, 9:14 AM

Fix some nits + vscale tests

I think not implementing several of my suggestions because of a future patch is a mistake. I don't think they're nits and have some obvious benefits for readability and control flow, and I'm of the opinion that leaving code that anticipates future work that may or may not even land or be reverted is not ideal, so I'll leave this for someone else to approve.

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1738

nit: If you don't use auto to hide away small details like this for short lived variables then I don't see much reason to use it at all. If you combine the below block into match statements the fact that these are pointers becomes an unimportant detail.

I think not implementing several of my suggestions because of a future patch is a mistake. I don't think they're nits and have some obvious benefits for readability and control flow, and I'm of the opinion that leaving code that anticipates future work that may or may not even land or be reverted is not ideal, so I'll leave this for someone else to approve.

Sorry, missed this earlier, just seeing now.

If you insist, will update accordingly. Think this version has all your concerns addressed.

Rebase + fix nits

Rebase + fix nits (wrong patch last time)

goldstein.w.n marked an inline comment as done.Feb 13 2023, 7:42 PM
goldstein.w.n marked 3 inline comments as done.
goldstein.w.n added inline comments.Feb 13 2023, 7:48 PM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1742–1755

We can do

ConstantInt *ConstY, *ConstZ;
Constant *CY, *CZ;
if (!I.getType()->isVectorTy() || !match(Y, m_Constant(CY)) ||
    !match(CY->getSplatValue(), m_ConstantInt(ConstY)) ||
    !match(Z, m_Constant(CZ)) ||
    !match(CZ->getSplatValue(), m_ConstantInt(ConstZ)))
  if (!match(Y, m_ConstantInt(ConstY)) || !match(Z, m_ConstantInt(ConstZ)))
    return nullptr;

Irreverent of keeping aligned with the generic patch, I find that logic much harder to follow. There are alot of conditions and non-intuitive short-circuit requirements.

Which saves us a few lines, and this also lets us remove the later check for

if (!ConstY || !ConstZ)
  return nullptr;
MattDevereau accepted this revision.Feb 15 2023, 3:12 AM
This revision is now accepted and ready to land.Feb 15 2023, 3:12 AM

@MattDevereau thanks for completing the review, sorry about the friction earlier.

Any chance you can also review the generic impl: D143417

sdesmalen requested changes to this revision.Feb 16 2023, 8:31 AM

Hi @goldstein.w.n, I'm still going through the patch but already found that some code paths are currently untested. I am requesting changes to avoid you from landing it.
In general my preference would be to limit the cases your code handles, rather than adding more tests for more possible combinations of shifts/muls. That makes the code-changes easier to review and helps identify the test-coverage for the different code-paths you've added.

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1712–1713

Is that true? shl is not commutative, i.e. (a * b) == (b * a), but (a << b) != (b << a)

Also, this patch is already quite complicated, I would prefer to keep it simple at first and only match: urem/srem (mul/shl X, Y), (mul/shl X, Z)
where 'X' is matched with m_Specific. Then later patches can maybe relax those conditions further.

1714

When I comment this part of the condition out, all tests still pass.

1718

When I comment this part of the condition out, all tests still pass.

1718

When I comment this part of the condition out, all tests still pass

1733–1735

When I comment this code out, all tests still pass.

Can this case ever happen? (I would have expected InstCombine/InstSimplify to have folded that case away already)

This revision now requires changes to proceed.Feb 16 2023, 8:31 AM
sdesmalen added inline comments.Feb 16 2023, 9:21 AM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1724–1727

These can be cast<BinaryOperator> instead, since we know (from the match above) that they're either Mul or Shl, so they must be BinaryOperator's.

1737–1750

nit: You can write this a bit shorter like this:

ConstantInt *ConstY = nullptr, *ConstZ = nullptr;
if (I.getType()->isVectorTy()) {
  if (auto *CY = dyn_cast<Constant>(Y))
    ConstY = dyn_cast<ConstantInt>(CY->getSplatValue());
  if (auto *CZ = dyn_cast<Constant>(Z))
    ConstZ = dyn_cast<ConstantInt>(CZ->getSplatValue());
} else {
  ConstY = dyn_cast<ConstantInt>(Y);
  ConstZ = dyn_cast<ConstantInt>(Z);
}

Split of shift code + fix some nits

goldstein.w.n retitled this revision from [InstCombine] Add combines for `(urem/srem (mul/shl X, Y), (mul/shl X, Z))` to Add constant combines for `(urem/srem (mul X, Y), (mul X, Z))`.
goldstein.w.n edited the summary of this revision. (Show Details)
goldstein.w.n marked 2 inline comments as done.Feb 16 2023, 3:38 PM
goldstein.w.n added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1712–1713

Is that true? shl is not commutative, i.e. (a * b) == (b * a), but (a << b) != (b << a)

You are right. Made this patch only cover mul so its a non-issue here, but the shl patch (see D144225) cover it.

Also, this patch is already quite complicated, I would prefer to keep it simple at first and only match: urem/srem (mul/shl X, Y), (mul/shl X, Z)
where 'X' is matched with m_Specific. Then later patches can maybe relax those conditions further.

I tried doing this, but found it significantly more complicated because the m_Specific logic for both lhs/rhs as well as special cases for shl paired with mul. To try and simplify I spit off the shl patch (so hopefully easier to review). If you insist, however, it is implementable to match + m_Specific so LMK.

1718

When I comment this part of the condition out, all tests still pass

Added more tests that require this logic.

1718

When I comment this part of the condition out, all tests still pass.

Removed from this patch.

1733–1735

When I comment this code out, all tests still pass.

Can this case ever happen? (I would have expected InstCombine/InstSimplify to have folded that case away already)

Removed the BO0/BO1 checks, but !X can happen.

1737–1750

nit: You can write this a bit shorter like this:

ConstantInt *ConstY = nullptr, *ConstZ = nullptr;
if (I.getType()->isVectorTy()) {
  if (auto *CY = dyn_cast<Constant>(Y))
    ConstY = dyn_cast<ConstantInt>(CY->getSplatValue());
  if (auto *CZ = dyn_cast<Constant>(Z))
    ConstZ = dyn_cast<ConstantInt>(CZ->getSplatValue());
} else {
  ConstY = dyn_cast<ConstantInt>(Y);
  ConstZ = dyn_cast<ConstantInt>(Z);
}

Made it a lambda function.

goldstein.w.n marked an inline comment as done.Feb 16 2023, 3:38 PM
goldstein.w.n marked an inline comment as not done.
goldstein.w.n added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1714

When I comment this part of the condition out, all tests still pass.

Added additional tests.

Thanks for simplifying this patch a bit!

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1712–1713

To use m_Specific you can do the following:

Value *X = nullptr, *Y = nullptr, *Z = nullptr;
if (!match(Op0, m_Mul(m_Value(X), m_Value(Y))) &&
    !match(Op0, m_Shl(m_Value(X), m_Value(Y))))
  return nullptr;
if (!match(Op1, m_Mul(m_Specific(X), m_Value(Z))) &&
    !match(Op1, m_Shl(m_Specific(X), m_Value(Z))))
  return nullptr;

I prefer this explicit matching over trying assign X, Y and Z from A, B, C and D, since that logic tries to cover various different combinations and is harder to follow. Note that the version I wrote above also allows for the case where Y and Z are a Shl.

1726

No variables really need to be captured for this function. You can also write it more compactly as:

if (Op->getType()->isVectorTy())
  if (auto *COp = dyn_cast<Constant>(Op))
    return dyn_cast<ConstantInt>(COp->getSplatValue());
return dyn_cast<ConstantInt>(Op);
goldstein.w.n added inline comments.Feb 20 2023, 11:30 AM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1712–1713

To use m_Specific you can do the following:

Value *X = nullptr, *Y = nullptr, *Z = nullptr;
if (!match(Op0, m_Mul(m_Value(X), m_Value(Y))) &&
    !match(Op0, m_Shl(m_Value(X), m_Value(Y))))
  return nullptr;
if (!match(Op1, m_Mul(m_Specific(X), m_Value(Z))) &&
    !match(Op1, m_Shl(m_Specific(X), m_Value(Z))))
  return nullptr;

This doesn't quite work. It misses things like:

%a = mul i8 %Y, %X
%b = mul i8 %X, %Z

To do (just mul) with match it needs to be:

if ((match(Op0, m_Mul(m_Value(X), m_Value(Z))) &&
     match(Op1, m_c_Mul(m_Specific(X), m_Value(Y)))) ||
    (match(Op0, m_Mul(m_Value(Z), m_Value(X))) &&
     match(Op1, m_c_Mul(m_Specific(X), m_Value(Y)))))

To get shl will be 3x more of these (shl/mul, mul/shl, shl/shl).

I prefer this explicit matching over trying assign X, Y and Z from A, B, C and D, since that logic tries to cover various different combinations and is harder to follow. Note that the version I wrote above also allows for the case where Y and Z are a Shl.

Okay. Will change to use match.

goldstein.w.n added inline comments.Feb 20 2023, 12:17 PM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1726

No variables really need to be captured for this function. You can also write it more compactly as:

if (Op->getType()->isVectorTy())
  if (auto *COp = dyn_cast<Constant>(Op))
    return dyn_cast<ConstantInt>(COp->getSplatValue());
return dyn_cast<ConstantInt>(Op);

That doesn't work. If vec is a non-splat constant it will fault from dyn_cast<ConstantInt>(nullptr).

But it can be:

if (Op->getType()->isVectorTy())
  if (auto *COp = dyn_cast<Constant>(Op)) {
    auto *CSplat = COp->getSplatValue();
    return CSplat ? dyn_cast<ConstantInt>(CSplat) : nullptr;
  }
return dyn_cast<ConstantInt>(Op);
goldstein.w.n marked an inline comment as done.Feb 20 2023, 12:17 PM

Use match instead of handwritten logic

goldstein.w.n added inline comments.Feb 20 2023, 12:20 PM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1712–1713

To use m_Specific you can do the following:

Value *X = nullptr, *Y = nullptr, *Z = nullptr;
if (!match(Op0, m_Mul(m_Value(X), m_Value(Y))) &&
    !match(Op0, m_Shl(m_Value(X), m_Value(Y))))
  return nullptr;
if (!match(Op1, m_Mul(m_Specific(X), m_Value(Z))) &&
    !match(Op1, m_Shl(m_Specific(X), m_Value(Z))))
  return nullptr;

I prefer this explicit matching over trying assign X, Y and Z from A, B, C and D, since that logic tries to cover various different combinations and is harder to follow. Note that the version I wrote above also allows for the case where Y and Z are a Shl.

No changes to results in the mul case, but this improves the codegen for the generic mul/shl case as the handwritten logic was just glossing over some handleable cases :)

See D144225 for all the match logic.

nikic added inline comments.Feb 20 2023, 1:29 PM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1721

Why doesn't this use match(Y, m_APInt(APIntY)) etc? As far as I can tell you don't use the ConstantInt itself, and m_APInt already handles splats.

1732

BinaryOperator -> OverflowingBinaryOperator to avoid constant expression crash.

goldstein.w.n added inline comments.Feb 20 2023, 1:56 PM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1721

Why doesn't this use match(Y, m_APInt(APIntY)) etc? As far as I can tell you don't use the ConstantInt itself, and m_APInt already handles splats.

That would work for this patch (can update if thats preference), but last in the series (D143417) generalizes this to non-constants so would have to return to this code either way.

goldstein.w.n marked an inline comment as done.Feb 20 2023, 2:06 PM

BinaryOperator -> OverflowingBinaryOperator

goldstein.w.n marked 9 inline comments as done.Feb 27 2023, 10:30 AM
sdesmalen added inline comments.Feb 28 2023, 3:46 AM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1721

Since we're reviewing this patch, and not D143417, I would prefer you to use m_APInt here. D143417 can change it back if that's what it needs to do.

1723

nit: Please move this closer to its use. Also, is it worth calling it IsSigned?

1738

There is an implied condition that if RemYZ = 0, then Y >= Z, consequently mul X, Z cannot overflow if mul X, Y cannot overflow.

I think the code would be easier to follow if you do something like this:

bool LHSNoWrap = IsSRem ? BO0HasNSW : BO0HasNUW;
bool RHSNoWrap = IsSRem ? BO1HasNSW : BO1HasNUW;
if (Y >= Z && LHSNoWrap) {
  // Handle case for RemYZ == 0 and Y >= Z
} else if (Y < Z && RHSNoWrap) {
  // Handle case for RemYZ == Y
}
1761

if Y >= Z and mul X, Y doesn't overflow, then mul X, Z also can't overflow, so the test for B01HasNSW can be removed?

1767–1769

nit: It is easier to write:

BO->setHasNoSignedWrap(BO0HasNSW);
BO->setHasNoUnsignedWrap(BO0HasNUW);
goldstein.w.n marked 2 inline comments as done.Mar 3 2023, 1:18 PM

Sorry for the delay, didn't see your replies till just now :(

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1721

DOne, likewise for the shift patch.

1723

nit: Please move this closer to its use. Also, is it worth calling it IsSigned?

Moved closer, it was original IsSigned but matt asked for change to IsSRem. Personally I'm agnostic.

1738

There is an implied condition that if RemYZ = 0, then Y >= Z, consequently mul X, Z cannot overflow if mul X, Y cannot overflow.

the Y >= Z -> implied flags doesn't seem to verify in all cases i.e:
https://alive2.llvm.org/ce/z/bc8BXG
vs
https://alive2.llvm.org/ce/z/uERkKH

or
https://alive2.llvm.org/ce/z/BUuMfn
vs
https://alive2.llvm.org/ce/z/X6ZEtQ

I don't doubt that there may be more cases we can add, but would prefer to leave that as a todo.

I think the code would be easier to follow if you do something like this:

bool LHSNoWrap = IsSRem ? BO0HasNSW : BO0HasNUW;
bool RHSNoWrap = IsSRem ? BO1HasNSW : BO1HasNUW;
if (Y >= Z && LHSNoWrap) {
  // Handle case for RemYZ == 0 and Y >= Z
} else if (Y < Z && RHSNoWrap) {
  // Handle case for RemYZ == Y
}

Done.

1761

if Y >= Z and mul X, Y doesn't overflow, then mul X, Z also can't overflow, so the test for B01HasNSW can be removed?

Doesn't verify: https://alive2.llvm.org/ce/z/_yqMCk

goldstein.w.n marked an inline comment as done.

Fix many a nit

sdesmalen accepted this revision.Mar 10 2023, 5:36 AM
sdesmalen added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1738

That makes sense, my thinking here was guided by the unsigned case.

1739–1742

nit:

BO->setHasNoSignedWrap(IsSRem || BO0HasNSW);
BO->setHasNoUnsignedWrap(!IsSRem || BO0HasNUW);
This revision is now accepted and ready to land.Mar 10 2023, 5:36 AM
goldstein.w.n marked 2 inline comments as done.Mar 13 2023, 12:10 AM
goldstein.w.n added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1739–1742

Fixed, any chance you can also checkout D143417 and D144225?

I'm happy to look at one of the other patches. Can you land this patch in the meantime?

This revision was landed with ongoing or failed builds.Mar 16 2023, 11:02 AM
This revision was automatically updated to reflect the committed changes.
goldstein.w.n marked an inline comment as done.