This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1088	This match code is basically identical to `foldSquareSumInts`. The only difference other than `FMul` vs `Mul` is you do match `FMul(A, 2)` for floats and `m_Shl(A, 1)` for ints. Can you make the match code a helper that takes either fmul/2x matcher (or just lambda wrapping) so it can be used for SumFloat / SumInt?

rainerzufalldererste added inline comments.Aug 16 2023, 1:51 PM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1088	Does that imply that `m_c_FAdd` can simply be replaced with `m_c_Add` and will continue to match properly for floating point values as well? I presume that would entail partially matching another pattern and then deferring the actual check for the mul2 match, as `BinaryOp_match<RHS, LHS, OpCode>` would have different `OpCode`s for `FMul` and `Shl`, which sounds like a huge mess to me; or is there a cleaner way to do that? Something like this sadly doesn't compile (as the lambda return type is ambiguous): const auto FpMul2Matcher = [](auto &value) { return m_FMul(value, m_SpecificFP(2.0)); }; const auto IntMul2Matcher = [](auto &value) { return m_Shl(value, m_SpecificInt(1)); }; const auto Mul2Matcher = FP ? FpMul2Matcher : IntMul2Matcher;

rainerzufalldererste added inline comments.Aug 16 2023, 2:25 PM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

1088

Even something like this shouldn't work.

template <typename TMul2, typename TCAdd, typename TMul>
static bool MatchesSquareSum(BinaryOperator &I, Value *&A, Value *&B,
                             const TMul2 &Mul2, const TCAdd &CAdd,
                             const TMul &Mul) {

  // (a * a) + (((a * 2) + b) * b)
  bool Matches =
      match(&I, CAdd(m_OneUse(Mul(m_Value(A), m_Deferred(A))),
                     m_OneUse(Mul(CAdd(Mul2(m_Deferred(A)), m_Value(B)),
                                  m_Deferred(B)))));

  // ((a * b) * 2)  or ((a * 2) * b)
  // +
  // (a * a + b * b) or (b * b + a * a)
  if (!Matches) {
    Matches =
        match(&I, CAdd(m_CombineOr(m_OneUse(Mul2(Mul(m_Value(A), m_Value(B)))),
                                   m_OneUse(Mul(Mul2(m_Value(A)), m_Value(B)))),
                       m_OneUse(CAdd(Mul(m_Deferred(A), m_Deferred(A)),
                                     Mul(m_Deferred(B), m_Deferred(B))))));
  }

  return Matches;
}

I agree that it's messy to have duplicate code, but with the way op-codes are used as template parameters I don't see a way without template specialization to do this nicely; and with template specialization it's even more of a beast.
Am I missing some obvious way built into llvm/InstCombine to do this nicely?

goldstein.w.n added inline comments.Aug 16 2023, 5:07 PM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1088	Why doesn't that code work?

rainerzufalldererste added a subscriber: nikic.Aug 17 2023, 4:20 AM

rainerzufalldererste added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1088	Assuming `TMul2` etc. to be a lambda, the return type couln't be consistent, as for both `m_FMul` and `m_Shl` it'd be `BinaryOp_match<RHS, LHS, OpCode>`, with the same `OpCode` for each invocation, but different `RHS` and `LHS`. One could make this work with macros, but I don't know the LLVM stance on macros, or with templace specialization, where there'd be a specialized struct with three functions (`Mul2`, `Mul`, `CAdd`) that simply map to the correct functions for `FAdd`/`Add` etc. However, I honestly think that the current implementation is the cleanest way to do it. I'm also not a big fan of code duplication, but the discussed alternatives seem a lot messier to me.

rainerzufalldererste marked an inline comment as done.Aug 18 2023, 5:02 PM

rainerzufalldererste added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1088	Have you been able to come up with some better ideas? Maybe it's not _that_ terrible to go down the template specialization route, as many of the integer optimizations may have similar counterparts in FP with `nsz` and `reassoc`. Not sure how many of them are already handled twice, but there's a chance one could simplify this process by providing template specialized `m_XAdd<IsFP>(LHS, RHS)` etc. However, I'm not sure if I'm the right person to pass judgement on something that large, as I'm still very new to both LLVM and InstCombine.

goldstein.w.n added inline comments.Aug 18 2023, 10:51 PM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1088	Assuming `TMul2` etc. to be a lambda, the return type couln't be consistent, as for both `m_FMul` and `m_Shl` it'd be `BinaryOp_match<RHS, LHS, OpCode>`, with the same `OpCode` for each invocation, but different `RHS` and `LHS`. One could make this work with macros, but I don't know the LLVM stance on macros, or with templace specialization, where there'd be a specialized struct with three functions (`Mul2`, `Mul`, `CAdd`) that simply map to the correct functions for `FAdd`/`Add` etc. For the TMul2 don't you only need a single Value? Instead of passing a BinaryOperator, you could just pass a lambda i.e: auto FPMul2 = [](Value & A) { return match(m_FMul(m_Value(A), m_SpecificFP(2)); }; ... auto IntMul2 = [](Value &A) { return match(m_Shl(m_Value(A), m_SpecificInt(1)); }; Don't see why the same isn't true for mul/add (although two values then). However, I honestly think that the current implementation is the cleanest way to do it. I'm also not a big fan of code duplication, but the discussed alternatives seem a lot messier to me.

rainerzufalldererste marked an inline comment as done.Aug 19 2023, 7:24 AM

rainerzufalldererste added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

1088

Regarding the LHS and RHS, you are correct, I misspoke. The OpCode and RHS are consistent, but LHS isn't. There are multiple cases where TMul2 is used:

Mul2(m_Deferred(A)
Mul2(Mul(m_Value(A), m_Value(B))
Mul2(m_Value(A))

All of these parameters have different types, therefore the return type of this lambda would also be different in every case. So if the parameter were Value *&, this wouldn't be a problem at all, but that's simply not the case. Is there a way to cast these types to Value *& somehow (without capturing them separately and then matching things again against the sub-match-lambda)?

mDeferred returns deferredval_ty<Value>.
Mul(m_Value(), m_Value() returns either BinaryOp_match<bind_ty<Value>, bind_ty<Value>, Instruction::FMul> or BinaryOp_match<bind_ty<Value>, bind_ty<Value>, Instruction::Mul>.
m_Value returns bind_ty<Value>.

These types aren't compatible, so the template can't deduce a consistent type even from auto-parameter lambdas. Same with Mul & CAdd.

Apart from that, I'm a bit confused about the match in your comment, as that's not quite applicable, unless we're previously matching parts of the match and then checking them against this follow-up matcher lambda, which - even if we were to do that - would end up in a large mess, as that's not only the case with Mul2, but also CAdd & Mul then, turning these two large matches into a ton of tiny matches.

Otherwise, I'm not quite sure why I'm explaining compilation errors here, unless I'm missing something very obvious or am completely missing the point.

This, however, isn't valid C++ code:

template <typename TMul2, typename TCAdd, typename TMul>
static std::tuple<bool, Value *, Value *>
MatchesSquareSum(BinaryOperator &I, const TMul2 &Mul2, const TCAdd &CAdd,
                 const TMul &Mul) {
  Value *A, *B;

  // (a * a) + (((a * 2) + b) * b)
  if (match(&I, CAdd(m_OneUse(Mul(m_Value(A), m_Deferred(A))),
                     m_OneUse(Mul(CAdd(Mul2(m_Deferred(A)), m_Value(B)),
                                  m_Deferred(B))))))
    return std::make_tuple(true, A, B);

  // ((a * b) * 2)  or ((a * 2) * b)
  // +
  // (a * a + b * b) or (b * b + a * a)
  return std::make_tuple(
      match(&I, CAdd(m_CombineOr(m_OneUse(Mul2(Mul(m_Value(A), m_Value(B)))),
                                 m_OneUse(Mul(Mul2(m_Value(A)), m_Value(B)))),
                     m_OneUse(CAdd(Mul(m_Deferred(A), m_Deferred(A)),
                                   Mul(m_Deferred(B), m_Deferred(B)))))),
      A, B);
}

// Fold variations of a^2 + 2*a*b + b^2 -> (a + b)^2
// if `FP`: requires `nsz` and `reassoc`.
Instruction *InstCombinerImpl::foldSquareSum(BinaryOperator &I, const bool FP) {
  if (FP) {
    assert(I.hasAllowReassoc() && I.hasNoSignedZeros() &&
           "Assumption mismatch");
  }

  std::tuple<bool, Value *, Value *> Match;

  if (FP) {
    Match = MatchesSquareSum(
        I, [](auto &V) { return m_FMul(V, m_SpecificFP(2.0)); },
        [](auto &L, auto &R) { return m_c_FAdd(L, R); },
        [](auto &L, auto &R) { return m_FMul(L, R); });
  } else {
    Match = MatchesSquareSum(
        I, [](auto &V) { return m_Shl(V, m_SpecificInt(1)); },
        [](auto &L, auto &R) { return m_c_Add(L, R); },
        [](auto &L, auto &R) { return m_Mul(L, R); });
  }

  // if one of them matches: -> (a + b)^2
  if (std::get<0>(Match)) {
    Value *AB =
        Builder.CreateFAddFMF(std::get<1>(Match), std::get<2>(Match), &I);
    return BinaryOperator::CreateFMulFMF(AB, AB, &I);
  }

  return nullptr;
}

This _is_ valid C++ code, but uses template specialization to get around the previous type-ambiguity issues:

template <bool IsFP> struct XMul;

template <> struct XMul<false> {
  template <typename LHS, typename RHS>
  inline auto operator()(const LHS &L, const RHS &R) const {
    return m_Mul(L, R);
  }
};

template <> struct XMul<true> {
  template <typename LHS, typename RHS>
  inline auto operator()(const LHS &L, const RHS &R) const {
    return m_FMul(L, R);
  }
};

template <bool IsFP> struct XCAdd;

template <> struct XCAdd<false> {
  template <typename LHS, typename RHS>
  inline auto operator()(const LHS &L, const RHS &R) const {
    return m_c_Add(L, R);
  }
};

template <> struct XCAdd<true> {
  template <typename LHS, typename RHS>
  inline auto operator()(const LHS &L, const RHS &R) const {
    return m_c_FAdd(L, R);
  }
};

template <bool IsFP> struct XMul2;

template <> struct XMul2<false> {
  template <typename LHS> inline auto operator()(const LHS &L) const {
    return m_Shl(L, m_SpecificInt(1));
  }
};

template <> struct XMul2<true> {
  template <typename LHS> inline auto operator()(const LHS &L) const {
    return m_FMul(L, m_SpecificFP(2.0));
  }
};

template <typename TMul2, typename TCAdd, typename TMul>
static std::tuple<bool, Value *, Value *>
MatchesSquareSum(BinaryOperator &I, const TMul2 &Mul2, const TCAdd &CAdd,
                 const TMul &Mul) {
  Value *A, *B;

  // (a * a) + (((a * 2) + b) * b)
  if (match(&I, CAdd(m_OneUse(Mul(m_Value(A), m_Deferred(A))),
                     m_OneUse(Mul(CAdd(Mul2(m_Deferred(A)), m_Value(B)),
                                  m_Deferred(B))))))
    return std::make_tuple(true, A, B);

  // ((a * b) * 2)  or ((a * 2) * b)
  // +
  // (a * a + b * b) or (b * b + a * a)
  return std::make_tuple(
      match(&I, CAdd(m_CombineOr(m_OneUse(Mul2(Mul(m_Value(A), m_Value(B)))),
                                 m_OneUse(Mul(Mul2(m_Value(A)), m_Value(B)))),
                     m_OneUse(CAdd(Mul(m_Deferred(A), m_Deferred(A)),
                                   Mul(m_Deferred(B), m_Deferred(B)))))),
      A, B);
}

// Fold variations of a^2 + 2*a*b + b^2 -> (a + b)^2
// if `FP`: requires `nsz` and `reassoc`.
Instruction *InstCombinerImpl::foldSquareSum(BinaryOperator &I, const bool FP) {
  if (FP) {
    assert(I.hasAllowReassoc() && I.hasNoSignedZeros() &&
           "Assumption mismatch");
  }

  const std::tuple<bool, Value *, Value *> Match =
      FP ? MatchesSquareSum(I, XMul2<true>(), XCAdd<true>(), XMul<true>())
         : MatchesSquareSum(I, XMul2<false>(), XCAdd<false>(), XMul<false>());

  // if one of them matches: -> (a + b)^2
  if (std::get<0>(Match)) {
    Value *AB =
        Builder.CreateFAddFMF(std::get<1>(Match), std::get<2>(Match), &I);
    return BinaryOperator::CreateFMulFMF(AB, AB, &I);
  }

  return nullptr;
}

goldstein.w.n added inline comments.Aug 19 2023, 11:25 AM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

1088

How about something along the lines of:

template <unsigned OpcMul, unsigned OpcAdd, unsigned OpcMul2, typename Mul2Rhs>
static bool foldSquareSum(BinaryOperator &I, Mul2Rhs MRhs, Value *&AOut,
                                  Value *&BOut) {
  Value *A, *B;
  bool Matches = match(
      &I,
      m_c_BinOp(OpcAdd, m_OneUse(m_BinOp(OpcMul, m_Value(A), m_Deferred(A))),
                m_OneUse(m_BinOp(
                    OpcMul,
                    m_c_BinOp(OpcAdd, m_BinOp(OpcMul2, m_Deferred(A), MRhs),
                              m_Value(B)),
                    m_Deferred(B)))));
  if (!Matches) {
    Matches = match(
        &I,
        m_c_BinOp(
            OpcAdd,
            m_CombineOr(
                m_OneUse(m_BinOp(
                    OpcMul2, m_BinOp(OpcMul, m_Value(A), m_Value(B)), MRhs)),
                m_OneUse(m_BinOp(OpcMul, m_BinOp(OpcMul2, m_Value(A), MRhs),
                                 m_Value(B)))),
            m_OneUse(
                m_c_BinOp(OpcAdd, m_BinOp(OpcMul, m_Deferred(A), m_Deferred(A)),
                          m_BinOp(OpcMul, m_Deferred(B), m_Deferred(B))))));
  }
  AOut = A;
  BOut = B;
  return Matches;
}


// Fold variations of a^2 + 2*a*b + b^2 -> (a + b)^2
Instruction *InstCombinerImpl::foldSquareSumInts(BinaryOperator &I) {
  Value *A, *B;

  bool Matches =
      foldSquareSum<Instruction::Mul, Instruction::Add, Instruction::Shl>(
          I, m_SpecificInt(1), A, B);
  // if one of them matches: -> (a + b)^2
  if (Matches) {
    Value *AB = Builder.CreateAdd(A, B);
    return BinaryOperator::CreateMul(AB, AB);
  }

  return nullptr;
}


// Fold variations of a^2 + 2*a*b + b^2 -> (a + b)^2
// Requires `nsz` and `reassoc`.

Instruction *InstCombinerImpl::foldSquareSumFloat(BinaryOperator &I) {
  Value *A, *B;

  assert(I.hasAllowReassoc() && I.hasNoSignedZeros() && "Assumption mismatch");

  bool Matches =
      foldSquareSum<Instruction::FMul, Instruction::FAdd, Instruction::FMul>(
          I, m_SpecificFP(2.0), A, B);

  // if one of them matches: -> (a + b)^2
  if (Matches) {
    Value *AB = Builder.CreateFAddFMF(A, B, &I);
    return BinaryOperator::CreateFMulFMF(AB, AB, &I);
  }

  return nullptr;
}

Needs comments/whatnot but don't see why this would fallshort.
All the InstCombine tests pass with this (I assume including all the tests relevant to int/fp version of this).

How do you like this? I've made most of the template parameters default to the correct type to keep the invocation cleaner. Not sure if template specializing (only for m_SpecificInt / m_SpecificFP in the matcher function would be a good idea, as it'd make the invocation even cleaner, but the matcher a bit more complicated. However, considering that this little template monster is the replacement for slight code duplication, this may be our implementation of choice. Let me know what you think!

rainerzufalldererste marked 2 inline comments as done.Aug 25 2023, 6:34 AM

Harbormaster completed remote builds in B254870: Diff 553450.Aug 25 2023, 7:24 AM

rainerzufalldererste updated this revision to Diff 553528.Aug 25 2023, 10:14 AM

goldstein.w.n added inline comments.Aug 25 2023, 10:53 AM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1013	In fact if you do the bool template approach, I don't think `MulOp`, `AddOp`, or `Mul2Op` even need to be template parameters. You can just set them as unsigned values at the top of `matchesSquareSum`. i.e: `unsigned MulOp = FP ? Instruction::FMul : Instruction::Mul;` Imo that ends up being cleaner.
1042	comment what `false` means. I.e `</FP/false>`
1054	ibid.

Well spotted, much cleaner now. Comments added as requested.

rainerzufalldererste marked 3 inline comments as done.Aug 25 2023, 11:25 AM

LGTM.
I'm by no means an expert in FP semantics. @arsenm any chance you can quickly verify the FP checks are correct?

Wait a few days or until matt signs off as well before pushing please.

This revision is now accepted and ready to land.Aug 25 2023, 11:29 AM

goldstein.w.n added reviewers: nikic, arsenm.Aug 25 2023, 11:30 AM

Herald added subscribers: StephenFan, wdng. · View Herald TranscriptAug 25 2023, 11:30 AM

all good, I don't have commit access anyways.

Harbormaster completed remote builds in B254944: Diff 553552.Aug 25 2023, 1:45 PM

Updated to current head. Please commit for me when the build completes successfully. Thanks!

Harbormaster completed remote builds in B256283: Diff 555407.Sep 1 2023, 10:08 AM

Closed by commit rG3af459050663: [InstCombine] Contracting x^2 + 2*x*y + y^2 to (x + y)^2 (float) (authored by rainerzufalldererste, committed by goldstein.w.n). · Explain WhySep 1 2023, 1:03 PM

This revision was automatically updated to reflect the committed changes.

goldstein.w.n added a commit: rG3af459050663: [InstCombine] Contracting x^2 + 2*x*y + y^2 to (x + y)^2 (float).

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineAddSub.cpp

82 lines

InstCombineInternal.h

3 lines

test/

Transforms/

InstCombine/

fadd.ll

99 lines

Diff 555485

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

Show First 20 Lines • Show All 989 Lines • ▼ Show 20 Lines	if (match(Op0, m_ZExt(m_Add(m_Value(X), m_AllOnes())))) {
if (llvm::isKnownNonZero(X, DL, 0, Q.AC, Q.CxtI, Q.DT))		if (llvm::isKnownNonZero(X, DL, 0, Q.AC, Q.CxtI, Q.DT))
return new ZExtInst(X, Ty);		return new ZExtInst(X, Ty);
}		}
}		}

return nullptr;		return nullptr;
}		}

// Fold variations of a^2 + 2ab + b^2 -> (a + b)^2		// match variations of a^2 + 2ab + b^2
Instruction *InstCombinerImpl::foldSquareSumInts(BinaryOperator &I) {		//
Value A, B;		// to reuse the code between the FP and Int versions, the instruction OpCodes
		// and constant types have been turned into template parameters.
		//
		// Mul2Rhs: The constant to perform the multiplicative equivalent of X*2 with;
		// should be `m_SpecificFP(2.0)` for FP and `m_SpecificInt(1)` for Int
		// (we're matching `X<<1` instead of `X*2` for Int)
		template <bool FP, typename Mul2Rhs>
		static bool matchesSquareSum(BinaryOperator &I, Mul2Rhs M2Rhs, Value *&A,
		Value *&B) {
		constexpr unsigned MulOp = FP ? Instruction::FMul : Instruction::Mul;
		constexpr unsigned AddOp = FP ? Instruction::FAdd : Instruction::Add;
		constexpr unsigned Mul2Op = FP ? Instruction::FMul : Instruction::Shl;

// (a * a) + (((a << 1) + b) * b)		// (a * a) + (((a * 2) + b) * b)
		goldstein.w.nUnsubmitted Done Reply Inline Actions In fact if you do the bool template approach, I don't think `MulOp`, `AddOp`, or `Mul2Op` even need to be template parameters. You can just set them as unsigned values at the top of `matchesSquareSum`. i.e: `unsigned MulOp = FP ? Instruction::FMul : Instruction::Mul;` Imo that ends up being cleaner. goldstein.w.n: In fact if you do the bool template approach, I don't think `MulOp`, `AddOp`, or `Mul2Op` even…
bool Matches = match(		if (match(&I, m_c_BinOp(
&I, m_c_Add(m_OneUse(m_Mul(m_Value(A), m_Deferred(A))),		AddOp, m_OneUse(m_BinOp(MulOp, m_Value(A), m_Deferred(A))),
m_OneUse(m_Mul(m_c_Add(m_Shl(m_Deferred(A), m_SpecificInt(1)),		m_OneUse(m_BinOp(
		MulOp,
		m_c_BinOp(AddOp, m_BinOp(Mul2Op, m_Deferred(A), M2Rhs),
m_Value(B)),		m_Value(B)),
m_Deferred(B)))));		m_Deferred(B))))))
		return true;

// ((a * b) << 1) or ((a << 1) * b)		// ((a * b) * 2) or ((a * 2) * b)
// +		// +
// (a * a + b * b) or (b * b + a * a)		// (a * a + b * b) or (b * b + a * a)
if (!Matches) {		return match(
Matches = match(
&I,		&I,
m_c_Add(m_CombineOr(m_OneUse(m_Shl(m_Mul(m_Value(A), m_Value(B)),		m_c_BinOp(AddOp,
m_SpecificInt(1))),		m_CombineOr(
m_OneUse(m_Mul(m_Shl(m_Value(A), m_SpecificInt(1)),		m_OneUse(m_BinOp(
		Mul2Op, m_BinOp(MulOp, m_Value(A), m_Value(B)), M2Rhs)),
		m_OneUse(m_BinOp(MulOp, m_BinOp(Mul2Op, m_Value(A), M2Rhs),
m_Value(B)))),		m_Value(B)))),
m_OneUse(m_c_Add(m_Mul(m_Deferred(A), m_Deferred(A)),		m_OneUse(m_c_BinOp(
m_Mul(m_Deferred(B), m_Deferred(B))))));		AddOp, m_BinOp(MulOp, m_Deferred(A), m_Deferred(A)),
		m_BinOp(MulOp, m_Deferred(B), m_Deferred(B))))));
}		}

// if one of them matches: -> (a + b)^2		// Fold integer variations of a^2 + 2ab + b^2 -> (a + b)^2
if (Matches) {		Instruction *InstCombinerImpl::foldSquareSumInt(BinaryOperator &I) {
		Value A, B;
		if (matchesSquareSum</FP/ false>(I, m_SpecificInt(1), A, B)) {
		goldstein.w.nUnsubmitted Done Reply Inline Actions comment what `false` means. I.e `</FP/false>` goldstein.w.n: comment what `false` means. I.e `</FP/false>`
Value *AB = Builder.CreateAdd(A, B);		Value *AB = Builder.CreateAdd(A, B);
return BinaryOperator::CreateMul(AB, AB);		return BinaryOperator::CreateMul(AB, AB);
}		}
		return nullptr;
		}

		// Fold floating point variations of a^2 + 2ab + b^2 -> (a + b)^2
		// Requires `nsz` and `reassoc`.
		Instruction *InstCombinerImpl::foldSquareSumFP(BinaryOperator &I) {
		assert(I.hasAllowReassoc() && I.hasNoSignedZeros() && "Assumption mismatch");
		Value A, B;
		if (matchesSquareSum</FP/ true>(I, m_SpecificFP(2.0), A, B)) {
		goldstein.w.nUnsubmitted Done Reply Inline Actions ibid. goldstein.w.n: ibid.
		Value *AB = Builder.CreateFAddFMF(A, B, &I);
		return BinaryOperator::CreateFMulFMF(AB, AB, &I);
		}
return nullptr;		return nullptr;
}		}

// Matches multiplication expression Op * C where C is a constant. Returns the		// Matches multiplication expression Op * C where C is a constant. Returns the
// constant value in C and the other operand in Op. Returns true if such a		// constant value in C and the other operand in Op. Returns true if such a
// match is found.		// match is found.
static bool MatchMul(Value E, Value &Op, APInt &C) {		static bool MatchMul(Value E, Value &Op, APInt &C) {
const APInt *AI;		const APInt *AI;
Show All 14 Lines
// the remainder operation in IsSigned. Returns true if such a match is		// the remainder operation in IsSigned. Returns true if such a match is
// found.		// found.
static bool MatchRem(Value E, Value &Op, APInt &C, bool &IsSigned) {		static bool MatchRem(Value E, Value &Op, APInt &C, bool &IsSigned) {
const APInt *AI;		const APInt *AI;
IsSigned = false;		IsSigned = false;
if (match(E, m_SRem(m_Value(Op), m_APInt(AI)))) {		if (match(E, m_SRem(m_Value(Op), m_APInt(AI)))) {
IsSigned = true;		IsSigned = true;
C = *AI;		C = *AI;
return true;		return true;
		goldstein.w.nUnsubmitted Done Reply Inline Actions This match code is basically identical to `foldSquareSumInts`. The only difference other than `FMul` vs `Mul` is you do match `FMul(A, 2)` for floats and `m_Shl(A, 1)` for ints. Can you make the match code a helper that takes either fmul/2x matcher (or just lambda wrapping) so it can be used for SumFloat / SumInt? goldstein.w.n: This match code is basically identical to `foldSquareSumInts`. The only difference other than…
		rainerzufallderersteAuthorUnsubmitted Done Reply Inline Actions Does that imply that `m_c_FAdd` can simply be replaced with `m_c_Add` and will continue to match properly for floating point values as well? I presume that would entail partially matching another pattern and then deferring the actual check for the mul2 match, as `BinaryOp_match<RHS, LHS, OpCode>` would have different `OpCode`s for `FMul` and `Shl`, which sounds like a huge mess to me; or is there a cleaner way to do that? Something like this sadly doesn't compile (as the lambda return type is ambiguous): const auto FpMul2Matcher = [](auto &value) { return m_FMul(value, m_SpecificFP(2.0)); }; const auto IntMul2Matcher = [](auto &value) { return m_Shl(value, m_SpecificInt(1)); }; const auto Mul2Matcher = FP ? FpMul2Matcher : IntMul2Matcher; rainerzufalldererste: Does that imply that `m_c_FAdd` can simply be replaced with `m_c_Add` and will continue to…
		rainerzufallderersteAuthorUnsubmitted Done Reply Inline Actions Even something like this shouldn't work. template <typename TMul2, typename TCAdd, typename TMul> static bool MatchesSquareSum(BinaryOperator &I, Value &A, Value &B, const TMul2 &Mul2, const TCAdd &CAdd, const TMul &Mul) { // (a * a) + (((a * 2) + b) * b) bool Matches = match(&I, CAdd(m_OneUse(Mul(m_Value(A), m_Deferred(A))), m_OneUse(Mul(CAdd(Mul2(m_Deferred(A)), m_Value(B)), m_Deferred(B))))); // ((a * b) * 2) or ((a * 2) * b) // + // (a * a + b * b) or (b * b + a * a) if (!Matches) { Matches = match(&I, CAdd(m_CombineOr(m_OneUse(Mul2(Mul(m_Value(A), m_Value(B)))), m_OneUse(Mul(Mul2(m_Value(A)), m_Value(B)))), m_OneUse(CAdd(Mul(m_Deferred(A), m_Deferred(A)), Mul(m_Deferred(B), m_Deferred(B)))))); } return Matches; } I agree that it's messy to have duplicate code, but with the way op-codes are used as template parameters I don't see a way without template specialization to do this nicely; and with template specialization it's even more of a beast. Am I missing some obvious way built into llvm/InstCombine to do this nicely? rainerzufalldererste: Even something like this shouldn't work. ``` template <typename TMul2, typename TCAdd…
		goldstein.w.nUnsubmitted Done Reply Inline Actions Why doesn't that code work? goldstein.w.n: Why doesn't that code work?
		rainerzufallderersteAuthorUnsubmitted Done Reply Inline Actions Assuming `TMul2` etc. to be a lambda, the return type couln't be consistent, as for both `m_FMul` and `m_Shl` it'd be `BinaryOp_match<RHS, LHS, OpCode>`, with the same `OpCode` for each invocation, but different `RHS` and `LHS`. One could make this work with macros, but I don't know the LLVM stance on macros, or with templace specialization, where there'd be a specialized struct with three functions (`Mul2`, `Mul`, `CAdd`) that simply map to the correct functions for `FAdd`/`Add` etc. However, I honestly think that the current implementation is the cleanest way to do it. I'm also not a big fan of code duplication, but the discussed alternatives seem a lot messier to me. rainerzufalldererste: Assuming `TMul2` etc. to be a lambda, the return type couln't be consistent, as for both…
		rainerzufallderersteAuthorUnsubmitted Done Reply Inline Actions Have you been able to come up with some better ideas? Maybe it's not _that_ terrible to go down the template specialization route, as many of the integer optimizations may have similar counterparts in FP with `nsz` and `reassoc`. Not sure how many of them are already handled twice, but there's a chance one could simplify this process by providing template specialized `m_XAdd<IsFP>(LHS, RHS)` etc. However, I'm not sure if I'm the right person to pass judgement on something that large, as I'm still very new to both LLVM and InstCombine. rainerzufalldererste: Have you been able to come up with some better ideas? Maybe it's not _that_ terrible to go down…
		goldstein.w.nUnsubmitted Done Reply Inline Actions Assuming `TMul2` etc. to be a lambda, the return type couln't be consistent, as for both `m_FMul` and `m_Shl` it'd be `BinaryOp_match<RHS, LHS, OpCode>`, with the same `OpCode` for each invocation, but different `RHS` and `LHS`. One could make this work with macros, but I don't know the LLVM stance on macros, or with templace specialization, where there'd be a specialized struct with three functions (`Mul2`, `Mul`, `CAdd`) that simply map to the correct functions for `FAdd`/`Add` etc. For the TMul2 don't you only need a single Value? Instead of passing a BinaryOperator, you could just pass a lambda i.e: auto FPMul2 = [](Value & A) { return match(m_FMul(m_Value(A), m_SpecificFP(2)); }; ... auto IntMul2 = [](Value &A) { return match(m_Shl(m_Value(A), m_SpecificInt(1)); }; Don't see why the same isn't true for mul/add (although two values then). However, I honestly think that the current implementation is the cleanest way to do it. I'm also not a big fan of code duplication, but the discussed alternatives seem a lot messier to me. goldstein.w.n: > Assuming `TMul2` etc. to be a lambda, the return type couln't be consistent, as for both…
		rainerzufallderersteAuthorUnsubmitted Done Reply Inline Actions Regarding the LHS and RHS, you are correct, I misspoke. The `OpCode` and `RHS` are consistent, but `LHS` isn't. There are multiple cases where `TMul2` is used: `Mul2(m_Deferred(A)` `Mul2(Mul(m_Value(A), m_Value(B))` `Mul2(m_Value(A))` All of these parameters have different types, therefore the return type of this lambda would also be different in every case. So if the parameter were `Value &`, this wouldn't be a problem at all, but that's simply not the case. Is there a way to cast these types to `Value &` somehow (without capturing them separately and then matching things again against the sub-match-lambda)? `mDeferred` returns `deferredval_ty<Value>`. `Mul(m_Value(), m_Value()` returns either `BinaryOp_match<bind_ty<Value>, bind_ty<Value>, Instruction::FMul>` or `BinaryOp_match<bind_ty<Value>, bind_ty<Value>, Instruction::Mul>`. `m_Value` returns `bind_ty<Value>`. These types aren't compatible, so the template can't deduce a consistent type even from `auto`-parameter lambdas. Same with `Mul` & `CAdd`. Apart from that, I'm a bit confused about the `match` in your comment, as that's not quite applicable, unless we're previously matching parts of the match and then checking them against this follow-up matcher lambda, which - even if we were to do that - would end up in a large mess, as that's not only the case with `Mul2`, but also `CAdd` & `Mul` then, turning these two large matches into a ton of tiny matches. Otherwise, I'm not quite sure why I'm explaining compilation errors here, unless I'm missing something very obvious or am completely missing the point. This, however, isn't valid C++ code: template <typename TMul2, typename TCAdd, typename TMul> static std::tuple<bool, Value , Value > MatchesSquareSum(BinaryOperator &I, const TMul2 &Mul2, const TCAdd &CAdd, const TMul &Mul) { Value A, B; // (a * a) + (((a * 2) + b) * b) if (match(&I, CAdd(m_OneUse(Mul(m_Value(A), m_Deferred(A))), m_OneUse(Mul(CAdd(Mul2(m_Deferred(A)), m_Value(B)), m_Deferred(B)))))) return std::make_tuple(true, A, B); // ((a * b) * 2) or ((a * 2) * b) // + // (a * a + b * b) or (b * b + a * a) return std::make_tuple( match(&I, CAdd(m_CombineOr(m_OneUse(Mul2(Mul(m_Value(A), m_Value(B)))), m_OneUse(Mul(Mul2(m_Value(A)), m_Value(B)))), m_OneUse(CAdd(Mul(m_Deferred(A), m_Deferred(A)), Mul(m_Deferred(B), m_Deferred(B)))))), A, B); } // Fold variations of a^2 + 2ab + b^2 -> (a + b)^2 // if `FP`: requires `nsz` and `reassoc`. Instruction InstCombinerImpl::foldSquareSum(BinaryOperator &I, const bool FP) { if (FP) { assert(I.hasAllowReassoc() && I.hasNoSignedZeros() && "Assumption mismatch"); } std::tuple<bool, Value , Value > Match; if (FP) { Match = MatchesSquareSum( I, [](auto &V) { return m_FMul(V, m_SpecificFP(2.0)); }, [](auto &L, auto &R) { return m_c_FAdd(L, R); }, [](auto &L, auto &R) { return m_FMul(L, R); }); } else { Match = MatchesSquareSum( I, [](auto &V) { return m_Shl(V, m_SpecificInt(1)); }, [](auto &L, auto &R) { return m_c_Add(L, R); }, [](auto &L, auto &R) { return m_Mul(L, R); }); } // if one of them matches: -> (a + b)^2 if (std::get<0>(Match)) { Value AB = Builder.CreateFAddFMF(std::get<1>(Match), std::get<2>(Match), &I); return BinaryOperator::CreateFMulFMF(AB, AB, &I); } return nullptr; } This _is_ valid C++ code, but uses template specialization to get around the previous type-ambiguity issues: template <bool IsFP> struct XMul; template <> struct XMul<false> { template <typename LHS, typename RHS> inline auto operator()(const LHS &L, const RHS &R) const { return m_Mul(L, R); } }; template <> struct XMul<true> { template <typename LHS, typename RHS> inline auto operator()(const LHS &L, const RHS &R) const { return m_FMul(L, R); } }; template <bool IsFP> struct XCAdd; template <> struct XCAdd<false> { template <typename LHS, typename RHS> inline auto operator()(const LHS &L, const RHS &R) const { return m_c_Add(L, R); } }; template <> struct XCAdd<true> { template <typename LHS, typename RHS> inline auto operator()(const LHS &L, const RHS &R) const { return m_c_FAdd(L, R); } }; template <bool IsFP> struct XMul2; template <> struct XMul2<false> { template <typename LHS> inline auto operator()(const LHS &L) const { return m_Shl(L, m_SpecificInt(1)); } }; template <> struct XMul2<true> { template <typename LHS> inline auto operator()(const LHS &L) const { return m_FMul(L, m_SpecificFP(2.0)); } }; template <typename TMul2, typename TCAdd, typename TMul> static std::tuple<bool, Value , Value > MatchesSquareSum(BinaryOperator &I, const TMul2 &Mul2, const TCAdd &CAdd, const TMul &Mul) { Value A, B; // (a * a) + (((a * 2) + b) * b) if (match(&I, CAdd(m_OneUse(Mul(m_Value(A), m_Deferred(A))), m_OneUse(Mul(CAdd(Mul2(m_Deferred(A)), m_Value(B)), m_Deferred(B)))))) return std::make_tuple(true, A, B); // ((a * b) * 2) or ((a * 2) * b) // + // (a * a + b * b) or (b * b + a * a) return std::make_tuple( match(&I, CAdd(m_CombineOr(m_OneUse(Mul2(Mul(m_Value(A), m_Value(B)))), m_OneUse(Mul(Mul2(m_Value(A)), m_Value(B)))), m_OneUse(CAdd(Mul(m_Deferred(A), m_Deferred(A)), Mul(m_Deferred(B), m_Deferred(B)))))), A, B); } // Fold variations of a^2 + 2ab + b^2 -> (a + b)^2 // if `FP`: requires `nsz` and `reassoc`. Instruction InstCombinerImpl::foldSquareSum(BinaryOperator &I, const bool FP) { if (FP) { assert(I.hasAllowReassoc() && I.hasNoSignedZeros() && "Assumption mismatch"); } const std::tuple<bool, Value , Value > Match = FP ? MatchesSquareSum(I, XMul2<true>(), XCAdd<true>(), XMul<true>()) : MatchesSquareSum(I, XMul2<false>(), XCAdd<false>(), XMul<false>()); // if one of them matches: -> (a + b)^2 if (std::get<0>(Match)) { Value AB = Builder.CreateFAddFMF(std::get<1>(Match), std::get<2>(Match), &I); return BinaryOperator::CreateFMulFMF(AB, AB, &I); } return nullptr; } rainerzufalldererste: Regarding the LHS and RHS, you are correct, I misspoke. The `OpCode` and `RHS` are consistent…
		goldstein.w.nUnsubmitted Done Reply Inline Actions How about something along the lines of: template <unsigned OpcMul, unsigned OpcAdd, unsigned OpcMul2, typename Mul2Rhs> static bool foldSquareSum(BinaryOperator &I, Mul2Rhs MRhs, Value &AOut, Value &BOut) { Value A, B; bool Matches = match( &I, m_c_BinOp(OpcAdd, m_OneUse(m_BinOp(OpcMul, m_Value(A), m_Deferred(A))), m_OneUse(m_BinOp( OpcMul, m_c_BinOp(OpcAdd, m_BinOp(OpcMul2, m_Deferred(A), MRhs), m_Value(B)), m_Deferred(B))))); if (!Matches) { Matches = match( &I, m_c_BinOp( OpcAdd, m_CombineOr( m_OneUse(m_BinOp( OpcMul2, m_BinOp(OpcMul, m_Value(A), m_Value(B)), MRhs)), m_OneUse(m_BinOp(OpcMul, m_BinOp(OpcMul2, m_Value(A), MRhs), m_Value(B)))), m_OneUse( m_c_BinOp(OpcAdd, m_BinOp(OpcMul, m_Deferred(A), m_Deferred(A)), m_BinOp(OpcMul, m_Deferred(B), m_Deferred(B)))))); } AOut = A; BOut = B; return Matches; } // Fold variations of a^2 + 2ab + b^2 -> (a + b)^2 Instruction InstCombinerImpl::foldSquareSumInts(BinaryOperator &I) { Value A, B; bool Matches = foldSquareSum<Instruction::Mul, Instruction::Add, Instruction::Shl>( I, m_SpecificInt(1), A, B); // if one of them matches: -> (a + b)^2 if (Matches) { Value AB = Builder.CreateAdd(A, B); return BinaryOperator::CreateMul(AB, AB); } return nullptr; } // Fold variations of a^2 + 2ab + b^2 -> (a + b)^2 // Requires `nsz` and `reassoc`. Instruction InstCombinerImpl::foldSquareSumFloat(BinaryOperator &I) { Value A, B; assert(I.hasAllowReassoc() && I.hasNoSignedZeros() && "Assumption mismatch"); bool Matches = foldSquareSum<Instruction::FMul, Instruction::FAdd, Instruction::FMul>( I, m_SpecificFP(2.0), A, B); // if one of them matches: -> (a + b)^2 if (Matches) { Value AB = Builder.CreateFAddFMF(A, B, &I); return BinaryOperator::CreateFMulFMF(AB, AB, &I); } return nullptr; } Needs comments/whatnot but don't see why this would fallshort. All the InstCombine tests pass with this (I assume including all the tests relevant to int/fp version of this). goldstein.w.n: How about something along the lines of: ``` template <unsigned OpcMul, unsigned OpcAdd…
}		}
if (match(E, m_URem(m_Value(Op), m_APInt(AI)))) {		if (match(E, m_URem(m_Value(Op), m_APInt(AI)))) {
C = *AI;		C = *AI;
return true;		return true;
}		}
if (match(E, m_And(m_Value(Op), m_APInt(AI))) && (*AI + 1).isPowerOf2()) {		if (match(E, m_And(m_Value(Op), m_APInt(AI))) && (*AI + 1).isPowerOf2()) {
C = *AI + 1;		C = *AI + 1;
return true;		return true;
▲ Show 20 Lines • Show All 594 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::visitAdd(BinaryOperator &I) {
// ctpop(A) + ctpop(B) => ctpop(A \| B) if A and B have no bits set in common.		// ctpop(A) + ctpop(B) => ctpop(A \| B) if A and B have no bits set in common.
if (match(LHS, m_OneUse(m_Intrinsic<Intrinsic::ctpop>(m_Value(A)))) &&		if (match(LHS, m_OneUse(m_Intrinsic<Intrinsic::ctpop>(m_Value(A)))) &&
match(RHS, m_OneUse(m_Intrinsic<Intrinsic::ctpop>(m_Value(B)))) &&		match(RHS, m_OneUse(m_Intrinsic<Intrinsic::ctpop>(m_Value(B)))) &&
haveNoCommonBitsSet(A, B, DL, &AC, &I, &DT))		haveNoCommonBitsSet(A, B, DL, &AC, &I, &DT))
return replaceInstUsesWith(		return replaceInstUsesWith(
I, Builder.CreateIntrinsic(Intrinsic::ctpop, {I.getType()},		I, Builder.CreateIntrinsic(Intrinsic::ctpop, {I.getType()},
{Builder.CreateOr(A, B)}));		{Builder.CreateOr(A, B)}));

if (Instruction *Res = foldSquareSumInts(I))		if (Instruction *Res = foldSquareSumInt(I))
return Res;		return Res;

if (Instruction *Res = foldBinOpOfDisplacedShifts(I))		if (Instruction *Res = foldBinOpOfDisplacedShifts(I))
return Res;		return Res;

if (Instruction *Res = foldBinOpOfSelectAndCastOfSelectCondition(I))		if (Instruction *Res = foldBinOpOfSelectAndCastOfSelectCondition(I))
return Res;		return Res;

▲ Show 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::visitFAdd(BinaryOperator &I) {
// Handle specials cases for FAdd with selects feeding the operation		// Handle specials cases for FAdd with selects feeding the operation
if (Value *V = SimplifySelectsFeedingBinaryOp(I, LHS, RHS))		if (Value *V = SimplifySelectsFeedingBinaryOp(I, LHS, RHS))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (I.hasAllowReassoc() && I.hasNoSignedZeros()) {		if (I.hasAllowReassoc() && I.hasNoSignedZeros()) {
if (Instruction *F = factorizeFAddFSub(I, Builder))		if (Instruction *F = factorizeFAddFSub(I, Builder))
return F;		return F;

		if (Instruction *F = foldSquareSumFP(I))
		return F;

// Try to fold fadd into start value of reduction intrinsic.		// Try to fold fadd into start value of reduction intrinsic.
if (match(&I, m_c_FAdd(m_OneUse(m_Intrinsic<Intrinsic::vector_reduce_fadd>(		if (match(&I, m_c_FAdd(m_OneUse(m_Intrinsic<Intrinsic::vector_reduce_fadd>(
m_AnyZeroFP(), m_Value(X))),		m_AnyZeroFP(), m_Value(X))),
m_Value(Y)))) {		m_Value(Y)))) {
// fadd (rdx 0.0, X), Y --> rdx Y, X		// fadd (rdx 0.0, X), Y --> rdx Y, X
return replaceInstUsesWith(		return replaceInstUsesWith(
I, Builder.CreateIntrinsic(Intrinsic::vector_reduce_fadd,		I, Builder.CreateIntrinsic(Intrinsic::vector_reduce_fadd,
{X->getType()}, {Y, X}, &I));		{X->getType()}, {Y, X}, &I));
▲ Show 20 Lines • Show All 987 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 557 Lines • ▼ Show 20 Lines	public:
Instruction FoldOpIntoSelect(Instruction &Op, SelectInst SI,		Instruction FoldOpIntoSelect(Instruction &Op, SelectInst SI,
bool FoldWithMultiUse = false);		bool FoldWithMultiUse = false);

/// This is a convenience wrapper function for the above two functions.		/// This is a convenience wrapper function for the above two functions.
Instruction *foldBinOpIntoSelectOrPhi(BinaryOperator &I);		Instruction *foldBinOpIntoSelectOrPhi(BinaryOperator &I);

Instruction *foldAddWithConstant(BinaryOperator &Add);		Instruction *foldAddWithConstant(BinaryOperator &Add);

Instruction *foldSquareSumInts(BinaryOperator &I);		Instruction *foldSquareSumInt(BinaryOperator &I);
		Instruction *foldSquareSumFP(BinaryOperator &I);

/// Try to rotate an operation below a PHI node, using PHI nodes for		/// Try to rotate an operation below a PHI node, using PHI nodes for
/// its operands.		/// its operands.
Instruction *foldPHIArgOpIntoPHI(PHINode &PN);		Instruction *foldPHIArgOpIntoPHI(PHINode &PN);
Instruction *foldPHIArgBinOpIntoPHI(PHINode &PN);		Instruction *foldPHIArgBinOpIntoPHI(PHINode &PN);
Instruction *foldPHIArgInsertValueInstructionIntoPHI(PHINode &PN);		Instruction *foldPHIArgInsertValueInstructionIntoPHI(PHINode &PN);
Instruction *foldPHIArgExtractValueInstructionIntoPHI(PHINode &PN);		Instruction *foldPHIArgExtractValueInstructionIntoPHI(PHINode &PN);
Instruction *foldPHIArgGEPIntoPHI(PHINode &PN);		Instruction *foldPHIArgGEPIntoPHI(PHINode &PN);
▲ Show 20 Lines • Show All 187 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/fadd.ll

Show First 20 Lines • Show All 614 Lines • ▼ Show 20 Lines	;
%s = fsub float %n, %y		%s = fsub float %n, %y
%a = fadd float %x, %z		%a = fadd float %x, %z
%r = fadd nsz float %s, %a		%r = fadd nsz float %s, %a
ret float %r		ret float %r
}		}

define float @fadd_reduce_sqr_sum_varA(float %a, float %b) {		define float @fadd_reduce_sqr_sum_varA(float %a, float %b) {
; CHECK-LABEL: @fadd_reduce_sqr_sum_varA(		; CHECK-LABEL: @fadd_reduce_sqr_sum_varA(
; CHECK-NEXT: [[A_SQ:%.]] = fmul float [[A:%.]], [[A]]		; CHECK-NEXT: [[TMP1:%.]] = fadd reassoc nsz float [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TWO_A:%.*]] = fmul float [[A]], 2.000000e+00		; CHECK-NEXT: [[ADD:%.*]] = fmul reassoc nsz float [[TMP1]], [[TMP1]]
; CHECK-NEXT: [[TWO_A_PLUS_B:%.]] = fadd float [[TWO_A]], [[B:%.]]
; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TWO_A_PLUS_B]], [[B]]
; CHECK-NEXT: [[ADD:%.*]] = fadd reassoc nsz float [[MUL]], [[A_SQ]]
; CHECK-NEXT: ret float [[ADD]]		; CHECK-NEXT: ret float [[ADD]]
;		;
%a_sq = fmul float %a, %a		%a_sq = fmul float %a, %a
%two_a = fmul float %a, 2.0		%two_a = fmul float %a, 2.0
%two_a_plus_b = fadd float %two_a, %b		%two_a_plus_b = fadd float %two_a, %b
%mul = fmul float %two_a_plus_b, %b		%mul = fmul float %two_a_plus_b, %b
%add = fadd reassoc nsz float %mul, %a_sq		%add = fadd reassoc nsz float %mul, %a_sq
ret float %add		ret float %add
}		}

define float @fadd_reduce_sqr_sum_varA_order2(float %a, float %b) {		define float @fadd_reduce_sqr_sum_varA_order2(float %a, float %b) {
; CHECK-LABEL: @fadd_reduce_sqr_sum_varA_order2(		; CHECK-LABEL: @fadd_reduce_sqr_sum_varA_order2(
; CHECK-NEXT: [[A_SQ:%.]] = fmul float [[A:%.]], [[A]]		; CHECK-NEXT: [[TMP1:%.]] = fadd reassoc nsz float [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TWO_A:%.*]] = fmul float [[A]], 2.000000e+00		; CHECK-NEXT: [[ADD:%.*]] = fmul reassoc nsz float [[TMP1]], [[TMP1]]
; CHECK-NEXT: [[TWO_A_PLUS_B:%.]] = fadd float [[TWO_A]], [[B:%.]]
; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TWO_A_PLUS_B]], [[B]]
; CHECK-NEXT: [[ADD:%.*]] = fadd reassoc nsz float [[A_SQ]], [[MUL]]
; CHECK-NEXT: ret float [[ADD]]		; CHECK-NEXT: ret float [[ADD]]
;		;
%a_sq = fmul float %a, %a		%a_sq = fmul float %a, %a
%two_a = fmul float %a, 2.0		%two_a = fmul float %a, 2.0
%two_a_plus_b = fadd float %two_a, %b		%two_a_plus_b = fadd float %two_a, %b
%mul = fmul float %two_a_plus_b, %b		%mul = fmul float %two_a_plus_b, %b
%add = fadd reassoc nsz float %a_sq, %mul		%add = fadd reassoc nsz float %a_sq, %mul
ret float %add		ret float %add
}		}

define float @fadd_reduce_sqr_sum_varA_order3(float %a, float %b) {		define float @fadd_reduce_sqr_sum_varA_order3(float %a, float %b) {
; CHECK-LABEL: @fadd_reduce_sqr_sum_varA_order3(		; CHECK-LABEL: @fadd_reduce_sqr_sum_varA_order3(
; CHECK-NEXT: [[A_SQ:%.]] = fmul float [[A:%.]], [[A]]		; CHECK-NEXT: [[TMP1:%.]] = fadd reassoc nsz float [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TWO_A:%.*]] = fmul float [[A]], 2.000000e+00		; CHECK-NEXT: [[ADD:%.*]] = fmul reassoc nsz float [[TMP1]], [[TMP1]]
; CHECK-NEXT: [[TWO_A_PLUS_B:%.]] = fadd float [[TWO_A]], [[B:%.]]
; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TWO_A_PLUS_B]], [[B]]
; CHECK-NEXT: [[ADD:%.*]] = fadd reassoc nsz float [[MUL]], [[A_SQ]]
; CHECK-NEXT: ret float [[ADD]]		; CHECK-NEXT: ret float [[ADD]]
;		;
%a_sq = fmul float %a, %a		%a_sq = fmul float %a, %a
%two_a = fmul float %a, 2.0		%two_a = fmul float %a, 2.0
%two_a_plus_b = fadd float %two_a, %b		%two_a_plus_b = fadd float %two_a, %b
%mul = fmul float %b, %two_a_plus_b		%mul = fmul float %b, %two_a_plus_b
%add = fadd reassoc nsz float %mul, %a_sq		%add = fadd reassoc nsz float %mul, %a_sq
ret float %add		ret float %add
}		}

define float @fadd_reduce_sqr_sum_varA_order4(float %a, float %b) {		define float @fadd_reduce_sqr_sum_varA_order4(float %a, float %b) {
; CHECK-LABEL: @fadd_reduce_sqr_sum_varA_order4(		; CHECK-LABEL: @fadd_reduce_sqr_sum_varA_order4(
; CHECK-NEXT: [[A_SQ:%.]] = fmul float [[A:%.]], [[A]]		; CHECK-NEXT: [[TMP1:%.]] = fadd reassoc nsz float [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TWO_A:%.*]] = fmul float [[A]], 2.000000e+00		; CHECK-NEXT: [[ADD:%.*]] = fmul reassoc nsz float [[TMP1]], [[TMP1]]
; CHECK-NEXT: [[TWO_A_PLUS_B:%.]] = fadd float [[TWO_A]], [[B:%.]]
; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TWO_A_PLUS_B]], [[B]]
; CHECK-NEXT: [[ADD:%.*]] = fadd reassoc nsz float [[MUL]], [[A_SQ]]
; CHECK-NEXT: ret float [[ADD]]		; CHECK-NEXT: ret float [[ADD]]
;		;
%a_sq = fmul float %a, %a		%a_sq = fmul float %a, %a
%two_a = fmul float %a, 2.0		%two_a = fmul float %a, 2.0
%two_a_plus_b = fadd float %b, %two_a		%two_a_plus_b = fadd float %b, %two_a
%mul = fmul float %two_a_plus_b, %b		%mul = fmul float %two_a_plus_b, %b
%add = fadd reassoc nsz float %mul, %a_sq		%add = fadd reassoc nsz float %mul, %a_sq
ret float %add		ret float %add
}		}

define float @fadd_reduce_sqr_sum_varA_order5(float %a, float %b) {		define float @fadd_reduce_sqr_sum_varA_order5(float %a, float %b) {
; CHECK-LABEL: @fadd_reduce_sqr_sum_varA_order5(		; CHECK-LABEL: @fadd_reduce_sqr_sum_varA_order5(
; CHECK-NEXT: [[A_SQ:%.]] = fmul float [[A:%.]], [[A]]		; CHECK-NEXT: [[TMP1:%.]] = fadd reassoc nsz float [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TWO_A:%.*]] = fmul float [[A]], 2.000000e+00		; CHECK-NEXT: [[ADD:%.*]] = fmul reassoc nsz float [[TMP1]], [[TMP1]]
; CHECK-NEXT: [[TWO_A_PLUS_B:%.]] = fadd float [[TWO_A]], [[B:%.]]
; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TWO_A_PLUS_B]], [[B]]
; CHECK-NEXT: [[ADD:%.*]] = fadd reassoc nsz float [[MUL]], [[A_SQ]]
; CHECK-NEXT: ret float [[ADD]]		; CHECK-NEXT: ret float [[ADD]]
;		;
%a_sq = fmul float %a, %a		%a_sq = fmul float %a, %a
%two_a = fmul float 2.0, %a		%two_a = fmul float 2.0, %a
%two_a_plus_b = fadd float %two_a, %b		%two_a_plus_b = fadd float %two_a, %b
%mul = fmul float %two_a_plus_b, %b		%mul = fmul float %two_a_plus_b, %b
%add = fadd reassoc nsz float %mul, %a_sq		%add = fadd reassoc nsz float %mul, %a_sq
ret float %add		ret float %add
}		}

define float @fadd_reduce_sqr_sum_varB(float %a, float %b) {		define float @fadd_reduce_sqr_sum_varB(float %a, float %b) {
; CHECK-LABEL: @fadd_reduce_sqr_sum_varB(		; CHECK-LABEL: @fadd_reduce_sqr_sum_varB(
; CHECK-NEXT: [[A_B:%.]] = fmul float [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd reassoc nsz float [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[A_B_2:%.*]] = fmul float [[A_B]], 2.000000e+00		; CHECK-NEXT: [[ADD:%.*]] = fmul reassoc nsz float [[TMP1]], [[TMP1]]
; CHECK-NEXT: [[A_SQ:%.*]] = fmul float [[A]], [[A]]
; CHECK-NEXT: [[B_SQ:%.*]] = fmul float [[B]], [[B]]
; CHECK-NEXT: [[A_SQ_B_SQ:%.*]] = fadd float [[A_SQ]], [[B_SQ]]
; CHECK-NEXT: [[ADD:%.*]] = fadd reassoc nsz float [[A_B_2]], [[A_SQ_B_SQ]]
; CHECK-NEXT: ret float [[ADD]]		; CHECK-NEXT: ret float [[ADD]]
;		;
%a_b = fmul float %a, %b		%a_b = fmul float %a, %b
%a_b_2 = fmul float %a_b, 2.0		%a_b_2 = fmul float %a_b, 2.0
%a_sq = fmul float %a, %a		%a_sq = fmul float %a, %a
%b_sq = fmul float %b, %b		%b_sq = fmul float %b, %b
%a_sq_b_sq = fadd float %a_sq, %b_sq		%a_sq_b_sq = fadd float %a_sq, %b_sq
%add = fadd reassoc nsz float %a_b_2, %a_sq_b_sq		%add = fadd reassoc nsz float %a_b_2, %a_sq_b_sq
ret float %add		ret float %add
}		}

define float @fadd_reduce_sqr_sum_varB_order1(float %a, float %b) {		define float @fadd_reduce_sqr_sum_varB_order1(float %a, float %b) {
; CHECK-LABEL: @fadd_reduce_sqr_sum_varB_order1(		; CHECK-LABEL: @fadd_reduce_sqr_sum_varB_order1(
; CHECK-NEXT: [[A_B:%.]] = fmul float [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd reassoc nsz float [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[A_B_2:%.*]] = fmul float [[A_B]], 2.000000e+00		; CHECK-NEXT: [[ADD:%.*]] = fmul reassoc nsz float [[TMP1]], [[TMP1]]
; CHECK-NEXT: [[A_SQ:%.*]] = fmul float [[A]], [[A]]
; CHECK-NEXT: [[B_SQ:%.*]] = fmul float [[B]], [[B]]
; CHECK-NEXT: [[A_SQ_B_SQ:%.*]] = fadd float [[A_SQ]], [[B_SQ]]
; CHECK-NEXT: [[ADD:%.*]] = fadd reassoc nsz float [[A_SQ_B_SQ]], [[A_B_2]]
; CHECK-NEXT: ret float [[ADD]]		; CHECK-NEXT: ret float [[ADD]]
;		;
%a_b = fmul float %a, %b		%a_b = fmul float %a, %b
%a_b_2 = fmul float %a_b, 2.0		%a_b_2 = fmul float %a_b, 2.0
%a_sq = fmul float %a, %a		%a_sq = fmul float %a, %a
%b_sq = fmul float %b, %b		%b_sq = fmul float %b, %b
%a_sq_b_sq = fadd float %a_sq, %b_sq		%a_sq_b_sq = fadd float %a_sq, %b_sq
%add = fadd reassoc nsz float %a_sq_b_sq, %a_b_2		%add = fadd reassoc nsz float %a_sq_b_sq, %a_b_2
ret float %add		ret float %add
}		}

define float @fadd_reduce_sqr_sum_varB_order2(float %a, float %b) {		define float @fadd_reduce_sqr_sum_varB_order2(float %a, float %b) {
; CHECK-LABEL: @fadd_reduce_sqr_sum_varB_order2(		; CHECK-LABEL: @fadd_reduce_sqr_sum_varB_order2(
; CHECK-NEXT: [[A_B:%.]] = fmul float [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd reassoc nsz float [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[A_B_2:%.*]] = fmul float [[A_B]], 2.000000e+00		; CHECK-NEXT: [[ADD:%.*]] = fmul reassoc nsz float [[TMP1]], [[TMP1]]
; CHECK-NEXT: [[A_SQ:%.*]] = fmul float [[A]], [[A]]
; CHECK-NEXT: [[B_SQ:%.*]] = fmul float [[B]], [[B]]
; CHECK-NEXT: [[A_SQ_B_SQ:%.*]] = fadd float [[B_SQ]], [[A_SQ]]
; CHECK-NEXT: [[ADD:%.*]] = fadd reassoc nsz float [[A_B_2]], [[A_SQ_B_SQ]]
; CHECK-NEXT: ret float [[ADD]]		; CHECK-NEXT: ret float [[ADD]]
;		;
%a_b = fmul float %a, %b		%a_b = fmul float %a, %b
%a_b_2 = fmul float %a_b, 2.0		%a_b_2 = fmul float %a_b, 2.0
%a_sq = fmul float %a, %a		%a_sq = fmul float %a, %a
%b_sq = fmul float %b, %b		%b_sq = fmul float %b, %b
%a_sq_b_sq = fadd float %b_sq, %a_sq		%a_sq_b_sq = fadd float %b_sq, %a_sq
%add = fadd reassoc nsz float %a_b_2, %a_sq_b_sq		%add = fadd reassoc nsz float %a_b_2, %a_sq_b_sq
ret float %add		ret float %add
}		}

define float @fadd_reduce_sqr_sum_varB_order3(float %a, float %b) {		define float @fadd_reduce_sqr_sum_varB_order3(float %a, float %b) {
; CHECK-LABEL: @fadd_reduce_sqr_sum_varB_order3(		; CHECK-LABEL: @fadd_reduce_sqr_sum_varB_order3(
; CHECK-NEXT: [[A_B:%.]] = fmul float [[B:%.]], [[A:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd reassoc nsz float [[B:%.]], [[A:%.*]]
; CHECK-NEXT: [[A_B_2:%.*]] = fmul float [[A_B]], 2.000000e+00		; CHECK-NEXT: [[ADD:%.*]] = fmul reassoc nsz float [[TMP1]], [[TMP1]]
; CHECK-NEXT: [[A_SQ:%.*]] = fmul float [[A]], [[A]]
; CHECK-NEXT: [[B_SQ:%.*]] = fmul float [[B]], [[B]]
; CHECK-NEXT: [[A_SQ_B_SQ:%.*]] = fadd float [[A_SQ]], [[B_SQ]]
; CHECK-NEXT: [[ADD:%.*]] = fadd reassoc nsz float [[A_B_2]], [[A_SQ_B_SQ]]
; CHECK-NEXT: ret float [[ADD]]		; CHECK-NEXT: ret float [[ADD]]
;		;
%a_b = fmul float %b, %a		%a_b = fmul float %b, %a
%a_b_2 = fmul float 2.0, %a_b		%a_b_2 = fmul float 2.0, %a_b
%a_sq = fmul float %a, %a		%a_sq = fmul float %a, %a
%b_sq = fmul float %b, %b		%b_sq = fmul float %b, %b
%a_sq_b_sq = fadd float %a_sq, %b_sq		%a_sq_b_sq = fadd float %a_sq, %b_sq
%add = fadd reassoc nsz float %a_b_2, %a_sq_b_sq		%add = fadd reassoc nsz float %a_b_2, %a_sq_b_sq
ret float %add		ret float %add
}		}

define float @fadd_reduce_sqr_sum_varB2(float %a, float %b) {		define float @fadd_reduce_sqr_sum_varB2(float %a, float %b) {
; CHECK-LABEL: @fadd_reduce_sqr_sum_varB2(		; CHECK-LABEL: @fadd_reduce_sqr_sum_varB2(
; CHECK-NEXT: [[A_2:%.]] = fmul float [[A:%.]], 2.000000e+00		; CHECK-NEXT: [[TMP1:%.]] = fadd reassoc nsz float [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[A_B_2:%.]] = fmul float [[A_2]], [[B:%.]]		; CHECK-NEXT: [[ADD:%.*]] = fmul reassoc nsz float [[TMP1]], [[TMP1]]
; CHECK-NEXT: [[A_SQ:%.*]] = fmul float [[A]], [[A]]
; CHECK-NEXT: [[B_SQ:%.*]] = fmul float [[B]], [[B]]
; CHECK-NEXT: [[A_SQ_B_SQ:%.*]] = fadd float [[A_SQ]], [[B_SQ]]
; CHECK-NEXT: [[ADD:%.*]] = fadd reassoc nsz float [[A_B_2]], [[A_SQ_B_SQ]]
; CHECK-NEXT: ret float [[ADD]]		; CHECK-NEXT: ret float [[ADD]]
;		;
%a_2 = fmul float %a, 2.0		%a_2 = fmul float %a, 2.0
%a_b_2 = fmul float %a_2, %b		%a_b_2 = fmul float %a_2, %b
%a_sq = fmul float %a, %a		%a_sq = fmul float %a, %a
%b_sq = fmul float %b, %b		%b_sq = fmul float %b, %b
%a_sq_b_sq = fadd float %a_sq, %b_sq		%a_sq_b_sq = fadd float %a_sq, %b_sq
%add = fadd reassoc nsz float %a_b_2, %a_sq_b_sq		%add = fadd reassoc nsz float %a_b_2, %a_sq_b_sq
ret float %add		ret float %add
}		}

define float @fadd_reduce_sqr_sum_varB2_order1(float %a, float %b) {		define float @fadd_reduce_sqr_sum_varB2_order1(float %a, float %b) {
; CHECK-LABEL: @fadd_reduce_sqr_sum_varB2_order1(		; CHECK-LABEL: @fadd_reduce_sqr_sum_varB2_order1(
; CHECK-NEXT: [[A_2:%.]] = fmul float [[A:%.]], 2.000000e+00		; CHECK-NEXT: [[TMP1:%.]] = fadd reassoc nsz float [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[A_B_2:%.]] = fmul float [[A_2]], [[B:%.]]		; CHECK-NEXT: [[ADD:%.*]] = fmul reassoc nsz float [[TMP1]], [[TMP1]]
; CHECK-NEXT: [[A_SQ:%.*]] = fmul float [[A]], [[A]]
; CHECK-NEXT: [[B_SQ:%.*]] = fmul float [[B]], [[B]]
; CHECK-NEXT: [[A_SQ_B_SQ:%.*]] = fadd float [[A_SQ]], [[B_SQ]]
; CHECK-NEXT: [[ADD:%.*]] = fadd reassoc nsz float [[A_SQ_B_SQ]], [[A_B_2]]
; CHECK-NEXT: ret float [[ADD]]		; CHECK-NEXT: ret float [[ADD]]
;		;
%a_2 = fmul float %a, 2.0		%a_2 = fmul float %a, 2.0
%a_b_2 = fmul float %a_2, %b		%a_b_2 = fmul float %a_2, %b
%a_sq = fmul float %a, %a		%a_sq = fmul float %a, %a
%b_sq = fmul float %b, %b		%b_sq = fmul float %b, %b
%a_sq_b_sq = fadd float %a_sq, %b_sq		%a_sq_b_sq = fadd float %a_sq, %b_sq
%add = fadd reassoc nsz float %a_sq_b_sq, %a_b_2		%add = fadd reassoc nsz float %a_sq_b_sq, %a_b_2
ret float %add		ret float %add
}		}

define float @fadd_reduce_sqr_sum_varB2_order2(float %a, float %b) {		define float @fadd_reduce_sqr_sum_varB2_order2(float %a, float %b) {
; CHECK-LABEL: @fadd_reduce_sqr_sum_varB2_order2(		; CHECK-LABEL: @fadd_reduce_sqr_sum_varB2_order2(
; CHECK-NEXT: [[A_2:%.]] = fmul float [[A:%.]], 2.000000e+00		; CHECK-NEXT: [[TMP1:%.]] = fadd reassoc nsz float [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[A_B_2:%.]] = fmul float [[A_2]], [[B:%.]]		; CHECK-NEXT: [[ADD:%.*]] = fmul reassoc nsz float [[TMP1]], [[TMP1]]
; CHECK-NEXT: [[A_SQ:%.*]] = fmul float [[A]], [[A]]
; CHECK-NEXT: [[B_SQ:%.*]] = fmul float [[B]], [[B]]
; CHECK-NEXT: [[A_SQ_B_SQ:%.*]] = fadd float [[A_SQ]], [[B_SQ]]
; CHECK-NEXT: [[ADD:%.*]] = fadd reassoc nsz float [[A_B_2]], [[A_SQ_B_SQ]]
; CHECK-NEXT: ret float [[ADD]]		; CHECK-NEXT: ret float [[ADD]]
;		;
%a_2 = fmul float %a, 2.0		%a_2 = fmul float %a, 2.0
%a_b_2 = fmul float %b, %a_2		%a_b_2 = fmul float %b, %a_2
%a_sq = fmul float %a, %a		%a_sq = fmul float %a, %a
%b_sq = fmul float %b, %b		%b_sq = fmul float %b, %b
%a_sq_b_sq = fadd float %a_sq, %b_sq		%a_sq_b_sq = fadd float %a_sq, %b_sq
%add = fadd reassoc nsz float %a_b_2, %a_sq_b_sq		%add = fadd reassoc nsz float %a_b_2, %a_sq_b_sq
ret float %add		ret float %add
}		}

define float @fadd_reduce_sqr_sum_varB2_order3(float %a, float %b) {		define float @fadd_reduce_sqr_sum_varB2_order3(float %a, float %b) {
; CHECK-LABEL: @fadd_reduce_sqr_sum_varB2_order3(		; CHECK-LABEL: @fadd_reduce_sqr_sum_varB2_order3(
; CHECK-NEXT: [[A_2:%.]] = fmul float [[A:%.]], 2.000000e+00		; CHECK-NEXT: [[TMP1:%.]] = fadd reassoc nsz float [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[A_B_2:%.]] = fmul float [[A_2]], [[B:%.]]		; CHECK-NEXT: [[ADD:%.*]] = fmul reassoc nsz float [[TMP1]], [[TMP1]]
; CHECK-NEXT: [[A_SQ:%.*]] = fmul float [[A]], [[A]]
; CHECK-NEXT: [[B_SQ:%.*]] = fmul float [[B]], [[B]]
; CHECK-NEXT: [[A_SQ_B_SQ:%.*]] = fadd float [[A_SQ]], [[B_SQ]]
; CHECK-NEXT: [[ADD:%.*]] = fadd reassoc nsz float [[A_B_2]], [[A_SQ_B_SQ]]
; CHECK-NEXT: ret float [[ADD]]		; CHECK-NEXT: ret float [[ADD]]
;		;
%a_2 = fmul float 2.0, %a		%a_2 = fmul float 2.0, %a
%a_b_2 = fmul float %a_2, %b		%a_b_2 = fmul float %a_2, %b
%a_sq = fmul float %a, %a		%a_sq = fmul float %a, %a
%b_sq = fmul float %b, %b		%b_sq = fmul float %b, %b
%a_sq_b_sq = fadd float %a_sq, %b_sq		%a_sq_b_sq = fadd float %a_sq, %b_sq
%add = fadd reassoc nsz float %a_b_2, %a_sq_b_sq		%add = fadd reassoc nsz float %a_b_2, %a_sq_b_sq
▲ Show 20 Lines • Show All 340 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Contracting x^2 + 2*x*y + y^2 to (x + y)^2 (float)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 555485

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

llvm/test/Transforms/InstCombine/fadd.ll

[InstCombine] Contracting x^2 + 2xy + y^2 to (x + y)^2 (float)
ClosedPublic