This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
1/2
InstCombineAddSub.cpp
-
test/Transforms/Reassociate/
-
Transforms/
-
Reassociate/
1/2
fast-basictest.ll

Differential D117302

[InstCombine] Simplify addends reordering logic
ClosedPublic

Authored by kovdan01 on Jan 14 2022, 4:44 AM.

Download Raw Diff

Details

Reviewers

spatel
lebedev.ri
craig.topper

Commits

rGd8e0e125a2ff: [InstCombine] Simplify addends reordering logic

Summary

Previously some constants were not pushed to the top of the resulting
expression tree as intended by the algorithm. We can remove the logic from
simplifyFAdd and rely on SimplifyAssociativeOrCommutative to do that.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

kovdan01 created this revision.Jan 14 2022, 4:44 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptJan 14 2022, 4:44 AM

kovdan01 requested review of this revision.Jan 14 2022, 4:44 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 14 2022, 4:44 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B143362: Diff 399954.Jan 14 2022, 5:23 AM

kovdan01 updated this revision to Diff 399962.Jan 14 2022, 5:37 AM

Harbormaster completed remote builds in B143365: Diff 399962.Jan 14 2022, 6:11 AM

kovdan01 added a reviewer: craig.topper.Jan 15 2022, 2:44 AM

I can't tell if this has a greater effect than shown by the test diff. That diff actually demonstrates a basic missing canonicalization in IR - no fast-math is needed to convert fsub to fadd in that example or any of these more general cases:

define float @fmul_c1(float %x, float %y) {
  %m = fmul float %x, 7.000000e+00
  %r = fsub float %y, %m
  ret float %r
}

define float @fdiv_c0(float %x, float %y) {
  %m = fdiv float 7.000000e+00, %x
  %r = fsub float %y, %m
  ret float %r
}

define float @fdiv_c1(float %x, float %y) {
  %m = fdiv float %x, 7.000000e+00
  %r = fsub float %y, %m
  ret float %r
}

We do have this transform in the backend though (pushing the implicit negation op into the constant operand). For example, if you codegen this with llc for x86, you'll see 3 "addss" instructions and no "subss". Would adding those canonicalizations to IR solve your motivating case(s)?
Or is there a larger example of a change resulting from this patch?

In D117302#3246034, @spatel wrote:
I can't tell if this has a greater effect than shown by the test diff. That diff actually demonstrates a basic missing canonicalization in IR - no fast-math is needed to convert fsub to fadd in that example or any of these more general cases:
define float @fmul_c1(float %x, float %y) {
  %m = fmul float %x, 7.000000e+00
  %r = fsub float %y, %m
  ret float %r
}

define float @fdiv_c0(float %x, float %y) {
  %m = fdiv float 7.000000e+00, %x
  %r = fsub float %y, %m
  ret float %r
}

define float @fdiv_c1(float %x, float %y) {
  %m = fdiv float %x, 7.000000e+00
  %r = fsub float %y, %m
  ret float %r
}
We do have this transform in the backend though (pushing the implicit negation op into the constant operand). For example, if you codegen this with llc for x86, you'll see 3 "addss" instructions and no "subss". Would adding those canonicalizations to IR solve your motivating case(s)?
Or is there a larger example of a change resulting from this patch?

Thanks for your comment! I don’t have a larger example that results from this patch and adding the canonicalization that you mentioned will solve my motivation case. I can implement that and submit as a separate patch if that’s applicable for you. Could you please tell me the right place to add the canonicalization to?

Getting back to the current patch – I believe that it is also worth to be merged – at least, in terms of refactoring (having code that is intended to do the same thing in two places looks weird IMHO).

In D117302#3248112, @kovdan01 wrote:
In D117302#3246034, @spatel wrote:
I can't tell if this has a greater effect than shown by the test diff. That diff actually demonstrates a basic missing canonicalization in IR - no fast-math is needed to convert fsub to fadd in that example or any of these more general cases:
define float @fmul_c1(float %x, float %y) {
  %m = fmul float %x, 7.000000e+00
  %r = fsub float %y, %m
  ret float %r
}

define float @fdiv_c0(float %x, float %y) {
  %m = fdiv float 7.000000e+00, %x
  %r = fsub float %y, %m
  ret float %r
}

define float @fdiv_c1(float %x, float %y) {
  %m = fdiv float %x, 7.000000e+00
  %r = fsub float %y, %m
  ret float %r
}
We do have this transform in the backend though (pushing the implicit negation op into the constant operand). For example, if you codegen this with llc for x86, you'll see 3 "addss" instructions and no "subss". Would adding those canonicalizations to IR solve your motivating case(s)?
Or is there a larger example of a change resulting from this patch?
Thanks for your comment! I don’t have a larger example that results from this patch and adding the canonicalization that you mentioned will solve my motivation case. I can implement that and submit as a separate patch if that’s applicable for you. Could you please tell me the right place to add the canonicalization to?

I think it would start from InstCombinerImpl::visitFSub() and be very similar to the specialization that already exists as foldFNegIntoConstant. I already drafted a patch for it while looking at this, so I can clean that up and post it (the harder part is adding a pile of tests to check FMF propagation!).

Getting back to the current patch – I believe that it is also worth to be merged – at least, in terms of refactoring (having code that is intended to do the same thing in two places looks weird IMHO).

I agree. I'm still not sure exactly what happens within FAddCombine, but this patch just removes code, so LGTM.

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
540	typo: supper -> super

This revision is now accepted and ready to land.Jan 17 2022, 6:31 AM

spatel added inline comments.Jan 17 2022, 6:38 AM

llvm/test/Transforms/Reassociate/fast-basictest.ll
2	I missed that this test is under Reassociate. It's generally wrong for a regression test to run 3 different passes, but it can be a separate cleanup patch.

kovdan01 updated this revision to Diff 400582.Jan 17 2022, 9:28 AM

In D117302#3248293, @spatel wrote:

I already drafted a patch for it while looking at this, so I can clean that up and post it (the harder part is adding a pile of tests to check FMF propagation!).

OK, please let me know if you need any help with that. If I understood you correctly, you plan to submit the patch by yourself. If so, could you please mention me in revision to keep me informed? Thanks!

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
540	Fixed
llvm/test/Transforms/Reassociate/fast-basictest.ll
2	Got it, will submit a separate patch to fix this

Harbormaster completed remote builds in B143825: Diff 400582.Jan 17 2022, 10:11 AM

This revision was landed with ongoing or failed builds.Jan 18 2022, 5:01 AM

Closed by commit rGd8e0e125a2ff: [InstCombine] Simplify addends reordering logic (authored by kovdan01, committed by asavonic). · Explain Why

This revision was automatically updated to reflect the committed changes.

asavonic added a commit: rGd8e0e125a2ff: [InstCombine] Simplify addends reordering logic.

In D117302#3248938, @kovdan01 wrote:

In D117302#3248293, @spatel wrote:

I already drafted a patch for it while looking at this, so I can clean that up and post it (the harder part is adding a pile of tests to check FMF propagation!).

OK, please let me know if you need any help with that. If I understood you correctly, you plan to submit the patch by yourself. If so, could you please mention me in revision to keep me informed? Thanks!

Looking over the regression tests, I noticed that we would miss a factoring optimization if we canonicalize fsub->fadd more often. This would potentially be inverting a fold done by the reassociation pass:

/// Recursively analyze an expression to build a list of instructions that have
/// negative floating-point constant operands. The caller can then transform
/// the list to create positive constants for better reassociation and CSE.

...so I'm not sure if it is worth doing that transform generally.

kovdan01 mentioned this in D118769: Split fast-basictest.ll according to passes responsible for optimizations.Feb 2 2022, 3:13 AM

asavonic mentioned this in rG8471c537d55d: Split fast-basictest.ll according to passes responsible for optimizations.Feb 4 2022, 1:22 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineAddSub.cpp

28 lines

test/

Transforms/

Reassociate/

fast-basictest.ll

4 lines

Diff 400816

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

Show First 20 Lines • Show All 513 Lines • ▼ Show 20 Lines
Value *FAddCombine::simplifyFAdd(AddendVect& Addends, unsigned InstrQuota) {		Value *FAddCombine::simplifyFAdd(AddendVect& Addends, unsigned InstrQuota) {
unsigned AddendNum = Addends.size();		unsigned AddendNum = Addends.size();
assert(AddendNum <= 4 && "Too many addends");		assert(AddendNum <= 4 && "Too many addends");

// For saving intermediate results;		// For saving intermediate results;
unsigned NextTmpIdx = 0;		unsigned NextTmpIdx = 0;
FAddend TmpResult[3];		FAddend TmpResult[3];

// Points to the constant addend of the resulting simplified expression.
// If the resulting expr has constant-addend, this constant-addend is
// desirable to reside at the top of the resulting expression tree. Placing
// constant close to supper-expr(s) will potentially reveal some optimization
// opportunities in super-expr(s).
const FAddend *ConstAdd = nullptr;

// Simplified addends are placed <SimpVect>.		// Simplified addends are placed <SimpVect>.
AddendVect SimpVect;		AddendVect SimpVect;

// The outer loop works on one symbolic-value at a time. Suppose the input		// The outer loop works on one symbolic-value at a time. Suppose the input
// addends are : <a1, x>, <b1, y>, <a2, x>, <c1, z>, <b2, y>, ...		// addends are : <a1, x>, <b1, y>, <a2, x>, <c1, z>, <b2, y>, ...
// The symbolic-values will be processed in this order: x, y, z.		// The symbolic-values will be processed in this order: x, y, z.
for (unsigned SymIdx = 0; SymIdx < AddendNum; SymIdx++) {		for (unsigned SymIdx = 0; SymIdx < AddendNum; SymIdx++) {

const FAddend *ThisAddend = Addends[SymIdx];		const FAddend *ThisAddend = Addends[SymIdx];
if (!ThisAddend) {		if (!ThisAddend) {
// This addend was processed before.		// This addend was processed before.
continue;		continue;
}		}

Value *Val = ThisAddend->getSymVal();		Value *Val = ThisAddend->getSymVal();

		// If the resulting expr has constant-addend, this constant-addend is
		// desirable to reside at the top of the resulting expression tree. Placing
		// constant close to super-expr(s) will potentially reveal some
		spatelUnsubmitted Not Done Reply Inline Actions typo: supper -> super spatel: typo: supper -> super
		kovdan01AuthorUnsubmitted Done Reply Inline Actions Fixed kovdan01: Fixed
		// optimization opportunities in super-expr(s). Here we do not implement
		// this logic intentionally and rely on SimplifyAssociativeOrCommutative
		// call later.

unsigned StartIdx = SimpVect.size();		unsigned StartIdx = SimpVect.size();
SimpVect.push_back(ThisAddend);		SimpVect.push_back(ThisAddend);

// The inner loop collects addends sharing same symbolic-value, and these		// The inner loop collects addends sharing same symbolic-value, and these
// addends will be later on folded into a single addend. Following above		// addends will be later on folded into a single addend. Following above
// example, if the symbolic value "y" is being processed, the inner loop		// example, if the symbolic value "y" is being processed, the inner loop
// will collect two addends "<b1,y>" and "<b2,Y>". These two addends will		// will collect two addends "<b1,y>" and "<b2,Y>". These two addends will
// be later on folded into "<b1+b2, y>".		// be later on folded into "<b1+b2, y>".
Show All 12 Lines	for (unsigned SymIdx = 0; SymIdx < AddendNum; SymIdx++) {
if (StartIdx + 1 != SimpVect.size()) {		if (StartIdx + 1 != SimpVect.size()) {
FAddend &R = TmpResult[NextTmpIdx ++];		FAddend &R = TmpResult[NextTmpIdx ++];
R = *SimpVect[StartIdx];		R = *SimpVect[StartIdx];
for (unsigned Idx = StartIdx + 1; Idx < SimpVect.size(); Idx++)		for (unsigned Idx = StartIdx + 1; Idx < SimpVect.size(); Idx++)
R += *SimpVect[Idx];		R += *SimpVect[Idx];

// Pop all addends being folded and push the resulting folded addend.		// Pop all addends being folded and push the resulting folded addend.
SimpVect.resize(StartIdx);		SimpVect.resize(StartIdx);
if (Val) {
if (!R.isZero()) {		if (!R.isZero()) {
SimpVect.push_back(&R);		SimpVect.push_back(&R);
}		}
} else {
// Don't push constant addend at this time. It will be the last element
// of <SimpVect>.
ConstAdd = &R;
}
}		}
}		}

assert((NextTmpIdx <= array_lengthof(TmpResult) + 1) &&		assert((NextTmpIdx <= array_lengthof(TmpResult) + 1) &&
"out-of-bound access");		"out-of-bound access");

if (ConstAdd)
SimpVect.push_back(ConstAdd);

Value *Result;		Value *Result;
if (!SimpVect.empty())		if (!SimpVect.empty())
Result = createNaryFAdd(SimpVect, InstrQuota);		Result = createNaryFAdd(SimpVect, InstrQuota);
else {		else {
// The addition is folded to 0.0.		// The addition is folded to 0.0.
Result = ConstantFP::get(Instr->getType(), 0.0);		Result = ConstantFP::get(Instr->getType(), 0.0);
}		}

▲ Show 20 Lines • Show All 1,874 Lines • Show Last 20 Lines

llvm/test/Transforms/Reassociate/fast-basictest.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -reassociate -gvn -instcombine -S \| FileCheck %s		; RUN: opt < %s -reassociate -gvn -instcombine -S \| FileCheck %s
		spatelUnsubmitted Not Done Reply Inline Actions I missed that this test is under Reassociate. It's generally wrong for a regression test to run 3 different passes, but it can be a separate cleanup patch. spatel: I missed that this test is under Reassociate. It's generally wrong for a regression test to run…
		kovdan01AuthorUnsubmitted Done Reply Inline Actions Got it, will submit a separate patch to fix this kovdan01: Got it, will submit a separate patch to fix this

; With reassociation, constant folding can eliminate the 12 and -12 constants.		; With reassociation, constant folding can eliminate the 12 and -12 constants.
define float @test1(float %arg) {		define float @test1(float %arg) {
; CHECK-LABEL: @test1(		; CHECK-LABEL: @test1(
; CHECK-NEXT: [[ARG_NEG:%.]] = fneg fast float [[ARG:%.]]		; CHECK-NEXT: [[ARG_NEG:%.]] = fneg fast float [[ARG:%.]]
; CHECK-NEXT: ret float [[ARG_NEG]]		; CHECK-NEXT: ret float [[ARG_NEG]]
;		;
%t1 = fsub fast float -1.200000e+01, %arg		%t1 = fsub fast float -1.200000e+01, %arg
▲ Show 20 Lines • Show All 337 Lines • ▼ Show 20 Lines	;
%Y = fadd fast float %A ,%B		%Y = fadd fast float %A ,%B
%Z = fadd fast float %Y, %C		%Z = fadd fast float %Y, %C
ret float %Z		ret float %Z
}		}

; Check again with 'reassoc' and 'nsz' ('nsz' not technically required).		; Check again with 'reassoc' and 'nsz' ('nsz' not technically required).
define float @test12_reassoc_nsz(float %X) {		define float @test12_reassoc_nsz(float %X) {
; CHECK-LABEL: @test12_reassoc_nsz(		; CHECK-LABEL: @test12_reassoc_nsz(
; CHECK-NEXT: [[TMP1:%.]] = fmul reassoc nsz float [[X:%.]], 3.000000e+00		; CHECK-NEXT: [[TMP1:%.]] = fmul reassoc nsz float [[X:%.]], -3.000000e+00
; CHECK-NEXT: [[TMP2:%.*]] = fsub reassoc nsz float 6.000000e+00, [[TMP1]]		; CHECK-NEXT: [[TMP2:%.*]] = fadd reassoc nsz float [[TMP1]], 6.000000e+00
; CHECK-NEXT: ret float [[TMP2]]		; CHECK-NEXT: ret float [[TMP2]]
;		;
%A = fsub reassoc nsz float 1.000000e+00, %X		%A = fsub reassoc nsz float 1.000000e+00, %X
%B = fsub reassoc nsz float 2.000000e+00, %X		%B = fsub reassoc nsz float 2.000000e+00, %X
%C = fsub reassoc nsz float 3.000000e+00, %X		%C = fsub reassoc nsz float 3.000000e+00, %X
%Y = fadd reassoc nsz float %A ,%B		%Y = fadd reassoc nsz float %A ,%B
%Z = fadd reassoc nsz float %Y, %C		%Z = fadd reassoc nsz float %Y, %C
ret float %Z		ret float %Z
▲ Show 20 Lines • Show All 330 Lines • Show Last 20 Lines