This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] do not speculate fdiv by default; it's expensive (PR24818)
ClosedPublic

Authored by spatel on Sep 15 2015, 10:49 AM.

Download Raw Diff

Details

Reviewers

reames
jmolloy
sanjoy
bkramer
arsenm
hfinkel

Commits

rG13e8bbc23735: set div/rem default values to 'expensive' in TargetTransformInfo's cost model
rL248439: set div/rem default values to 'expensive' in TargetTransformInfo's cost model

Summary

I'm not sure if this is the right fix, but PR24818 and PR24343 show that we're now speculating costly ops on x86 when we shouldn't. I've confirmed that we're hoisting fdiv on all in-tree targets without this patch, and that seems wrong, so I think we should fix this in the base class rather than everyone's overrides.

The test just below the one I'm proposing assumes that fsqrt is cheap enough to speculate, but that's not generally true either. Do we want SimplifyCFG to make this speculative transform and then have backends undo it for expensive ops?

Diff Detail

Event Timeline

Updated patch: full-context diff.

FWIW, this works for my test case in bug 24343. Before D12882:

double foo(int n, double d)
{
  if (n > 0)
    d = 1.0 / d;
  return d;
}

compiles to:

foo:                                    # @foo
# BB#0:                                 # %entry
        pushl       %ebp
        movl        %esp, %ebp
        fldl        12(%ebp)
        cmpl        $0, 8(%ebp)
        fld1
        fdiv        %st(1)
        jg  .LBB0_2
# BB#1:                                 # %entry
        fstp        %st(0)
        fldz
        fxch        %st(1)
.LBB0_2:                                # %entry
        fstp        %st(1)
        popl        %ebp
        retl

After D12882, it compiles to:

foo:                                    # @foo
# BB#0:                                 # %entry
        pushl       %ebp
        movl        %esp, %ebp
        fldl        12(%ebp)
        cmpl        $0, 8(%ebp)
        jle .LBB0_2
# BB#1:                                 # %if.then
        fld1
        fdivp       %st(1)
.LBB0_2:                                # %if.end
        popl        %ebp
        retl

So the fdiv is no longer executed before the comparison.

Thanks for verifying that, Dimitry.

Adding some more reviewers who may be able to provide guidance.

Is it only an "optimization fix"? The "correctness" fix for PR24343 seems to requires that an fdiv can't be speculated with FPE enabled and if the denominator is not zero, right?
It seems that LLVM Optimizations are still not very friendly with respect to that, there were some patches at some point last year but it seems they have not been integrated: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20141110/243778.html

In D12882#247388, @joker.eph wrote:

Is it only an "optimization fix"? The "correctness" fix for PR24343 seems to requires that an fdiv can't be speculated with FPE enabled and if the denominator is not zero, right?
It seems that LLVM Optimizations are still not very friendly with respect to that, there were some patches at some point last year but it seems they have not been integrated: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20141110/243778.html

Hi Mehdi -

Yes, my patch is only intended to solve the perf problem (PR24818)...and raise the question about the existing sqrt hoisting test in test/Transforms/SimplifyCFG/speculate-math.ll.

This patch just happens to prevent the FPE in the case of PR24343, but that's a happy accident. Clang/LLVM still need major work to support FPE, and that's tracked (and unsolved for years) by PR8100 and related bugs.

Can you give some context on specifically which speculative
optimizations you're hoping to avoid here? Are you only worried about
if-then-else-to-select from SimplifyCFG? What about cases like LICM?
Do you think those should hoist or not?

My general feeling is that we *should* speculate these instructions if
is legal and we have reason to predict it as being profitable (e.g. in
LICM). We do not currently have a global scheduler (or even a global
schedule fixer-uper beyond some hacks in CodeGenPrep), but I've been
thinking about this for other reasons. In cases where the only use is
down a particular path, adding a "fixup" which undoes hoisting seems
plausible and reasonable.

For this particular case, it really seems like this should be handled
via a select-to-control flow conversion for expensive operations done
late in the pipeline. IMO, selects is a better intermediate form for
optimization, and we should undo the transformation late when heading
into the backend. In fact, we already have the framework for this in
CodeGenPrep::OptimizeSelectInst. Have you considered tweaking that instead?

Philip

In D12882#247604, @llvm-commits wrote:

Can you give some context on specifically which speculative
optimizations you're hoping to avoid here? Are you only worried about
if-then-else-to-select from SimplifyCFG? What about cases like LICM?
Do you think those should hoist or not?

I was only considering the if-then case in SimplifyCFG. I'm not familiar with LICM, but I don't see it using TTI. Maybe that's just an oversight though?

My general feeling is that we *should* speculate these instructions if
is legal and we have reason to predict it as being profitable (e.g. in
LICM). We do not currently have a global scheduler (or even a global
schedule fixer-uper beyond some hacks in CodeGenPrep), but I've been
thinking about this for other reasons. In cases where the only use is
down a particular path, adding a "fixup" which undoes hoisting seems
plausible and reasonable.

For this particular case, it really seems like this should be handled
via a select-to-control flow conversion for expensive operations done
late in the pipeline. IMO, selects is a better intermediate form for
optimization, and we should undo the transformation late when heading
into the backend. In fact, we already have the framework for this in
CodeGenPrep::OptimizeSelectInst. Have you considered tweaking that instead?

I was hoping there was some later hook in place where we could do this, but I hadn't looked further yet, so I'm glad to see that CGP has it.

But if that is the right spot to do (undo) this transform, that implies that using TTI here in SimplifyCFG was the wrong choice, doesn't it? Ie, the TTI cost model cites this exact case as a reason for its existence:

TCC_Expensive = 4 ///< The cost of a 'div' instruction on x86.

If we're not going to honor the TTI cost model here in SimplifyCFG, we should go back to whatever logic was used before r228826. (Note: I'm not advocating one way or the other...yet; I'm just looking for the right answer.)

This is another way of asking the general question I've run into several times lately ( PR24818, PR24766, PR22723, PR24743 ):
What defines the "canonical" or "optimal" IR? Less instructions? Less basic blocks? Less register pressure? Is it art or science?

LGTM. Please go ahead and fix the FIXME and make UDiv and SDiv (and URem/SRem?) expensive by default. The targets can always override these if appropriate.

What defines the "canonical" or "optimal" IR?

These are different things. Canonical form is designed to be simple, yes, but to expose further optimizations and limit the implementation complexity of different optimizations and analysis. Optimal IR need to be defined with respect to a particular backend and architecture, and might not be in canonical form.

Less instructions? Less basic blocks?

Generally, yes.

Less register pressure?

During canonicalization, we ignore this by design. During lowering (including late IR passes such as vectorization), it can be considered explicitly.

Is it art or science?

Both.

This revision is now accepted and ready to land.Sep 22 2015, 4:13 PM

In D12882#251209, @hfinkel wrote:

Is it art or science?

Both.

Aha, now it's all starting to make more sense. That's much more fun. :)
Many thanks to you and Philip for explaining this.

My plan to push this forward is:

Fix up the TTI cost model (this patch and possibly others).
Add code to CGP that uses the cost model to undo this sort of speculation for targets that don't want it.
Change this part of SimplifyCFG to not use the TTI cost model for purposes of speculative transforms.

I've just started looking at CGP for the first time...it has this ominous comment:

// This pass munges the code in the input function to better prepare it for
// SelectionDAG-based code generation. This works around limitations in it's
// basic-block-at-a-time approach. It should eventually be removed.

Is there a proposed replacement for CGP?

Closed by commit rL248439: set div/rem default values to 'expensive' in TargetTransformInfo's cost model (authored by spatel). · Explain WhySep 23 2015, 3:36 PM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in rL248439: set div/rem default values to 'expensive' in TargetTransformInfo's cost model.Sep 23 2015, 3:39 PM

spatel mentioned this in D13297: [CGP] transform select instructions into branches and sink expensive operands.Sep 30 2015, 9:35 AM

spatel mentioned this in rL250527: This is a follow-up to the discussion in D12882..Oct 16 2015, 9:56 AM

Diffusion mentioned this in rL250570: Revert "This is a follow-up to the discussion in D12882.".Oct 16 2015, 4:02 PM

spatel mentioned this in rL250743: [CGP] transform select instructions into branches and sink expensive operands.Oct 19 2015, 3:01 PM

spatel mentioned this in D15213: [SimplifyCFG] allow speculation of exactly one expensive instruction (PR24818).Dec 3 2015, 3:13 PM

spatel mentioned this in rL255660: [SimplifyCFG] allow speculation of exactly one expensive instruction (PR24818).Dec 15 2015, 9:41 AM

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfoImpl.h

4 lines

test/

Transforms/

SimplifyCFG/

speculate-math.ll

18 lines

Diff 34809

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	case Instruction::BitCast:
assert(OpTy && "Cast instructions must provide the operand type");		assert(OpTy && "Cast instructions must provide the operand type");
if (Ty == OpTy \|\| (Ty->isPointerTy() && OpTy->isPointerTy()))		if (Ty == OpTy \|\| (Ty->isPointerTy() && OpTy->isPointerTy()))
// Identity and pointer-to-pointer casts are free.		// Identity and pointer-to-pointer casts are free.
return TTI::TCC_Free;		return TTI::TCC_Free;

// Otherwise, the default basic cost is used.		// Otherwise, the default basic cost is used.
return TTI::TCC_Basic;		return TTI::TCC_Basic;

		// FIXME: UDiv and SDiv should also default to expensive.
		case Instruction::FDiv:
		return TTI::TCC_Expensive;

case Instruction::IntToPtr: {		case Instruction::IntToPtr: {
// An inttoptr cast is free so long as the input is a legal integer type		// An inttoptr cast is free so long as the input is a legal integer type
// which doesn't contain values outside the range of a pointer.		// which doesn't contain values outside the range of a pointer.
unsigned OpSize = OpTy->getScalarSizeInBits();		unsigned OpSize = OpTy->getScalarSizeInBits();
if (DL.isLegalInteger(OpSize) &&		if (DL.isLegalInteger(OpSize) &&
OpSize <= DL.getPointerTypeSizeInBits(Ty))		OpSize <= DL.getPointerTypeSizeInBits(Ty))
return TTI::TCC_Free;		return TTI::TCC_Free;

▲ Show 20 Lines • Show All 416 Lines • Show Last 20 Lines

test/Transforms/SimplifyCFG/speculate-math.ll

	; RUN: opt -S -simplifycfg -phi-node-folding-threshold=2 < %s \| FileCheck %s			; RUN: opt -S -simplifycfg -phi-node-folding-threshold=2 < %s \| FileCheck %s

	declare float @llvm.sqrt.f32(float) nounwind readonly			declare float @llvm.sqrt.f32(float) nounwind readonly
	declare float @llvm.fma.f32(float, float, float) nounwind readonly			declare float @llvm.fma.f32(float, float, float) nounwind readonly
	declare float @llvm.fmuladd.f32(float, float, float) nounwind readonly			declare float @llvm.fmuladd.f32(float, float, float) nounwind readonly
	declare float @llvm.fabs.f32(float) nounwind readonly			declare float @llvm.fabs.f32(float) nounwind readonly
	declare float @llvm.minnum.f32(float, float) nounwind readonly			declare float @llvm.minnum.f32(float, float) nounwind readonly
	declare float @llvm.maxnum.f32(float, float) nounwind readonly			declare float @llvm.maxnum.f32(float, float) nounwind readonly

				; Do not speculate fdiv by default because it is generally expensive.

				; CHECK-LABEL: @fdiv_test(
				; CHECK-NOT: select
				define double @fdiv_test(double %a, double %b) {
				entry:
				%cmp = fcmp ogt double %a, 0.0
				br i1 %cmp, label %cond.true, label %cond.end

				cond.true:
				%div = fdiv double %b, %a
				br label %cond.end

				cond.end:
				%cond = phi double [ %div, %cond.true ], [ 0.0, %entry ]
				ret double %cond
				}

	; CHECK-LABEL: @sqrt_test(			; CHECK-LABEL: @sqrt_test(
	; CHECK: select			; CHECK: select
	define void @sqrt_test(float addrspace(1)* noalias nocapture %out, float %a) nounwind {			define void @sqrt_test(float addrspace(1)* noalias nocapture %out, float %a) nounwind {
	entry:			entry:
	%cmp.i = fcmp olt float %a, 0.000000e+00			%cmp.i = fcmp olt float %a, 0.000000e+00
	br i1 %cmp.i, label %test_sqrt.exit, label %cond.else.i			br i1 %cmp.i, label %test_sqrt.exit, label %cond.else.i

	cond.else.i: ; preds = %entry			cond.else.i: ; preds = %entry
	▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines