This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
-
InlineFunction.cpp
-
test/Transforms/PhaseOrdering/
-
Transforms/
-
PhaseOrdering/
-
inlining-alignment-assumptions.ll

Differential D76886

[InlineFunction] Disable emission of alignment assumptions by default
ClosedPublic

Authored by nikic on Mar 26 2020, 1:31 PM.

Download Raw Diff

Details

Reviewers

efriedma
hfinkel
jdoerfert
arsenm

Commits

rGb74c6d2c9d8e: [InlineFunction] Disable emission of alignment assumptions by default

Summary

In D74183 clang started emitting alignment for sret parameters unconditionally. This caused a 1.5% compile-time regression on tramp3d-v4. The reason is that we now generate many instance of IR like

%ptrint = ptrtoint %class.GuardLayers* %guards_m to i64
%maskedptr = and i64 %ptrint, 3
%maskcond = icmp eq i64 %maskedptr, 0
tail call void @llvm.assume(i1 %maskcond)

to preserve the alignment information during inlining. Based on the size increase of the final binary, it is likely that these assumptions not only increase compile-time, but also regress optimizations (due to the usual issues with assumes).

We already encountered the same problem in Rust, where we (unlike Clang) generally prefer to emit alignment information absolutely everywhere it is available. We were only able to do this after hardcoding -preserve-alignment-assumptions-during-inlining=false, because we were seeing significant optimization and compile-time regressions otherwise.

This patch disables -preserve-alignment-assumptions-during-inlining by default, because we should not be punishing people for adding more alignment annotations.

I think once the operand bundle work by @Tyker and @jdoerfert shakes out, it might be possible to use an operand bundle based assume for this instead and avoid some/most of the overhead.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nikic created this revision.Mar 26 2020, 1:31 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 26 2020, 1:31 PM

Herald added subscribers: llvm-commits, JDevlieghere. · View Herald Transcript

Is there no phase-ordering-esque test for that?

Harbormaster failed remote builds in B50612: Diff 252963!Mar 26 2020, 2:09 PM

I'm fine with disabling this for now. The IRBuilder create alignment as operand bundle patch is already somewhere but not merged (under an option).
We should measure that solution and then either enable this again by default or determine what else is needed.
FWIW, I expect us to need a pass that merges and eliminates llvm.assume with operand bundles soon, that should drive compile time down again for cases where we emit equivalent or subsuming assumptions.

Add phase ordering test.

In D76886#1944908, @lebedev.ri wrote:

Is there no phase-ordering-esque test for that?

After looking at IR diffs a bit more, one difference I noticed is that jump threading is not being performed in some places. In this case the reason is not multi-use, but cost heuristics based on instruction counts. The alignment assumption adds 4 extra instructions, and if the block duplication threshold is at 6 instructions, that can easily make a difference. The phase ordering test is loosely based around that idea. Does that look useful to you?

@jdoerfert @Tyker In case it isn't on the roadmap yet... I guess we need to switch all the code that is currently skipping debug intrinsics to skip debug intrinsics and assumes, to make sure they don't affect codegen. With operand bundles, that should remove any impact assumes have on instruction count heuristics. Without operand bundles, it will at least lessen it by not counting the assume itself.

In D76886#1946040, @nikic wrote:

In D76886#1944908, @lebedev.ri wrote:

Is there no phase-ordering-esque test for that?

After looking at IR diffs a bit more, one difference I noticed is that jump threading is not being performed in some places. In this case the reason is not multi-use, but cost heuristics based on instruction counts. The alignment assumption adds 4 extra instructions, and if the block duplication threshold is at 6 instructions, that can easily make a difference. The phase ordering test is loosely based around that idea. Does that look useful to you?

Yes, thank you.

@jdoerfert @Tyker In case it isn't on the roadmap yet... I guess we need to switch all the code that is currently skipping debug intrinsics to skip debug intrinsics and assumes, to make sure they don't affect codegen. With operand bundles, that should remove any impact assumes have on instruction count heuristics. Without operand bundles, it will at least lessen it by not counting the assume itself.

lebedev.ri added inline comments.Mar 27 2020, 7:15 AM

test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll

3 ↗

(On Diff #253104)

Err, one thing i forgot to add: i mainly wanted to see the test that exercises
the *default* value of preserve-alignment-assumptions-during-inlining.
So i think this needs one more line like

; RUN: opt -S -O2 -preserve-alignment-assumptions-during-inlining=0 < %s | FileCheck %s --check-prefixes=CHECK,ASSUMPTIONS-OFF,FALLBACK-0
; RUN: opt -S -O2 -preserve-alignment-assumptions-during-inlining=1 < %s | FileCheck %s --check-prefixes=CHECK,ASSUMPTIONS-ON,FALLBACK-1
; RUN: opt -S -O2 < %s | FileCheck %s --check-prefixes=CHECK,ASSUMPTIONS-OFF,FALLBACK-DEFAULT

Add a RUN line for the default behavior as well.

I don't object to doing this, but...

After looking at IR diffs a bit more, one difference I noticed is that jump threading is not being performed in some places. In this case the reason is not multi-use, but cost heuristics based on instruction counts. The alignment assumption adds 4 extra instructions, and if the block duplication threshold is at 6 instructions, that can easily make a difference.

This an independent problem that we might fix first. Jump threading should be collecting and ignoring ephemeral values for it's heuristic. If it's not doing that, please fix that first and then reevaluate.

In D76886#1946476, @hfinkel wrote:

I don't object to doing this, but...

After looking at IR diffs a bit more, one difference I noticed is that jump threading is not being performed in some places. In this case the reason is not multi-use, but cost heuristics based on instruction counts. The alignment assumption adds 4 extra instructions, and if the block duplication threshold is at 6 instructions, that can easily make a difference.

This an independent problem that we might fix first. Jump threading should be collecting and ignoring ephemeral values for it's heuristic. If it's not doing that, please fix that first and then reevaluate.

Given the ongoing work to convert assumes to operand bundles, which would reduce ignoring of ephemeral values to ignoring the assume itself, I don't think it makes sense to do such a change at this time. Ignoring ephemeral values is not cheap, and we have numerous places that use limited length instruction scans as a cheap heuristic. I don't want to start computing ephemeral values every time InstCombine does a backwards scan from a load or store.

I'm only using JumpThreading as an illustrative example for a test case here. As you well know, JumpThreading is neither the only nor the most important issue when it comes to handling of assumes. As things stand right now, fixing any particular place is not going to change anything about the big picture. And of course, it will do nothing about compile-time impact.

That said, we're already forced to disable this in Rust anyway, so I'm not particularly invested in making this change -- it makes no difference to us. I'm happy to abandon this revision if it does not seem sufficiently beneficial on its own.

I'm in favor of adding the test (and changing the default value). We have a clear path to re-enable it and it is not far away either.

ThomasRaoux added a subscriber: ThomasRaoux.Apr 1 2020, 4:27 PM

hgreving added a subscriber: hgreving.Apr 1 2020, 4:30 PM

The ptrtoint inhibits SROA. Don't we have a less cumbersome way to test for alignment?

In D76886#2008279, @arsenm wrote:

Don't we have a less cumbersome way to test for alignment?

That is the current canonical alignment assumption, at least until the attribute bundles are here.

The ptrtoint inhibits SROA.

That issue seems not entirely new to these changes.
Is there a bug with reproducer?

In D76886#2008383, @lebedev.ri wrote:

In D76886#2008279, @arsenm wrote:

Don't we have a less cumbersome way to test for alignment?

That is the current canonical alignment assumption, at least until the attribute bundles are here.

The ptrtoint inhibits SROA.

That issue seems not entirely new to these changes.
Is there a bug with reproducer?

Yes:

; RUN: opt -S -O3 < %s | FileCheck %s

; CHECK: @caller
; CHECK-NOT: alloca
; CHECK-NEXT: ret void

target datalayout = "e-p:64:64-p5:32:32-A5"

define amdgpu_kernel void @caller() {
  %alloca = alloca i64, align 8, addrspace(5)
  %cast = addrspacecast i64 addrspace(5)* %alloca to i64*
  call void @callee(i64* sret align 8 %cast)
  ret void
}

define internal void @callee(i64* noalias sret align 8 %arg) {
  store i64 0, i64* %arg, align 8
  ret void
}

cfang added a subscriber: cfang.Apr 28 2020, 1:40 PM

LGTM. This should be committed until SROA is fixed

This revision is now accepted and ready to land.Apr 29 2020, 3:24 PM

Herald added a subscriber: wdng. · View Herald TranscriptApr 29 2020, 3:24 PM

@rampitec Thanks. I guess that means the situation here is somewhat worse for targets that use address spaces. As we cannot assume that an addrspace cast preserves alignment (as far as I know), we will end up inserting the alignment assumptions in much more cases. On targets that do not use address spaces, we would not insert these assumptions for allocas, and thus sidestep the issue.

I've submitted https://bugs.llvm.org/show_bug.cgi?id=45763 to make sure the SROA issue is tracked somewhere.

Closed by commit rGb74c6d2c9d8e: [InlineFunction] Disable emission of alignment assumptions by default (authored by nikic). · Explain WhyApr 30 2020, 2:33 PM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: hiraditya. · View Herald TranscriptApr 30 2020, 2:33 PM

nikic mentioned this in rGafc287e0abec: Fix clang test after D76886.Apr 30 2020, 3:07 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Utils/

InlineFunction.cpp

5 lines

test/

Transforms/

PhaseOrdering/

inlining-alignment-assumptions.ll

114 lines

Diff 261358

llvm/lib/Transforms/Utils/InlineFunction.cpp

	Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	using namespace llvm;			using namespace llvm;
	using ProfileCount = Function::ProfileCount;			using ProfileCount = Function::ProfileCount;

	static cl::opt<bool>			static cl::opt<bool>
	EnableNoAliasConversion("enable-noalias-to-md-conversion", cl::init(true),			EnableNoAliasConversion("enable-noalias-to-md-conversion", cl::init(true),
	cl::Hidden,			cl::Hidden,
	cl::desc("Convert noalias attributes to metadata during inlining."));			cl::desc("Convert noalias attributes to metadata during inlining."));

				// Disabled by default, because the added alignment assumptions may increase
				// compile-time and block optimizations. This option is not suitable for use
				// with frontends that emit comprehensive parameter alignment annotations.
	static cl::opt<bool>			static cl::opt<bool>
	PreserveAlignmentAssumptions("preserve-alignment-assumptions-during-inlining",			PreserveAlignmentAssumptions("preserve-alignment-assumptions-during-inlining",
	cl::init(true), cl::Hidden,			cl::init(false), cl::Hidden,
	cl::desc("Convert align attributes to assumptions during inlining."));			cl::desc("Convert align attributes to assumptions during inlining."));

	static cl::opt<bool> UpdateReturnAttributes(			static cl::opt<bool> UpdateReturnAttributes(
	"update-return-attrs", cl::init(true), cl::Hidden,			"update-return-attrs", cl::init(true), cl::Hidden,
	cl::desc("Update return attributes on calls within inlined body"));			cl::desc("Update return attributes on calls within inlined body"));

	static cl::opt<unsigned> InlinerAttributeWindow(			static cl::opt<unsigned> InlinerAttributeWindow(
	"max-inst-checked-for-throw-during-inlining", cl::Hidden,			"max-inst-checked-for-throw-during-inlining", cl::Hidden,
	▲ Show 20 Lines • Show All 2,413 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S -O2 -preserve-alignment-assumptions-during-inlining=0 < %s \| FileCheck %s --check-prefixes=CHECK,ASSUMPTIONS-OFF,FALLBACK-0
				; RUN: opt -S -O2 -preserve-alignment-assumptions-during-inlining=1 < %s \| FileCheck %s --check-prefixes=CHECK,ASSUMPTIONS-ON,FALLBACK-1
				; RUN: opt -S -O2 < %s \| FileCheck %s --check-prefixes=CHECK,ASSUMPTIONS-OFF,FALLBACK-DEFAULT

				target datalayout = "e-p:64:64-p5:32:32-A5"

				; This illustrates an optimization difference caused by instruction counting
				; heuristics, which are affected by the additional instructions of the
				; alignment assumption.

				define internal i1 @callee1(i1 %c, i64* align 8 %ptr) {
				store volatile i64 0, i64* %ptr
				ret i1 %c
				}

				define void @caller1(i1 %c, i64* align 1 %ptr) {
				; ASSUMPTIONS-OFF-LABEL: @caller1(
				; ASSUMPTIONS-OFF-NEXT: br i1 [[C:%.]], label [[TRUE2:%.]], label [[FALSE2:%.*]]
				; ASSUMPTIONS-OFF: true2:
				; ASSUMPTIONS-OFF-NEXT: store volatile i64 0, i64* [[PTR:%.*]], align 8
				; ASSUMPTIONS-OFF-NEXT: store volatile i64 -1, i64* [[PTR]], align 4
				; ASSUMPTIONS-OFF-NEXT: store volatile i64 -1, i64* [[PTR]], align 4
				; ASSUMPTIONS-OFF-NEXT: store volatile i64 -1, i64* [[PTR]], align 4
				; ASSUMPTIONS-OFF-NEXT: store volatile i64 -1, i64* [[PTR]], align 4
				; ASSUMPTIONS-OFF-NEXT: store volatile i64 -1, i64* [[PTR]], align 4
				; ASSUMPTIONS-OFF-NEXT: store volatile i64 2, i64* [[PTR]], align 4
				; ASSUMPTIONS-OFF-NEXT: ret void
				; ASSUMPTIONS-OFF: false2:
				; ASSUMPTIONS-OFF-NEXT: store volatile i64 1, i64* [[PTR]], align 4
				; ASSUMPTIONS-OFF-NEXT: store volatile i64 0, i64* [[PTR]], align 8
				; ASSUMPTIONS-OFF-NEXT: store volatile i64 -1, i64* [[PTR]], align 4
				; ASSUMPTIONS-OFF-NEXT: store volatile i64 -1, i64* [[PTR]], align 4
				; ASSUMPTIONS-OFF-NEXT: store volatile i64 -1, i64* [[PTR]], align 4
				; ASSUMPTIONS-OFF-NEXT: store volatile i64 -1, i64* [[PTR]], align 4
				; ASSUMPTIONS-OFF-NEXT: store volatile i64 -1, i64* [[PTR]], align 4
				; ASSUMPTIONS-OFF-NEXT: store volatile i64 3, i64* [[PTR]], align 4
				; ASSUMPTIONS-OFF-NEXT: ret void
				;
				; ASSUMPTIONS-ON-LABEL: @caller1(
				; ASSUMPTIONS-ON-NEXT: br i1 [[C:%.]], label [[TRUE1:%.]], label [[FALSE1:%.*]]
				; ASSUMPTIONS-ON: true1:
				; ASSUMPTIONS-ON-NEXT: [[C_PR:%.]] = phi i1 [ false, [[FALSE1]] ], [ true, [[TMP0:%.]] ]
				; ASSUMPTIONS-ON-NEXT: [[PTRINT:%.]] = ptrtoint i64 [[PTR:%.*]] to i64
				; ASSUMPTIONS-ON-NEXT: [[MASKEDPTR:%.*]] = and i64 [[PTRINT]], 7
				; ASSUMPTIONS-ON-NEXT: [[MASKCOND:%.*]] = icmp eq i64 [[MASKEDPTR]], 0
				; ASSUMPTIONS-ON-NEXT: tail call void @llvm.assume(i1 [[MASKCOND]])
				; ASSUMPTIONS-ON-NEXT: store volatile i64 0, i64* [[PTR]], align 8
				; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8
				; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8
				; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8
				; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8
				; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8
				; ASSUMPTIONS-ON-NEXT: br i1 [[C_PR]], label [[TRUE2:%.]], label [[FALSE2:%.]]
				; ASSUMPTIONS-ON: false1:
				; ASSUMPTIONS-ON-NEXT: store volatile i64 1, i64* [[PTR]], align 4
				; ASSUMPTIONS-ON-NEXT: br label [[TRUE1]]
				; ASSUMPTIONS-ON: true2:
				; ASSUMPTIONS-ON-NEXT: store volatile i64 2, i64* [[PTR]], align 8
				; ASSUMPTIONS-ON-NEXT: ret void
				; ASSUMPTIONS-ON: false2:
				; ASSUMPTIONS-ON-NEXT: store volatile i64 3, i64* [[PTR]], align 8
				; ASSUMPTIONS-ON-NEXT: ret void
				;
				br i1 %c, label %true1, label %false1

				true1:
				%c2 = call i1 @callee1(i1 %c, i64* %ptr)
				store volatile i64 -1, i64* %ptr
				store volatile i64 -1, i64* %ptr
				store volatile i64 -1, i64* %ptr
				store volatile i64 -1, i64* %ptr
				store volatile i64 -1, i64* %ptr
				br i1 %c2, label %true2, label %false2

				false1:
				store volatile i64 1, i64* %ptr
				br label %true1

				true2:
				store volatile i64 2, i64* %ptr
				ret void

				false2:
				store volatile i64 3, i64* %ptr
				ret void
				}

				; This test illustrates that alignment assumptions may prevent SROA.
				; See PR45763.

				define internal void @callee2(i64* noalias sret align 8 %arg) {
				store i64 0, i64* %arg, align 8
				ret void
				}

				define amdgpu_kernel void @caller2() {
				; ASSUMPTIONS-OFF-LABEL: @caller2(
				; ASSUMPTIONS-OFF-NEXT: ret void
				;
				; ASSUMPTIONS-ON-LABEL: @caller2(
				; ASSUMPTIONS-ON-NEXT: [[ALLOCA:%.*]] = alloca i64, align 8, addrspace(5)
				; ASSUMPTIONS-ON-NEXT: [[CAST:%.]] = addrspacecast i64 addrspace(5) [[ALLOCA]] to i64*
				; ASSUMPTIONS-ON-NEXT: [[PTRINT:%.]] = ptrtoint i64 [[CAST]] to i64
				; ASSUMPTIONS-ON-NEXT: [[MASKEDPTR:%.*]] = and i64 [[PTRINT]], 7
				; ASSUMPTIONS-ON-NEXT: [[MASKCOND:%.*]] = icmp eq i64 [[MASKEDPTR]], 0
				; ASSUMPTIONS-ON-NEXT: call void @llvm.assume(i1 [[MASKCOND]])
				; ASSUMPTIONS-ON-NEXT: ret void
				;
				%alloca = alloca i64, align 8, addrspace(5)
				%cast = addrspacecast i64 addrspace(5)* %alloca to i64*
				call void @callee2(i64* sret align 8 %cast)
				ret void
				}