This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][GlobalISel] Enable the localizer for optimized builds
ClosedPublic

Authored by aemerson on Sep 6 2019, 2:27 PM.

Download Raw Diff

Details

Reviewers

paquette
qcolombet

Commits

rGa1cf4d9795f2: [AArch64][GlobalISel] Enable the localizer for optimized builds.
rL371266: [AArch64][GlobalISel] Enable the localizer for optimized builds.

Summary

Despite the fact that the localizer's original motivation was to fix horrendous constant spilling at -O0, shortening live ranges still has net benefits even with optimizations enabled.

On an -Os build of CTMark, doing this improves code size by 0.5% geomean.

There are a few regressions, bullet increasing in size by 0.5%. One example from bullet where code size increased slightly was due to GlobalISel actually now generating the same code as SelectionDAG. So we actually have an opportunity in future to implement better heuristics for localization and therefore be *better* than SDAG in some cases. In relation to other optimizations though that one is relatively minor.

Diff Detail

Repository: rL LLVM

Event Timeline

aemerson created this revision.Sep 6 2019, 2:27 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 6 2019, 2:27 PM

Herald added subscribers: Petar.Avramovic, hiraditya, kristof.beyls, rovka. · View Herald Transcript

LGTM

This revision is now accepted and ready to land.Sep 6 2019, 2:48 PM

Hi Amara,

Where are the benefits coming from for optimized build?
I am guessing less copies/spills and in that case, I think we should try to fix the allocator, but that's more a long term plan.
Fill a PR though with an example so that we don't forget.

LGTM.

Cheers,
-Quentin

In D67303#1661664, @qcolombet wrote:

Hi Amara,

Where are the benefits coming from for optimized build?
I am guessing less copies/spills and in that case, I think we should try to fix the allocator, but that's more a long term plan.
Fill a PR though with an example so that we don't forget.

LGTM.

Cheers,
-Quentin

The swifterror test change in this patch is one example. Live ranges over function calls can be problematic if under register pressure, which can cause additional moves, if not spills. There are others of course. The greedy allocator just doesn't seem to expect long live ranges across basic-block boundaries.

Filed https://bugs.llvm.org/show_bug.cgi?id=43247 for the regalloc issue.

Closed by commit rL371266: [AArch64][GlobalISel] Enable the localizer for optimized builds. (authored by aemerson). · Explain WhySep 6 2019, 3:27 PM

This revision was automatically updated to reflect the committed changes.

The swifterror test change in this patch is one example. Live ranges over function calls can be problematic if under register pressure, which can cause additional moves, if not spills. There are others of course

Is it happening in this specific test (more moves or spill)?
I'd like to have something to reproduce the issue.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AArch64/

AArch64TargetMachine.cpp

4 lines

test/

CodeGen/

AArch64/

GlobalISel/

gisel-commandline-option.ll

2 lines

localizer-in-O0-pipeline.mir

6 lines

swifterror.ll

2 lines

Diff 219188

llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.cpp

	Show First 20 Lines • Show All 512 Lines • ▼ Show 20 Lines
	}			}

	bool AArch64PassConfig::addRegBankSelect() {			bool AArch64PassConfig::addRegBankSelect() {
	addPass(new RegBankSelect());			addPass(new RegBankSelect());
	return false;			return false;
	}			}

	void AArch64PassConfig::addPreGlobalInstructionSelect() {			void AArch64PassConfig::addPreGlobalInstructionSelect() {
	// Workaround the deficiency of the fast register allocator.
	if (TM->getOptLevel() == CodeGenOpt::None)
	addPass(new Localizer());			addPass(new Localizer());
	}			}

	bool AArch64PassConfig::addGlobalInstructionSelect() {			bool AArch64PassConfig::addGlobalInstructionSelect() {
	addPass(new InstructionSelect());			addPass(new InstructionSelect());
	return false;			return false;
	}			}

	bool AArch64PassConfig::addILPOpts() {			bool AArch64PassConfig::addILPOpts() {
	▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/GlobalISel/gisel-commandline-option.ll

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; ENABLED-NEXT: Analysis for ComputingKnownBits			; ENABLED-NEXT: Analysis for ComputingKnownBits
	; ENABLED-NEXT: PreLegalizerCombiner			; ENABLED-NEXT: PreLegalizerCombiner
	; VERIFY-NEXT: Verify generated machine code			; VERIFY-NEXT: Verify generated machine code
	; ENABLED-NEXT: Analysis containing CSE Info			; ENABLED-NEXT: Analysis containing CSE Info
	; ENABLED-NEXT: Legalizer			; ENABLED-NEXT: Legalizer
	; VERIFY-NEXT: Verify generated machine code			; VERIFY-NEXT: Verify generated machine code
	; ENABLED-NEXT: RegBankSelect			; ENABLED-NEXT: RegBankSelect
	; VERIFY-NEXT: Verify generated machine code			; VERIFY-NEXT: Verify generated machine code
	; ENABLED-O0-NEXT: Localizer			; ENABLED-NEXT: Localizer
	; VERIFY-O0-NEXT: Verify generated machine code			; VERIFY-O0-NEXT: Verify generated machine code
	; ENABLED-NEXT: Analysis for ComputingKnownBits			; ENABLED-NEXT: Analysis for ComputingKnownBits
	; ENABLED-NEXT: InstructionSelect			; ENABLED-NEXT: InstructionSelect
	; VERIFY-NEXT: Verify generated machine code			; VERIFY-NEXT: Verify generated machine code
	; ENABLED-NEXT: ResetMachineFunction			; ENABLED-NEXT: ResetMachineFunction

	; FALLBACK: AArch64 Instruction Selection			; FALLBACK: AArch64 Instruction Selection
	; NOFALLBACK-NOT: AArch64 Instruction Selection			; NOFALLBACK-NOT: AArch64 Instruction Selection
	Show All 9 Lines

llvm/trunk/test/CodeGen/AArch64/GlobalISel/localizer-in-O0-pipeline.mir

	Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines

	# First block remains untouched			# First block remains untouched
	# CHECK: body			# CHECK: body
	# CHECK: %4:fpr(s32) = G_FCONSTANT float 1.000000e+00			# CHECK: %4:fpr(s32) = G_FCONSTANT float 1.000000e+00
	# CHECK: %5:fpr(s32) = G_FCONSTANT float 2.000000e+00			# CHECK: %5:fpr(s32) = G_FCONSTANT float 2.000000e+00

	# Second block will get the constant 1.0 when the localizer is enabled.			# Second block will get the constant 1.0 when the localizer is enabled.
	# CHECK: bb.1.{{[a-zA-Z0-9]+}}:			# CHECK: bb.1.{{[a-zA-Z0-9]+}}:
	# OPT-NOT: G_FCONSTANT			# OPT: [[FONE:%[0-9]+]]:fpr(s32) = G_FCONSTANT float 1.000000e+00
	# OPTNONE: [[FONE:%[0-9]+]]:fpr(s32) = G_FCONSTANT float 1.000000e+00			# OPTNONE: [[FONE:%[0-9]+]]:fpr(s32) = G_FCONSTANT float 1.000000e+00
	# CHECK: G_BR %bb.3			# CHECK: G_BR %bb.3

	# Thrid block will get the constant 2.0 when the localizer is enabled.			# Thrid block will get the constant 2.0 when the localizer is enabled.
	# CHECK: bb.2.{{[a-zA-Z0-9]+}}:			# CHECK: bb.2.{{[a-zA-Z0-9]+}}:
	# OPT-NOT: G_FCONSTANT			# OPT: [[FTWO:%[0-9]+]]:fpr(s32) = G_FCONSTANT float 2.000000e+00
	# OPTNONE: [[FTWO:%[0-9]+]]:fpr(s32) = G_FCONSTANT float 2.000000e+00			# OPTNONE: [[FTWO:%[0-9]+]]:fpr(s32) = G_FCONSTANT float 2.000000e+00

	# CHECK: bb.3.end			# CHECK: bb.3.end
	# OPTNONE: %2:fpr(s32) = PHI [[FONE]](s32), %bb.1, [[FTWO]](s32), %bb.2			# OPTNONE: %2:fpr(s32) = PHI [[FONE]](s32), %bb.1, [[FTWO]](s32), %bb.2
	# OPT: %2:fpr(s32) = PHI %4(s32), %bb.1, %5(s32), %bb.2			# OPT: %2:fpr(s32) = PHI [[FONE]](s32), %bb.1, [[FTWO]](s32), %bb.2
	# CHECK-NEXT: G_FADD %0, %2			# CHECK-NEXT: G_FADD %0, %2
	body: \|			body: \|
	bb.0 (%ir-block.0):			bb.0 (%ir-block.0):
	liveins: $s0, $w0			liveins: $s0, $w0

	%0(s32) = COPY $s0			%0(s32) = COPY $s0
	%6(s32) = COPY $w0			%6(s32) = COPY $w0
	%1(s1) = G_TRUNC %6			%1(s1) = G_TRUNC %6
	Show All 17 Lines

llvm/trunk/test/CodeGen/AArch64/GlobalISel/swifterror.ll

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	handler:
ret float 1.0		ret float 1.0
}		}

; "foo_if" is a function that takes a swifterror parameter, it sets swifterror		; "foo_if" is a function that takes a swifterror parameter, it sets swifterror
; under a certain condition.		; under a certain condition.
define float @foo_if(%swift_error** swifterror %error_ptr_ref, i32 %cc) {		define float @foo_if(%swift_error** swifterror %error_ptr_ref, i32 %cc) {
; CHECK-LABEL: foo_if:		; CHECK-LABEL: foo_if:
; CHECK: cbz w0		; CHECK: cbz w0
; CHECK: mov [[ID:w[0-9]+]], #1
; CHECK: mov w0, #16		; CHECK: mov w0, #16
; CHECK: malloc		; CHECK: malloc
		; CHECK: mov [[ID:w[0-9]+]], #1
; CHECK: strb [[ID]], [x0, #8]		; CHECK: strb [[ID]], [x0, #8]
; CHECK: mov x21, x0		; CHECK: mov x21, x0
; CHECK-NOT: x21		; CHECK-NOT: x21
; CHECK: ret		; CHECK: ret

entry:		entry:
%cond = icmp ne i32 %cc, 0		%cond = icmp ne i32 %cc, 0
br i1 %cond, label %gen_error, label %normal		br i1 %cond, label %gen_error, label %normal
▲ Show 20 Lines • Show All 395 Lines • Show Last 20 Lines