This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][GlobalISel] Eliminate redundant G_ZEXT when the source is implicitly zext-loaded
ClosedPublic

Authored by aemerson on Jul 26 2019, 6:06 PM.

Download Raw Diff

Details

Reviewers

Commits

rG73752abeab1a: [AArch64][GlobalISel] Eliminate redundant G_ZEXT when the source is implicitly…
rL367723: [AArch64][GlobalISel] Eliminate redundant G_ZEXT when the source is implicitly…

Summary

These cases can come up when the extending loads combiner doesn't combine a zext(load) to a zextload op, due to some other operation being in between, which then gets simplified at a later stage.

This gives code size improvements on -O0 CTMark, geomean 0.1%.
sqlite3: -0.4%
consumer-typeset: -0.3%
clamscan: -0.2%
lencod: -0.1%

Diff Detail

Repository: rL LLVM

Event Timeline

aemerson created this revision.Jul 26 2019, 6:06 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2019, 6:06 PM

Herald added subscribers: Petar.Avramovic, hiraditya, kristof.beyls and 2 others. · View Herald Transcript

Shouldn’t the combiner have already gotten his?

In D65360#1603454, @arsenm wrote:

Shouldn’t the combiner have already gotten his?

It does catch these cases, but sometimes these patterns arise later after the combiner due to some other optimization or DCE. By then the combiner has already run so we don’t merge these into G_ZEXTLOAD.

In D65360#1603526, @aemerson wrote:

In D65360#1603454, @arsenm wrote:

Shouldn’t the combiner have already gotten his?

It does catch these cases, but sometimes these patterns arise later after the combiner due to some other optimization or DCE. By then the combiner has already run so we don’t merge these into G_ZEXTLOAD.

I kind of think this is an issue of not running another combiner round, and the selector should do minimal work

In D65360#1603790, @arsenm wrote:

In D65360#1603526, @aemerson wrote:

In D65360#1603454, @arsenm wrote:

Shouldn’t the combiner have already gotten his?

It does catch these cases, but sometimes these patterns arise later after the combiner due to some other optimization or DCE. By then the combiner has already run so we don’t merge these into G_ZEXTLOAD.

I kind of think this is an issue of not running another combiner round, and the selector should do minimal work

Sure, for O2 or Os we will have another combiner run, but at O0 we’re compile time sensitive, so I think the selector should deal with simple cases where it can. Reducing code size still has benefits at -O0.

I think that for -O0 this makes sense.

(As for another combiner round, I wonder if it would make sense to have a "minimal combiner" for -O0 which only handles extremely simple transformations? Or is just adding a pass too much compile time overhead, even if it results in fewer instructions and thus less work for later passes?)

This revision is now accepted and ready to land.Jul 29 2019, 9:15 AM

I am a little concerned that continuing down this route is going to result in a bunch of work that every target will have to duplicate. I think it makes sense to come up with a plan to prevent that if possible.

For now, though, I feel like we don't really know how many things like this show up/how they show up. So, I don't know if it makes sense to run the combiner twice at -O0/add another more minimal pass/etc. yet.

(Do we know what the compile time/code size tradeoff of running the combiner twice is for -O0?)

In D65360#1604646, @paquette wrote:

I am a little concerned that continuing down this route is going to result in a bunch of work that every target will have to duplicate. I think it makes sense to come up with a plan to prevent that if possible.

For now, though, I feel like we don't really know how many things like this show up/how they show up. So, I don't know if it makes sense to run the combiner twice at -O0/add another more minimal pass/etc. yet.

(Do we know what the compile time/code size tradeoff of running the combiner twice is for -O0?)

It seems the issue here is that we have a minor but noticeable codegen issue at -O0, which IMO is not alone worth adding an entire new pass to the -O0 pipeline for. However, if we decide to add another combiner run in future then we can revisit this,

Closed by commit rL367723: [AArch64][GlobalISel] Eliminate redundant G_ZEXT when the source is implicitly… (authored by aemerson). · Explain WhyAug 2 2019, 2:16 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AArch64/

AArch64InstructionSelector.cpp

17 lines

test/

CodeGen/

AArch64/

GlobalISel/

select-redundant-zext-of-load.mir

48 lines

select-zextload.mir

4 lines

Diff 213124

llvm/trunk/lib/Target/AArch64/AArch64InstructionSelector.cpp

Show First 20 Lines • Show All 2,039 Lines • ▼ Show 20 Lines	assert((*RBI.getRegBank(DefReg, MRI, TRI)).getID() ==
AArch64::GPRRegBankID &&		AArch64::GPRRegBankID &&
"Unexpected ext regbank");		"Unexpected ext regbank");

MachineIRBuilder MIB(I);		MachineIRBuilder MIB(I);
MachineInstr *ExtI;		MachineInstr *ExtI;
if (DstTy.isVector())		if (DstTy.isVector())
return false; // Should be handled by imported patterns.		return false; // Should be handled by imported patterns.

		// First check if we're extending the result of a load which has a dest type
		// smaller than 32 bits, then this zext is redundant. GPR32 is the smallest
		// GPR register on AArch64 and all loads which are smaller automatically
		// zero-extend the upper bits. E.g.
		// %v(s8) = G_LOAD %p, :: (load 1)
		// %v2(s32) = G_ZEXT %v(s8)
		if (!IsSigned) {
		auto *LoadMI = getOpcodeDef(TargetOpcode::G_LOAD, SrcReg, MRI);
		if (LoadMI &&
		RBI.getRegBank(SrcReg, MRI, TRI)->getID() == AArch64::GPRRegBankID) {
		const MachineMemOperand MemOp = LoadMI->memoperands_begin();
		unsigned BytesLoaded = MemOp->getSize();
		if (BytesLoaded < 4 && SrcTy.getSizeInBytes() == BytesLoaded)
		return selectCopy(I, TII, MRI, TRI, RBI);
		}
		}

if (DstSize == 64) {		if (DstSize == 64) {
// FIXME: Can we avoid manually doing this?		// FIXME: Can we avoid manually doing this?
if (!RBI.constrainGenericRegister(SrcReg, AArch64::GPR32RegClass, MRI)) {		if (!RBI.constrainGenericRegister(SrcReg, AArch64::GPR32RegClass, MRI)) {
LLVM_DEBUG(dbgs() << "Failed to constrain " << TII.getName(Opcode)		LLVM_DEBUG(dbgs() << "Failed to constrain " << TII.getName(Opcode)
<< " operand\n");		<< " operand\n");
return false;		return false;
}		}

▲ Show 20 Lines • Show All 2,402 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/GlobalISel/select-redundant-zext-of-load.mir

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple=aarch64-- -O0 -run-pass=instruction-select -verify-machineinstrs %s -global-isel-abort=1 -o - \| FileCheck %s
				---
				name: redundant_zext_8
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				body: \|
				bb.1:
				liveins: $x0

				; CHECK-LABEL: name: redundant_zext_8
				; CHECK: liveins: $x0
				; CHECK: [[COPY:%[0-9]+]]:gpr64sp = COPY $x0
				; CHECK: [[LDRBBui:%[0-9]+]]:gpr32 = LDRBBui [[COPY]], 0 :: (load 1)
				; CHECK: [[COPY1:%[0-9]+]]:gpr32all = COPY [[LDRBBui]]
				; CHECK: $w0 = COPY [[COPY1]]
				; CHECK: RET_ReallyLR implicit $w0
				%1:gpr(p0) = COPY $x0
				%2:gpr(s8) = G_LOAD %1(p0) :: (load 1)
				%3:gpr(s32) = G_ZEXT %2(s8)
				$w0 = COPY %3(s32)
				RET_ReallyLR implicit $w0

				...
				---
				name: redundant_zext_16
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				body: \|
				bb.1:
				liveins: $x0

				; CHECK-LABEL: name: redundant_zext_16
				; CHECK: liveins: $x0
				; CHECK: [[COPY:%[0-9]+]]:gpr64sp = COPY $x0
				; CHECK: [[LDRHHui:%[0-9]+]]:gpr32 = LDRHHui [[COPY]], 0 :: (load 2)
				; CHECK: [[COPY1:%[0-9]+]]:gpr32all = COPY [[LDRHHui]]
				; CHECK: $w0 = COPY [[COPY1]]
				; CHECK: RET_ReallyLR implicit $w0
				%1:gpr(p0) = COPY $x0
				%2:gpr(s16) = G_LOAD %1(p0) :: (load 2)
				%3:gpr(s32) = G_ZEXT %2(s16)
				$w0 = COPY %3(s32)
				RET_ReallyLR implicit $w0

				...

llvm/trunk/test/CodeGen/AArch64/GlobalISel/select-zextload.mir

	Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $x0			liveins: $x0

	; CHECK-LABEL: name: zextload_s32_from_s16_not_combined			; CHECK-LABEL: name: zextload_s32_from_s16_not_combined
	; CHECK: [[COPY:%[0-9]+]]:gpr64sp = COPY $x0			; CHECK: [[COPY:%[0-9]+]]:gpr64sp = COPY $x0
	; CHECK: [[LDRHHui:%[0-9]+]]:gpr32 = LDRHHui [[COPY]], 0 :: (load 2 from %ir.addr)			; CHECK: [[LDRHHui:%[0-9]+]]:gpr32 = LDRHHui [[COPY]], 0 :: (load 2 from %ir.addr)
	; CHECK: [[UBFMWri:%[0-9]+]]:gpr32 = UBFMWri [[LDRHHui]], 0, 15			; CHECK: [[COPY1:%[0-9]+]]:gpr32all = COPY [[LDRHHui]]
	; CHECK: $w0 = COPY [[UBFMWri]]			; CHECK: $w0 = COPY [[COPY1]]
	%0:gpr(p0) = COPY $x0			%0:gpr(p0) = COPY $x0
	%1:gpr(s16) = G_LOAD %0 :: (load 2 from %ir.addr)			%1:gpr(s16) = G_LOAD %0 :: (load 2 from %ir.addr)
	%2:gpr(s32) = G_ZEXT %1			%2:gpr(s32) = G_ZEXT %1
	$w0 = COPY %2(s32)			$w0 = COPY %2(s32)
	...			...
	---			---
	name: i32_to_i64			name: i32_to_i64
	legalized: true			legalized: true
	▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines