This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64Combine.td
-
GISel/
5
AArch64PreLegalizerCombiner.cpp
-
test/CodeGen/AArch64/GlobalISel/
-
CodeGen/
-
AArch64/
-
GlobalISel/
-
prelegalizercombiner-icmp-redundant-trunc.mir

Differential D95645

[AArch64][GlobalISel] Add a combine to fold away truncate in: G_ICMP EQ/NE (G_TRUNC(v), 0)
ClosedPublic

Authored by aemerson on Jan 28 2021, 3:48 PM.

Download Raw Diff

Details

Reviewers

paquette
arsenm
foad

Commits

rGbe62b3ba347d: [AArch64][GlobalISel] Add a combine to fold away truncate in: G_ICMP EQ/NE…

Summary

We try to do this optimization if we can determine that testing for the truncated bits with an eq/ne predicate results in the same thing as testing the lower bits.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	250 ms	x64 debian > libarcher.races::task-dependency.c
	250 ms	x64 debian > libarcher.races::task-taskgroup-unrelated.c
	260 ms	x64 debian > libarcher.races::task-taskwait-nested.c
	240 ms	x64 debian > libarcher.races::task-two.c
	230 ms	x64 debian > libarcher.task::task-barrier.c
		View Full Test Results (13 Failed)

Event Timeline

aemerson created this revision.Jan 28 2021, 3:48 PM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls, rovka. · View Herald TranscriptJan 28 2021, 3:48 PM

aemerson requested review of this revision.Jan 28 2021, 3:48 PM

Herald added a subscriber: wdng. · View Herald TranscriptJan 28 2021, 3:48 PM

This is a pretty niche optimization, mostly to mitigate some regressions with a later patch, so it's AArch64 specific right now.

This LGTM. Seems like it could be useful on other targets in general though.

llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp
67	Nit: `ICmpInst` has a helper for this
78	Minor nit: maybe slightly nicer to pull the other `mi_match` into here if (!mi_match(...) \|\| !mi_match(...)) return false;

This revision is now accepted and ready to land.Jan 28 2021, 4:04 PM

This revision was landed with ongoing or failed builds.Jan 28 2021, 4:29 PM

Closed by commit rGbe62b3ba347d: [AArch64][GlobalISel] Add a combine to fold away truncate in: G_ICMP EQ/NE… (authored by aemerson). · Explain Why

This revision was automatically updated to reflect the committed changes.

aemerson added a commit: rGbe62b3ba347d: [AArch64][GlobalISel] Add a combine to fold away truncate in: G_ICMP EQ/NE….

Harbormaster completed remote builds in B87082: Diff 319978.Jan 28 2021, 5:28 PM

foad added inline comments.Jan 29 2021, 2:58 AM

llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp
72	Is there any reason why this doesn't Just Work for vector types too?
79	There's also m_ZeroInt for this.
83	Could also do the optimisation if all the extended bits are known to be zero?

foad mentioned this in D95432: AMDGPU/GlobalISel: Combine zext(trunc x) to x after RegBankSelect.Jan 29 2021, 3:02 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64Combine.td

10 lines

GISel/

AArch64PreLegalizerCombiner.cpp

54 lines

test/

CodeGen/

AArch64/

GlobalISel/

prelegalizercombiner-icmp-redundant-trunc.mir

107 lines

Diff 319978

llvm/lib/Target/AArch64/AArch64Combine.td

	Show All 11 Lines
	include "llvm/Target/GlobalISel/Combine.td"			include "llvm/Target/GlobalISel/Combine.td"

	def fconstant_to_constant : GICombineRule<			def fconstant_to_constant : GICombineRule<
	(defs root:$root),			(defs root:$root),
	(match (wip_match_opcode G_FCONSTANT):$root,			(match (wip_match_opcode G_FCONSTANT):$root,
	[{ return matchFConstantToConstant(*${root}, MRI); }]),			[{ return matchFConstantToConstant(*${root}, MRI); }]),
	(apply [{ applyFConstantToConstant(*${root}); }])>;			(apply [{ applyFConstantToConstant(*${root}); }])>;

				def icmp_redundant_trunc_matchdata : GIDefMatchData<"Register">;
				def icmp_redundant_trunc : GICombineRule<
				(defs root:$root, icmp_redundant_trunc_matchdata:$matchinfo),
				(match (wip_match_opcode G_ICMP):$root,
				[{ return matchICmpRedundantTrunc(*${root}, MRI, Helper.getKnownBits(), ${matchinfo}); }]),
				(apply [{ applyICmpRedundantTrunc(*${root}, MRI, B, Observer, ${matchinfo}); }])>;

	def AArch64PreLegalizerCombinerHelper: GICombinerHelper<			def AArch64PreLegalizerCombinerHelper: GICombinerHelper<
	"AArch64GenPreLegalizerCombinerHelper", [all_combines,			"AArch64GenPreLegalizerCombinerHelper", [all_combines,
	fconstant_to_constant]> {			fconstant_to_constant,
				icmp_redundant_trunc]> {
	let DisableRuleOption = "aarch64prelegalizercombiner-disable-rule";			let DisableRuleOption = "aarch64prelegalizercombiner-disable-rule";
	let StateClass = "AArch64PreLegalizerCombinerHelperState";			let StateClass = "AArch64PreLegalizerCombinerHelperState";
	let AdditionalArguments = [];			let AdditionalArguments = [];
	}			}

	// Matchdata for combines which replace a G_SHUFFLE_VECTOR with a			// Matchdata for combines which replace a G_SHUFFLE_VECTOR with a
	// target-specific opcode.			// target-specific opcode.
	def shuffle_matchdata : GIDefMatchData<"ShuffleVectorPseudo">;			def shuffle_matchdata : GIDefMatchData<"ShuffleVectorPseudo">;
	▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp

Show All 11 Lines

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#include "AArch64TargetMachine.h" #include "AArch64TargetMachine.h"

#include "llvm/CodeGen/GlobalISel/Combiner.h" #include "llvm/CodeGen/GlobalISel/Combiner.h"

#include "llvm/CodeGen/GlobalISel/CombinerHelper.h" #include "llvm/CodeGen/GlobalISel/CombinerHelper.h"

#include "llvm/CodeGen/GlobalISel/CombinerInfo.h" #include "llvm/CodeGen/GlobalISel/CombinerInfo.h"

#include "llvm/CodeGen/GlobalISel/GISelKnownBits.h" #include "llvm/CodeGen/GlobalISel/GISelKnownBits.h"

#include "llvm/CodeGen/GlobalISel/MIPatternMatch.h" #include "llvm/CodeGen/GlobalISel/MIPatternMatch.h"

#include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"

#include "llvm/CodeGen/MachineDominators.h" #include "llvm/CodeGen/MachineDominators.h"

#include "llvm/CodeGen/MachineFunction.h"

#include "llvm/CodeGen/MachineFunctionPass.h" #include "llvm/CodeGen/MachineFunctionPass.h"

#include "llvm/CodeGen/MachineRegisterInfo.h"

#include "llvm/CodeGen/TargetPassConfig.h" #include "llvm/CodeGen/TargetPassConfig.h"

#include "llvm/Support/Debug.h" #include "llvm/Support/Debug.h"

#define DEBUG_TYPE "aarch64-prelegalizer-combiner" #define DEBUG_TYPE "aarch64-prelegalizer-combiner"

using namespace llvm; using namespace llvm;

using namespace MIPatternMatch; using namespace MIPatternMatch;

Show All 18 Lines

static void applyFConstantToConstant(MachineInstr &MI) { static void applyFConstantToConstant(MachineInstr &MI) {

assert(MI.getOpcode() == TargetOpcode::G_FCONSTANT); assert(MI.getOpcode() == TargetOpcode::G_FCONSTANT);

MachineIRBuilder MIB(MI); MachineIRBuilder MIB(MI);

const APFloat &ImmValAPF = MI.getOperand(1).getFPImm()->getValueAPF(); const APFloat &ImmValAPF = MI.getOperand(1).getFPImm()->getValueAPF();

MIB.buildConstant(MI.getOperand(0).getReg(), ImmValAPF.bitcastToAPInt()); MIB.buildConstant(MI.getOperand(0).getReg(), ImmValAPF.bitcastToAPInt());

MI.eraseFromParent(); MI.eraseFromParent();

} }

/// Try to match a G_ICMP of a G_TRUNC with zero, in which the truncated bits

/// are sign bits. In this case, we can transform the G_ICMP to directly compare

/// the wide value with a zero.

static bool matchICmpRedundantTrunc(MachineInstr &MI, MachineRegisterInfo &MRI,

GISelKnownBits *KB, Register &MatchInfo) {

assert(MI.getOpcode() == TargetOpcode::G_ICMP && KB);

auto Pred = (CmpInst::Predicate)MI.getOperand(1).getPredicate();

if (Pred != ICmpInst::ICMP_NE && Pred != llvm::CmpInst::ICMP_EQ)

paquetteUnsubmitted

Not Done

auto Pred = (CmpInst::Predicate)MI.getOperand(1).getPredicate();

- if (Pred != ICmpInst::ICMP_NE && Pred != llvm::CmpInst::ICMP_EQ)

+ if (!ICmpInst::isEquality(Pred))

return false;

Nit: ICmpInst has a helper for this

paquette: Nit: `ICmpInst` has a helper for this

return false;

LLT LHSTy = MRI.getType(LHS);

if (!LHSTy.isScalar())

foadUnsubmitted

Not Done

Is there any reason why this doesn't Just Work for vector types too?

foad: Is there any reason why this doesn't Just Work for vector types too?

return false;

if (!mi_match(LHS, MRI, m_GTrunc(m_Reg(WideReg))))

paquetteUnsubmitted

Not Done

Minor nit: maybe slightly nicer to pull the other mi_match into here

if (!mi_match(...) || !mi_match(...))
  return false;

paquette: Minor nit: maybe slightly nicer to pull the other `mi_match` into here ``` if (!mi_match(...)…

return false;

foadUnsubmitted

Not Done

There's also m_ZeroInt for this.

foad: There's also m_ZeroInt for this.

if (!mi_match(RHS, MRI, m_SpecificICst(0)))

return false;

LLT WideTy = MRI.getType(WideReg);

foadUnsubmitted

Not Done

Could also do the optimisation if all the extended bits are known to be zero?

foad: Could also do the optimisation if all the extended bits are known to be zero?

if (KB->computeNumSignBits(WideReg) <=

WideTy.getSizeInBits() - LHSTy.getSizeInBits())

return false;

MatchInfo = WideReg;

return true;

}

static bool applyICmpRedundantTrunc(MachineInstr &MI, MachineRegisterInfo &MRI,

MachineIRBuilder &Builder,

GISelChangeObserver &Observer,

assert(MI.getOpcode() == TargetOpcode::G_ICMP);

LLT WideTy = MRI.getType(WideReg);

// We're going to directly use the wide register as the LHS, and then use an

// equivalent size zero for RHS.

Builder.setInstrAndDebugLoc(MI);

auto WideZero = Builder.buildConstant(WideTy, 0);

Observer.changingInstr(MI);

MI.getOperand(2).setReg(WideReg);

MI.getOperand(3).setReg(WideZero.getReg(0));

Observer.changedInstr(MI);

return true;

}

class AArch64PreLegalizerCombinerHelperState { class AArch64PreLegalizerCombinerHelperState {

protected: protected:

CombinerHelper &Helper; CombinerHelper &Helper;

public: public:

AArch64PreLegalizerCombinerHelperState(CombinerHelper &Helper) AArch64PreLegalizerCombinerHelperState(CombinerHelper &Helper)

: Helper(Helper) {} : Helper(Helper) {}

}; };

▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/GlobalISel/prelegalizercombiner-icmp-redundant-trunc.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple aarch64-apple-ios -run-pass=aarch64-prelegalizer-combiner %s -o - -verify-machineinstrs \| FileCheck %s

				# This test checks the optimization to remove the G_TRUNC if we can determine it's redundant.
				---
				name: icmp_trunc_sextload
				tracksRegLiveness: true
				body: \|
				bb.1:
				liveins: $x0

				; CHECK-LABEL: name: icmp_trunc_sextload
				; CHECK: liveins: $x0
				; CHECK: %v:_(p0) = COPY $x0
				; CHECK: %load:_(s64) = G_SEXTLOAD %v(p0) :: (load 4)
				; CHECK: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 0
				; CHECK: %cmp:_(s1) = G_ICMP intpred(ne), %load(s64), [[C]]
				; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT %cmp(s1)
				; CHECK: $w0 = COPY [[ANYEXT]](s32)
				; CHECK: RET_ReallyLR implicit $w0
				%v:_(p0) = COPY $x0
				%load:_(s64) = G_SEXTLOAD %v:_(p0) :: (load 4)
				%trunc:_(s32) = G_TRUNC %load(s64)
				%zero:_(s32) = G_CONSTANT i32 0
				%cmp:_(s1) = G_ICMP intpred(ne), %trunc(s32), %zero
				%5:_(s32) = G_ANYEXT %cmp
				$w0 = COPY %5(s32)
				RET_ReallyLR implicit $w0
				...
				---
				name: icmp_trunc_sextload_eq
				tracksRegLiveness: true
				body: \|
				bb.1:
				liveins: $x0

				; CHECK-LABEL: name: icmp_trunc_sextload_eq
				; CHECK: liveins: $x0
				; CHECK: %v:_(p0) = COPY $x0
				; CHECK: %load:_(s64) = G_SEXTLOAD %v(p0) :: (load 4)
				; CHECK: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 0
				; CHECK: %cmp:_(s1) = G_ICMP intpred(eq), %load(s64), [[C]]
				; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT %cmp(s1)
				; CHECK: $w0 = COPY [[ANYEXT]](s32)
				; CHECK: RET_ReallyLR implicit $w0
				%v:_(p0) = COPY $x0
				%load:_(s64) = G_SEXTLOAD %v:_(p0) :: (load 4)
				%trunc:_(s32) = G_TRUNC %load(s64)
				%zero:_(s32) = G_CONSTANT i32 0
				%cmp:_(s1) = G_ICMP intpred(eq), %trunc(s32), %zero
				%5:_(s32) = G_ANYEXT %cmp
				$w0 = COPY %5(s32)
				RET_ReallyLR implicit $w0
				...
				---
				name: icmp_trunc_sextload_wrongpred
				tracksRegLiveness: true
				body: \|
				bb.1:
				liveins: $x0

				; CHECK-LABEL: name: icmp_trunc_sextload_wrongpred
				; CHECK: liveins: $x0
				; CHECK: %v:_(p0) = COPY $x0
				; CHECK: %load:_(s64) = G_SEXTLOAD %v(p0) :: (load 4)
				; CHECK: %trunc:_(s32) = G_TRUNC %load(s64)
				; CHECK: %zero:_(s32) = G_CONSTANT i32 0
				; CHECK: %cmp:_(s1) = G_ICMP intpred(slt), %trunc(s32), %zero
				; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT %cmp(s1)
				; CHECK: $w0 = COPY [[ANYEXT]](s32)
				; CHECK: RET_ReallyLR implicit $w0
				%v:_(p0) = COPY $x0
				%load:_(s64) = G_SEXTLOAD %v:_(p0) :: (load 4)
				%trunc:_(s32) = G_TRUNC %load(s64)
				%zero:_(s32) = G_CONSTANT i32 0
				%cmp:_(s1) = G_ICMP intpred(slt), %trunc(s32), %zero
				%5:_(s32) = G_ANYEXT %cmp
				$w0 = COPY %5(s32)
				RET_ReallyLR implicit $w0
				...
				---
				name: icmp_trunc_sextload_extend_mismatch
				tracksRegLiveness: true
				body: \|
				bb.1:
				liveins: $x0

				; CHECK-LABEL: name: icmp_trunc_sextload_extend_mismatch
				; CHECK: liveins: $x0
				; CHECK: %v:_(p0) = COPY $x0
				; CHECK: %load:_(s64) = G_SEXTLOAD %v(p0) :: (load 4)
				; CHECK: %trunc:_(s16) = G_TRUNC %load(s64)
				; CHECK: %zero:_(s16) = G_CONSTANT i16 0
				; CHECK: %cmp:_(s1) = G_ICMP intpred(ne), %trunc(s16), %zero
				; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT %cmp(s1)
				; CHECK: $w0 = COPY [[ANYEXT]](s32)
				; CHECK: RET_ReallyLR implicit $w0
				%v:_(p0) = COPY $x0
				%load:_(s64) = G_SEXTLOAD %v:_(p0) :: (load 4)
				%trunc:_(s16) = G_TRUNC %load(s64)
				%zero:_(s16) = G_CONSTANT i16 0
				%cmp:_(s1) = G_ICMP intpred(ne), %trunc(s16), %zero
				%5:_(s32) = G_ANYEXT %cmp
				$w0 = COPY %5(s32)
				RET_ReallyLR implicit $w0
				...

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][GlobalISel] Add a combine to fold away truncate in: G_ICMP EQ/NE (G_TRUNC(v), 0)ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 319978

llvm/lib/Target/AArch64/AArch64Combine.td

llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp

llvm/test/CodeGen/AArch64/GlobalISel/prelegalizercombiner-icmp-redundant-trunc.mir

[AArch64][GlobalISel] Add a combine to fold away truncate in: G_ICMP EQ/NE (G_TRUNC(v), 0)
ClosedPublic