This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel][IRTranslator] Canonicalize G_ICMP to have constant operands last
AbandonedPublic

Authored by aemerson on Aug 28 2018, 8:34 AM.

Download Raw Diff

Details

Reviewers

qcolombet
dsanders
rtereshin
igorb
tstellar
javed.absar
bogner
aditya_nandakumar
volkan

Summary

This makes selecting immediate instructions simpler. At -O1 or higher this already happens at the IR level, but for -O0 we end up with arbitrary order.

Diff Detail

Repository: rL LLVM

Event Timeline

aemerson created this revision.Aug 28 2018, 8:34 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptAug 28 2018, 8:34 AM

Herald added subscribers: kristof.beyls, rovka. · View Herald Transcript

Hi Amara,

That patch worries me, because I feel that we are going to pull all the LLVM IR code that does canonicalization.
The way I see it, is the input IR should already be in a canonical form and thus we don't have to do that.

If that's not the case, I would argue that bad output code is fine (garbage in, garbage out).

What do you think?

Cheers,
-Quentin

In D51362#1216865, @qcolombet wrote:

Hi Amara,

That patch worries me, because I feel that we are going to pull all the LLVM IR code that does canonicalization.
The way I see it, is the input IR should already be in a canonical form and thus we don't have to do that.

If that's not the case, I would argue that bad output code is fine (garbage in, garbage out).

What do you think?

Cheers,
-Quentin

GISel is currently causing some significant code size regressions, and while this isn't the biggest issue, it looked like something that one would expect even -O0 to do. In fastisel we don't select immediate forms, however if fast-isel falls back to SDISel then the comparison between GISel enabled/disabled starts to look much worse, as SDISel selects immediates, resulting in less code and register pressure.

If we don't do this here, then I'll extend the manual G_ICMP AArch64 instruction selector code because it's extremely cheap/free. I'm ok with that at -O0, I just think that other targets will end up doing the same thing so this would be an overall code/complexity saver if we canonicalized earlier. Other optimizations will be on the way even at -O0 (like jump table creation), because being 30% larger in code size on some CTMark benchmarks is not an acceptable state at the moment.

I agree with the motivation but I don’t think this is the right approach. I would prefer we run a canonicalization step even at O0 before GISel rather than duplicating this logic between IR and MI.
Jump tables are a different beast and have their place in here because they don’t really make sense to do on IR.

What I am saying is code duplication is bad :)

I'm not against a separate IR canonicalization stage, even though it's a bit overkill for this particular case. At the moment it looks like InstCombine is doing the canonicalizaton for this case, so re-running that is out of the question. A new pass looks to be on the cards. Anyone else have opinions on this?

I had a similar discussion with Quentin a while ago and it seemed there are two solutions:

Add a pass to canonicalize MIR
Extend MIRBuilder to canonicalize instructions

I think it makes sense to handle this in MIRBuilder because the other passes may insert instructions that are not in canonical form after the canonicalization.

In D51362#1218394, @aemerson wrote:

I'm not against a separate IR canonicalization stage, even though it's a bit overkill for this particular case. At the moment it looks like InstCombine is doing the canonicalizaton for this case, so re-running that is out of the question. A new pass looks to be on the cards. Anyone else have opinions on this?

Does that mean that something after the last InstCombine produces icmp's with the non-canonical operand order? If so maybe that something needs to be chased down and fixed instead.

In D51362#1218451, @rtereshin wrote:

In D51362#1218394, @aemerson wrote:

I'm not against a separate IR canonicalization stage, even though it's a bit overkill for this particular case. At the moment it looks like InstCombine is doing the canonicalizaton for this case, so re-running that is out of the question. A new pass looks to be on the cards. Anyone else have opinions on this?

Does that mean that something after the last InstCombine produces icmp's with the non-canonical operand order? If so maybe that something needs to be chased down and fixed instead.

No, I meant that with -O1 instcombine is the pass that's currently doing the transformation. At -O0 IC doesn't run at all, it's codegen that's handling it for fastisel/sdisel.

In D51362#1218451, @rtereshin wrote:

In D51362#1218394, @aemerson wrote:

I'm not against a separate IR canonicalization stage, even though it's a bit overkill for this particular case. At the moment it looks like InstCombine is doing the canonicalizaton for this case, so re-running that is out of the question. A new pass looks to be on the cards. Anyone else have opinions on this?

Does that mean that something after the last InstCombine produces icmp's with the non-canonical operand order? If so maybe that something needs to be chased down and fixed instead.

I like the idea of putting this into MIRBuilder then.

If IRTranslator itself uses canonicalizing MIRBuilder, it will translate IR into canonicalized MIR seamlessly w/ no evident increase in IRTranslator's complexity as it's all encapsulated within the builder.

Then using the same builder everywhere else, including combines, will make sure that the canonical form is maintained across GlobalISel pipeline. For instance, one combine doesn't need to explicitly worry about not breaking the form, relying on the MIRBuilder, while the very next one can rely on MIR being in canonical form.

In D51362#1218504, @rtereshin wrote:

In D51362#1218451, @rtereshin wrote:

In D51362#1218394, @aemerson wrote:

I'm not against a separate IR canonicalization stage, even though it's a bit overkill for this particular case. At the moment it looks like InstCombine is doing the canonicalizaton for this case, so re-running that is out of the question. A new pass looks to be on the cards. Anyone else have opinions on this?

Does that mean that something after the last InstCombine produces icmp's with the non-canonical operand order? If so maybe that something needs to be chased down and fixed instead.

I like the idea of putting this into MIRBuilder then.

If IRTranslator itself uses canonicalizing MIRBuilder, it will translate IR into canonicalized MIR seamlessly w/ no evident increase in IRTranslator's complexity as it's all encapsulated within the builder.

Then using the same builder everywhere else, including combines, will make sure that the canonical form is maintained across GlobalISel pipeline. For instance, one combine doesn't need to explicitly worry about not breaking the form, relying on the MIRBuilder, while the very next one can rely on MIR being in canonical form.

I guess first we'll need to define what exactly canonicalization means for this - are we talking beyond just putting constants at the end?

Does that mean that something after the last InstCombine produces icmp's with the non-canonical operand order? If so maybe that something needs to be chased down and fixed instead.

That's exactly the point :).

I understand that O0 doesn't run instcombine, thus the problem. It may be a good idea to identify what is canonicalization and what is optimization in instcombine and separate both so that it can be run at O0.

Regarding the solution of doing that in the MIRBuilder, this has preference given all MIR pass can benefit from it. That said, I would like we make a conscious choice of what we put here again to avoid code duplication as much as possible.
Basically what I am saying is we should ask ourselves, assuming the input IR is canonicalized, is this a pattern we may introduce and therefore that we need to clean-up one way or another. I would tend to think that when we create instruction with constant we can discipline ourselves for not putting them on the LHS.

Ideally, we could express the canonicalization rules in a higher-level language and use them both in IR and MIR. Though, we don't want to add such dependency to the progress of GISel.

I would tend to think that when we create instruction with constant we can discipline ourselves for not putting them on the LHS.

I'm not sure about this bit. Say, it's a mid-GlobalISel combine we're talking about. It matches a number of vregs along the boundary of a pattern it's supposed to match, and then uses them to generate an equivalent sequence. All it matched is a bunch of vregs from different places, it may not know what's constant and what's not, so how would it decide in which order to put those vregs in the generated instructions w/o explicitly checking for constants?

how would it decide in which order to put those vregs in the generated instructions w/o explicitly checking for constants?

Fair point.
I was more thinking of target specific node, where you'll have to check anyway otherwise you cannot match the imm variant.

Going back to a higher-level description for all that stuff, I wonder if ISel could already handle this.

Indeed, it seems easy, at least conceptually, to teach tablegen to check for the commuted patterns to get the immediate version whenever possible before trying the non-immediate variant.

In D51362#1219573, @qcolombet wrote:

Going back to a higher-level description for all that stuff, I wonder if ISel could already handle this.

Indeed, it seems easy, at least conceptually, to teach tablegen to check for the commuted patterns to get the immediate version whenever possible before trying the non-immediate variant.

It can't do that all the time, with these G_ICMPs for example it also has to swap the predicate (not that we even have imported patterns that could possibly match for AArch64).

So overall I think we should have some form of IR canonicalization before translation for the majority of cases to be handled for -O0.

If the legalizer or combiner is causing issues with introducing non-canonical forms then I think we should either fix them, or if that's not feasible, we accept the worse codegen at -O0 and introduce re-canonicalization either through MIRBuilder or a separate pass when optimizations are enabled.

It can't do that all the time, with these G_ICMPs for example it also has to swap the predicate (not that we even have imported patterns that could possibly match for AArch64).

Why not?
I understand G_ICMPs are special, but we could come up with whatever complicated logic in TableGen.

So overall I think we should have some form of IR canonicalization before translation for the majority of cases to be handled for -O0.

If the legalizer or combiner is causing issues with introducing non-canonical forms then I think we should either fix them, or if that's not feasible, we accept the worse codegen at -O0 and introduce re-canonicalization either through MIRBuilder or a separate pass when optimizations are enabled.

Agreed.

In D51362#1221028, @qcolombet wrote:

It can't do that all the time, with these G_ICMPs for example it also has to swap the predicate (not that we even have imported patterns that could possibly match for AArch64).

Why not?
I understand G_ICMPs are special, but we could come up with whatever complicated logic in TableGen.

Some forms of an instruction, like certain I_CMP predicates, may not be legal and the target might want to transform them into a sequence of compares for example.

If we agree that an input canonicalization pass is needed then doing this there adds much less complexity than teaching tablegen that G_ICMPs are special (or somehow adding legalisation rules to tablegen?).

If we agree that an input canonicalization pass is needed then doing this there adds much less complexity than teaching tablegen that G_ICMPs are special (or somehow adding legalisation rules to tablegen?).

Agree.

aemerson mentioned this in D51953: [GlobalISel] Add a new IR canonicalization pass.Sep 11 2018, 3:00 PM

I'd like to revive this patch, as the other IR canonicalization pass is much more controversial.

I still believe we shouldn't do that here and more generally O0 should be kept out of doing things on the code.

My main concern is if we allow this kind of combine here, where do we draw the line for future combines?
Thus, I would rather keep this pass dumb and bite the bullet at O0.

aemerson abandoned this revision.Jan 24 2019, 2:20 PM

Revision Contents

Path

Size

lib/

CodeGen/

GlobalISel/

IRTranslator.cpp

10 lines

test/

CodeGen/

AArch64/

GlobalISel/

arm64-irtranslator.ll

16 lines

Diff 162880

lib/CodeGen/GlobalISel/IRTranslator.cpp

Show First 20 Lines • Show All 298 Lines • ▼ Show 20 Lines	bool IRTranslator::translateCompare(const User &U,
MachineIRBuilder &MIRBuilder) {		MachineIRBuilder &MIRBuilder) {
const CmpInst *CI = dyn_cast<CmpInst>(&U);		const CmpInst *CI = dyn_cast<CmpInst>(&U);
unsigned Op0 = getOrCreateVReg(*U.getOperand(0));		unsigned Op0 = getOrCreateVReg(*U.getOperand(0));
unsigned Op1 = getOrCreateVReg(*U.getOperand(1));		unsigned Op1 = getOrCreateVReg(*U.getOperand(1));
unsigned Res = getOrCreateVReg(U);		unsigned Res = getOrCreateVReg(U);
CmpInst::Predicate Pred =		CmpInst::Predicate Pred =
CI ? CI->getPredicate() : static_cast<CmpInst::Predicate>(		CI ? CI->getPredicate() : static_cast<CmpInst::Predicate>(
cast<ConstantExpr>(U).getPredicate());		cast<ConstantExpr>(U).getPredicate());
if (CmpInst::isIntPredicate(Pred))		if (CmpInst::isIntPredicate(Pred)) {
		// Try to canonicalize constant operands to be last.
		if (isa<Constant>(U.getOperand(0)) && !isa<Constant>(U.getOperand(1))) {
		Pred = CmpInst::getSwappedPredicate(Pred);
		std::swap(Op0, Op1);
		}
MIRBuilder.buildICmp(Pred, Res, Op0, Op1);		MIRBuilder.buildICmp(Pred, Res, Op0, Op1);
		}
else if (Pred == CmpInst::FCMP_FALSE)		else if (Pred == CmpInst::FCMP_FALSE)
MIRBuilder.buildCopy(		MIRBuilder.buildCopy(
Res, getOrCreateVReg(*Constant::getNullValue(CI->getType())));		Res, getOrCreateVReg(*Constant::getNullValue(CI->getType())));
else if (Pred == CmpInst::FCMP_TRUE)		else if (Pred == CmpInst::FCMP_TRUE)
MIRBuilder.buildCopy(		MIRBuilder.buildCopy(
Res, getOrCreateVReg(*Constant::getAllOnesValue(CI->getType())));		Res, getOrCreateVReg(*Constant::getAllOnesValue(CI->getType())));
else		else
MIRBuilder.buildFCmp(Pred, Res, Op0, Op1);		MIRBuilder.buildFCmp(Pred, Res, Op0, Op1);
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	bool IRTranslator::translateSwitch(const User &U,
const SwitchInst &SwInst = cast<SwitchInst>(U);		const SwitchInst &SwInst = cast<SwitchInst>(U);
const unsigned SwCondValue = getOrCreateVReg(*SwInst.getCondition());		const unsigned SwCondValue = getOrCreateVReg(*SwInst.getCondition());
const BasicBlock *OrigBB = SwInst.getParent();		const BasicBlock *OrigBB = SwInst.getParent();

LLT LLTi1 = getLLTForType(Type::getInt1Ty(U.getContext()), DL);		LLT LLTi1 = getLLTForType(Type::getInt1Ty(U.getContext()), DL);
for (auto &CaseIt : SwInst.cases()) {		for (auto &CaseIt : SwInst.cases()) {
const unsigned CaseValueReg = getOrCreateVReg(*CaseIt.getCaseValue());		const unsigned CaseValueReg = getOrCreateVReg(*CaseIt.getCaseValue());
const unsigned Tst = MRI->createGenericVirtualRegister(LLTi1);		const unsigned Tst = MRI->createGenericVirtualRegister(LLTi1);
MIRBuilder.buildICmp(CmpInst::ICMP_EQ, Tst, CaseValueReg, SwCondValue);		MIRBuilder.buildICmp(CmpInst::ICMP_EQ, Tst, SwCondValue, CaseValueReg);
MachineBasicBlock &CurMBB = MIRBuilder.getMBB();		MachineBasicBlock &CurMBB = MIRBuilder.getMBB();
const BasicBlock *TrueBB = CaseIt.getCaseSuccessor();		const BasicBlock *TrueBB = CaseIt.getCaseSuccessor();
MachineBasicBlock &TrueMBB = getMBB(*TrueBB);		MachineBasicBlock &TrueMBB = getMBB(*TrueBB);

MIRBuilder.buildBrCond(Tst, TrueMBB);		MIRBuilder.buildBrCond(Tst, TrueMBB);
CurMBB.addSuccessor(&TrueMBB);		CurMBB.addSuccessor(&TrueMBB);
addMachineCFGPred({OrigBB, TrueBB}, &CurMBB);		addMachineCFGPred({OrigBB, TrueBB}, &CurMBB);

▲ Show 20 Lines • Show All 1,337 Lines • Show Last 20 Lines

test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll

	Show First 20 Lines • Show All 135 Lines • ▼ Show 20 Lines
	; CHECK: bb.{{[a-zA-Z0-9.]+}}:			; CHECK: bb.{{[a-zA-Z0-9.]+}}:
	; CHECK-NEXT: successors: %[[BB_CASE100:bb.[0-9]+]](0x40000000), %[[BB_NOTCASE100_CHECKNEXT:bb.[0-9]+]](0x40000000)			; CHECK-NEXT: successors: %[[BB_CASE100:bb.[0-9]+]](0x40000000), %[[BB_NOTCASE100_CHECKNEXT:bb.[0-9]+]](0x40000000)
	; CHECK: %0:_(s32) = COPY $w0			; CHECK: %0:_(s32) = COPY $w0
	; CHECK: %[[reg100:[0-9]+]]:_(s32) = G_CONSTANT i32 100			; CHECK: %[[reg100:[0-9]+]]:_(s32) = G_CONSTANT i32 100
	; CHECK: %[[reg200:[0-9]+]]:_(s32) = G_CONSTANT i32 200			; CHECK: %[[reg200:[0-9]+]]:_(s32) = G_CONSTANT i32 200
	; CHECK: %[[reg2:[0-9]+]]:_(s32) = G_CONSTANT i32 2			; CHECK: %[[reg2:[0-9]+]]:_(s32) = G_CONSTANT i32 2
	; CHECK: %[[reg1:[0-9]+]]:_(s32) = G_CONSTANT i32 1			; CHECK: %[[reg1:[0-9]+]]:_(s32) = G_CONSTANT i32 1
	; CHECK: %[[reg0:[0-9]+]]:_(s32) = G_CONSTANT i32 0			; CHECK: %[[reg0:[0-9]+]]:_(s32) = G_CONSTANT i32 0
	; CHECK: %[[regicmp100:[0-9]+]]:_(s1) = G_ICMP intpred(eq), %[[reg100]](s32), %0			; CHECK: %[[regicmp100:[0-9]+]]:_(s1) = G_ICMP intpred(eq), %0(s32), %[[reg100]]
	; CHECK: G_BRCOND %[[regicmp100]](s1), %[[BB_CASE100]]			; CHECK: G_BRCOND %[[regicmp100]](s1), %[[BB_CASE100]]
	; CHECK: G_BR %[[BB_NOTCASE100_CHECKNEXT]]			; CHECK: G_BR %[[BB_NOTCASE100_CHECKNEXT]]
	;			;
	; CHECK: [[BB_NOTCASE100_CHECKNEXT]].{{[a-zA-Z0-9.]+}}:			; CHECK: [[BB_NOTCASE100_CHECKNEXT]].{{[a-zA-Z0-9.]+}}:
	; CHECK-NEXT: successors: %[[BB_CASE200:bb.[0-9]+]](0x40000000), %[[BB_NOTCASE200_CHECKNEXT:bb.[0-9]+]](0x40000000)			; CHECK-NEXT: successors: %[[BB_CASE200:bb.[0-9]+]](0x40000000), %[[BB_NOTCASE200_CHECKNEXT:bb.[0-9]+]](0x40000000)
	; CHECK: %[[regicmp200:[0-9]+]]:_(s1) = G_ICMP intpred(eq), %[[reg200]](s32), %0			; CHECK: %[[regicmp200:[0-9]+]]:_(s1) = G_ICMP intpred(eq), %0(s32), %[[reg200]]
	; CHECK: G_BRCOND %[[regicmp200]](s1), %[[BB_CASE200]]			; CHECK: G_BRCOND %[[regicmp200]](s1), %[[BB_CASE200]]
	; CHECK: G_BR %[[BB_NOTCASE200_CHECKNEXT]]			; CHECK: G_BR %[[BB_NOTCASE200_CHECKNEXT]]
	;			;
	; CHECK: [[BB_NOTCASE200_CHECKNEXT]].{{[a-zA-Z0-9.]+}}:			; CHECK: [[BB_NOTCASE200_CHECKNEXT]].{{[a-zA-Z0-9.]+}}:
	; CHECK-NEXT: successors: %[[BB_DEFAULT:bb.[0-9]+]](0x80000000)			; CHECK-NEXT: successors: %[[BB_DEFAULT:bb.[0-9]+]](0x80000000)
	; CHECK: G_BR %[[BB_DEFAULT]]			; CHECK: G_BR %[[BB_DEFAULT]]
	;			;
	; CHECK: [[BB_DEFAULT]].{{[a-zA-Z0-9.]+}}:			; CHECK: [[BB_DEFAULT]].{{[a-zA-Z0-9.]+}}:
	▲ Show 20 Lines • Show All 593 Lines • ▼ Show 20 Lines
	; CHECK: [[TST:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[LHS]](s32), [[RHS]]			; CHECK: [[TST:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[LHS]](s32), [[RHS]]
	; CHECK: G_STORE [[TST]](s1), [[ADDR]](p0)			; CHECK: G_STORE [[TST]](s1), [[ADDR]](p0)
	define void @int_comparison(i32 %a, i32 %b, i1* %addr) {			define void @int_comparison(i32 %a, i32 %b, i1* %addr) {
	%res = icmp ne i32 %a, %b			%res = icmp ne i32 %a, %b
	store i1 %res, i1* %addr			store i1 %res, i1* %addr
	ret void			ret void
	}			}

				; CHECK-LABEL: name: int_comparison_cst_op0
				; CHECK: [[LHS:%[0-9]+]]:_(s32) = COPY $w0
				; CHECK: [[ADDR:%[0-9]+]]:_(p0) = COPY $x1
				; CHECK: [[CST:%[0-9]+]]:_(s32) = G_CONSTANT i32 42
				; CHECK: [[TST:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[LHS]](s32), [[CST]]
				; CHECK: G_STORE [[TST]](s1), [[ADDR]](p0)
				define void @int_comparison_cst_op0(i32 %a, i1* %addr) {
				%res = icmp sgt i32 42, %a
				store i1 %res, i1* %addr
				ret void
				}

	; CHECK-LABEL: name: ptr_comparison			; CHECK-LABEL: name: ptr_comparison
	; CHECK: [[LHS:%[0-9]+]]:_(p0) = COPY $x0			; CHECK: [[LHS:%[0-9]+]]:_(p0) = COPY $x0
	; CHECK: [[RHS:%[0-9]+]]:_(p0) = COPY $x1			; CHECK: [[RHS:%[0-9]+]]:_(p0) = COPY $x1
	; CHECK: [[ADDR:%[0-9]+]]:_(p0) = COPY $x2			; CHECK: [[ADDR:%[0-9]+]]:_(p0) = COPY $x2
	; CHECK: [[TST:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[LHS]](p0), [[RHS]]			; CHECK: [[TST:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[LHS]](p0), [[RHS]]
	; CHECK: G_STORE [[TST]](s1), [[ADDR]](p0)			; CHECK: G_STORE [[TST]](s1), [[ADDR]](p0)
	define void @ptr_comparison(i8* %a, i8* %b, i1* %addr) {			define void @ptr_comparison(i8* %a, i8* %b, i1* %addr) {
	%res = icmp eq i8* %a, %b			%res = icmp eq i8* %a, %b
	▲ Show 20 Lines • Show All 1,444 Lines • Show Last 20 Lines