This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/GlobalISel/
-
CodeGen/
-
GlobalISel/
-
IRTranslator.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/GlobalISel/
-
GlobalISel/
-
arm64-irtranslator.ll
-
ARM/GlobalISel/
-
GlobalISel/
-
arm-irtranslator.ll

Differential D40074

[GISel] Canonicalize constants to RHS for commutative operations
AbandonedPublic

Authored by rovka on Nov 15 2017, 5:51 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
dsanders

Summary

Canonicalize (op imm, reg) to (op reg, imm) in the IRTranslator for
commutative operations.

This makes it easier to match sequences with constants in future passes,
since we don't need to check both possible orders. It is particularly
important for TableGen, since it only generates code to match (op reg,
imm), and there's no point in making it do any extra work to check for
the other variant too.

This change makes it possible for the ARM backend to select e.g.
'ADDri %reg, %imm' for both 'add i32 %reg, %imm' and
'add i32 %imm, %reg', whereas previously for the latter it would
materialize %imm in a register and generate an ADDrr. It should have
similar benefits for the other backends.

Diff Detail

Event Timeline

rovka created this revision.Nov 15 2017, 5:51 AM

Herald added subscribers: kristof.beyls, igorb, javed.absar, aemerson. · View Herald TranscriptNov 15 2017, 5:51 AM

We have canonicalization in other parts of LLVM too - even though it's not an area I understand very well.
If this is the first canonicalization (of potentially many to come) in the IRTranslator, would it make sense to try and document the canonical representation(s) somewhere and/or somehow?

In D40074#926085, @kristof.beyls wrote:

If this is the first canonicalization (of potentially many to come) in the IRTranslator, would it make sense to try and document the canonical representation(s) somewhere and/or somehow?

Even better would be a pass that verified canonical order, that could be inserted between other passes in debug mode to verify that optimisations preserve canonical-ness

Hi Diana,

It is particularly important for TableGen, since it only generates code to match (op reg,
imm), and there's no point in making it do any extra work to check for
the other variant too.

That's odd. TableGen appears to handle some commutativity but maybe the cases I've seen are special somehow. For example AArch64 emits these:

// (add:{ *:[i32] } GPR32sp:{ *:[i32] }:$Rn, addsub_shifted_imm32:{ *:[i32] }:$imm)  =>  (ADDWri:{ *:[i32] } GPR32sp:{ *:[i32] }:$Rn, addsub_shifted_imm32:{ *:[i32] }:$imm)
// (add:{ *:[i32] } addsub_shifted_imm32:{ *:[i32] }:$imm, GPR32sp:{ *:[i32] }:$Rn)  =>  (ADDWri:{ *:[i32] } GPR32sp:{ *:[i32] }:$Rn, addsub_shifted_imm32:{ *:[i32] }:$imm)

I believe both of these originate from the pattern in BaseAddSubImm. I also vaguely remember seeing some code that checked whether commutativity made a difference before adding a second TreePattern to handle the commutative case.

I'll see if I can spot why ARM's case is different.

I see why ARM is different from AArch64. ARM only sees one pattern being emitted because TableGen has some code that prevents commutativity being considered when one of the children of a commutative operator is either the imm operator or a plain integer (see OnlyOnRHSOfCommutative() which is used in canPatternMatch()). Meanwhile, AArch64's case is using a ComplexPattern so both cases get emitted.

So it appears that TableGen considers commutative pattern but chooses to ignore the second pattern because it knows theres some canonicalization taking place.

In D40074#926470, @grandinj wrote:

In D40074#926085, @kristof.beyls wrote:

If this is the first canonicalization (of potentially many to come) in the IRTranslator, would it make sense to try and document the canonical representation(s) somewhere and/or somehow?

Even better would be a pass that verified canonical order, that could be inserted between other passes in debug mode to verify that optimisations preserve canonical-ness

I agree, documentation is good but it's far too easy to let it slip out of date. Would the MachineVerifier be a good place for this kind of check?
For the record, I'm not sure this is the first canonicalization in the IRTranslator, for instance the lowering of -0.0 - X to G_FNEG instead of just another G_FSUB could be seen as a canonicalization too.

In D40074#926496, @dsanders wrote:

I see why ARM is different from AArch64. ARM only sees one pattern being emitted because TableGen has some code that prevents commutativity being considered when one of the children of a commutative operator is either the imm operator or a plain integer (see OnlyOnRHSOfCommutative() which is used in canPatternMatch()). Meanwhile, AArch64's case is using a ComplexPattern so both cases get emitted.

So it appears that TableGen considers commutative pattern but chooses to ignore the second pattern because it knows theres some canonicalization taking place.

Yes, the canonicalization takes place in the target-independent part of the SelectionDAG code (SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1, SDValue N2, const SDNodeFlags Flags).

I guess one advantage that DAGISel has over GlobalISel is that the getNode helpers are used throughout instruction selection. If we wanted the same kind of behaviour for GlobalISel, we'd have to put this canonicalization in the MachineInstrBuilder, but that would affect the whole backend.

I think for now it's better to keep this in the IRTranslator since it's the least disruptive thing to do, and any subsequent passes (e.g. combiners and whatnot) can benefit from having a canonical form to work with most of the time. An alternative would be to move these things into their own canonicalization pass that we can then schedule whenever we see fit, but that might be a bit overkill since adding canonicalizations isn't really a priority at this point.

In D40074#926496, @dsanders wrote:

I see why ARM is different from AArch64. ARM only sees one pattern being emitted because TableGen has some code that prevents commutativity being considered when one of the children of a
commutative operator is either the imm operator or a plain integer (see OnlyOnRHSOfCommutative() which is used in canPatternMatch()). Meanwhile, AArch64's case is using a ComplexPattern so both cases get emitted.

So it appears that TableGen considers commutative pattern but chooses to ignore the second pattern because it knows theres some canonicalization taking place.

Yes, the canonicalization takes place in the target-independent part of the SelectionDAG code (SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1, SDValue N2, const SDNodeFlags Flags).

I guess one advantage that DAGISel has over GlobalISel is that the getNode helpers are used throughout instruction selection. If we wanted the same kind of behaviour for GlobalISel, we'd have to put this canonicalization in the MachineInstrBuilder, but that would affect the whole backend.

I think for now it's better to keep this in the IRTranslator since it's the least disruptive thing to do, and any subsequent passes (e.g. combiners and whatnot) can benefit from having a canonical form to work with most of the time. An alternative would be to move these things into their own canonicalization pass that we can then schedule whenever we see fit, but that might be a bit overkill since adding canonicalizations isn't really a priority at this point.

Quentin and I were discussing this the other day and we were both instinctively leaning towards removing the culling from tablegen for GlobalISel. The main reason behind this is that we don't want to spend compile-time canonicalizing for builds where compile-time is the priority. However, it's possible that canonicalization ends up cheaper overall so we weren't sure which was the best decision in the long run.

Does disabling the culling of commutative rules in tablegen sound like a good solution to you?

FWIW, if we did end up introducing canonicalization, we'd prefer it to be a separate pass (so that it can be skipped) and we'd like it to be possible for targets to opt out of (at least some) canonicalizations. We've had a few occasions where we have to repeatedly de-canonicalize in DAGISel.

... can benefit from having a canonical form to work with most of the time.

I think that if we're going to have canonicalization then we ought to be consistently canonical. The reason for that is if we aren't consistent then we may have to pay the price of canonicalization as well as the price of handling non-canonical MIR.

In D40074#939620, @dsanders wrote:

FWIW, if we did end up introducing canonicalization, we'd prefer it to be a separate pass (so that it can be skipped) and we'd like it to be possible for targets to opt out of (at least some) canonicalizations. We've had a few occasions where we have to repeatedly de-canonicalize in DAGISel.

Ok, that makes sense in the long run.

... can benefit from having a canonical form to work with most of the time.

I think that if we're going to have canonicalization then we ought to be consistently canonical. The reason for that is if we aren't consistent then we may have to pay the price of canonicalization as well as the price of handling non-canonical MIR.

Fair enough.

In D40074#926496, @dsanders wrote:

I see why ARM is different from AArch64. ARM only sees one pattern being emitted because TableGen has some code that prevents commutativity being considered when one of the children of a
commutative operator is either the imm operator or a plain integer (see OnlyOnRHSOfCommutative() which is used in canPatternMatch()). Meanwhile, AArch64's case is using a ComplexPattern so both cases get emitted.

So it appears that TableGen considers commutative pattern but chooses to ignore the second pattern because it knows theres some canonicalization taking place.

Yes, the canonicalization takes place in the target-independent part of the SelectionDAG code (SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1, SDValue N2, const SDNodeFlags Flags).

I guess one advantage that DAGISel has over GlobalISel is that the getNode helpers are used throughout instruction selection. If we wanted the same kind of behaviour for GlobalISel, we'd have to put this canonicalization in the MachineInstrBuilder, but that would affect the whole backend.

I think for now it's better to keep this in the IRTranslator since it's the least disruptive thing to do, and any subsequent passes (e.g. combiners and whatnot) can benefit from having a canonical form to work with most of the time. An alternative would be to move these things into their own canonicalization pass that we can then schedule whenever we see fit, but that might be a bit overkill since adding canonicalizations isn't really a priority at this point.

Quentin and I were discussing this the other day and we were both instinctively leaning towards removing the culling from tablegen for GlobalISel. The main reason behind this is that we don't want to spend compile-time canonicalizing for builds where compile-time is the priority.

Well, a canonicalization pass can be easily skipped for builds where compile-time is the priority, whereas TableGen culling would be less trivial to disable selectively (and much more opaque).

However, it's possible that canonicalization ends up cheaper overall so we weren't sure which was the best decision in the long run.

Does disabling the culling of commutative rules in tablegen sound like a good solution to you?

I'm not sure. It has the advantage that it's probably very easy to implement at this point, and it also reduces the burden on backend developers, since they won't have to decide which canonicalizations to enable and when. OTOH, a canonicalization pass is more flexible and doing canonicalization early on would help other passes that need to match instruction patterns, such as the combiner (but since that's not implemented yet, and since we don't have a large number of canonicalizations that we want to add just now, it's difficult to tell how much the combiner would actually benefit in practice from canonicalization).

All in all, I don't think I would oppose disabling the culling in TableGen, or even leaving things as they are until we have more substance to base a decision on. It would be nice to keep track of these trade-offs somewhere though.

This doesn't seem to be what we want to do in the long run.

Sorry, I'd completely forgotten about this revision. I think the best thing to do is to disable the culling for GlobalISel.

Revision Contents

Path

Size

lib/

CodeGen/

GlobalISel/

IRTranslator.cpp

12 lines

test/

CodeGen/

AArch64/

GlobalISel/

arm64-irtranslator.ll

18 lines

ARM/

GlobalISel/

arm-irtranslator.ll

24 lines

Diff 123017

lib/CodeGen/GlobalISel/IRTranslator.cpp

Show First 20 Lines • Show All 190 Lines • ▼ Show 20 Lines	bool IRTranslator::translateBinaryOp(unsigned Opcode, const User &U,

// Get or create a virtual register for each value.		// Get or create a virtual register for each value.
// Unless the value is a Constant => loadimm cst?		// Unless the value is a Constant => loadimm cst?
// or inline constant each time?		// or inline constant each time?
// Creation of a virtual register needs to have a size.		// Creation of a virtual register needs to have a size.
unsigned Op0 = getOrCreateVReg(*U.getOperand(0));		unsigned Op0 = getOrCreateVReg(*U.getOperand(0));
unsigned Op1 = getOrCreateVReg(*U.getOperand(1));		unsigned Op1 = getOrCreateVReg(*U.getOperand(1));
unsigned Res = getOrCreateVReg(U);		unsigned Res = getOrCreateVReg(U);
MIRBuilder.buildInstr(Opcode).addDef(Res).addUse(Op0).addUse(Op1);
		bool Op0IsImm = dyn_cast<Constant>(U.getOperand(0));
		bool Op1IsImm = dyn_cast<Constant>(U.getOperand(1));

		// If the instruction is commutable and only one of the operands is a
		// constant, canonicalize it to the RHS so it's easier to match by future
		// passes.
		auto MI = MIRBuilder.buildInstr(Opcode).addDef(Res);
		if (MI->isCommutable() && Op0IsImm && !Op1IsImm)
		std::swap(Op0, Op1);
		MI.addUse(Op0).addUse(Op1);
return true;		return true;
}		}

bool IRTranslator::translateFSub(const User &U, MachineIRBuilder &MIRBuilder) {		bool IRTranslator::translateFSub(const User &U, MachineIRBuilder &MIRBuilder) {
// -0.0 - X --> G_FNEG		// -0.0 - X --> G_FNEG
if (isa<Constant>(U.getOperand(0)) &&		if (isa<Constant>(U.getOperand(0)) &&
U.getOperand(0) == ConstantFP::getZeroValueForNegation(U.getType())) {		U.getOperand(0) == ConstantFP::getZeroValueForNegation(U.getType())) {
MIRBuilder.buildInstr(TargetOpcode::G_FNEG)		MIRBuilder.buildInstr(TargetOpcode::G_FNEG)
▲ Show 20 Lines • Show All 1,134 Lines • Show Last 20 Lines

test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll

	Show First 20 Lines • Show All 339 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[RES:%[0-9]+]]:_(s64) = G_AND [[ARG1]], [[ARG2]]			; CHECK-NEXT: [[RES:%[0-9]+]]:_(s64) = G_AND [[ARG1]], [[ARG2]]
	; CHECK-NEXT: %x0 = COPY [[RES]]			; CHECK-NEXT: %x0 = COPY [[RES]]
	; CHECK-NEXT: RET_ReallyLR implicit %x0			; CHECK-NEXT: RET_ReallyLR implicit %x0
	define i64 @andi64(i64 %arg1, i64 %arg2) {			define i64 @andi64(i64 %arg1, i64 %arg2) {
	%res = and i64 %arg1, %arg2			%res = and i64 %arg1, %arg2
	ret i64 %res			ret i64 %res
	}			}

				; CHECK-LABEL: name: andi64
				; CHECK-DAG: [[ARG1:%[0-9]+]]:_(s64) = COPY %x0
				; CHECK-DAG: [[IMM:%[0-9]+]]:_(s64) = G_CONSTANT i64 1024
				; CHECK: G_AND [[ARG1]], [[IMM]]
				define i64 @andi64imm(i64 %arg1) {
				%res = and i64 1024, %arg1
				ret i64 %res
				}

	; CHECK-LABEL: name: andi32			; CHECK-LABEL: name: andi32
	; CHECK: [[ARG1:%[0-9]+]]:_(s32) = COPY %w0			; CHECK: [[ARG1:%[0-9]+]]:_(s32) = COPY %w0
	; CHECK-NEXT: [[ARG2:%[0-9]+]]:_(s32) = COPY %w1			; CHECK-NEXT: [[ARG2:%[0-9]+]]:_(s32) = COPY %w1
	; CHECK-NEXT: [[RES:%[0-9]+]]:_(s32) = G_AND [[ARG1]], [[ARG2]]			; CHECK-NEXT: [[RES:%[0-9]+]]:_(s32) = G_AND [[ARG1]], [[ARG2]]
	; CHECK-NEXT: %w0 = COPY [[RES]]			; CHECK-NEXT: %w0 = COPY [[RES]]
	; CHECK-NEXT: RET_ReallyLR implicit %w0			; CHECK-NEXT: RET_ReallyLR implicit %w0
	define i32 @andi32(i32 %arg1, i32 %arg2) {			define i32 @andi32(i32 %arg1, i32 %arg2) {
	%res = and i32 %arg1, %arg2			%res = and i32 %arg1, %arg2
	ret i32 %res			ret i32 %res
	}			}

	; Tests for sub.			; Tests for sub.
	; CHECK-LABEL: name: subi64			; CHECK-LABEL: name: subi64
	; CHECK: [[ARG1:%[0-9]+]]:_(s64) = COPY %x0			; CHECK: [[ARG1:%[0-9]+]]:_(s64) = COPY %x0
	; CHECK-NEXT: [[ARG2:%[0-9]+]]:_(s64) = COPY %x1			; CHECK-NEXT: [[ARG2:%[0-9]+]]:_(s64) = COPY %x1
	; CHECK-NEXT: [[RES:%[0-9]+]]:_(s64) = G_SUB [[ARG1]], [[ARG2]]			; CHECK-NEXT: [[RES:%[0-9]+]]:_(s64) = G_SUB [[ARG1]], [[ARG2]]
	; CHECK-NEXT: %x0 = COPY [[RES]]			; CHECK-NEXT: %x0 = COPY [[RES]]
	; CHECK-NEXT: RET_ReallyLR implicit %x0			; CHECK-NEXT: RET_ReallyLR implicit %x0
	define i64 @subi64(i64 %arg1, i64 %arg2) {			define i64 @subi64(i64 %arg1, i64 %arg2) {
	%res = sub i64 %arg1, %arg2			%res = sub i64 %arg1, %arg2
	ret i64 %res			ret i64 %res
	}			}

				; CHECK-LABEl: subi64imm
				; CHECK-DAG: [[ARG1:%[0-9]+]]:_(s64) = COPY %x0
				; CHEcK-DAG: [[IMM:%[0-9]+]]:_(s64) = G_CONSTANT i64 128
				; CHECK: G_SUB [[IMM]], [[ARG1]]
				define i64 @subi64imm(i64 %arg1) {
				%res = sub i64 128, %arg1
				ret i64 %res
				}

	; CHECK-LABEL: name: subi32			; CHECK-LABEL: name: subi32
	; CHECK: [[ARG1:%[0-9]+]]:_(s32) = COPY %w0			; CHECK: [[ARG1:%[0-9]+]]:_(s32) = COPY %w0
	; CHECK-NEXT: [[ARG2:%[0-9]+]]:_(s32) = COPY %w1			; CHECK-NEXT: [[ARG2:%[0-9]+]]:_(s32) = COPY %w1
	; CHECK-NEXT: [[RES:%[0-9]+]]:_(s32) = G_SUB [[ARG1]], [[ARG2]]			; CHECK-NEXT: [[RES:%[0-9]+]]:_(s32) = G_SUB [[ARG1]], [[ARG2]]
	; CHECK-NEXT: %w0 = COPY [[RES]]			; CHECK-NEXT: %w0 = COPY [[RES]]
	; CHECK-NEXT: RET_ReallyLR implicit %w0			; CHECK-NEXT: RET_ReallyLR implicit %w0
	define i32 @subi32(i32 %arg1, i32 %arg2) {			define i32 @subi32(i32 %arg1, i32 %arg2) {
	%res = sub i32 %arg1, %arg2			%res = sub i32 %arg1, %arg2
	▲ Show 20 Lines • Show All 1,260 Lines • Show Last 20 Lines

test/CodeGen/ARM/GlobalISel/arm-irtranslator.ll

	Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines
	; CHECK: [[RES:%[0-9]+]]:_(s32) = G_SUB [[VREGX]], [[VREGY]]			; CHECK: [[RES:%[0-9]+]]:_(s32) = G_SUB [[VREGX]], [[VREGY]]
	; CHECK: %r0 = COPY [[RES]](s32)			; CHECK: %r0 = COPY [[RES]](s32)
	; CHECK: BX_RET 14, _, implicit %r0			; CHECK: BX_RET 14, _, implicit %r0
	entry:			entry:
	%res = sub i32 %x, %y			%res = sub i32 %x, %y
	ret i32 %res			ret i32 %res
	}			}

				define i32 @test_swap_imm_with_reg(i32 %x) {
				; Add is commutative, so we should see it canonicalized with the immediate
				; on the RHS.
				; CHECK-LABEL: name: test_swap_imm_with_reg
				; CHECK-DAG: [[REG:%[0-9]+]]:_(s32) = COPY %r0
				; CHECK-DAG: [[IMM:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; CHECK: G_ADD [[REG]], [[IMM]]
				entry:
				%res = add i32 16, %x
				ret i32 %res
				}

				define i32 @test_dont_swap_imm_with_reg(i32 %x) {
				; Sub is not commutative, so we should not see it canonicalized with the
				; immediate on the RHS.
				; CHECK-LABEL: name: test_dont_swap_imm_with_reg
				; CHECK-DAG: [[REG:%[0-9]+]]:_(s32) = COPY %r0
				; CHECK-DAG: [[IMM:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; CHECK: G_SUB [[IMM]], [[REG]]
				entry:
				%res = sub i32 16, %x
				ret i32 %res
				}

	define i32 @test_stack_args(i32 %p0, i32 %p1, i32 %p2, i32 %p3, i32 %p4, i32 %p5) {			define i32 @test_stack_args(i32 %p0, i32 %p1, i32 %p2, i32 %p3, i32 %p4, i32 %p5) {
	; CHECK-LABEL: name: test_stack_args			; CHECK-LABEL: name: test_stack_args
	; CHECK: fixedStack:			; CHECK: fixedStack:
	; CHECK-DAG: id: [[P4:[0-9]]]{{.}}offset: 0{{.}}size: 4			; CHECK-DAG: id: [[P4:[0-9]]]{{.}}offset: 0{{.}}size: 4
	; CHECK-DAG: id: [[P5:[0-9]]]{{.}}offset: 4{{.}}size: 4			; CHECK-DAG: id: [[P5:[0-9]]]{{.}}offset: 4{{.}}size: 4
	; CHECK: liveins: %r0, %r1, %r2, %r3			; CHECK: liveins: %r0, %r1, %r2, %r3
	; CHECK: [[VREGP2:%[0-9]+]]:_(s32) = COPY %r2			; CHECK: [[VREGP2:%[0-9]+]]:_(s32) = COPY %r2
	; CHECK: [[FIP5:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.[[P5]]			; CHECK: [[FIP5:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.[[P5]]
	▲ Show 20 Lines • Show All 400 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[GISel] Canonicalize constants to RHS for commutative operationsAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 123017

lib/CodeGen/GlobalISel/IRTranslator.cpp

test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll

test/CodeGen/ARM/GlobalISel/arm-irtranslator.ll

[GISel] Canonicalize constants to RHS for commutative operations
AbandonedPublic