Download Raw Diff

Details

Reviewers

RKSimon
dmgreen
samparker
spatel

Commits

rGdad5f00e3b4d: [DAGCombine] Combine pattern for REV16

Summary

This adds another pattern matcher to the combiner to generate the REV16 instruction for a case that we were not handling.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Feb 5 2020, 2:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2020, 2:41 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

RKSimon added inline comments.Feb 5 2020, 3:58 AM

llvm/test/CodeGen/Thumb2/thumb2-rev16.ll
2	Might be best if you regenerate+commit this file against trunk and then rebase the patch so it shows the full codegen diff.

Ah yes, thanks for the tip.

RKSimon added inline comments.Feb 5 2020, 5:54 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5657	This looks like it only works for MVT::i32 cases, but there doesn't seem to be any VT check?
5663	Use isConstOrConstSplat instead of dyn_cast<ConstantSDNode> ?

SjoerdMeijer marked an inline comment as done.Feb 5 2020, 7:33 AM

SjoerdMeijer added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5704	This looks like it only works for MVT::i32 cases, but there doesn't seem to be any VT check? That is checked here, but I will add an assert to `MatchBSwapHWordOrAndAnd`

Thanks for looking. Comments addressed

friendly ping

spatel added inline comments.Feb 11 2020, 9:15 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5655	Use current formatting rules, so lowerCamel: "matchBSwap..." (can fix the related function names as a preliminary step)
5658	Also assert that N->getOpcode() == ISD::OR ?
5667–5669	This is too specific - what if the operands of the "or" are commuted? This patch should have a test for that possibility: define i32 @bswap_ror_commuted(i32 %a) { %l8 = shl i32 %a, 8 %r8 = lshr i32 %a, 8 %mask_l8 = and i32 %l8, 4278255360 %mask_r8 = and i32 %r8, 16711935 %tmp = or i32 %mask_r8, %mask_l8 ret i32 %tmp }
5674–5675	These could use more descriptive names: ShiftAmt{1/2}. These are using isConstOrConstSplat(), but the 'and' mask operands are not. Change the above code to match this?

Thanks for looking! Comments addressed. I have also added more test for the cases that shouldn't trigger.

spatel added inline comments.Feb 13 2020, 9:38 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

5668–5671

This check works, but it makes me nervous because I think it requires demanded bits to alter the code for correctness.
Take this example where the masks are paired with the wrong shift op (and please add a test like this to the patch):

define i32 @not_rev16(i32 %a) {
    %l8 = shl i32 %a, 8
    %r8 = lshr i32 %a, 8
    %mask_r8 = and i32 %r8, 4278255360
    %mask_l8 = and i32 %l8, 16711935
    %tmp = or i32 %mask_r8, %mask_l8
    ret i32 %tmp
}

To be safe, I think we should enforce that the masks and shifts are paired correctly.
So we could do something like:

// Canonicalize mask ops to ensure that shift-left operand is on the left.
if (Mask2 == 0xff00ff00) {
  std::swap(N0, N1);
  std::swap(Mask1, Mask2)
}

or maybe better - go back to the earlier rev of this patch and call this function with the operands reversed:

if (SDValue BSwap = matchBSwapHWordOrAndAnd(TLI, DAG, N, N0, N1, VT,
                                            getShiftAmountTy(VT)))
  return BSwap;
// Try again with commuted operands.
if (SDValue BSwap = matchBSwapHWordOrAndAnd(TLI, DAG, N, N1, N0, VT,
                                            getShiftAmountTy(VT)))
  return BSwap;

Many thanks again for the feedback! Really liked that suggestion: so now calling the same function twice but with the operands swapped to check the commuted case. Have also added the suggested test case.

In D74032#1876021, @SjoerdMeijer wrote:

Many thanks again for the feedback! Really liked that suggestion: so now calling the same function twice but with the operands swapped to check the commuted case. Have also added the suggested test case.

There's 1 more concern for this transform - do we need to limit it when the intermediate values have other uses? This will be interesting because the answer may be different for different targets. Ie, for ARM/Thumb/AArch64, we're able to reduce the whole sequence to a single rev16, so it's always a good transform. But for x86, we're going to need a bswap and ror.

So we need to:

Add a test like this:

define i32 @extra_maskop_uses2(i32 %a) {
    %l8 = shl i32 %a, 8
    %r8 = lshr i32 %a, 8
    %mask_l8 = and i32 %l8, 4278255360
    %mask_r8 = and i32 %r8, 16711935
    %or = or i32 %mask_r8, %mask_l8
    %mul = mul i32 %mask_r8, %mask_l8   ; use the mask ops for some other reason 
    %r = mul i32 %mul, %or              ; and use that result for some other reason
    ret i32 %r
}

Copy that test (maybe the whole test file) over to llvm/tests/CodeGen/X86 and generate CHECK lines with utils/update_llc_test_checks.py.
Possibly add some 'hasOneUse()' logic to the code to avoid regressions.

Please push the test changes to trunk before updating in this review (no need for pre-commit review unless you have questions/concerns).

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5664–5665	Nit: the input variables are "0" and "1", but this shifts the indexing to "1" and "2". It would be better to make this consistent (same for the "Shift" variables below this). Most of the code in DAGCombiner uses "0"-based indexing.

Thanks, I will first add the tests (will indeed add the whole test file). Will do that on Monday, and after that, will return and revisit this patch

SjoerdMeijer mentioned this in rGa02056c96070: [X86] New test to check rev16 patterns, prep step for D74032. NFC..Feb 17 2020, 1:15 AM

New x86 test committed in: https://reviews.llvm.org/rGa02056c96070

Fixed the off-by-1 error in the variable naming, and added hasOneUse checks for the AND instructions.

LGTM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5664	This check seems over-protective. If only one of the 'and' ops has extra uses, the transform is probably still worth doing. It's ok to add a TODO comment here if you want and make that a small follow-up patch after adding more tests to exercise those cases.
llvm/test/CodeGen/X86/rev16.ll
228 ↗	(On Diff #244946)	Remove TODO comment - this is working as expected. It also works with aarch64. The missed transform for arm/thumb seems to be because there's an early target-specific combine that creates "bfi", so we miss the generic pattern matching for bswap because it only runs with !LegalOperations. I'm not sure if/why we need that restriction.

This revision is now accepted and ready to land.Feb 17 2020, 6:07 AM

Thanks, this has been a good and useful introduction to DAGCombine for me :-)
Before committing, I will add a TODO saying that this hasOneUse is a bit over-protective for now (will follow up later), and will remove the TODO from the test case.

Closed by commit rGdad5f00e3b4d: [DAGCombine] Combine pattern for REV16 (authored by SjoerdMeijer). · Explain WhyFeb 17 2020, 7:01 AM

This revision was automatically updated to reflect the committed changes.

Diff 244159

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,642 Lines • ▼ Show 20 Lines	if (!C \|\| C->getAPIntValue() != 16)
return false;		return false;
Parts[0] = Parts[1] = N.getOperand(0).getOperand(0).getNode();		Parts[0] = Parts[1] = N.getOperand(0).getOperand(0).getNode();
return true;		return true;
}		}

return false;		return false;
}		}

		// Match this pattern:
		// (or (and (shl (A, 8)), 0xff00ff00), (and (srl (A, 8)), 0x00ff00ff))
		// And rewrite this to:
		// (rotr (bswap A), 16)
		static SDValue matchBSwapHWordOrAndAnd(const TargetLowering &TLI,
		spatelUnsubmitted Not Done Reply Inline Actions Use current formatting rules, so lowerCamel: "matchBSwap..." (can fix the related function names as a preliminary step) spatel: Use current formatting rules, so lowerCamel: "matchBSwap..." (can fix the related function…
		SelectionDAG &DAG, SDNode *N, SDValue N0,
		SDValue N1, EVT VT, EVT ShiftAmountTy) {
		RKSimonUnsubmitted Not Done Reply Inline Actions This looks like it only works for MVT::i32 cases, but there doesn't seem to be any VT check? RKSimon: This looks like it only works for MVT::i32 cases, but there doesn't seem to be any VT check?
		assert(N->getOpcode() == ISD::OR && VT == MVT::i32 &&
		spatelUnsubmitted Not Done Reply Inline Actions Also assert that N->getOpcode() == ISD::OR ? spatel: Also assert that N->getOpcode() == ISD::OR ?
		"MatchBSwapHWordOrAndAnd: expecting i32");
		if (!TLI.isOperationLegalOrCustom(ISD::ROTR, VT))
		return SDValue();
		if (N0.getOpcode() != ISD::AND \|\| N1.getOpcode() != ISD::AND)
		return SDValue();
		RKSimonUnsubmitted Not Done Reply Inline Actions Use isConstOrConstSplat instead of dyn_cast<ConstantSDNode> ? RKSimon: Use isConstOrConstSplat instead of dyn_cast<ConstantSDNode> ?
		ConstantSDNode *Mask1 = isConstOrConstSplat(N0.getOperand(1));
		spatelUnsubmitted Not Done Reply Inline Actions This check seems over-protective. If only one of the 'and' ops has extra uses, the transform is probably still worth doing. It's ok to add a TODO comment here if you want and make that a small follow-up patch after adding more tests to exercise those cases. spatel: This check seems over-protective. If only one of the 'and' ops has extra uses, the transform is…
		ConstantSDNode *Mask2 = isConstOrConstSplat(N1.getOperand(1));
		spatelUnsubmitted Not Done Reply Inline Actions Nit: the input variables are "0" and "1", but this shifts the indexing to "1" and "2". It would be better to make this consistent (same for the "Shift" variables below this). Most of the code in DAGCombiner uses "0"-based indexing. spatel: Nit: the input variables are "0" and "1", but this shifts the indexing to "1" and "2". It would…
		if (!Mask1 \|\| !Mask2)
		return SDValue();
		if (!((Mask1->getAPIntValue() == 0xff00ff00 &&
		Mask2->getAPIntValue() == 0x00ff00ff) \|\|
		spatelUnsubmitted Not Done Reply Inline Actions This is too specific - what if the operands of the "or" are commuted? This patch should have a test for that possibility: define i32 @bswap_ror_commuted(i32 %a) { %l8 = shl i32 %a, 8 %r8 = lshr i32 %a, 8 %mask_l8 = and i32 %l8, 4278255360 %mask_r8 = and i32 %r8, 16711935 %tmp = or i32 %mask_r8, %mask_l8 ret i32 %tmp } spatel: This is too specific - what if the operands of the "or" are commuted? This patch should have a…
		(Mask1->getAPIntValue() == 0x00ff00ff &&
		Mask2->getAPIntValue() == 0xff00ff00)))
		spatelUnsubmitted Not Done Reply Inline Actions This check works, but it makes me nervous because I think it requires demanded bits to alter the code for correctness. Take this example where the masks are paired with the wrong shift op (and please add a test like this to the patch): define i32 @not_rev16(i32 %a) { %l8 = shl i32 %a, 8 %r8 = lshr i32 %a, 8 %mask_r8 = and i32 %r8, 4278255360 %mask_l8 = and i32 %l8, 16711935 %tmp = or i32 %mask_r8, %mask_l8 ret i32 %tmp } To be safe, I think we should enforce that the masks and shifts are paired correctly. So we could do something like: // Canonicalize mask ops to ensure that shift-left operand is on the left. if (Mask2 == 0xff00ff00) { std::swap(N0, N1); std::swap(Mask1, Mask2) } or maybe better - go back to the earlier rev of this patch and call this function with the operands reversed: if (SDValue BSwap = matchBSwapHWordOrAndAnd(TLI, DAG, N, N0, N1, VT, getShiftAmountTy(VT))) return BSwap; // Try again with commuted operands. if (SDValue BSwap = matchBSwapHWordOrAndAnd(TLI, DAG, N, N1, N0, VT, getShiftAmountTy(VT))) return BSwap; spatel: This check works, but it makes me nervous because I think it requires demanded bits to alter…
		return SDValue();
		SDValue Shift1 = N0.getOperand(0);
		SDValue Shift2 = N1.getOperand(0);
		if (!((Shift1.getOpcode() == ISD::SHL && Shift2.getOpcode() == ISD::SRL) \|\|
		spatelUnsubmitted Not Done Reply Inline Actions These could use more descriptive names: ShiftAmt{1/2}. These are using isConstOrConstSplat(), but the 'and' mask operands are not. Change the above code to match this? spatel: These could use more descriptive names: ShiftAmt{1/2}. These are using isConstOrConstSplat()…
		(Shift1.getOpcode() == ISD::SRL && Shift2.getOpcode() == ISD::SHL)))
		return SDValue();
		ConstantSDNode *ShiftAmt1 = isConstOrConstSplat(Shift1.getOperand(1));
		ConstantSDNode *ShiftAmt2 = isConstOrConstSplat(Shift2.getOperand(1));
		if (!ShiftAmt1 \|\| !ShiftAmt2)
		return SDValue();
		if (ShiftAmt1->getAPIntValue() != 8 \|\| ShiftAmt2->getAPIntValue() != 8)
		return SDValue();
		if (Shift1.getOperand(0) != Shift2.getOperand(0))
		return SDValue();

		SDLoc DL(N);
		SDValue BSwap = DAG.getNode(ISD::BSWAP, DL, VT, Shift1.getOperand(0));
		SDValue ShAmt = DAG.getConstant(16, DL, ShiftAmountTy);
		return DAG.getNode(ISD::ROTR, DL, VT, BSwap, ShAmt);
		}

/// Match a 32-bit packed halfword bswap. That is		/// Match a 32-bit packed halfword bswap. That is
/// ((x & 0x000000ff) << 8) \|		/// ((x & 0x000000ff) << 8) \|
/// ((x & 0x0000ff00) >> 8) \|		/// ((x & 0x0000ff00) >> 8) \|
/// ((x & 0x00ff0000) << 8) \|		/// ((x & 0x00ff0000) << 8) \|
/// ((x & 0xff000000) >> 8)		/// ((x & 0xff000000) >> 8)
/// => (rotl (bswap x), 16)		/// => (rotl (bswap x), 16)
SDValue DAGCombiner::MatchBSwapHWord(SDNode *N, SDValue N0, SDValue N1) {		SDValue DAGCombiner::MatchBSwapHWord(SDNode *N, SDValue N0, SDValue N1) {
if (!LegalOperations)		if (!LegalOperations)
return SDValue();		return SDValue();

EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
if (VT != MVT::i32)		if (VT != MVT::i32)
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions This looks like it only works for MVT::i32 cases, but there doesn't seem to be any VT check? That is checked here, but I will add an assert to `MatchBSwapHWordOrAndAnd` SjoerdMeijer: >This looks like it only works for MVT::i32 cases, but there doesn't seem to be any VT check?
return SDValue();		return SDValue();
if (!TLI.isOperationLegalOrCustom(ISD::BSWAP, VT))		if (!TLI.isOperationLegalOrCustom(ISD::BSWAP, VT))
return SDValue();		return SDValue();

		if (SDValue BSwap = matchBSwapHWordOrAndAnd(TLI, DAG, N, N0, N1, VT,
		getShiftAmountTy(VT)))
		return BSwap;

// Look for either		// Look for either
// (or (bswaphpair), (bswaphpair))		// (or (bswaphpair), (bswaphpair))
// (or (or (bswaphpair), (and)), (and))		// (or (or (bswaphpair), (and)), (and))
// (or (or (and), (bswaphpair)), (and))		// (or (or (and), (bswaphpair)), (and))
SDNode *Parts[4] = {};		SDNode *Parts[4] = {};

if (isBSwapHWordPair(N0, Parts)) {		if (isBSwapHWordPair(N0, Parts)) {
// (or (or (and), (and)), (or (and), (and)))		// (or (or (and), (and)), (or (and), (and)))
▲ Show 20 Lines • Show All 15,700 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/thumb2-rev16.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=thumbv7m-none-eabi -o - \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv7m-none-eabi -o - \| FileCheck %s
				RKSimonUnsubmitted Not Done Reply Inline Actions Might be best if you regenerate+commit this file against trunk and then rebase the patch so it shows the full codegen diff. RKSimon: Might be best if you regenerate+commit this file against trunk and then rebase the patch so it…

	; 0xff00ff00 = 4278255360			; 0xff00ff00 = 4278255360
	; 0x00ff00ff = 16711935			; 0x00ff00ff = 16711935
	define i32 @f1(i32 %a) {			define i32 @f1(i32 %a) {
	; CHECK-LABEL: f1:			; CHECK-LABEL: f1:
	; CHECK: @ %bb.0:			; CHECK: @ %bb.0:
				; CHECK-NEXT: rev16 r0, r0
				; CHECK-NEXT: bx lr
				%l8 = shl i32 %a, 8
				%r8 = lshr i32 %a, 8
				%mask_l8 = and i32 %l8, 4278255360
				%mask_r8 = and i32 %r8, 16711935
				%tmp = or i32 %mask_l8, %mask_r8
				ret i32 %tmp
				}

				define i32 @bswap_ror_commuted(i32 %a) {
				; CHECK-LABEL: bswap_ror_commuted:
				; CHECK: @ %bb.0:
				; CHECK-NEXT: rev16 r0, r0
				; CHECK-NEXT: bx lr
				%l8 = shl i32 %a, 8
				%r8 = lshr i32 %a, 8
				%mask_l8 = and i32 %l8, 4278255360
				%mask_r8 = and i32 %r8, 16711935
				%tmp = or i32 %mask_r8, %mask_l8
				ret i32 %tmp
				}

				define i32 @different_shift_amount(i32 %a) {
				; CHECK-LABEL: different_shift_amount:
				; CHECK: @ %bb.0:
	; CHECK-NEXT: mov.w r1, #16711935			; CHECK-NEXT: mov.w r1, #16711935
	; CHECK-NEXT: mov.w r2, #-16711936			; CHECK-NEXT: movw r2, #65024
	; CHECK-NEXT: and.w r1, r1, r0, lsr #8			; CHECK-NEXT: and.w r1, r1, r0, lsr #8
	; CHECK-NEXT: and.w r0, r2, r0, lsl #8			; CHECK-NEXT: movt r2, #65280
				; CHECK-NEXT: and.w r0, r2, r0, lsl #9
	; CHECK-NEXT: add r0, r1			; CHECK-NEXT: add r0, r1
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
				%l8 = shl i32 %a, 9
				%r8 = lshr i32 %a, 8
				%mask_l8 = and i32 %l8, 4278255360
				%mask_r8 = and i32 %r8, 16711935
				%tmp = or i32 %mask_l8, %mask_r8
				ret i32 %tmp
				}

				define i32 @different_constant(i32 %a) {
				; CHECK-LABEL: different_constant:
				; CHECK: @ %bb.0:
				; CHECK-NEXT: mov.w r1, #16711935
				; CHECK-NEXT: and.w r0, r1, r0, lsr #8
				; CHECK-NEXT: bx lr
	%l8 = shl i32 %a, 8			%l8 = shl i32 %a, 8
	%r8 = lshr i32 %a, 8			%r8 = lshr i32 %a, 8
				%mask_l8 = and i32 %l8, 42
				%mask_r8 = and i32 %r8, 16711935
				%tmp = or i32 %mask_l8, %mask_r8
				ret i32 %tmp
				}

				define i32 @different_op(i32 %a) {
				; CHECK-LABEL: different_op:
				; CHECK: @ %bb.0:
				; CHECK-NEXT: mov.w r1, #16711935
				; CHECK-NEXT: movw r2, #256
				; CHECK-NEXT: and.w r1, r1, r0, lsr #8
				; CHECK-NEXT: movt r2, #255
				; CHECK-NEXT: add.w r0, r2, r0, lsl #8
				; CHECK-NEXT: orrs r0, r1
				; CHECK-NEXT: bx lr
				%l8 = shl i32 %a, 8
				%r8 = lshr i32 %a, 8
				%mask_l8 = sub i32 %l8, 4278255360
				%mask_r8 = and i32 %r8, 16711935
				%tmp = or i32 %mask_l8, %mask_r8
				ret i32 %tmp
				}

				define i32 @different_vars(i32 %a, i32 %b) {
				; CHECK-LABEL: different_vars:
				; CHECK: @ %bb.0:
				; CHECK-NEXT: mov.w r2, #16711935
				; CHECK-NEXT: and.w r1, r2, r1, lsr #8
				; CHECK-NEXT: mov.w r2, #-16711936
				; CHECK-NEXT: and.w r0, r2, r0, lsl #8
				; CHECK-NEXT: add r0, r1
				; CHECK-NEXT: bx lr
				%l8 = shl i32 %a, 8
				%r8 = lshr i32 %b, 8
	%mask_l8 = and i32 %l8, 4278255360			%mask_l8 = and i32 %l8, 4278255360
	%mask_r8 = and i32 %r8, 16711935			%mask_r8 = and i32 %r8, 16711935
	%tmp = or i32 %mask_l8, %mask_r8			%tmp = or i32 %mask_l8, %mask_r8
	ret i32 %tmp			ret i32 %tmp
	}			}


	; FIXME: this rev16 pattern is not matching			; FIXME: this rev16 pattern is not matching

	; 0xff000000 = 4278190080			; 0xff000000 = 4278190080
	; 0x00ff0000 = 16711680			; 0x00ff0000 = 16711680
	; 0x0000ff00 = 65280			; 0x0000ff00 = 65280
	; 0x000000ff = 255			; 0x000000ff = 255
	define i32 @f2(i32 %a) {			define i32 @f2(i32 %a) {
	; CHECK-LABEL: f2:			; CHECK-LABEL: f2:
	Show All 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine][ARM] Combine pattern for REV16
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 244159

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/Thumb2/thumb2-rev16.ll

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine][ARM] Combine pattern for REV16ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 244159

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/Thumb2/thumb2-rev16.ll

[DAGCombine][ARM] Combine pattern for REV16
ClosedPublic