This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/Mips/
-
Target/
-
Mips/
-
MipsISelDAGToDAG.h
-
MipsISelDAGToDAG.cpp
-
test/CodeGen/Mips/msa/
-
CodeGen/
-
Mips/
-
msa/
-
arithmetic.ll

Differential D66805

[MIPS] For vectors, select `add %x, C` as `sub %x, -C` if it results in inline immediate
ClosedPublic

Authored by lebedev.ri on Aug 27 2019, 8:09 AM.

Download Raw Diff

Details

Reviewers

atanasyan
Petar.Avramovic
RKSimon

Commits

rGec6b91b6655a: [MIPS] For vectors, select `add %x, C` as `sub %x, -C` if it results in inline…
rL372254: [MIPS] For vectors, select `add %x, C` as `sub %x, -C` if it results in inline…

Summary

As discussed in https://reviews.llvm.org/D62341#1515637,
for MIPS add %x, -1 isn't optimal. Unlike X86 there
are no fastpaths to matearialize such -1/1 vector constants,
and sub %x, 1 results in better codegen,
so undo canonicalization

Diff Detail

Repository: rL LLVM

Event Timeline

lebedev.ri created this revision.Aug 27 2019, 8:09 AM

Herald added subscribers: jrtc27, hiraditya, arichardson, sdardis. · View Herald TranscriptAug 27 2019, 8:09 AM

lebedev.ri added a parent revision: D62341: [DAGCombine][X86][AArch64][AMDGPU][MIPS][PPC] (sub x, c) -> (add x, -c) vector edition..Aug 27 2019, 8:09 AM

lebedev.ri mentioned this in D62341: [DAGCombine][X86][AArch64][AMDGPU][MIPS][PPC] (sub x, c) -> (add x, -c) vector edition..Aug 27 2019, 8:11 AM

This patch targets very specific values(1,-1) but there are more. The trick for D62341 is that msa vector add/sub imm accept 5 bit unsigned imm, so it is ok to switch from add imm to sub imm (also sub to add) if it changes imm form negative to positive. I think it is cleanest to have some hook and ask target if it would like to transform sub into add (add to sub) with imm.

Thank you for taking a look.

In D66805#1647228, @Petar.Avramovic wrote:

This patch targets very specific values(1,-1) but there are more. The trick for D62341 is that msa vector add/sub imm accept 5 bit unsigned imm, so it is ok to switch from add imm to sub imm (also sub to add) if it changes imm form negative to positive.

Yeah, i have found that in ISA manual since posting the patch.

I think it is cleanest to have some hook and ask target if it would like to transform sub into add (add to sub) with imm.

I understand where this is coming from but no, that will not be ok.
The whole point of D62341 is that we really don't want sub %x, C in DAGCombine,
so while we could hide it behind target hook, that defies the whole purpose.
So this needs to be undone per-target in *ISelDAGToDAG.cpp.

Oh, I see. Then we have to undo combines here. Let's what for Simon.

Forgot to mention, this sub %X, C -> add %X, -C transform is already always being performed
by InstCombine in middle-end, so such undo transform is needed regardless of what DAGCombine does.

Generalize the check, do not limit the transform to -1 specifically,
but rather all splat immediates if negating results in inline immediate (5-bit unsigned)

Rebased.

LGTM

This revision is now accepted and ready to land.Sep 3 2019, 5:14 AM

In D66805#1655408, @atanasyan wrote:

LGTM

Thank you for the review!

lebedev.ri added inline comments.Sep 18 2019, 11:51 AM

llvm/test/CodeGen/Mips/msa/arithmetic.ll
220–241 ↗	(On Diff #218151)	Hm, this one didn't get recovered by the patch.

atanasyan added inline comments.Sep 18 2019, 12:32 PM

llvm/test/CodeGen/Mips/msa/arithmetic.ll
220–241 ↗	(On Diff #218151)	I saw this regression. Unfortunately I could not quickly create a fix for it. As far as I remember, the problem is in this statement `auto *BVN = dyn_cast<BuildVectorSDNode>(C)`. In `@sub_v2i64_i` the second operand (i.e. `C`) is a bitcast and `dyn_cast<BuildVectorSDNode>` returns zero. Right now I do not have a time to dig it. Maybe next week.

Closed by commit rL372254: [MIPS] For vectors, select `add %x, C` as `sub %x, -C` if it results in inline… (authored by lebedevri). · Explain WhySep 18 2019, 12:32 PM

This revision was automatically updated to reflect the committed changes.

Diffusion mentioned this in rL372253: [CodeGen][MIPS][NFC] Some standalone tests for D66805 "or vectors, select `add….

lebedev.ri mentioned this in rG260b69490409: [CodeGen][MIPS][NFC] Some standalone tests for D66805 "or vectors, select `add….Sep 18 2019, 12:41 PM

lebedev.ri marked an inline comment as done.Sep 18 2019, 12:58 PM

lebedev.ri added inline comments.

llvm/test/CodeGen/Mips/msa/arithmetic.ll

220–241 ↗

(On Diff #218151)

Honestly i didn't notice that it didn't address all of the patterns until now.
We get:

t10: v2i64 = add t7, t16
  t7: v2i64,ch = load<(load 16 from %ir.a)> t0, t4, undef:i32
    t4: i32,ch = CopyFromReg t0, Register:i32 %1
      t3: i32 = Register %1
    t6: i32 = undef
  t16: v2i64 = bitcast t15
    t15: v4i32 = BUILD_VECTOR Constant:i32<-1>, Constant:i32<-31>, Constant:i32<-1>, Constant:i32<-31>
      t14: i32 = Constant<-1>
      t13: i32 = Constant<-31>
      t14: i32 = Constant<-1>
      t13: i32 = Constant<-31>

So yes, this because of a bitcast.
Even if we look past it, isConstantSplat() will say "no, not a splat".
This likely means something else other than isConstantSplat() should be used here.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

Mips/

MipsISelDAGToDAG.h

5 lines

MipsISelDAGToDAG.cpp

51 lines

test/

CodeGen/

Mips/

msa/

arithmetic.ll

13 lines

Diff 220728

llvm/trunk/lib/Target/Mips/MipsISelDAGToDAG.h

Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	private:
virtual bool selectVSplatUimmInvPow2(SDValue N, SDValue &Imm) const;		virtual bool selectVSplatUimmInvPow2(SDValue N, SDValue &Imm) const;
/// Select constant vector splats whose value is a run of set bits		/// Select constant vector splats whose value is a run of set bits
/// ending at the most significant bit		/// ending at the most significant bit
virtual bool selectVSplatMaskL(SDValue N, SDValue &Imm) const;		virtual bool selectVSplatMaskL(SDValue N, SDValue &Imm) const;
/// Select constant vector splats whose value is a run of set bits		/// Select constant vector splats whose value is a run of set bits
/// starting at bit zero.		/// starting at bit zero.
virtual bool selectVSplatMaskR(SDValue N, SDValue &Imm) const;		virtual bool selectVSplatMaskR(SDValue N, SDValue &Imm) const;

		/// Convert vector addition with vector subtraction if that allows to encode
		/// constant as an immediate and thus avoid extra 'ldi' instruction.
		/// add X, <-1, -1...> --> sub X, <1, 1...>
		bool selectVecAddAsVecSubIfProfitable(SDNode *Node);

void Select(SDNode *N) override;		void Select(SDNode *N) override;

virtual bool trySelect(SDNode *Node) = 0;		virtual bool trySelect(SDNode *Node) = 0;

// getImm - Return a target constant with the specified value.		// getImm - Return a target constant with the specified value.
inline SDValue getImm(const SDNode *Node, uint64_t Imm) {		inline SDValue getImm(const SDNode *Node, uint64_t Imm) {
return CurDAG->getTargetConstant(Imm, SDLoc(Node), Node->getValueType(0));		return CurDAG->getTargetConstant(Imm, SDLoc(Node), Node->getValueType(0));
}		}
Show All 10 Lines

llvm/trunk/lib/Target/Mips/MipsISelDAGToDAG.cpp

Show First 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	bool MipsDAGToDAGISel::selectVSplatMaskL(SDValue N, SDValue &Imm) const {
return false;		return false;
}		}

bool MipsDAGToDAGISel::selectVSplatMaskR(SDValue N, SDValue &Imm) const {		bool MipsDAGToDAGISel::selectVSplatMaskR(SDValue N, SDValue &Imm) const {
llvm_unreachable("Unimplemented function.");		llvm_unreachable("Unimplemented function.");
return false;		return false;
}		}

		/// Convert vector addition with vector subtraction if that allows to encode
		/// constant as an immediate and thus avoid extra 'ldi' instruction.
		/// add X, <-1, -1...> --> sub X, <1, 1...>
		bool MipsDAGToDAGISel::selectVecAddAsVecSubIfProfitable(SDNode *Node) {
		assert(Node->getOpcode() == ISD::ADD && "Should only get 'add' here.");

		EVT VT = Node->getValueType(0);
		assert(VT.isVector() && "Should only be called for vectors.");

		SDValue X = Node->getOperand(0);
		SDValue C = Node->getOperand(1);

		auto *BVN = dyn_cast<BuildVectorSDNode>(C);
		if (!BVN)
		return false;

		APInt SplatValue, SplatUndef;
		unsigned SplatBitSize;
		bool HasAnyUndefs;

		if (!BVN->isConstantSplat(SplatValue, SplatUndef, SplatBitSize, HasAnyUndefs,
		8, !Subtarget->isLittle()))
		return false;

		auto IsInlineConstant = [](const APInt &Imm) { return Imm.isIntN(5); };

		if (IsInlineConstant(SplatValue))
		return false; // Can already be encoded as an immediate.

		APInt NegSplatValue = 0 - SplatValue;
		if (!IsInlineConstant(NegSplatValue))
		return false; // Even if we negate it it won't help.

		SDLoc DL(Node);

		SDValue NegC = CurDAG->FoldConstantArithmetic(
		ISD::SUB, DL, VT, CurDAG->getConstant(0, DL, VT).getNode(), C.getNode());
		assert(NegC && "Constant-folding failed!");
		SDValue NewNode = CurDAG->getNode(ISD::SUB, DL, VT, X, NegC);

		ReplaceNode(Node, NewNode.getNode());
		SelectCode(NewNode.getNode());
		return true;
		}

/// Select instructions not customized! Used for		/// Select instructions not customized! Used for
/// expanded, promoted and normal instructions		/// expanded, promoted and normal instructions
void MipsDAGToDAGISel::Select(SDNode *Node) {		void MipsDAGToDAGISel::Select(SDNode *Node) {
unsigned Opcode = Node->getOpcode();		unsigned Opcode = Node->getOpcode();

// If we have a custom node, we already have selected!		// If we have a custom node, we already have selected!
if (Node->isMachineOpcode()) {		if (Node->isMachineOpcode()) {
LLVM_DEBUG(errs() << "== "; Node->dump(CurDAG); errs() << "\n");		LLVM_DEBUG(errs() << "== "; Node->dump(CurDAG); errs() << "\n");
Node->setNodeId(-1);		Node->setNodeId(-1);
return;		return;
}		}

// See if subclasses can handle this node.		// See if subclasses can handle this node.
if (trySelect(Node))		if (trySelect(Node))
return;		return;

switch(Opcode) {		switch(Opcode) {
default: break;		default: break;

		case ISD::ADD:
		if (Node->getSimpleValueType(0).isVector() &&
		selectVecAddAsVecSubIfProfitable(Node))
		return;
		break;

// Get target GOT address.		// Get target GOT address.
case ISD::GLOBAL_OFFSET_TABLE:		case ISD::GLOBAL_OFFSET_TABLE:
ReplaceNode(Node, getGlobalBaseReg());		ReplaceNode(Node, getGlobalBaseReg());
return;		return;

#ifndef NDEBUG		#ifndef NDEBUG
case ISD::LOAD:		case ISD::LOAD:
case ISD::STORE:		case ISD::STORE:
Show All 28 Lines

llvm/trunk/test/CodeGen/Mips/msa/arithmetic.ll

Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	; ALL-NEXT: st.b $w0, 0($4)
store <16 x i8> %2, <16 x i8>* %c		store <16 x i8> %2, <16 x i8>* %c
ret void		ret void
}		}

define void @sub_v16i8_i_negated(<16 x i8>* %c, <16 x i8>* %a) nounwind {		define void @sub_v16i8_i_negated(<16 x i8>* %c, <16 x i8>* %a) nounwind {
; ALL-LABEL: sub_v16i8_i_negated:		; ALL-LABEL: sub_v16i8_i_negated:
; ALL: # %bb.0:		; ALL: # %bb.0:
; ALL-NEXT: ld.b $w0, 0($5)		; ALL-NEXT: ld.b $w0, 0($5)
; ALL-NEXT: ldi.b $w1, -1		; ALL-NEXT: subvi.b $w0, $w0, 1
; ALL-NEXT: addv.b $w0, $w0, $w1
; ALL-NEXT: jr $ra		; ALL-NEXT: jr $ra
; ALL-NEXT: st.b $w0, 0($4)		; ALL-NEXT: st.b $w0, 0($4)
%1 = load <16 x i8>, <16 x i8>* %a		%1 = load <16 x i8>, <16 x i8>* %a
%2 = add <16 x i8> %1, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1,		%2 = add <16 x i8> %1, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1,
i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>		i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
store <16 x i8> %2, <16 x i8>* %c		store <16 x i8> %2, <16 x i8>* %c
ret void		ret void
}		}
Show All 10 Lines	%2 = sub <8 x i16> %1, <i16 1, i16 1, i16 1, i16 1,
i16 1, i16 1, i16 1, i16 1>		i16 1, i16 1, i16 1, i16 1>
store <8 x i16> %2, <8 x i16>* %c		store <8 x i16> %2, <8 x i16>* %c
ret void		ret void
}		}

define void @sub_v8i16_i_negated(<8 x i16>* %c, <8 x i16>* %a) nounwind {		define void @sub_v8i16_i_negated(<8 x i16>* %c, <8 x i16>* %a) nounwind {
; ALL-LABEL: sub_v8i16_i_negated:		; ALL-LABEL: sub_v8i16_i_negated:
; ALL: # %bb.0:		; ALL: # %bb.0:
; ALL-NEXT: ldi.b $w0, -1		; ALL-NEXT: ld.h $w0, 0($5)
; ALL-NEXT: ld.h $w1, 0($5)		; ALL-NEXT: subvi.h $w0, $w0, 1
; ALL-NEXT: addv.h $w0, $w1, $w0
; ALL-NEXT: jr $ra		; ALL-NEXT: jr $ra
; ALL-NEXT: st.h $w0, 0($4)		; ALL-NEXT: st.h $w0, 0($4)
%1 = load <8 x i16>, <8 x i16>* %a		%1 = load <8 x i16>, <8 x i16>* %a
%2 = add <8 x i16> %1, <i16 -1, i16 -1, i16 -1, i16 -1,		%2 = add <8 x i16> %1, <i16 -1, i16 -1, i16 -1, i16 -1,
i16 -1, i16 -1, i16 -1, i16 -1>		i16 -1, i16 -1, i16 -1, i16 -1>
store <8 x i16> %2, <8 x i16>* %c		store <8 x i16> %2, <8 x i16>* %c
ret void		ret void
}		}
Show All 9 Lines	; ALL-NEXT: st.w $w0, 0($4)
%2 = sub <4 x i32> %1, <i32 1, i32 1, i32 1, i32 1>		%2 = sub <4 x i32> %1, <i32 1, i32 1, i32 1, i32 1>
store <4 x i32> %2, <4 x i32>* %c		store <4 x i32> %2, <4 x i32>* %c
ret void		ret void
}		}

define void @sub_v4i32_i_negated(<4 x i32>* %c, <4 x i32>* %a) nounwind {		define void @sub_v4i32_i_negated(<4 x i32>* %c, <4 x i32>* %a) nounwind {
; ALL-LABEL: sub_v4i32_i_negated:		; ALL-LABEL: sub_v4i32_i_negated:
; ALL: # %bb.0:		; ALL: # %bb.0:
; ALL-NEXT: ldi.b $w0, -1		; ALL-NEXT: ld.w $w0, 0($5)
; ALL-NEXT: ld.w $w1, 0($5)		; ALL-NEXT: subvi.w $w0, $w0, 1
; ALL-NEXT: addv.w $w0, $w1, $w0
; ALL-NEXT: jr $ra		; ALL-NEXT: jr $ra
; ALL-NEXT: st.w $w0, 0($4)		; ALL-NEXT: st.w $w0, 0($4)
%1 = load <4 x i32>, <4 x i32>* %a		%1 = load <4 x i32>, <4 x i32>* %a
%2 = add <4 x i32> %1, <i32 -1, i32 -1, i32 -1, i32 -1>		%2 = add <4 x i32> %1, <i32 -1, i32 -1, i32 -1, i32 -1>
store <4 x i32> %2, <4 x i32>* %c		store <4 x i32> %2, <4 x i32>* %c
ret void		ret void
}		}

▲ Show 20 Lines • Show All 487 Lines • Show Last 20 Lines