This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
1/2
PPCISelDAGToDAG.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
vsx-p9.ll

Differential D45079

[PowerPC] allow D-form VSX load/store when accessing FrameIndex without offset
ClosedPublic

Authored by inouehrs on Mar 30 2018, 1:34 AM.

Download Raw Diff

Details

Reviewers

echristo
timshen
kbarton
nemanjai
syzaara
sfertile
lei

Commits

rGa2eefb6d9a76: [PowerPC] allow D-form VSX load/store when accessing FrameIndex without offset
rL329377: [PowerPC] allow D-form VSX load/store when accessing FrameIndex without offset

Summary

VSX D-form load/store instructions of POWER9 require the offset be a multiple of 16 and a helper`isOffsetMultipleOf` is used to check this.
So far, it handles FrameIndex + offset case, but not handling FrameIndex without offset case. Due to this, we are missing opportunities to exploit D-form instructions when accessing an object or array allocated on stack.
For example, x-form store (stxvx) is used for int a[4] = {0}; instead of stxv. For larger arrays, D-form instruction is not used when accessing the first 16-byte. Using D-form instructions reduces instructions as well as reducing register pressure.

Diff Detail

Event Timeline

inouehrs created this revision.Mar 30 2018, 1:34 AM

inouehrs retitled this revision from [PowerPC] allow D-form load/store when accessing FrameIndex without offset to [PowerPC] allow D-form VSX load/store when accessing FrameIndex without offset .Mar 30 2018, 1:36 AM

nemanjai added inline comments.Apr 3 2018, 3:15 PM

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
3961	Can we just combine this with the above? Perhaps with something like: `if (FrameIndexSDNode *FI == dyn_cast<FrameIndexSDNode>(AddrOp.getOpcode() == ISD::ADD ? AddrOp.getOperand(0) : AddrOp))`

inouehrs updated this revision to Diff 140904.Apr 3 2018, 10:30 PM

inouehrs marked an inline comment as done.

inouehrs added inline comments.

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
3961	Thanks for the advice. Is this better?

I assume that there are cases where the frame index that doesn't have an offset actually ends up being an address with some displacement and that's the purpose of this patch. What I mean is that I assume that this will sometimes lead to something like:

li 4, 16
stxvx 2, 3, 4

Since we'd never have something like:

li 4, 0
stxvx 2, 3, 4

as that is simply:
stxvx 2, 0, 3

If we have cases of the former, please add a test case (i.e. FrameIndex is used without an offset, but we end up needing a non-zero immediate). If we have a case of the latter, that should be fixed more generally (i.e. we should never have a zero input to an instruction where zero in a register field means literal zero).

Even the offset to the stack object is zero, the offset to stack pointer ($x1) is not zero. So we do not have li 4, 0 before stxvx, but we have something like addi 4, 1, 32.

For example, after the instruction selection without this patch, generated code for int a[12] = {0}; looks like:

%1:g8rc = ADDI8 %stack.0.a, 0
STXVX %0:vsrc, $zero8, %1:g8rc :: (store 16 into %ir.0)
STXV %0:vsrc, 32, %stack.0.a :: (store 16 into %ir.0 + 32)
STXV %0:vsrc, 16, %stack.0.a :: (store 16 into %ir.0 + 16)

Then %stack.0.a is resolved in Prologue/Epilogue Insertion & Frame Finalization pass and the code become

renamable $x3 = ADDI8 $x1, 32
STXVX renamable $vsl0, $zero8, renamable $x3 :: (store 16 into %ir.0)
STXV renamable $vsl0, 64, $x1 :: (store 16 into %ir.0 + 32)
STXV killed renamable $vsl0, 48, $x1 :: (store 16 into %ir.0 + 16)

With this patch these snippets become

STXV %0:vsrc, 32, %stack.0.a :: (store 16 into %ir.0 + 32)
STXV %0:vsrc, 16, %stack.0.a :: (store 16 into %ir.0 + 16)
STXV %0:vsrc, 0, %stack.0.a :: (store 16 into %ir.0)

and

STXV renamable $vsl0, 64, $x1 :: (store 16 into %ir.0 + 32)
STXV renamable $vsl0, 48, $x1 :: (store 16 into %ir.0 + 16)
STXV killed renamable $vsl0, 32, $x1 :: (store 16 into %ir.0)

make unit tests more strict

LGTM.

This revision is now accepted and ready to land.Apr 5 2018, 6:59 AM

Closed by commit rL329377: [PowerPC] allow D-form VSX load/store when accessing FrameIndex without offset (authored by inouehrs). · Explain WhyApr 5 2018, 10:44 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

PowerPC/

PPCISelDAGToDAG.cpp

24 lines

test/

CodeGen/

PowerPC/

vsx-p9.ll

31 lines

Diff 140904

lib/Target/PowerPC/PPCISelDAGToDAG.cpp

Show First 20 Lines • Show All 3,931 Lines • ▼ Show 20 Lines	bool PPCDAGToDAGISel::isOffsetMultipleOf(SDNode *N, unsigned Val) const {
LoadSDNode *LDN = dyn_cast<LoadSDNode>(N);		LoadSDNode *LDN = dyn_cast<LoadSDNode>(N);
StoreSDNode *STN = dyn_cast<StoreSDNode>(N);		StoreSDNode *STN = dyn_cast<StoreSDNode>(N);
SDValue AddrOp;		SDValue AddrOp;
if (LDN)		if (LDN)
AddrOp = LDN->getOperand(1);		AddrOp = LDN->getOperand(1);
else if (STN)		else if (STN)
AddrOp = STN->getOperand(2);		AddrOp = STN->getOperand(2);

		// If the address points a frame object or a frame object with an offset,
		// we need to check the object alignment.
short Imm = 0;		short Imm = 0;
if (AddrOp.getOpcode() == ISD::ADD) {		if (FrameIndexSDNode *FI = dyn_cast<FrameIndexSDNode>(
		AddrOp.getOpcode() == ISD::ADD ? AddrOp.getOperand(0) :
		AddrOp)) {
// If op0 is a frame index that is under aligned, we can't do it either,		// If op0 is a frame index that is under aligned, we can't do it either,
// because it is translated to r31 or r1 + slot + offset. We won't know the		// because it is translated to r31 or r1 + slot + offset. We won't know the
// slot number until the stack frame is finalized.		// slot number until the stack frame is finalized.
if (FrameIndexSDNode *FI = dyn_cast<FrameIndexSDNode>(AddrOp.getOperand(0))) {
const MachineFrameInfo &MFI = CurDAG->getMachineFunction().getFrameInfo();		const MachineFrameInfo &MFI = CurDAG->getMachineFunction().getFrameInfo();
unsigned SlotAlign = MFI.getObjectAlignment(FI->getIndex());		unsigned SlotAlign = MFI.getObjectAlignment(FI->getIndex());
if ((SlotAlign % Val) != 0)		if ((SlotAlign % Val) != 0)
return false;		return false;

		// If we have an offset, we need further check on the offset.
		if (AddrOp.getOpcode() != ISD::ADD)
		return true;
}		}

		if (AddrOp.getOpcode() == ISD::ADD)
return isIntS16Immediate(AddrOp.getOperand(1), Imm) && !(Imm % Val);		return isIntS16Immediate(AddrOp.getOperand(1), Imm) && !(Imm % Val);
}

		nemanjaiUnsubmitted Done Reply Inline Actions Can we just combine this with the above? Perhaps with something like: `if (FrameIndexSDNode FI == dyn_cast<FrameIndexSDNode>(AddrOp.getOpcode() == ISD::ADD ? AddrOp.getOperand(0) : AddrOp))` nemanjai:* Can we just combine this with the above? Perhaps with something like: `if (FrameIndexSDNode *FI…
		inouehrsAuthorUnsubmitted Not Done Reply Inline Actions Thanks for the advice. Is this better? inouehrs: Thanks for the advice. Is this better?
// If the address comes from the outside, the offset will be zero.		// If the address comes from the outside, the offset will be zero.
return AddrOp.getOpcode() == ISD::CopyFromReg;		return AddrOp.getOpcode() == ISD::CopyFromReg;
}		}

void PPCDAGToDAGISel::transferMemOperands(SDNode N, SDNode Result) {		void PPCDAGToDAGISel::transferMemOperands(SDNode N, SDNode Result) {
// Transfer memoperands.		// Transfer memoperands.
MachineSDNode::mmo_iterator MemOp = MF->allocateMemRefsArray(1);		MachineSDNode::mmo_iterator MemOp = MF->allocateMemRefsArray(1);
MemOp[0] = cast<MemSDNode>(N)->getMemOperand();		MemOp[0] = cast<MemSDNode>(N)->getMemOperand();
▲ Show 20 Lines • Show All 2,169 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/vsx-p9.ll

	Show First 20 Lines • Show All 405 Lines • ▼ Show 20 Lines

	; CHECK-LABEL: @test1			; CHECK-LABEL: @test1
	; CHECK: vnegd 2, 2			; CHECK: vnegd 2, 2
	; CHECK: blr			; CHECK: blr

	}			}

	declare void @sink(...)			declare void @sink(...)

				; stack object should be accessed using D-form load/store instead of X-form
				define signext i32 @func1() {
				; CHECK-LABEL: @func1
				; CHECK-NOT: stxvx
				; CHECK: blr
				entry:
				%a = alloca [4 x i32], align 4
				%0 = bitcast [4 x i32]* %a to i8*
				call void @llvm.memset.p0i8.i64(i8* nonnull align 4 %0, i8 0, i64 16, i1 false)
				%arraydecay = getelementptr inbounds [4 x i32], [4 x i32]* %a, i64 0, i64 0
				%call = call signext i32 @callee(i32* nonnull %arraydecay) #3
				ret i32 %call
				}

				; stack object should be accessed using D-form load/store instead of X-form
				define signext i32 @func2() {
				; CHECK-LABEL: @func2
				; CHECK-NOT: stxvx
				; CHECK: blr
				entry:
				%a = alloca [16 x i32], align 4
				%0 = bitcast [16 x i32]* %a to i8*
				call void @llvm.memset.p0i8.i64(i8* nonnull align 4 %0, i8 0, i64 64, i1 false)
				%arraydecay = getelementptr inbounds [16 x i32], [16 x i32]* %a, i64 0, i64 0
				%call = call signext i32 @callee(i32* nonnull %arraydecay) #3
				ret i32 %call
				}

				declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i1) #1
				declare signext i32 @callee(i32*) local_unnamed_addr #2