Download Raw Diff

Details

Reviewers

hfinkel
nemanjai

Commits

rG9f0fe9a3f86d: If the arch is P9, we will select the DFLOADf32/DFLOADf64 pseudo instruction…
rL335024: If the arch is P9, we will select the DFLOADf32/DFLOADf64 pseudo instruction…

Summary

If the arch is P9, we will select the DFLOADf32/DFLOADf64 pseudo instruction when we are loading a floating, and expand it post RA basing on the register pressure. However, we miss to do the add-imm peephole for these pseudo instruction.

Diff Detail

Repository: rL LLVM

Event Timeline

steven.zhang created this revision.May 30 2018, 7:51 PM

Herald added subscribers: kbarton, hiraditya. · View Herald TranscriptMay 30 2018, 7:51 PM

nemanjai requested changes to this revision.May 31 2018, 1:43 AM

nemanjai added inline comments.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
2091 ↗	(On Diff #149228)	This is not adequate. We don't only convert this for constant pool loads, it will crash with something like this (also please add this as a test case): float FArr[10]; float getF() { return FArr[3] + 3.4f; } I think it's probably a good idea to implement something like: #ifndef NDEBUG static bool isAnImmediateOperand(const MachineOperand &MO) { return MO.isCPI() \|\| MO.isGlobal() \|\| MO.isImm(); } #endif and use that in this (and similar) assert(s).

This revision now requires changes to proceed.May 31 2018, 1:43 AM

Thank you for the comment. I will fix that. BTW, I cannot get the assertion from your case, but this case.(miss some options ?) It seems that, clang put the floating variable into the TOC with some condition, which is different from gcc/xlc. I will deliver some other change to fix this issue.

float attribute((visibility("hidden"))) FArr[10];

float getF() {

return FArr[3] + 3.4f;

}

There are two changes:

add a query to check if it is imm operand as Nemanjai suggested. Didn't find other places that could use this query.
add a new test case to address the global case.

I'd typically be OK with these cosmetic changes being made on the commit, but there are a large enough number of changes that I'd prefer to see the updated patch. Thanks for fixing these.

llvm/test/CodeGen/PowerPC/toc-float.ll
1 ↗	(On Diff #149393)	Nit: There's only one RUN in this test case, no need for the check prefix - just use the default `CHECK` directives with no prefix. Also, it would be good to have all the following tests: Returning a `double` constant that can be represented as `float` (you already have this) Returning a `double` constant that cannot be represented as `float` (you already have this) Returning a `float` constant Accessing a global array of `float` (you already have this) Accessing a global array of `double` Accessing a global array of either `double` or `float` where the index of the access is large enough that a `D-Form` load cannot be used (perhaps above `4096` for `double`)
4 ↗	(On Diff #149393)	Nit: indentation.
12 ↗	(On Diff #149393)	Nit: please change the constant being returned to be significantly different from the one being returned from `bar()` so that it is obvious from a quick visual inspection that they're different. Also, you should add a comment that the constant cannot be represented exactly as `float` so a `double` is loaded from the constant pool. This can either be with a comment or by naming the function adequately.

steven.zhang updated this revision to Diff 150453.Jun 8 2018, 12:39 AM

Update the change basing on Nemanjai's comment. Thank you.

steven.zhang updated this revision to Diff 150470.Jun 8 2018, 2:52 AM

Other than the minor nit, LGTM.

llvm/test/CodeGen/PowerPC/toc-float.ll
65 ↗	(On Diff #150470)	This comment is unclear to me. I'm not sure how we could use a D-Form instruction when the offset doesn't fit in the displacement field. In any case, clarify the comment or remove it.

This revision is now accepted and ready to land.Jun 8 2018, 3:14 PM

steven.zhang added inline comments.Jun 10 2018, 7:25 PM

llvm/test/CodeGen/PowerPC/toc-float.ll
65 ↗	(On Diff #150470)	LFD didn't have the alignment restrict while LXSD has. When TOC_Entry is lowing, we assume that, the disp must be multiple of 4, which in fact, is not necessary for LFD. Therefore, we could do it post RA. For this case, LFD 32768[REG] is valid.

steven.zhang marked an inline comment as done.Jun 10 2018, 10:03 PM

steven.zhang marked an inline comment as done.Jun 10 2018, 10:10 PM

steven.zhang added inline comments.

llvm/test/CodeGen/PowerPC/toc-float.ll
65 ↗	(On Diff #150470)	Sorry, please just ignore this comment, as I didn't realize that, the imm for LFD is signed, whose range is -32768 ~ 32767. LFD 32768[REG] is invalid. And this has nothing to do with alignment.

steven.zhang updated this revision to Diff 150674.Jun 10 2018, 10:14 PM

steven.zhang marked 2 inline comments as done.Jun 10 2018, 10:23 PM

Add the triple for the new created test case toc-float.ll

nemanjai added inline comments.Jun 12 2018, 3:48 PM

llvm/test/CodeGen/PowerPC/toc-float.ll
4 ↗	(On Diff #150897)	The preference is to specify the triple on the command line in test cases and not in the IR. Also, please remove the `datalayout`. Of course, feel free to do this on the commit.

Remove the data layout and triple in the IR and specify it in the command line.

Closed by commit rL335024: If the arch is P9, we will select the DFLOADf32/DFLOADf64 pseudo instruction… (authored by • qshanz). · Explain WhyJun 18 2018, 11:59 PM

This revision was automatically updated to reflect the committed changes.

Diff 151854

llvm/trunk/lib/Target/PowerPC/PPCISelDAGToDAG.cpp

Show First 20 Lines • Show All 6,038 Lines • ▼ Show 20 Lines	void PPCDAGToDAGISel::PeepholePPC64() {
while (Position != CurDAG->allnodes_begin()) {		while (Position != CurDAG->allnodes_begin()) {
SDNode N = &--Position;		SDNode N = &--Position;
// Skip dead nodes and any non-machine opcodes.		// Skip dead nodes and any non-machine opcodes.
if (N->use_empty() \|\| !N->isMachineOpcode())		if (N->use_empty() \|\| !N->isMachineOpcode())
continue;		continue;

unsigned FirstOp;		unsigned FirstOp;
unsigned StorageOpcode = N->getMachineOpcode();		unsigned StorageOpcode = N->getMachineOpcode();
		bool RequiresMod4Offset = false;

switch (StorageOpcode) {		switch (StorageOpcode) {
default: continue;		default: continue;

		case PPC::LWA:
		case PPC::LD:
		case PPC::DFLOADf64:
		case PPC::DFLOADf32:
		RequiresMod4Offset = true;
		LLVM_FALLTHROUGH;
case PPC::LBZ:		case PPC::LBZ:
case PPC::LBZ8:		case PPC::LBZ8:
case PPC::LD:
case PPC::LFD:		case PPC::LFD:
case PPC::LFS:		case PPC::LFS:
case PPC::LHA:		case PPC::LHA:
case PPC::LHA8:		case PPC::LHA8:
case PPC::LHZ:		case PPC::LHZ:
case PPC::LHZ8:		case PPC::LHZ8:
case PPC::LWA:
case PPC::LWZ:		case PPC::LWZ:
case PPC::LWZ8:		case PPC::LWZ8:
FirstOp = 0;		FirstOp = 0;
break;		break;

		case PPC::STD:
		case PPC::DFSTOREf64:
		case PPC::DFSTOREf32:
		RequiresMod4Offset = true;
		LLVM_FALLTHROUGH;
case PPC::STB:		case PPC::STB:
case PPC::STB8:		case PPC::STB8:
case PPC::STD:
case PPC::STFD:		case PPC::STFD:
case PPC::STFS:		case PPC::STFS:
case PPC::STH:		case PPC::STH:
case PPC::STH8:		case PPC::STH8:
case PPC::STW:		case PPC::STW:
case PPC::STW8:		case PPC::STW8:
FirstOp = 1;		FirstOp = 1;
break;		break;
Show All 30 Lines	while (Position != CurDAG->allnodes_begin()) {
case PPC::ADDI:		case PPC::ADDI:
// In some cases (such as TLS) the relocation information		// In some cases (such as TLS) the relocation information
// is already in place on the operand, so copying the operand		// is already in place on the operand, so copying the operand
// is sufficient.		// is sufficient.
ReplaceFlags = false;		ReplaceFlags = false;
// For these cases, the immediate may not be divisible by 4, in		// For these cases, the immediate may not be divisible by 4, in
// which case the fold is illegal for DS-form instructions. (The		// which case the fold is illegal for DS-form instructions. (The
// other cases provide aligned addresses and are always safe.)		// other cases provide aligned addresses and are always safe.)
if ((StorageOpcode == PPC::LWA \|\|		if (RequiresMod4Offset &&
StorageOpcode == PPC::LD \|\|
StorageOpcode == PPC::STD) &&
(!isa<ConstantSDNode>(Base.getOperand(1)) \|\|		(!isa<ConstantSDNode>(Base.getOperand(1)) \|\|
Base.getConstantOperandVal(1) % 4 != 0))		Base.getConstantOperandVal(1) % 4 != 0))
continue;		continue;
break;		break;
case PPC::ADDIdtprelL:		case PPC::ADDIdtprelL:
Flags = PPCII::MO_DTPREL_LO;		Flags = PPCII::MO_DTPREL_LO;
break;		break;
case PPC::ADDItlsldL:		case PPC::ADDItlsldL:
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	if (ReplaceFlags) {
// If we're directly folding the addend from an addi instruction, then:		// If we're directly folding the addend from an addi instruction, then:
// 1. In general, the offset on the memory access must be zero.		// 1. In general, the offset on the memory access must be zero.
// 2. If the addend is a constant, then it can be combined with a		// 2. If the addend is a constant, then it can be combined with a
// non-zero offset, but only if the result meets the encoding		// non-zero offset, but only if the result meets the encoding
// requirements.		// requirements.
if (auto *C = dyn_cast<ConstantSDNode>(ImmOpnd)) {		if (auto *C = dyn_cast<ConstantSDNode>(ImmOpnd)) {
Offset += C->getSExtValue();		Offset += C->getSExtValue();

if ((StorageOpcode == PPC::LWA \|\| StorageOpcode == PPC::LD \|\|		if (RequiresMod4Offset && (Offset % 4) != 0)
StorageOpcode == PPC::STD) && (Offset % 4) != 0)
continue;		continue;

if (!isInt<16>(Offset))		if (!isInt<16>(Offset))
continue;		continue;

ImmOpnd = CurDAG->getTargetConstant(Offset, SDLoc(ImmOpnd),		ImmOpnd = CurDAG->getTargetConstant(Offset, SDLoc(ImmOpnd),
ImmOpnd.getValueType());		ImmOpnd.getValueType());
} else if (Offset != 0) {		} else if (Offset != 0) {
Show All 15 Lines	while (Position != CurDAG->allnodes_begin()) {
// immediate operand, add it now.		// immediate operand, add it now.
if (ReplaceFlags) {		if (ReplaceFlags) {
if (GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(ImmOpnd)) {		if (GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(ImmOpnd)) {
SDLoc dl(GA);		SDLoc dl(GA);
const GlobalValue *GV = GA->getGlobal();		const GlobalValue *GV = GA->getGlobal();
// We can't perform this optimization for data whose alignment		// We can't perform this optimization for data whose alignment
// is insufficient for the instruction encoding.		// is insufficient for the instruction encoding.
if (GV->getAlignment() < 4 &&		if (GV->getAlignment() < 4 &&
(StorageOpcode == PPC::LD \|\| StorageOpcode == PPC::STD \|\|		(RequiresMod4Offset \|\| (Offset % 4) != 0)) {
StorageOpcode == PPC::LWA \|\| (Offset % 4) != 0)) {
LLVM_DEBUG(dbgs() << "Rejected this candidate for alignment.\n\n");		LLVM_DEBUG(dbgs() << "Rejected this candidate for alignment.\n\n");
continue;		continue;
}		}
ImmOpnd = CurDAG->getTargetGlobalAddress(GV, dl, MVT::i64, Offset, Flags);		ImmOpnd = CurDAG->getTargetGlobalAddress(GV, dl, MVT::i64, Offset, Flags);
} else if (ConstantPoolSDNode *CP =		} else if (ConstantPoolSDNode *CP =
dyn_cast<ConstantPoolSDNode>(ImmOpnd)) {		dyn_cast<ConstantPoolSDNode>(ImmOpnd)) {
const Constant *C = CP->getConstVal();		const Constant *C = CP->getConstVal();
ImmOpnd = CurDAG->getTargetConstantPool(C, MVT::i64,		ImmOpnd = CurDAG->getTargetConstantPool(C, MVT::i64,
Show All 29 Lines

llvm/trunk/lib/Target/PowerPC/PPCInstrInfo.cpp

Show First 20 Lines • Show All 2,059 Lines • ▼ Show 20 Lines	if ((TargetReg >= PPC::F0 && TargetReg <= PPC::F31) \|\|
(TargetReg >= PPC::VSL0 && TargetReg <= PPC::VSL31))		(TargetReg >= PPC::VSL0 && TargetReg <= PPC::VSL31))
Opcode = LowerOpcode;		Opcode = LowerOpcode;
else		else
Opcode = UpperOpcode;		Opcode = UpperOpcode;
MI.setDesc(get(Opcode));		MI.setDesc(get(Opcode));
return true;		return true;
}		}

		#ifndef NDEBUG
		static bool isAnImmediateOperand(const MachineOperand &MO) {
		return MO.isCPI() \|\| MO.isGlobal() \|\| MO.isImm();
		}
		#endif

bool PPCInstrInfo::expandPostRAPseudo(MachineInstr &MI) const {		bool PPCInstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
auto &MBB = *MI.getParent();		auto &MBB = *MI.getParent();
auto DL = MI.getDebugLoc();		auto DL = MI.getDebugLoc();

switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case TargetOpcode::LOAD_STACK_GUARD: {		case TargetOpcode::LOAD_STACK_GUARD: {
assert(Subtarget.isTargetLinux() &&		assert(Subtarget.isTargetLinux() &&
"Only Linux target is expected to contain LOAD_STACK_GUARD");		"Only Linux target is expected to contain LOAD_STACK_GUARD");
const int64_t Offset = Subtarget.isPPC64() ? -0x7010 : -0x7008;		const int64_t Offset = Subtarget.isPPC64() ? -0x7010 : -0x7008;
const unsigned Reg = Subtarget.isPPC64() ? PPC::X13 : PPC::R2;		const unsigned Reg = Subtarget.isPPC64() ? PPC::X13 : PPC::R2;
MI.setDesc(get(Subtarget.isPPC64() ? PPC::LD : PPC::LWZ));		MI.setDesc(get(Subtarget.isPPC64() ? PPC::LD : PPC::LWZ));
MachineInstrBuilder(*MI.getParent()->getParent(), MI)		MachineInstrBuilder(*MI.getParent()->getParent(), MI)
.addImm(Offset)		.addImm(Offset)
.addReg(Reg);		.addReg(Reg);
return true;		return true;
}		}
case PPC::DFLOADf32:		case PPC::DFLOADf32:
case PPC::DFLOADf64:		case PPC::DFLOADf64:
case PPC::DFSTOREf32:		case PPC::DFSTOREf32:
case PPC::DFSTOREf64: {		case PPC::DFSTOREf64: {
assert(Subtarget.hasP9Vector() &&		assert(Subtarget.hasP9Vector() &&
"Invalid D-Form Pseudo-ops on Pre-P9 target.");		"Invalid D-Form Pseudo-ops on Pre-P9 target.");
assert(MI.getOperand(2).isReg() && MI.getOperand(1).isImm() &&		assert(MI.getOperand(2).isReg() &&
		isAnImmediateOperand(MI.getOperand(1)) &&
"D-form op must have register and immediate operands");		"D-form op must have register and immediate operands");
return expandVSXMemPseudo(MI);		return expandVSXMemPseudo(MI);
}		}
case PPC::XFLOADf32:		case PPC::XFLOADf32:
case PPC::XFSTOREf32:		case PPC::XFSTOREf32:
case PPC::LIWAX:		case PPC::LIWAX:
case PPC::LIWZX:		case PPC::LIWZX:
case PPC::STIWX: {		case PPC::STIWX: {
▲ Show 20 Lines • Show All 1,252 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/PowerPC/mcm-12.ll

	Show All 27 Lines
	; CHECK-VSX: addis [[REG1:[0-9]+]], 2, [[VAR]]@toc@ha			; CHECK-VSX: addis [[REG1:[0-9]+]], 2, [[VAR]]@toc@ha
	; CHECK-VSX: addi [[REG1]], {{[0-9]+}}, [[VAR]]@toc@l			; CHECK-VSX: addi [[REG1]], {{[0-9]+}}, [[VAR]]@toc@l
	; CHECK-VSX: lfdx {{[0-9]+}}, 0, [[REG1]]			; CHECK-VSX: lfdx {{[0-9]+}}, 0, [[REG1]]

	; CHECK-P9: [[VAR:[a-z0-9A-Z_.]+]]:			; CHECK-P9: [[VAR:[a-z0-9A-Z_.]+]]:
	; CHECK-P9: .quad 4562098671269285104			; CHECK-P9: .quad 4562098671269285104
	; CHECK-P9-LABEL: test_double_const:			; CHECK-P9-LABEL: test_double_const:
	; CHECK-P9: addis [[REG1:[0-9]+]], 2, [[VAR]]@toc@ha			; CHECK-P9: addis [[REG1:[0-9]+]], 2, [[VAR]]@toc@ha
	; CHECK-P9: addi [[REG1]], {{[0-9]+}}, [[VAR]]@toc@l			; CHECK-P9: lfd {{[0-9]+}}, [[VAR]]@toc@l([[REG1]])
	; CHECK-P9: lfd {{[0-9]+}}, 0([[REG1]])

llvm/trunk/test/CodeGen/PowerPC/toc-float.ll

				; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr9 <%s \| FileCheck %s

				; As the constant could be represented as float, a float is
				; loaded from constant pool.
				define double @doubleConstant1() {
				ret double 1.400000e+01
				}

				; CHECK-LABEL: doubleConstant1:
				; CHECK: addis [[REG1:[0-9]+]], 2, [[VAR:[a-z0-9A-Z_.]+]]@toc@ha
				; CHECK: lfs {{[0-9]+}}, [[VAR]]@toc@l([[REG1]])

				; As the constant couldn't be represented as float, a double is
				; loaded from constant pool.
				define double @doubleConstant2() {
				ret double 2.408904e+01
				}

				; CHECK-LABEL: doubleConstant2:
				; CHECK: addis [[REG1:[0-9]+]], 2, [[VAR:[a-z0-9A-Z_.]+]]@toc@ha
				; CHECK: lfd {{[0-9]+}}, [[VAR]]@toc@l([[REG1]])

				@FArr = hidden local_unnamed_addr global [10 x float] zeroinitializer, align 4

				define float @floatConstantArray() local_unnamed_addr {
				%1 = load float, float* getelementptr inbounds ([10 x float], [10 x float]* @FArr, i64 0, i64 3), align 4
				%2 = fadd float %1, 0x400B333340000000
				ret float %2
				}

				; CHECK-LABEL: floatConstantArray
				; CHECK: addis [[REG1:[0-9]+]], 2, [[VAR:[a-z0-9A-Z_.]+]]@toc@ha+[[REG2:[0-9]+]]
				; CHECK: lfs {{[0-9]+}}, [[VAR]]@toc@l+[[REG2]]([[REG1]])

				define float @floatConstant() {
				ret float 0x400470A3E0000000
				}

				; CHECK-LABEL: floatConstant:
				; CHECK: addis [[REG1:[0-9]+]], 2, [[VAR:[a-z0-9A-Z_.]+]]@toc@ha
				; CHECK: lfs {{[0-9]+}}, [[VAR]]@toc@l([[REG1]])

				; llvm put the hidden globals into the TOC table.
				; TODO - do some analysis and decide which globals could be put into TOC.
				@d = hidden local_unnamed_addr global [200 x double] zeroinitializer, align 8

				define double @doubleConstantArray() {
				%1 = load double, double* getelementptr inbounds ([200 x double], [200 x double]* @d, i64 0, i64 3), align 8
				%2 = fadd double %1, 6.880000e+00
				ret double %2
				}

				; CHECK-LABEL: doubleConstantArray
				; CHECK: addis [[REG1:[0-9]+]], 2, [[VAR:[a-z0-9A-Z_.]+]]@toc@ha+[[REG2:[0-9]+]]
				; CHECK: lfd {{[0-9]+}}, [[VAR]]@toc@l+[[REG2]]([[REG1]])

				@arr = hidden local_unnamed_addr global [20000 x double] zeroinitializer, align 8

				define double @doubleLargeConstantArray() {
				%1 = load double, double* getelementptr inbounds ([20000 x double], [20000 x double]* @arr, i64 0, i64 4096), align 8
				%2 = fadd double %1, 6.880000e+00
				ret double %2
				}

				; access element that out of range
				; CHECK-LABEL: doubleLargeConstantArray
				; CHECK: addis [[REG1:[0-9]+]], 2, [[VAR:[a-z0-9A-Z_.]+]]@toc@ha
				; CHECK: li [[REG2:[0-9]+]], 0
				; CHECK: addi [[REG3:[0-9]+]], [[REG1]], [[VAR:[a-z0-9A-Z_.]+]]@toc@l
				; CHECK: ori [[REG4:[0-9]+]], [[REG2]], 32768
				; CHECK: lfdx {{[0-9]+}}, [[REG3]], [[REG4]]

This is an archive of the discontinued LLVM Phabricator instance.

[Power9] Do the add-imm peephole for pseudo instruction DFLOADf32/DFLOADf64 and the store pair
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 151854

llvm/trunk/lib/Target/PowerPC/PPCISelDAGToDAG.cpp

llvm/trunk/lib/Target/PowerPC/PPCInstrInfo.cpp

llvm/trunk/test/CodeGen/PowerPC/mcm-12.ll

llvm/trunk/test/CodeGen/PowerPC/toc-float.ll

This is an archive of the discontinued LLVM Phabricator instance.

[Power9] Do the add-imm peephole for pseudo instruction DFLOADf32/DFLOADf64 and the store pairClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 151854

llvm/trunk/lib/Target/PowerPC/PPCISelDAGToDAG.cpp

llvm/trunk/lib/Target/PowerPC/PPCInstrInfo.cpp

llvm/trunk/test/CodeGen/PowerPC/mcm-12.ll

llvm/trunk/test/CodeGen/PowerPC/toc-float.ll

[Power9] Do the add-imm peephole for pseudo instruction DFLOADf32/DFLOADf64 and the store pair
ClosedPublic