This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
-
PPCAsmPrinter.cpp
1/2
PPCISelDAGToDAG.cpp
-
PPCISelLowering.h
2/6
PPCISelLowering.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
toc-float.ll

Differential D58378

[PowerPC]Leverage the addend in the TOC relocation to do the address calculation
Changes PlannedPublic

Authored by steven.zhang on Feb 19 2019, 12:51 AM.

Download Raw Diff

Details

Reviewers

nemanjai
hfinkel
stefanp
jsji

Group Reviewers

Restricted Project

Summary

For now, we use instructions to calculate the address for the element of the global array. If that offset is too large(i.e. larger than 16 bit), we have to add extra instructions to do the calculation. i.e.

double attribute((visibility("hidden"))) b[2000000000];
double foo() { return b[4096] ; }
This is the code sequence we get now:

addis 3, 2, b@toc@ha
li 4, 0
addi 3, 3, b@toc@l
ori 4, 4, 32768
lfdx 1, 4, 3

Because 32768 is not 16-bit constant, we have to use the X-form load to load the address of b[4096]. This patch is trying to leverage the addend in the relocation to do the address calculation. This is the new instruction sequence we want to produce:

addis 3, 2, b@toc@ha+32768
lfd 1, b@toc@l+32768(3)
blr

Notice that, as this transformation will take up one extra TOC entry(b+32768), we only do this if the offset is larger than 16 bit and smaller than 32bit.

Diff Detail

Event Timeline

steven.zhang created this revision.Feb 19 2019, 12:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 19 2019, 12:51 AM

Herald added subscribers: jdoerfert, kbarton, hiraditya. · View Herald Transcript

nemanjai added inline comments.Feb 19 2019, 7:35 AM

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
6583	It is not clear to me how we ensure `Offset` fits into a signed 16-bit immediate.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15489	Line too long. If you are using Vim, you can run clang-format within the editor with something like: `:14506,14560 !clang-format` as long as you have `clang-format` in your `$PATH`.
15490	s/platform/platforms
15504	Is this actually needed? There is not canonical form (with the constant being the second operand)?
15541	Why do we need this? Wouldn't this always be the case? The ELFv2 ABI has no support for 32-bit addressing so why is it that we need this? Could it not just be an assert?

This can cause relocation overflows:

$ cat b.c 
double b[1LU << 33];
double foo() { return b[(1LU << _SH) - 1] ; }
void setfoo(double d) { b[(1LU << _SH) - 1] = d; }

$ cat main.c 
double foo();
void setfoo(double);
int main(void) {
  setfoo(445.2);
  return foo() == 445.2;
}

$ clang -O2 b.c main.c -D_SH=28
/tmp/b-9d97be.o: In function `foo':
b.c:(.text+0x8): relocation truncated to fit: R_PPC64_TOC16_HA against symbol `b' defined in COMMON section in /tmp/b-9d97be.o+7ffffff8
/tmp/b-9d97be.o: In function `setfoo':
b.c:(.text+0x28): relocation truncated to fit: R_PPC64_TOC16_HA against symbol `b' defined in COMMON section in /tmp/b-9d97be.o+7ffffff8
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)

This revision now requires changes to proceed.Feb 19 2019, 8:01 AM

steven.zhang marked 3 inline comments as done.Feb 19 2019, 10:31 PM

steven.zhang added inline comments.

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
6583	We don't need to ensure the Offset fits the 16-bit imm as it is the offset of the Global Address.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15489	Well, I will set up my IDE to avoid the format issue happening again.
15541	You are right, the ABI imply the 64 bit. Thank you.

In D58378#1402373, @nemanjai wrote:

This can cause relocation overflows:

$ cat b.c 
double b[1LU << 33];
double foo() { return b[(1LU << _SH) - 1] ; }
void setfoo(double d) { b[(1LU << _SH) - 1] = d; }

$ cat main.c 
double foo();
void setfoo(double);
int main(void) {
  setfoo(445.2);
  return foo() == 445.2;
}

$ clang -O2 b.c main.c -D_SH=28
/tmp/b-9d97be.o: In function `foo':
b.c:(.text+0x8): relocation truncated to fit: R_PPC64_TOC16_HA against symbol `b' defined in COMMON section in /tmp/b-9d97be.o+7ffffff8
/tmp/b-9d97be.o: In function `setfoo':
b.c:(.text+0x28): relocation truncated to fit: R_PPC64_TOC16_HA against symbol `b' defined in COMMON section in /tmp/b-9d97be.o+7ffffff8
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)

I need to double check the ELF spec to see why it is limited to 27bit. And I check this with llvm linker(lld), it linked successfully but encounter the runtime segment fault if _SH=28. Seems that, lld also miss to do this check.

steven.zhang planned changes to this revision.Nov 8 2019, 6:11 PM

Herald added subscribers: shchenz, • wuzish. · View Herald TranscriptNov 8 2019, 6:11 PM

Still need to investigate why it is 27 bit limit.

steven.zhang planned changes to this revision.Jan 1 2020, 10:21 PM

b.c:(.text+0x8): relocation truncated to fit: R_PPC64_TOC16_HA against symbol `b' defined in COMMON section in /tmp/b-9d97be.o+7ffffff8

The r_addend is 0x7ffffff8, a value close to 2**31.

The distance between the TOC entry and the variable address cannot be too far. More accurately, -0x80008000 <= address - .TOC. + r_addend < 0x7fff8000

GNU ld correctly reports a relocation overflow. lld currently does not check R_PPC64_TOC16_HA overflow.

% powerpc64le-linux-gnu-ld -pie b.o main.o
powerpc64le-linux-gnu-ld: warning: cannot find entry symbol _start; defaulting to 0000000000000230
powerpc64le-linux-gnu-ld: b.o: in function `foo':
b.c:(.text+0x8): relocation truncated to fit: R_PPC64_TOC16_HA against symbol `b' defined in COMMON section in b.o+7ffffff8

The largest address a pair of HA/L can materialize is something like:

addis 3, 2, 32767  # adding 1 will overflow to -32768
lfd 1, 32767(3)

In D58378#1800862, @MaskRay wrote:
b.c:(.text+0x8): relocation truncated to fit: R_PPC64_TOC16_HA against symbol `b' defined in COMMON section in /tmp/b-9d97be.o+7ffffff8

The r_addend is 0x7ffffff8, a value close to 2**31.

The distance between the TOC entry and the variable address cannot be too far. More accurately, -0x80008000 <= address - .TOC. + r_addend < 0x7fff8000

GNU ld correctly reports a relocation overflow. lld currently does not check R_PPC64_TOC16_HA overflow.
% powerpc64le-linux-gnu-ld -pie b.o main.o
powerpc64le-linux-gnu-ld: warning: cannot find entry symbol _start; defaulting to 0000000000000230
powerpc64le-linux-gnu-ld: b.o: in function `foo':
b.c:(.text+0x8): relocation truncated to fit: R_PPC64_TOC16_HA against symbol `b' defined in COMMON section in b.o+7ffffff8
The largest address a pair of HA/L can materialize is something like:
addis 3, 2, 32767  # adding 1 will overflow to -32768
lfd 1, 32767(3)

Thank you for this information! I miss the "double" type in the array(it is not 27 bit, but 30 bit). So, for some unknown reason, linker reserve the 0x8000 for special usage. The addend + 0x8000 should be fit into the 32bit sign value. I will split this patch into two parts.

fix the missing part of the ASM printer of the offset.
Add the combine rule to generate the global address that has offset.

I get the reason about 0x8000 now.

#ha(value) Denotes the high adjusted value: bits 16 - 63 of the indicated value, compensating
for #lo() being treated as a signed number. That is:
#ha(x) = (x + 0x8000) >> 16

The TOC region commonly includes data items within the .got, .toc, .sdata, and .sbss sections. In the medium
code model, they can be addressed with 32-bit signed offsets from the TOC pointer register. The TOC pointer
register typically points to the beginning of the .got section + 0x8000, which permits a 2 GB TOC with the
medium and large code models.

In D58378#1801102, @steven.zhang wrote:

I get the reason about 0x8000 now.

#ha(value) Denotes the high adjusted value: bits 16 - 63 of the indicated value, compensating
for #lo() being treated as a signed number. That is:
#ha(x) = (x + 0x8000) >> 16

Yes.

The TOC region commonly includes data items within the .got, .toc, .sdata, and .sbss sections. In the medium
code model, they can be addressed with 32-bit signed offsets from the TOC pointer register. The TOC pointer
register typically points to the beginning of the .got section + 0x8000, which permits a 2 GB TOC with the
medium and large code models.

-0x80008000 <= address - .TOC. + r_addend < 0x7fff8000

If address - .TOC. can be as large as 0x7fff8000 (this may happen with huge .data or .bss), then you cannot leverage any positive value of r_addend.. Though, I believe this situation may be rare. You may try a smaller cut-off value, say, 0x100, and see if it is beneficial. Be aware that if the code references multiple elements of a global array, e.g. a[0] a[1] a[2] ... a[99], don't just create 100 TOC entries.

steven.zhang added a comment.Jan 2 2020, 6:26 PM

This comment was removed by steven.zhang.

jsji added a reviewer: Restricted Project.Jan 30 2020, 7:17 AM

jsji added a project: Restricted Project.

jsji resigned from this revision.Jun 2 2022, 8:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2022, 8:00 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

PowerPC/

5 lines

6 lines

1 line

52 lines

test/

CodeGen/

PowerPC/

toc-float.ll

37 lines

Diff 235828

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp

Show First 20 Lines • Show All 922 Lines • ▼ Show 20 Lines	case PPC::ADDItocL: {

LLVM_DEBUG(assert(		LLVM_DEBUG(assert(
!(MO.isGlobal() && Subtarget->isGVIndirectSymbol(MO.getGlobal())) &&		!(MO.isGlobal() && Subtarget->isGVIndirectSymbol(MO.getGlobal())) &&
"Interposable definitions must use indirect access."));		"Interposable definitions must use indirect access."));

const MCExpr *Exp =		const MCExpr *Exp =
MCSymbolRefExpr::create(getMCSymbolForTOCPseudoMO(MO),		MCSymbolRefExpr::create(getMCSymbolForTOCPseudoMO(MO),
MCSymbolRefExpr::VK_PPC_TOC_LO, OutContext);		MCSymbolRefExpr::VK_PPC_TOC_LO, OutContext);
		if (!MO.isJTI() && MO.getOffset())
		Exp = MCBinaryExpr::createAdd(Exp,
		MCConstantExpr::create(MO.getOffset(),
		OutContext),
		OutContext);
TmpInst.getOperand(2) = MCOperand::createExpr(Exp);		TmpInst.getOperand(2) = MCOperand::createExpr(Exp);
EmitToStreamer(*OutStreamer, TmpInst);		EmitToStreamer(*OutStreamer, TmpInst);
return;		return;
}		}
case PPC::ADDISgotTprelHA: {		case PPC::ADDISgotTprelHA: {
// Transform: %xd = ADDISgotTprelHA %x2, @sym		// Transform: %xd = ADDISgotTprelHA %x2, @sym
// Into: %xd = ADDIS8 %x2, sym@got@tlsgd@ha		// Into: %xd = ADDIS8 %x2, sym@got@tlsgd@ha
assert(IsPPC64 && "Not supported for 32-bit PowerPC");		assert(IsPPC64 && "Not supported for 32-bit PowerPC");
▲ Show 20 Lines • Show All 1,017 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp

Show First 20 Lines • Show All 6,570 Lines • ▼ Show 20 Lines	while (Position != CurDAG->allnodes_begin()) {
LLVM_DEBUG(dbgs() << "\nN: ");		LLVM_DEBUG(dbgs() << "\nN: ");
LLVM_DEBUG(N->dump(CurDAG));		LLVM_DEBUG(N->dump(CurDAG));
LLVM_DEBUG(dbgs() << "\n");		LLVM_DEBUG(dbgs() << "\n");

// If the relocation information isn't already present on the		// If the relocation information isn't already present on the
// immediate operand, add it now.		// immediate operand, add it now.
if (ReplaceFlags) {		if (ReplaceFlags) {
if (GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(ImmOpnd)) {		if (GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(ImmOpnd)) {
		// ADDI y, x, GA{off1}
		// LFD z, off2(y)
		// ==>
		// LFD z, GA{off1+off2}(x)
		Offset += GA->getOffset();
		nemanjaiUnsubmitted Not Done Reply Inline Actions It is not clear to me how we ensure `Offset` fits into a signed 16-bit immediate. nemanjai: It is not clear to me how we ensure `Offset` fits into a signed 16-bit immediate.
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions We don't need to ensure the Offset fits the 16-bit imm as it is the offset of the Global Address. steven.zhang: We don't need to ensure the Offset fits the 16-bit imm as it is the offset of the Global…

SDLoc dl(GA);		SDLoc dl(GA);
const GlobalValue *GV = GA->getGlobal();		const GlobalValue *GV = GA->getGlobal();
// We can't perform this optimization for data whose alignment		// We can't perform this optimization for data whose alignment
// is insufficient for the instruction encoding.		// is insufficient for the instruction encoding.
if (GV->getAlignment() < 4 &&		if (GV->getAlignment() < 4 &&
(RequiresMod4Offset \|\| (Offset % 4) != 0)) {		(RequiresMod4Offset \|\| (Offset % 4) != 0)) {
LLVM_DEBUG(dbgs() << "Rejected this candidate for alignment.\n\n");		LLVM_DEBUG(dbgs() << "Rejected this candidate for alignment.\n\n");
continue;		continue;
Show All 35 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 1,192 Lines • ▼ Show 20 Lines	private:
SDValue combineADD(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue combineADD(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue combineTRUNCATE(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue combineTRUNCATE(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue combineSetCC(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue combineSetCC(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue combineABS(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue combineABS(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue combineVSelect(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue combineVSelect(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue combineVReverseMemOP(ShuffleVectorSDNode SVN, LSBaseSDNode LSBase,		SDValue combineVReverseMemOP(ShuffleVectorSDNode SVN, LSBaseSDNode LSBase,
DAGCombinerInfo &DCI) const;		DAGCombinerInfo &DCI) const;

		SDValue combineADDOnTOCEntry(SDNode *N, SelectionDAG &DAG) const;
/// ConvertSETCCToSubtract - looks at SETCC that compares ints. It replaces		/// ConvertSETCCToSubtract - looks at SETCC that compares ints. It replaces
/// SETCC with integer subtraction when (1) there is a legal way of doing it		/// SETCC with integer subtraction when (1) there is a legal way of doing it
/// (2) keeping the result of comparison in GPR has performance benefit.		/// (2) keeping the result of comparison in GPR has performance benefit.
SDValue ConvertSETCCToSubtract(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue ConvertSETCCToSubtract(SDNode *N, DAGCombinerInfo &DCI) const;

SDValue getSqrtEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,		SDValue getSqrtEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
int &RefinementSteps, bool &UseOneConstNR,		int &RefinementSteps, bool &UseOneConstNR,
bool Reciprocal) const override;		bool Reciprocal) const override;
Show All 40 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 15,480 Lines • ▼ Show 20 Lines	case ISD::SETEQ: {
return DAG.getNode(ISD::ADDE, DL, VTs, LHS, DAG.getConstant(0, DL, MVT::i64),		return DAG.getNode(ISD::ADDE, DL, VTs, LHS, DAG.getConstant(0, DL, MVT::i64),
SDValue(Subc.getNode(), 1));		SDValue(Subc.getNode(), 1));
}		}
}		}

return SDValue();		return SDValue();
}		}

		SDValue PPCTargetLowering::combineADDOnTOCEntry(SDNode *N,
		nemanjaiUnsubmitted Not Done Reply Inline Actions Line too long. If you are using Vim, you can run clang-format within the editor with something like: `:14506,14560 !clang-format` as long as you have `clang-format` in your `$PATH`. nemanjai: Line too long. If you are using Vim, you can run clang-format within the editor with something…
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions Well, I will set up my IDE to avoid the format issue happening again. steven.zhang: Well, I will set up my IDE to avoid the format issue happening again.
		SelectionDAG &DAG) const {
		nemanjaiUnsubmitted Not Done Reply Inline Actions s/platform/platforms nemanjai: s/platform/platforms
		// The addend in the TOC relocation isn't supported by all platforms.
		if (!Subtarget.isELFv2ABI())
		return SDValue();

		// Combine the code seq:
		// x = TOC_ENTRY<Global{offset1}>
		// y = add x, offset2
		// to
		// y = TOC_ENTRY<Global{offset1 + offset2}>
		SDValue Op0 = N->getOperand(0);
		SDValue Op1 = N->getOperand(1);
		ConstantSDNode *Offset = dyn_cast<ConstantSDNode>(Op1);
		MemIntrinsicSDNode *TocEntry = dyn_cast<MemIntrinsicSDNode>(Op0);
		if (!Offset \|\| !TocEntry \|\| TocEntry->getOpcode() != PPCISD::TOC_ENTRY)
		nemanjaiUnsubmitted Not Done Reply Inline Actions Is this actually needed? There is not canonical form (with the constant being the second operand)? nemanjai: Is this actually needed? There is not canonical form (with the constant being the second…
		return SDValue();

		// Only combine the add TOC_ENTRY for globals.
		SDValue GA = TocEntry->getOperand(0);
		GlobalAddressSDNode *Addr = dyn_cast<GlobalAddressSDNode>(GA);
		if (!Addr)
		return SDValue();

		// If the global is accessed as got-indirect, a load is needed to
		// load the address of the global from TOC entry. It is unsafe to fold the
		// offset into globals.
		if (isAccessedAsGotIndirect(GA))
		return SDValue();

		// This combine will require the linker to use an additional TOC entry to
		// compute the address. Therefore, do nothing for offset that fit in a
		// 16-bit signed value already fit into the displacement field of LDtocL.
		// Offsets larger than a 32-bit signed value will still not be reachable
		// by this method. So we only combine if 16 < size of offset in bits < 32.
		int64_t Addend = Addr->getOffset() + Offset->getSExtValue();
		if (isInt<16>(Addend) \|\| !isInt<32>(Addend))
		return SDValue();

		// Creating new global with offset, and new TOC with the new global.
		assert(Addr->getValueType(0) == MVT::i64 && "The address must be i64");
		SDValue NewAddr = DAG.getTargetGlobalAddress(Addr->getGlobal(),
		SDLoc(Addr),
		MVT::i64,
		Addend,
		Addr->getTargetFlags());
		return getTOCEntry(DAG, SDLoc(TocEntry), NewAddr);
		}

SDValue PPCTargetLowering::combineADD(SDNode *N, DAGCombinerInfo &DCI) const {		SDValue PPCTargetLowering::combineADD(SDNode *N, DAGCombinerInfo &DCI) const {
if (auto Value = combineADDToADDZE(N, DCI.DAG, Subtarget))		if (auto Value = combineADDToADDZE(N, DCI.DAG, Subtarget))
return Value;		return Value;

		nemanjaiUnsubmitted Not Done Reply Inline Actions Why do we need this? Wouldn't this always be the case? The ELFv2 ABI has no support for 32-bit addressing so why is it that we need this? Could it not just be an assert? nemanjai: Why do we need this? Wouldn't this always be the case? The ELFv2 ABI has no support for 32-bit…
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions You are right, the ABI imply the 64 bit. Thank you. steven.zhang: You are right, the ABI imply the 64 bit. Thank you.
		if (auto Value = combineADDOnTOCEntry(N, DCI.DAG))
		return Value;

return SDValue();		return SDValue();
}		}

// Detect TRUNCATE operations on bitcasts of float128 values.		// Detect TRUNCATE operations on bitcasts of float128 values.
// What we are looking for here is the situtation where we extract a subset		// What we are looking for here is the situtation where we extract a subset
// of bits from a 128 bit float.		// of bits from a 128 bit float.
// This can be of two forms:		// This can be of two forms:
// 1) BITCAST of f128 feeding TRUNCATE		// 1) BITCAST of f128 feeding TRUNCATE
▲ Show 20 Lines • Show All 297 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/toc-float.ll

	; RUN: llc -relocation-model=pic -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr9 <%s \| FileCheck -check-prefix=CHECK-P9 %s			; RUN: llc -relocation-model=pic -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr9 <%s \| FileCheck -check-prefix=CHECK-P9 %s
	; RUN: llc -relocation-model=pic -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr8 <%s \| FileCheck -check-prefix=CHECK-P8 %s			; RUN: llc -relocation-model=pic -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr8 <%s \| FileCheck -check-prefix=CHECK-P8 %s
				; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ppc-late-peephole=false <%s \| FileCheck -check-prefix=CHECK-P8-NOPEEPHOLE %s

	; As the constant could be represented as float, a float is			; As the constant could be represented as float, a float is
	; loaded from constant pool.			; loaded from constant pool.
	define double @doubleConstant1() {			define double @doubleConstant1() {
	ret double 1.400000e+01			ret double 1.400000e+01

	; CHECK-P9-LABEL: doubleConstant1:			; CHECK-P9-LABEL: doubleConstant1:
	; CHECK-P9: addis [[REG1:[0-9]+]], 2, [[VAR:[a-z0-9A-Z_.]+]]@toc@ha			; CHECK-P9: addis [[REG1:[0-9]+]], 2, [[VAR:[a-z0-9A-Z_.]+]]@toc@ha
	▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines

	define double @doubleLargeConstantArray() {			define double @doubleLargeConstantArray() {
	%1 = load double, double* getelementptr inbounds ([20000 x double], [20000 x double]* @arr, i64 0, i64 4096), align 8			%1 = load double, double* getelementptr inbounds ([20000 x double], [20000 x double]* @arr, i64 0, i64 4096), align 8
	%2 = fadd double %1, 6.880000e+00			%2 = fadd double %1, 6.880000e+00
	ret double %2			ret double %2

	; Access an element with an offset that doesn't fit in the displacement field of LFD.			; Access an element with an offset that doesn't fit in the displacement field of LFD.
	; CHECK-P9-LABEL: doubleLargeConstantArray			; CHECK-P9-LABEL: doubleLargeConstantArray
	; CHECK-P9: addis [[REG1:[0-9]+]], 2, [[VAR:[a-z0-9A-Z_.]+]]@toc@ha			; CHECK-P9: addis [[REG1:[0-9]+]], 2, [[VAR:[a-z0-9A-Z_.]+]]@toc@ha+[[ADDEND:[0-9]+]]
	; CHECK-P9: li [[REG2:[0-9]+]], 0			; CHECK-P9: lfd {{[0-9]+}}, [[VAR]]@toc@l+[[ADDEND]]([[REG1]])
	; CHECK-P9: addi [[REG3:[0-9]+]], [[REG1]], [[VAR:[a-z0-9A-Z_.]+]]@toc@l
	; CHECK-P9: ori [[REG4:[0-9]+]], [[REG2]], 32768
	; CHECK-P9: lfdx {{[0-9]+}}, [[REG3]], [[REG4]]
	; CHECK-P8-LABEL: doubleLargeConstantArray			; CHECK-P8-LABEL: doubleLargeConstantArray
				; CHECK-P8: addis [[REG1:[0-9]+]], 2, [[VAR:[a-z0-9A-Z_.]+]]@toc@ha+[[ADDEND:[0-9]+]]
				; CHECK-P8: lfd {{[0-9]+}}, [[VAR]]@toc@l+[[ADDEND]]([[REG1]])
				; CHECK-P8-NOPEEPHOLE-LABEL: doubleLargeConstantArray
				; CHECK-P8-NOPEEPHOLE: addis [[REG1:[0-9]+]], 2, [[VAR:[a-z0-9A-Z_.]+]]@toc@ha+[[ADDEND:[0-9]+]]
				; CHECK-P8-NOPEEPHOLE: addi [[REG3:[0-9]+]], [[REG1]], [[VAR]]@toc@l+[[ADDEND]]
				; CHECK-P8-NOPEEPHOLE: lfdx {{[0-9]+}}, 0, [[REG3]]
				}

				@arr2 = hidden local_unnamed_addr global [20000 x double] zeroinitializer, align 8

				define double @doubleLargeConstantArray2() {
				%1 = load double, double* getelementptr inbounds ([20000 x double], [20000 x double]* @arr2, i64 0, i64 0), align 8
				%2 = load double, double* getelementptr inbounds ([20000 x double], [20000 x double]* @arr2, i64 0, i64 8095), align 8
				%3 = fadd double %1, %2
				ret double %3

				; CHECK-P8-LABEL: doubleLargeConstantArray2
	; CHECK-P8: addis [[REG1:[0-9]+]], 2, [[VAR:[a-z0-9A-Z_.]+]]@toc@ha			; CHECK-P8: addis [[REG1:[0-9]+]], 2, [[VAR:[a-z0-9A-Z_.]+]]@toc@ha
	; CHECK-P8: li [[REG2:[0-9]+]], 0			; CHECK-P8: addis [[REG2:[0-9]+]], 2, [[VAR]]@toc@ha+[[ADDEND:[0-9]+]]
	; CHECK-P8: addi [[REG3:[0-9]+]], [[REG1]], [[VAR:[a-z0-9A-Z_.]+]]@toc@l			; CHECK-P8: lfd {{[0-9]+}}, [[VAR]]@toc@l([[REG1]])
	; CHECK-P8: ori [[REG4:[0-9]+]], [[REG2]], 32768			; CHECK-P8: lfd {{[0-9]+}}, [[VAR]]@toc@l+[[ADDEND]]([[REG2]])
	; CHECK-P8: lfdx {{[0-9]+}}, [[REG3]], [[REG4]]			; CHECK-P9-LABEL: doubleLargeConstantArray2
				; CHECK-P9: addis [[REG1:[0-9]+]], 2, [[VAR:[a-z0-9A-Z_.]+]]@toc@ha
				; CHECK-P9: lfd {{[0-9]+}}, [[VAR]]@toc@l([[REG1]])
				; CHECK-P9: addis [[REG2:[0-9]+]], 2, [[VAR]]@toc@ha+[[ADDEND:[0-9]+]]
				; CHECK-P9: lfd {{[0-9]+}}, [[VAR]]@toc@l+[[ADDEND]]([[REG2]])
	}			}

	@vec_arr = global [10 x <4 x i32>] zeroinitializer, align 16			@vec_arr = global [10 x <4 x i32>] zeroinitializer, align 16

	define <4 x i32> @vectorArray() #0 {			define <4 x i32> @vectorArray() #0 {
	entry:			entry:
	%0 = load <4 x i32>, <4 x i32>* getelementptr inbounds ([10 x <4 x i32>], [10 x <4 x i32>]* @vec_arr, i64 0, i64 2), align 16			%0 = load <4 x i32>, <4 x i32>* getelementptr inbounds ([10 x <4 x i32>], [10 x <4 x i32>]* @vec_arr, i64 0, i64 2), align 16
	ret <4 x i32> %0			ret <4 x i32> %0
	Show All 11 Lines