This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC]Leverage the addend in the TOC relocation to do the address calculation
Changes PlannedPublic

Authored by steven.zhang on Feb 19 2019, 12:51 AM.

Details

Reviewers
nemanjai
hfinkel
stefanp
jsji
Group Reviewers
Restricted Project
Summary

For now, we use instructions to calculate the address for the element of the global array. If that offset is too large(i.e. larger than 16 bit), we have to add extra instructions to do the calculation. i.e.

double attribute((visibility("hidden"))) b[2000000000];
double foo() { return b[4096] ; }
This is the code sequence we get now:

addis 3, 2, b@toc@ha
li 4, 0
addi 3, 3, b@toc@l
ori 4, 4, 32768
lfdx 1, 4, 3

Because 32768 is not 16-bit constant, we have to use the X-form load to load the address of b[4096]. This patch is trying to leverage the addend in the relocation to do the address calculation. This is the new instruction sequence we want to produce:

addis 3, 2, b@toc@ha+32768
lfd 1, b@toc@l+32768(3)
blr

Notice that, as this transformation will take up one extra TOC entry(b+32768), we only do this if the offset is larger than 16 bit and smaller than 32bit.

Diff Detail

Event Timeline

steven.zhang created this revision.Feb 19 2019, 12:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 19 2019, 12:51 AM
nemanjai added inline comments.Feb 19 2019, 7:35 AM
llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
6583

It is not clear to me how we ensure Offset fits into a signed 16-bit immediate.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15489

Line too long. If you are using Vim, you can run clang-format within the editor with something like:
:14506,14560 !clang-format
as long as you have clang-format in your $PATH.

15490

s/platform/platforms

15504

Is this actually needed? There is not canonical form (with the constant being the second operand)?

15541

Why do we need this? Wouldn't this always be the case? The ELFv2 ABI has no support for 32-bit addressing so why is it that we need this? Could it not just be an assert?

nemanjai requested changes to this revision.Feb 19 2019, 8:01 AM

This can cause relocation overflows:

$ cat b.c 
double b[1LU << 33];
double foo() { return b[(1LU << _SH) - 1] ; }
void setfoo(double d) { b[(1LU << _SH) - 1] = d; }

$ cat main.c 
double foo();
void setfoo(double);
int main(void) {
  setfoo(445.2);
  return foo() == 445.2;
}

$ clang -O2 b.c main.c -D_SH=28
/tmp/b-9d97be.o: In function `foo':
b.c:(.text+0x8): relocation truncated to fit: R_PPC64_TOC16_HA against symbol `b' defined in COMMON section in /tmp/b-9d97be.o+7ffffff8
/tmp/b-9d97be.o: In function `setfoo':
b.c:(.text+0x28): relocation truncated to fit: R_PPC64_TOC16_HA against symbol `b' defined in COMMON section in /tmp/b-9d97be.o+7ffffff8
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)
This revision now requires changes to proceed.Feb 19 2019, 8:01 AM
steven.zhang marked 3 inline comments as done.Feb 19 2019, 10:31 PM
steven.zhang added inline comments.
llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
6583

We don't need to ensure the Offset fits the 16-bit imm as it is the offset of the Global Address.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15489

Well, I will set up my IDE to avoid the format issue happening again.

15541

You are right, the ABI imply the 64 bit. Thank you.

This can cause relocation overflows:

$ cat b.c 
double b[1LU << 33];
double foo() { return b[(1LU << _SH) - 1] ; }
void setfoo(double d) { b[(1LU << _SH) - 1] = d; }

$ cat main.c 
double foo();
void setfoo(double);
int main(void) {
  setfoo(445.2);
  return foo() == 445.2;
}

$ clang -O2 b.c main.c -D_SH=28
/tmp/b-9d97be.o: In function `foo':
b.c:(.text+0x8): relocation truncated to fit: R_PPC64_TOC16_HA against symbol `b' defined in COMMON section in /tmp/b-9d97be.o+7ffffff8
/tmp/b-9d97be.o: In function `setfoo':
b.c:(.text+0x28): relocation truncated to fit: R_PPC64_TOC16_HA against symbol `b' defined in COMMON section in /tmp/b-9d97be.o+7ffffff8
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)

I need to double check the ELF spec to see why it is limited to 27bit. And I check this with llvm linker(lld), it linked successfully but encounter the runtime segment fault if _SH=28. Seems that, lld also miss to do this check.

steven.zhang planned changes to this revision.Nov 8 2019, 6:11 PM

Still need to investigate why it is 27 bit limit.

steven.zhang planned changes to this revision.Jan 1 2020, 10:21 PM
MaskRay added a subscriber: MaskRay.EditedJan 2 2020, 12:10 AM

b.c:(.text+0x8): relocation truncated to fit: R_PPC64_TOC16_HA against symbol `b' defined in COMMON section in /tmp/b-9d97be.o+7ffffff8

The r_addend is 0x7ffffff8, a value close to 2**31.

The distance between the TOC entry and the variable address cannot be too far. More accurately, -0x80008000 <= address - .TOC. + r_addend < 0x7fff8000

GNU ld correctly reports a relocation overflow. lld currently does not check R_PPC64_TOC16_HA overflow.

% powerpc64le-linux-gnu-ld -pie b.o main.o
powerpc64le-linux-gnu-ld: warning: cannot find entry symbol _start; defaulting to 0000000000000230
powerpc64le-linux-gnu-ld: b.o: in function `foo':
b.c:(.text+0x8): relocation truncated to fit: R_PPC64_TOC16_HA against symbol `b' defined in COMMON section in b.o+7ffffff8

The largest address a pair of HA/L can materialize is something like:

addis 3, 2, 32767  # adding 1 will overflow to -32768
lfd 1, 32767(3)

b.c:(.text+0x8): relocation truncated to fit: R_PPC64_TOC16_HA against symbol `b' defined in COMMON section in /tmp/b-9d97be.o+7ffffff8

The r_addend is 0x7ffffff8, a value close to 2**31.

The distance between the TOC entry and the variable address cannot be too far. More accurately, -0x80008000 <= address - .TOC. + r_addend < 0x7fff8000

GNU ld correctly reports a relocation overflow. lld currently does not check R_PPC64_TOC16_HA overflow.

% powerpc64le-linux-gnu-ld -pie b.o main.o
powerpc64le-linux-gnu-ld: warning: cannot find entry symbol _start; defaulting to 0000000000000230
powerpc64le-linux-gnu-ld: b.o: in function `foo':
b.c:(.text+0x8): relocation truncated to fit: R_PPC64_TOC16_HA against symbol `b' defined in COMMON section in b.o+7ffffff8

The largest address a pair of HA/L can materialize is something like:

addis 3, 2, 32767  # adding 1 will overflow to -32768
lfd 1, 32767(3)

Thank you for this information! I miss the "double" type in the array(it is not 27 bit, but 30 bit). So, for some unknown reason, linker reserve the 0x8000 for special usage. The addend + 0x8000 should be fit into the 32bit sign value. I will split this patch into two parts.

  1. fix the missing part of the ASM printer of the offset.
  2. Add the combine rule to generate the global address that has offset.

I get the reason about 0x8000 now.

#ha(value) Denotes the high adjusted value: bits 16 - 63 of the indicated value, compensating
for #lo() being treated as a signed number. That is:
#ha(x) = (x + 0x8000) >> 16
The TOC region commonly includes data items within the .got, .toc, .sdata, and .sbss sections. In the medium
code model, they can be addressed with 32-bit signed offsets from the TOC pointer register. The TOC pointer
register typically points to the beginning of the .got section + 0x8000, which permits a 2 GB TOC with the
medium and large code models.

I get the reason about 0x8000 now.

#ha(value) Denotes the high adjusted value: bits 16 - 63 of the indicated value, compensating
for #lo() being treated as a signed number. That is:
#ha(x) = (x + 0x8000) >> 16

Yes.

The TOC region commonly includes data items within the .got, .toc, .sdata, and .sbss sections. In the medium
code model, they can be addressed with 32-bit signed offsets from the TOC pointer register. The TOC pointer
register typically points to the beginning of the .got section + 0x8000, which permits a 2 GB TOC with the
medium and large code models.

-0x80008000 <= address - .TOC. + r_addend < 0x7fff8000

If address - .TOC. can be as large as 0x7fff8000 (this may happen with huge .data or .bss), then you cannot leverage any positive value of r_addend.. Though, I believe this situation may be rare. You may try a smaller cut-off value, say, 0x100, and see if it is beneficial. Be aware that if the code references multiple elements of a global array, e.g. a[0] a[1] a[2] ... a[99], don't just create 100 TOC entries.

This comment was removed by steven.zhang.
jsji added a reviewer: Restricted Project.Jan 30 2020, 7:17 AM
jsji added a project: Restricted Project.
jsji resigned from this revision.Jun 2 2022, 8:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2022, 8:00 AM