This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/
-
ELF/
-
Arch/
-
PPC.cpp
20/20
PPC64.cpp
-
Target.h
-
test/ELF/
-
ELF/
-
ppc32-tls-ie.s
4/4
ppc64-tls-ie.s
3/3
ppc64-tls-pcrel-ie.s

Differential D158197

[PowerPC][lld] Account for additional X-Forms -> D-Form/DS-Forms load/stores when relaxing initial-exec to local-exec
ClosedPublic

Authored by amyk on Aug 17 2023, 9:19 AM.

Download Raw Diff

Details

Reviewers

sfertile
syzaara
stefanp
kamaub
nemanjai
MaskRay

Group Reviewers

Restricted Project

Commits

rG698b45aa902d: [PowerPC][lld] Account for additional X-Forms -> D-Form/DS-Forms load/stores…

Summary

D153645 added additional X-Form load/stores that can be generated for TLS accesses.
However, these added instructions have not been accounted for in lld. As a result,
lld does not know how to handle them and cannot relax initial-exec to local-exec
when the initial-exec sequence contains these additional load/stores.

This patch aims to resolve https://github.com/llvm/llvm-project/issues/64424.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

amyk created this revision.Aug 17 2023, 9:19 AM

Herald added a reviewer: MaskRay. · View Herald TranscriptAug 17 2023, 9:19 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: shchenz, kbarton, arichardson, emaste. · View Herald Transcript

amyk requested review of this revision.Aug 17 2023, 9:19 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptAug 17 2023, 9:19 AM

Harbormaster completed remote builds in B253245: Diff 551170.Aug 17 2023, 9:54 AM

nemanjai added inline comments.Aug 18 2023, 4:28 AM

lld/ELF/Arch/PPC64.cpp
846	I understand the motivation to return both the primary opcode and the last two bits here to differentiate `LWA` from `LD/STD`. However, I am not a fan of the different interface of these very similar and related functions. Do any users of `getPPCDFormOp()` need the primary opcode to be in the least significant bits? Could we change the interface to return the instruction with the primary opcode in the most significant bits regardless of whether it is `D-Form/DS-Form` or even (in the future) `DQ-Form`?

MaskRay added inline comments.Aug 18 2023, 9:43 AM

lld/ELF/Arch/PPC64.cpp
846	`static unsigned getPPCDSFormOp` `getPPCDFormOp` is external just because PPC.cpp uses it as well.

Update getPPCDFormOp() to return the D-Form opcodes in the most significant bits (like getPPCDSFormOp().

lld/ELF/Arch/PPC64.cpp
846	@nemanjai That's a good point, since the functions are very related. After taking a look, to me, it doesn't appear that any users of `getPPCDFormOp()` require the primary opcode to be in the least significant bits. I also tried shifting all of the D-Forms over to the most significant bits like the DS-Form ones. This works, but I'm uncertain if this is the right approach so I was wondering if you have any thoughts regarding this.
846	@MaskRay I might be misunderstanding something, but I am also using `getPPCDSFormOp()` in `PPC.cpp`, as well. Doesn't this mean this function also needs to be external?

Harbormaster completed remote builds in B253624: Diff 551708.Aug 18 2023, 10:50 PM

This patch fixes a regression caused by a patch that is in v17. We will need to decide if we should backport this fix or if we should pull the original patch (which also has additional patches dependent on it that are in v17 as well). I think this patch is obvious enough to be backported, but I'll defer the final decision on that to @MaskRay once this is approved.
@amyk Perhaps it would help to perform a thorough test with the approved version of this patch applied to v17 to ensure it is safe to backport.

lld/ELF/Arch/PPC64.cpp
855	I forgot to mention it yesterday, but I don't think we should have a different signal for failure here (i.e. returning 1 instead of 0 like in `getPPCDFormOp()`). Let's just use zero for both.
944	Using `0x03FFFFFC` rather than `0x03FFFFFF` is presumably because the two least significant bits of the DS-Form must not be overwritten. While I think that makes sense, I have a bit of a concern with using a different value there: We always had DS-Forms there (`LD/STD`) and didn't treat them specially Presumably, having either of those bits set in the incoming instruction is an indication that something has gone wrong with whatever produced the code, so masking out the bits just hides the issue Perhaps it would be appropriate to set the `finalInstr` the same way regardless of how we set the `dFormOp` and just have an assert (or possibly a fatal_error or warning is even better) in the DS-Form section that those bits are unset.

As Nemanja suggests, I will also apply this patch to the LLVM 17 release branch to test.

lld/ELF/Arch/PPC64.cpp
855	I will update this.
944	I'd like to understand this suggestion a bit more. Do you mean, we should just do: if (dFormOp == 0) { // Expecting a DS-Form instruction. dFormOp = getPPCDSFormOp(secondaryOp); if (dFormOp == 0) error("unrecognized instruction for IE to LE R_PPC64_TLS"); // Check if the last two bits are unset. If set, we `errorOrWarn()` finalReloc = R_PPC64_TPREL16_LO_DS; } else finalReloc = R_PPC64_TPREL16_LO; write32(loc, (dFormOp \| (read32(loc) & 0x03FFFFFF))); relocateNoSym(loc + offset, finalReloc, val); } In any case, yes, if I understand correctly, `0x03FFFFFF` grabs bits 6-31 in the X-Form instruction, while `0x03FFFFFC` grabs bits 6-29, leaving the last two bits. Also, to give a bit more context behind why I am treating the DS-Forms differently: I discussed the use of use of `0x03FFFFFC` in the DS-Form case with a few others before putting up the patch. This all came about because I was using a different relocation for DS-Form `R_PPC64_TPREL16_LO_DS`. I found that when I used the different relocation with the original masking in the DS-Form case: write32(loc, (dFormOp \| (read32(loc) & 0x03FFFFFF))); relocateNoSym(loc + offset, R_PPC64_TPREL16_LO_DS, val); For the following situation: test8: addis 4, 2, l@got@tprel@ha ld 4, l@got@tprel@l(4) stdx 3, 4, l@tls .section .tdata,"awT",@progbits .globl l l: .quad 55 I'd get `stq` instead of `std` like we would expect. Disassembly of section .text: 0000000010010200 <test8>: 10010200: nop 10010204: addis 4, 13, 0 10010208: stq 3, -28672(4) Presumably this is because `std` and `stq` share the same primary opcode, and the other thing that differs are the last two bits (00 for `std` whereas 10 for `stq`), so we found that `0x03FFFFFC` works with `R_PPC64_TPREL16_LO_DS`. I don't quite remember the exact discussion that we also had, but I think we were saying before that `LD/STD` perhaps just happened to have worked when we did `0x03FFFFFF` with `R_PPC64_TPREL16_LO` in the past, since the last two bits in these instructions are both 00.

We will need to decide if we should backport this fix or if we should pull the original patch (which also has additional patches dependent on it that are in v17 as well).

Yes, rG598cccea80f5614869bf0dda4d09d68b2c64423c is also dependent on the original patch.

Additionally, after applying this patch to the release/17.x branch, my bootstrap build/run completed with no issues.
This includes building libc++ as well (since libc++ was originally where I found this issue: https://github.com/llvm/llvm-project/issues/64424).

MaskRay added inline comments.Aug 19 2023, 7:05 PM

lld/ELF/Arch/PPC64.cpp
849	ditto below
855	Mark as unresolved.
862	ditto below
lld/test/ELF/ppc64-tls-ie.s
157	Preferred style for new tests is that instructions are indented 2 column more than `<test11>`: # LE-LABEL: <test11>: # LE-NEXT: nop # LE-NEXT: addis 3, 13, 0 # LE-NEXT: lha 3, -28670(3)
161	Consider changing the numbering to something more descriptive. Then, if we want to add a new test in the middle, we won't have to renumber all following `test*`.
166	ditto
175	ditto
lld/test/ELF/ppc64-tls-pcrel-ie.s
159	Consider changing the numbered section name and label name to something more descriptive. Then, if we want to add a new test in the middle, we won't have to renumber all following `.text_incrval` and `IEIncrementVal`.

Remove extra parentheses in PPC64.cpp
Update test cases to have more unique names
Make getPPCDSFormOp() return 0 for the default case

I've addressed all of the current comments, with the exception of Nemanja's comment on using a different mask for the DS-Form case (since we're still currently discussing that and I have asked for some clarification).

lld/ELF/Arch/PPC64.cpp
855	Thank you for pointing that out. I have now updated the patch, so the comment should be resolved.

Harbormaster completed remote builds in B253699: Diff 551804.Aug 19 2023, 10:33 PM

nemanjai added inline comments.Aug 20 2023, 4:17 AM

lld/ELF/Arch/PPC64.cpp
944	That `stq` vs. `std` issue is a bit alarming. Where do the low bits come from? The variable `l` is aligned at 8 bytes.

sfertile added inline comments.Aug 21 2023, 7:57 AM

lld/ELF/Arch/PPC64.cpp
944	Where do the low bits come from? From the `stdx` instruction. The XO field is 149 so bit 30 is 1, and bit 31 is indeterminate but those are typically zeroed, so we have `2` in the last 2 bits. Using `read32(loc) & 0x03FFFFFF` will leave those bits in, and then relocating with `R_PPC64_TPREL16_LO_DS` we end preserving the non-zero value in the 'XO' bits of the instruction. I can see why you might want to preserve what we were doing before because it was working for LD/STD but this assumption is wrong - Presumably, having either of those bits set in the incoming instruction is an indication that something has gone wrong with whatever produced the code, so masking out the bits just hides the issue The stdx instruction is the counter example. Thats why we updated the mask to reflect we are actually working with a DS-form instruction.

nemanjai added inline comments.Aug 22 2023, 8:53 AM

lld/ELF/Arch/PPC64.cpp
944	Ah, ok. This makes sense. Just out of curiosity, why aren't we masking out the XO field completely? It seems odd that we are masking out the bottom two bits (one bit from XO and one reserved bit) but we are leaving the rest of the XO field intact.

LGTM from my viewpoint if a PowerPC reviewer looks ok as well.

lld/test/ELF/ppc64-tls-pcrel-ie.s
52	This line can use `-NEXT:`
64	The symbol indexes are not so reliable if we change the symbol table order in the future. Omit the section indexes as well. # LE-SYM: 0000000000000000 0 TLS GLOBAL DEFAULT [[#]] x

This revision is now accepted and ready to land.Aug 22 2023, 9:18 AM

sfertile added inline comments.Aug 22 2023, 10:04 AM

lld/ELF/Arch/PPC64.cpp
944	Yeah that's a good point. I think it was probably working because the relocate function would end up masking out the relocated bits but we should update the mask used here to reflect exactly what bits we want to extract from the existing instruction.

Address review comments:

Update checks for ppc64-tls-ie.s
Mask out all of the XO bits for DS-Form

lld/ELF/Arch/PPC64.cpp
944	Sure. I will update it so that the entire XO field is masked out. This also works.

Harbormaster completed remote builds in B254155: Diff 552453.Aug 22 2023, 12:35 PM

MaskRay added inline comments.Aug 22 2023, 12:38 PM

lld/ELF/Arch/PPC64.cpp
947	Nit: lld/ELF codebase isn's consistent on uppercase and lowercase. But for newer code, prefer lowercase hexidedicamls.

Address comment to make hexadecimals lowercase.

amyk marked an inline comment as done.Aug 22 2023, 4:04 PM

Harbormaster completed remote builds in B254211: Diff 552528.Aug 22 2023, 4:50 PM

nemanjai added inline comments.Aug 23 2023, 1:39 PM

lld/ELF/Arch/PPC64.cpp
944	I'm sorry Amy, but the purpose of my comment was to common up the masks between the D-Form and DS-Form updated instructions. If I am not mistaken, regardless of what the input instruction was, we want to mask out bits 21-31. I suppose we really want to mask out all the bits that will be replaced by the D/DS field, so presumably it should be bits 16-31.

Address Nemanja's comment by masking out bits 16-31 for both D-Form/DS-Forms.
Builds and tests successfully on both main and release/17.x.

lld/ELF/Arch/PPC64.cpp
944	Ah! Good point. Sorry, I had misunderstood your previous comment and thought it only applied to the DS-Forms. What you said makes sense, and I've updated it once again.

Harbormaster completed remote builds in B254534: Diff 552984.Aug 23 2023, 10:09 PM

LGTM.

LGTM if you want to either backport this to release/17.x or consider this risky and revert the prior patch just in release/17.x :)

This revision was landed with ongoing or failed builds.Aug 31 2023, 6:45 AM

Closed by commit rG698b45aa902d: [PowerPC][lld] Account for additional X-Forms -> D-Form/DS-Forms load/stores… (authored by amyk). · Explain Why

This revision was automatically updated to reflect the committed changes.

amyk added a commit: rG698b45aa902d: [PowerPC][lld] Account for additional X-Forms -> D-Form/DS-Forms load/stores….

Revision Contents

Path

Size

lld/

ELF/

Arch/

PPC.cpp

12 lines

PPC64.cpp

86 lines

Target.h

1 line

test/

ELF/

ppc32-tls-ie.s

27 lines

ppc64-tls-ie.s

72 lines

ppc64-tls-pcrel-ie.s

132 lines

Diff 555027

lld/ELF/Arch/PPC.cpp

Show First 20 Lines • Show All 465 Lines • ▼ Show 20 Lines	case R_PPC_GOT_TPREL16: {
writeFromHalf16(loc, 0x3c020000 \| rt \| ha(val));		writeFromHalf16(loc, 0x3c020000 \| rt \| ha(val));
break;		break;
}		}
case R_PPC_TLS: {		case R_PPC_TLS: {
uint32_t insn = read32(loc);		uint32_t insn = read32(loc);
if (insn >> 26 != 31)		if (insn >> 26 != 31)
error("unrecognized instruction for IE to LE R_PPC_TLS");		error("unrecognized instruction for IE to LE R_PPC_TLS");
// addi rT, rT, x@tls --> addi rT, rT, x@tprel@l		// addi rT, rT, x@tls --> addi rT, rT, x@tprel@l
uint32_t dFormOp = getPPCDFormOp((read32(loc) & 0x000007fe) >> 1);		unsigned secondaryOp = (read32(loc) & 0x000007fe) >> 1;
		uint32_t dFormOp = getPPCDFormOp(secondaryOp);
		if (dFormOp == 0) { // Expecting a DS-Form instruction.
		dFormOp = getPPCDSFormOp(secondaryOp);
if (dFormOp == 0)		if (dFormOp == 0)
error("unrecognized instruction for IE to LE R_PPC_TLS");		error("unrecognized instruction for IE to LE R_PPC_TLS");
write32(loc, (dFormOp << 26) \| (insn & 0x03ff0000) \| lo(val));		}
		write32(loc, (dFormOp \| (insn & 0x03ff0000) \| lo(val)));
break;		break;
}		}
default:		default:
llvm_unreachable("unsupported relocation for TLS IE to LE relaxation");		llvm_unreachable("unsupported relocation for TLS IE to LE relaxation");
}		}
}		}

void PPC::relocateAlloc(InputSectionBase &sec, uint8_t *buf) const {		void PPC::relocateAlloc(InputSectionBase &sec, uint8_t *buf) const {
Show All 33 Lines

lld/ELF/Arch/PPC64.cpp

Show All 31 Lines enum XFormOpcd {

LBZX = 87, LBZX = 87,

LHZX = 279, LHZX = 279,

LWZX = 23, LWZX = 23,

LDX = 21, LDX = 21,

STBX = 215, STBX = 215,

STHX = 407, STHX = 407,

STWX = 151, STWX = 151,

STDX = 149, STDX = 149,

LHAX = 343,

LWAX = 341,

LFSX = 535,

LFDX = 599,

STFSX = 663,

STFDX = 727,

ADD = 266, ADD = 266,

}; };

enum DFormOpcd { enum DFormOpcd {

LBZ = 34, LBZ = 34,

LBZU = 35, LBZU = 35,

LHZ = 40, LHZ = 40,

LHZU = 41, LHZU = 41,

LHAU = 43, LHAU = 43,

LWZ = 32, LWZ = 32,

LWZU = 33, LWZU = 33,

LFSU = 49, LFSU = 49,

LD = 58,

LFDU = 51, LFDU = 51,

STB = 38, STB = 38,

STBU = 39, STBU = 39,

STH = 44, STH = 44,

STHU = 45, STHU = 45,

STW = 36, STW = 36,

STWU = 37, STWU = 37,

STFSU = 53, STFSU = 53,

STFDU = 55, STFDU = 55,

STD = 62, LHA = 42,

LFS = 48,

LFD = 50,

STFS = 52,

STFD = 54,

ADDI = 14 ADDI = 14

}; };

enum DSFormOpcd {

LD = 58,

LWA = 58,

STD = 62

};

constexpr uint32_t NOP = 0x60000000; constexpr uint32_t NOP = 0x60000000;

enum class PPCLegacyInsn : uint32_t { enum class PPCLegacyInsn : uint32_t {

NOINSN = 0, NOINSN = 0,

// Loads. // Loads.

LBZ = 0x88000000, LBZ = 0x88000000,

LHZ = 0xa0000000, LHZ = 0xa0000000,

LWZ = 0x80000000, LWZ = 0x80000000,

▲ Show 20 Lines • Show All 746 Lines • ▼ Show 20 Lines void PPC64::relaxTlsLdToLe(uint8_t *loc, const Relocation &rel,

case R_PPC64_DTPREL34: case R_PPC64_DTPREL34:

relocate(loc, rel, val); relocate(loc, rel, val);

break; break;

default: default:

llvm_unreachable("unsupported relocation for TLS LD to LE relaxation"); llvm_unreachable("unsupported relocation for TLS LD to LE relaxation");

} }

// Map X-Form instructions to their DS-Form counterparts, if applicable.

// The full encoding is returned here to distinguish between the different

// DS-Form instructions.

unsigned elf::getPPCDSFormOp(unsigned secondaryOp) {

nemanjaiUnsubmitted

Done

I understand the motivation to return both the primary opcode and the last two bits here to differentiate LWA from LD/STD. However, I am not a fan of the different interface of these very similar and related functions.

Do any users of getPPCDFormOp() need the primary opcode to be in the least significant bits? Could we change the interface to return the instruction with the primary opcode in the most significant bits regardless of whether it is D-Form/DS-Form or even (in the future) DQ-Form?

nemanjai: I understand the motivation to return both the primary opcode and the last two bits here to…

MaskRayUnsubmitted

Done

static unsigned getPPCDSFormOp

getPPCDFormOp is external just because PPC.cpp uses it as well.

MaskRay: `static unsigned getPPCDSFormOp` `getPPCDFormOp` is external just because PPC.cpp uses it as…

amykAuthorUnsubmitted

Done

@MaskRay I might be misunderstanding something, but I am also using getPPCDSFormOp() in PPC.cpp, as well.
Doesn't this mean this function also needs to be external?

amyk: @MaskRay I might be misunderstanding something, but I am also using `getPPCDSFormOp()` in `PPC.

amykAuthorUnsubmitted

Done

@nemanjai That's a good point, since the functions are very related.

After taking a look, to me, it doesn't appear that any users of getPPCDFormOp() require the primary opcode to be in the least significant bits.

I also tried shifting all of the D-Forms over to the most significant bits like the DS-Form ones. This works, but I'm uncertain if this is the right approach so I was wondering if you have any thoughts regarding this.

amyk: @nemanjai That's a good point, since the functions are very related. After taking a look, to…

switch (secondaryOp) {

case LWAX:

return (LWA << 26) | 0x2;

MaskRayUnsubmitted

Done

case LWAX:

- return ((LWA << 26) | 0x2);

+ return (LWA << 26) | 0x2;

case LDX:

ditto below

MaskRay: ditto below

case LDX:

return LD << 26;

case STDX:

return STD << 26;

default:

return 0;

nemanjaiUnsubmitted

Done

I forgot to mention it yesterday, but I don't think we should have a different signal for failure here (i.e. returning 1 instead of 0 like in getPPCDFormOp()). Let's just use zero for both.

nemanjai: I forgot to mention it yesterday, but I don't think we should have a different signal for…

amykAuthorUnsubmitted

Done

I will update this.

amyk: I will update this.

MaskRayUnsubmitted

Done

Mark as unresolved.

MaskRay: Mark as unresolved.

amykAuthorUnsubmitted

Done

Thank you for pointing that out. I have now updated the patch, so the comment should be resolved.

amyk: Thank you for pointing that out. I have now updated the patch, so the comment should be…

}

unsigned elf::getPPCDFormOp(unsigned secondaryOp) { unsigned elf::getPPCDFormOp(unsigned secondaryOp) {

switch (secondaryOp) { switch (secondaryOp) {

case LBZX: case LBZX:

return LBZ; return LBZ << 26;

MaskRayUnsubmitted

Done

case LBZX:

- return (LBZ << 26);

+ return LBZ << 26;

case LHZX:

ditto below

MaskRay: ditto below

case LHZX: case LHZX:

return LHZ; return LHZ << 26;

case LWZX: case LWZX:

return LWZ; return LWZ << 26;

case LDX:

return LD;

case STBX: case STBX:

return STB; return STB << 26;

case STHX: case STHX:

return STH; return STH << 26;

case STWX: case STWX:

return STW; return STW << 26;

case STDX: case LHAX:

return STD; return LHA << 26;

case LFSX:

return LFS << 26;

case LFDX:

return LFD << 26;

case STFSX:

return STFS << 26;

case STFDX:

return STFD << 26;

case ADD: case ADD:

return ADDI; return ADDI << 26;

default: default:

return 0; return 0;

} }

void PPC64::relaxTlsIeToLe(uint8_t *loc, const Relocation &rel, void PPC64::relaxTlsIeToLe(uint8_t *loc, const Relocation &rel,

uint64_t val) const { uint64_t val) const {

// The initial exec code sequence for a global `x` will look like: // The initial exec code sequence for a global `x` will look like:

Show All 37 Lines void PPC64::relaxTlsIeToLe(uint8_t *loc, const Relocation &rel,

case R_PPC64_TLS: { case R_PPC64_TLS: {

const uintptr_t locAsInt = reinterpret_cast<uintptr_t>(loc); const uintptr_t locAsInt = reinterpret_cast<uintptr_t>(loc);

if (locAsInt % 4 == 0) { if (locAsInt % 4 == 0) {

uint32_t primaryOp = getPrimaryOpCode(read32(loc)); uint32_t primaryOp = getPrimaryOpCode(read32(loc));

if (primaryOp != 31) if (primaryOp != 31)

error("unrecognized instruction for IE to LE R_PPC64_TLS"); error("unrecognized instruction for IE to LE R_PPC64_TLS");

uint32_t secondaryOp = (read32(loc) & 0x000007FE) >> 1; // bits 21-30 uint32_t secondaryOp = (read32(loc) & 0x000007FE) >> 1; // bits 21-30

uint32_t dFormOp = getPPCDFormOp(secondaryOp); uint32_t dFormOp = getPPCDFormOp(secondaryOp);

uint32_t finalReloc;

if (dFormOp == 0) { // Expecting a DS-Form instruction.

dFormOp = getPPCDSFormOp(secondaryOp);

if (dFormOp == 0) if (dFormOp == 0)

error("unrecognized instruction for IE to LE R_PPC64_TLS"); error("unrecognized instruction for IE to LE R_PPC64_TLS");

write32(loc, ((dFormOp << 26) | (read32(loc) & 0x03FFFFFF))); finalReloc = R_PPC64_TPREL16_LO_DS;

relocateNoSym(loc + offset, R_PPC64_TPREL16_LO, val); } else

nemanjaiUnsubmitted

Done

Using 0x03FFFFFC rather than 0x03FFFFFF is presumably because the two least significant bits of the DS-Form must not be overwritten. While I think that makes sense, I have a bit of a concern with using a different value there:

We always had DS-Forms there (LD/STD) and didn't treat them specially
Presumably, having either of those bits set in the incoming instruction is an indication that something has gone wrong with whatever produced the code, so masking out the bits just hides the issue

Perhaps it would be appropriate to set the finalInstr the same way regardless of how we set the dFormOp and just have an assert (or possibly a fatal_error or warning is even better) in the DS-Form section that those bits are unset.

nemanjai: Using `0x03FFFFFC` rather than `0x03FFFFFF` is presumably because the two least significant…

amykAuthorUnsubmitted

Done

I'd like to understand this suggestion a bit more.

Do you mean, we should just do:

if (dFormOp == 0) { // Expecting a DS-Form instruction.
  dFormOp = getPPCDSFormOp(secondaryOp);
  if (dFormOp == 0)
    error("unrecognized instruction for IE to LE R_PPC64_TLS");
        
    // Check if the last two bits are unset. If set, we `errorOrWarn()`

    finalReloc = R_PPC64_TPREL16_LO_DS;
} else
    finalReloc = R_PPC64_TPREL16_LO;
  write32(loc, (dFormOp | (read32(loc) & 0x03FFFFFF)));
  relocateNoSym(loc + offset, finalReloc, val);
}

In any case, yes, if I understand correctly, 0x03FFFFFF grabs bits 6-31 in the X-Form instruction, while 0x03FFFFFC grabs bits 6-29, leaving the last two bits.

Also, to give a bit more context behind why I am treating the DS-Forms differently:

I discussed the use of use of 0x03FFFFFC in the DS-Form case with a few others before putting up the patch.
This all came about because I was using a different relocation for DS-Form R_PPC64_TPREL16_LO_DS.
I found that when I used the different relocation with the original masking in the DS-Form case:

write32(loc, (dFormOp | (read32(loc) & 0x03FFFFFF)));
relocateNoSym(loc + offset, R_PPC64_TPREL16_LO_DS, val);

For the following situation:

test8:
  addis 4, 2, l@got@tprel@ha
  ld 4, l@got@tprel@l(4)
  stdx 3, 4, l@tls

.section .tdata,"awT",@progbits
.globl l
l:
.quad 55

I'd get stq instead of std like we would expect.

Disassembly of section .text:

0000000010010200 <test8>:
10010200:      	nop
10010204:      	addis 4, 13, 0
10010208:      	stq 3, -28672(4)

Presumably this is because std and stq share the same primary opcode, and the other thing that differs are the last two bits (00 for std whereas 10 for stq), so we found that 0x03FFFFFC works with R_PPC64_TPREL16_LO_DS.

I don't quite remember the exact discussion that we also had, but I think we were saying before that LD/STD perhaps just happened to have worked when we did 0x03FFFFFF with R_PPC64_TPREL16_LO in the past, since the last two bits in these instructions are both 00.

amyk: I'd like to understand this suggestion a bit more. Do you mean, we should just do: ``` if…

nemanjaiUnsubmitted

Done

That stq vs. std issue is a bit alarming. Where do the low bits come from? The variable l is aligned at 8 bytes.

nemanjai: That `stq` vs. `std` issue is a bit alarming. Where do the low bits come from? The variable `l`…

sfertileUnsubmitted

Done

Where do the low bits come from?

From the stdx instruction. The XO field is 149 so bit 30 is 1, and bit 31 is indeterminate but those are typically zeroed, so we have 2 in the last 2 bits. Using read32(loc) & 0x03FFFFFF will leave those bits in, and then relocating with R_PPC64_TPREL16_LO_DS we end preserving the non-zero value in the 'XO' bits of the instruction.

I can see why you might want to preserve what we were doing before because it was working for LD/STD but this assumption is wrong -

Presumably, having either of those bits set in the incoming instruction is an indication that something has gone wrong with whatever produced the code, so masking out the bits just hides the issue

The stdx instruction is the counter example. Thats why we updated the mask to reflect we are actually working with a DS-form instruction.

sfertile: >Where do the low bits come from? From the `stdx` instruction. The XO field is 149 so bit 30…

nemanjaiUnsubmitted

Done

Ah, ok. This makes sense. Just out of curiosity, why aren't we masking out the XO field completely? It seems odd that we are masking out the bottom two bits (one bit from XO and one reserved bit) but we are leaving the rest of the XO field intact.

nemanjai: Ah, ok. This makes sense. Just out of curiosity, why aren't we masking out the XO field…

sfertileUnsubmitted

Done

Yeah that's a good point. I think it was probably working because the relocate function would end up masking out the relocated bits but we should update the mask used here to reflect exactly what bits we want to extract from the existing instruction.

sfertile: Yeah that's a good point. I think it was probably working because the relocate function would…

amykAuthorUnsubmitted

Done

Sure. I will update it so that the entire XO field is masked out. This also works.

amyk: Sure. I will update it so that the entire XO field is masked out. This also works.

nemanjaiUnsubmitted

Done

I'm sorry Amy, but the purpose of my comment was to common up the masks between the D-Form and DS-Form updated instructions. If I am not mistaken, regardless of what the input instruction was, we want to mask out bits 21-31. I suppose we really want to mask out all the bits that will be replaced by the D/DS field, so presumably it should be bits 16-31.

nemanjai: I'm sorry Amy, but the purpose of my comment was to common up the masks between the D-Form and…

amykAuthorUnsubmitted

Done

Ah! Good point. Sorry, I had misunderstood your previous comment and thought it only applied to the DS-Forms.
What you said makes sense, and I've updated it once again.

amyk: Ah! Good point. Sorry, I had misunderstood your previous comment and thought it only applied to…

finalReloc = R_PPC64_TPREL16_LO;

write32(loc, dFormOp | (read32(loc) & 0x03ff0000));

relocateNoSym(loc + offset, finalReloc, val);

MaskRayUnsubmitted

Done

Nit: lld/ELF codebase isn's consistent on uppercase and lowercase. But for newer code, prefer lowercase hexidedicamls.

MaskRay: Nit: lld/ELF codebase isn's consistent on uppercase and lowercase. But for newer code, prefer…

} else if (locAsInt % 4 == 1) { } else if (locAsInt % 4 == 1) {

// If the offset is not 4 byte aligned then we have a PCRel type reloc. // If the offset is not 4 byte aligned then we have a PCRel type reloc.

// This version of the relocation is offset by one byte from the // This version of the relocation is offset by one byte from the

// instruction it references. // instruction it references.

uint32_t tlsInstr = read32(loc - 1); uint32_t tlsInstr = read32(loc - 1);

uint32_t primaryOp = getPrimaryOpCode(tlsInstr); uint32_t primaryOp = getPrimaryOpCode(tlsInstr);

if (primaryOp != 31) if (primaryOp != 31)

errorOrWarn("unrecognized instruction for IE to LE R_PPC64_TLS"); errorOrWarn("unrecognized instruction for IE to LE R_PPC64_TLS");

uint32_t secondaryOp = (tlsInstr & 0x000007FE) >> 1; // bits 21-30 uint32_t secondaryOp = (tlsInstr & 0x000007FE) >> 1; // bits 21-30

// The add is a special case and should be turned into a nop. The paddi // The add is a special case and should be turned into a nop. The paddi

// that comes before it will already have computed the address of the // that comes before it will already have computed the address of the

// symbol. // symbol.

if (secondaryOp == 266) { if (secondaryOp == 266) {

// Check if the add uses the same result register as the input register. // Check if the add uses the same result register as the input register.

uint32_t rt = (tlsInstr & 0x03E00000) >> 21; // bits 6-10 uint32_t rt = (tlsInstr & 0x03E00000) >> 21; // bits 6-10

uint32_t ra = (tlsInstr & 0x001F0000) >> 16; // bits 11-15 uint32_t ra = (tlsInstr & 0x001F0000) >> 16; // bits 11-15

if (ra == rt) { if (ra == rt) {

write32(loc - 1, NOP); write32(loc - 1, NOP);

} else { } else {

// mr rt, ra // mr rt, ra

write32(loc - 1, 0x7C000378 | (rt << 16) | (ra << 21) | (ra << 11)); write32(loc - 1, 0x7C000378 | (rt << 16) | (ra << 21) | (ra << 11));

} }

} else { } else {

uint32_t dFormOp = getPPCDFormOp(secondaryOp); uint32_t dFormOp = getPPCDFormOp(secondaryOp);

if (dFormOp == 0) { // Expecting a DS-Form instruction.

dFormOp = getPPCDSFormOp(secondaryOp);

if (dFormOp == 0) if (dFormOp == 0)

errorOrWarn("unrecognized instruction for IE to LE R_PPC64_TLS"); errorOrWarn("unrecognized instruction for IE to LE R_PPC64_TLS");

write32(loc - 1, ((dFormOp << 26) | (tlsInstr & 0x03FF0000))); }

write32(loc - 1, (dFormOp | (tlsInstr & 0x03ff0000)));

} }

} else { } else {

errorOrWarn("R_PPC64_TLS must be either 4 byte aligned or one byte " errorOrWarn("R_PPC64_TLS must be either 4 byte aligned or one byte "

"offset from 4 byte aligned"); "offset from 4 byte aligned");

} }

break; break;

} }

default: default:

▲ Show 20 Lines • Show All 768 Lines • Show Last 20 Lines

lld/ELF/Target.h

Show First 20 Lines • Show All 202 Lines • ▼ Show 20 Lines	static inline std::string getErrorLocation(const uint8_t *loc) {
return getErrorPlace(loc).loc;		return getErrorPlace(loc).loc;
}		}

void processArmCmseSymbols();		void processArmCmseSymbols();

void writePPC32GlinkSection(uint8_t *buf, size_t numEntries);		void writePPC32GlinkSection(uint8_t *buf, size_t numEntries);

unsigned getPPCDFormOp(unsigned secondaryOp);		unsigned getPPCDFormOp(unsigned secondaryOp);
		unsigned getPPCDSFormOp(unsigned secondaryOp);

// In the PowerPC64 Elf V2 abi a function can have 2 entry points. The first		// In the PowerPC64 Elf V2 abi a function can have 2 entry points. The first
// is a global entry point (GEP) which typically is used to initialize the TOC		// is a global entry point (GEP) which typically is used to initialize the TOC
// pointer in general purpose register 2. The second is a local entry		// pointer in general purpose register 2. The second is a local entry
// point (LEP) which bypasses the TOC pointer initialization code. The		// point (LEP) which bypasses the TOC pointer initialization code. The
// offset between GEP and LEP is encoded in a function's st_other flags.		// offset between GEP and LEP is encoded in a function's st_other flags.
// This function will return the offset (in bytes) from the global entry-point		// This function will return the offset (in bytes) from the global entry-point
// to the local entry-point.		// to the local entry-point.
▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

lld/test/ELF/ppc32-tls-ie.s

	# REQUIRES: ppc			# REQUIRES: ppc
	# RUN: llvm-mc -filetype=obj -triple=powerpc %s -o %t.o			# RUN: llvm-mc -filetype=obj -triple=powerpc %s -o %t.o

	# RUN: ld.lld -shared %t.o -o %t.so			# RUN: ld.lld -shared %t.o -o %t.so
	# RUN: llvm-readobj -d -r %t.so \| FileCheck --check-prefix=IE-REL %s			# RUN: llvm-readobj -d -r %t.so \| FileCheck --check-prefix=IE-REL %s
	# RUN: llvm-objdump -d --no-show-raw-insn %t.so \| FileCheck --check-prefix=IE %s			# RUN: llvm-objdump -d --no-show-raw-insn %t.so \| FileCheck --check-prefix=IE %s

	# RUN: ld.lld %t.o -o %t			# RUN: ld.lld %t.o -o %t
	# RUN: llvm-readelf -r %t \| FileCheck --check-prefix=NOREL %s			# RUN: llvm-readelf -r %t \| FileCheck --check-prefix=NOREL %s
	# RUN: llvm-objdump -d --no-show-raw-insn %t \| FileCheck --check-prefix=LE %s			# RUN: llvm-objdump -d --no-show-raw-insn %t \| FileCheck --check-prefix=LE %s

	# IE-REL: FLAGS STATIC_TLS			# IE-REL: FLAGS STATIC_TLS
	## A non-preemptable symbol (b) has 0 st_shndx.			## A non-preemptable symbol (b) has 0 st_shndx.
	# IE-REL: .rela.dyn {			# IE-REL: .rela.dyn {
	# IE-REL-NEXT: 0x20238 R_PPC_TPREL32 - 0xC			# IE-REL-NEXT: 0x20258 R_PPC_TPREL32 - 0xC
	# IE-REL-NEXT: 0x20234 R_PPC_TPREL32 a 0x0			# IE-REL-NEXT: 0x20254 R_PPC_TPREL32 a 0x0
	# IE-REL-NEXT: }			# IE-REL-NEXT: }

	## &.got[3] - _GLOBAL_OFFSET_TABLE_ = 12			## &.got[3] - _GLOBAL_OFFSET_TABLE_ = 12
	# IE: lwz 10, 12(9)			# IE: lwz 10, 12(9)
	# IE-NEXT: add 10, 10, 2			# IE-NEXT: add 10, 10, 2
	## &.got[4] - _GLOBAL_OFFSET_TABLE_ = 16			## &.got[4] - _GLOBAL_OFFSET_TABLE_ = 16
	# IE-NEXT: lwz 8, 16(7)			# IE-NEXT: lwz 8, 16(7)
	# IE-NEXT: lbzx 10, 8, 2			# IE-NEXT: lbzx 10, 8, 2
	Show All 14 Lines
	lbzx 10, 8, c@tls			lbzx 10, 8, c@tls

	## In IE, these instructions (op rT, rA, x@tls) are not changed.			## In IE, these instructions (op rT, rA, x@tls) are not changed.
	# IE-NEXT: lhzx 12, 2, 2			# IE-NEXT: lhzx 12, 2, 2
	# IE-NEXT: lwzx 13, 3, 2			# IE-NEXT: lwzx 13, 3, 2
	# IE-NEXT: stbx 14, 4, 2			# IE-NEXT: stbx 14, 4, 2
	# IE-NEXT: sthx 15, 5, 2			# IE-NEXT: sthx 15, 5, 2
	# IE-NEXT: stwx 16, 6, 2			# IE-NEXT: stwx 16, 6, 2
				# IE-NEXT: lhax 17, 7, 2
				# IE-NEXT: lwax 18, 8, 2
				# IE-NEXT: lfsx 19, 9, 2
				# IE-NEXT: lfdx 20, 10, 2
				# IE-NEXT: stfsx 21, 11, 2
				# IE-NEXT: stfdx 22, 12, 2

	## In LE, these X-Form instructions are changed to their corresponding D-Form.			## In LE, these X-Form instructions are changed to their corresponding D-Form.
	# LE-NEXT: lhz 12, -28660(2)			# LE-NEXT: lhz 12, -28660(2)
	# LE-NEXT: lwz 13, -28660(3)			# LE-NEXT: lwz 13, -28660(3)
	# LE-NEXT: stb 14, -28660(4)			# LE-NEXT: stb 14, -28660(4)
	# LE-NEXT: sth 15, -28660(5)			# LE-NEXT: sth 15, -28660(5)
	# LE-NEXT: stw 16, -28660(6)			# LE-NEXT: stw 16, -28660(6)
				# LE-NEXT: lha 17, -28660(7)
				# LE-NEXT: lwa 18, -28660(8)
				# LE-NEXT: lfs 19, -28660(9)
				# LE-NEXT: lfd 20, -28660(10)
				# LE-NEXT: stfs 21, -28660(11)
				# LE-NEXT: stfd 22, -28660(12)

	lhzx 12, 2, s@tls			lhzx 12, 2, s@tls
	lwzx 13, 3, i@tls			lwzx 13, 3, i@tls
	stbx 14, 4, c@tls			stbx 14, 4, c@tls
	sthx 15, 5, s@tls			sthx 15, 5, s@tls
	stwx 16, 6, i@tls			stwx 16, 6, i@tls
				lhax 17, 7, s@tls
				lwax 18, 8, i@tls
				lfsx 19, 9, f@tls
				lfdx 20, 10, d@tls
				stfsx 21, 11, f@tls
				stfdx 22, 12, d@tls
				ldx 23, 13, l@tls
				stdx 24, 14, l@tls

	.section .tbss			.section .tbss
	.globl a			.globl a
	.zero 8			.zero 8
	a:			a:
	.zero 4			.zero 4
	c:			c:
	s:			s:
	i:			i:
				f:
				d:
				l:

lld/test/ELF/ppc64-tls-ie.s

	Show All 18 Lines
	# RUN: llvm-objdump -d --no-show-raw-insn %t.so \| FileCheck --check-prefix=IE %s			# RUN: llvm-objdump -d --no-show-raw-insn %t.so \| FileCheck --check-prefix=IE %s
	## IE -> LE			## IE -> LE
	# RUN: ld.lld %t.o -o %t			# RUN: ld.lld %t.o -o %t
	# RUN: llvm-readelf -r %t \| FileCheck --check-prefix=NOREL %s			# RUN: llvm-readelf -r %t \| FileCheck --check-prefix=NOREL %s
	# RUN: llvm-objdump -d --no-show-raw-insn %t \| FileCheck --check-prefix=LE %s			# RUN: llvm-objdump -d --no-show-raw-insn %t \| FileCheck --check-prefix=LE %s

	# IE-REL: FLAGS STATIC_TLS			# IE-REL: FLAGS STATIC_TLS
	# IE-REL: .rela.dyn {			# IE-REL: .rela.dyn {
	# IE-REL-NEXT: 0x204C8 R_PPC64_TPREL64 c 0x0			# IE-REL-NEXT: 0x205A8 R_PPC64_TPREL64 c 0x0
	# IE-REL-NEXT: 0x204D0 R_PPC64_TPREL64 s 0x0			# IE-REL-NEXT: 0x205B0 R_PPC64_TPREL64 s 0x0
	# IE-REL-NEXT: 0x204D8 R_PPC64_TPREL64 i 0x0			# IE-REL-NEXT: 0x205B8 R_PPC64_TPREL64 i 0x0
	# IE-REL-NEXT: 0x204E0 R_PPC64_TPREL64 l 0x0			# IE-REL-NEXT: 0x205C0 R_PPC64_TPREL64 l 0x0
				# IE-REL-NEXT: 0x205C8 R_PPC64_TPREL64 f 0x0
				# IE-REL-NEXT: 0x205D0 R_PPC64_TPREL64 d 0x0
	# IE-REL-NEXT: }			# IE-REL-NEXT: }

	# INPUT-REL: R_PPC64_GOT_TPREL16_HA c 0x0			# INPUT-REL: R_PPC64_GOT_TPREL16_HA c 0x0
	# INPUT-REL: R_PPC64_GOT_TPREL16_LO_DS c 0x0			# INPUT-REL: R_PPC64_GOT_TPREL16_LO_DS c 0x0
	# INPUT-REL: R_PPC64_TLS c 0x0			# INPUT-REL: R_PPC64_TLS c 0x0
	## &.got[1] - .TOC. = -32760			## &.got[1] - .TOC. = -32760
	# IE-LABEL: <test1>:			# IE-LABEL: <test1>:
	# IE-NEXT: addis 3, 2, 0			# IE-NEXT: addis 3, 2, 0
	▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines

	# LE-LABEL: <test_ds>:			# LE-LABEL: <test_ds>:
	# LE-NEXT: addis 4, 13, 0			# LE-NEXT: addis 4, 13, 0
	# LE-NEXT: std 3, -28664(4)			# LE-NEXT: std 3, -28664(4)
	test_ds:			test_ds:
	ld 4, l@got@tprel(2)			ld 4, l@got@tprel(2)
	stdx 3, 4, l@tls			stdx 3, 4, l@tls

				# LE-LABEL: <test_lhax>:
				MaskRayUnsubmitted Done Reply Inline Actions Preferred style for new tests is that instructions are indented 2 column more than `<test11>`: # LE-LABEL: <test11>: # LE-NEXT: nop # LE-NEXT: addis 3, 13, 0 # LE-NEXT: lha 3, -28670(3) MaskRay: Preferred style for new tests is that instructions are indented 2 column more than `<test11>`…
				# LE-NEXT: nop
				# LE-NEXT: addis 3, 13, 0
				# LE-NEXT: lha 3, -28670(3)
				test_lhax:
				MaskRayUnsubmitted Done Reply Inline Actions Consider changing the numbering to something more descriptive. Then, if we want to add a new test in the middle, we won't have to renumber all following `test`. MaskRay:* Consider changing the numbering to something more descriptive. Then, if we want to add a new…
				addis 3, 2, s@got@tprel@ha
				ld 3, s@got@tprel@l(3)
				lhax 3, 3, s@tls

				# LE-LABEL: <test_lwax>:
				MaskRayUnsubmitted Done Reply Inline Actions ditto MaskRay: ditto
				# LE-NEXT: nop
				# LE-NEXT: addis 3, 13, 0
				# LE-NEXT: lwa 3, -28668(3)
				test_lwax:
				addis 3, 2, i@got@tprel@ha
				ld 3, i@got@tprel@l(3)
				lwax 3, 3, i@tls

				# LE-LABEL: <test_lfsx>:
				MaskRayUnsubmitted Done Reply Inline Actions ditto MaskRay: ditto
				# LE-NEXT: nop
				# LE-NEXT: addis 3, 13, 0
				# LE-NEXT: lfs 3, -28656(3)
				test_lfsx:
				addis 3, 2, f@got@tprel@ha
				ld 3, f@got@tprel@l(3)
				lfsx 3, 3, f@tls

				# LE-LABEL: <test_lfdx>:
				# LE-NEXT: nop
				# LE-NEXT: addis 3, 13, 0
				# LE-NEXT: lfd 3, -28648(3)
				test_lfdx:
				addis 3, 2, d@got@tprel@ha
				ld 3, d@got@tprel@l(3)
				lfdx 3, 3, d@tls

				# LE-LABEL: <test_stfsx>:
				# LE-NEXT: nop
				# LE-NEXT: addis 4, 13, 0
				# LE-NEXT: stfs 3, -28656(4)
				test_stfsx:
				addis 4, 2, f@got@tprel@ha
				ld 4, f@got@tprel@l(4)
				stfsx 3, 4, f@tls

				# LE-LABEL: <test_stfdx>:
				# LE-NEXT: nop
				# LE-NEXT: addis 4, 13, 0
				# LE-NEXT: stfd 3, -28648(4)
				test_stfdx:
				addis 4, 2, d@got@tprel@ha
				ld 4, d@got@tprel@l(4)
				stfdx 3, 4, d@tls

	# NOREL: There are no relocations in this file.			# NOREL: There are no relocations in this file.

	.section .tdata,"awT",@progbits			.section .tdata,"awT",@progbits
	.globl c, s, i, l			.globl c, s, i, l, f, d
	c:			c:
	.byte 97			.byte 97

	.p2align 1			.p2align 1
	s:			s:
	.short 55			.short 55

	.p2align 2			.p2align 2
	i:			i:
	.long 55			.long 55

	.p2align 3			.p2align 3
	l:			l:
	.quad 55			.quad 55
				f:
				.long 55

				.p2align 3
				d:
				.quad 55

lld/test/ELF/ppc64-tls-pcrel-ie.s

	Show All 23 Lines
	## done correctly.			## done correctly.

	#--- lds			#--- lds
	SECTIONS {			SECTIONS {
	.text_addr 0x1001000 : { *(.text_addr) }			.text_addr 0x1001000 : { *(.text_addr) }
	.text_val 0x1002000 : { *(.text_val) }			.text_val 0x1002000 : { *(.text_val) }
	.text_twoval 0x1003000 : { *(.text_twoval) }			.text_twoval 0x1003000 : { *(.text_twoval) }
	.text_incrval 0x1004000 : { *(.text_incrval) }			.text_incrval 0x1004000 : { *(.text_incrval) }
				.text_incrval_half 0x1005000 : { *(.text_incrval_half) }
				.text_incrval_word 0x1006000 : { *(.text_incrval_word) }
				.text_incrval_float 0x1007000 : { *(.text_incrval_float) }
				.text_incrval_double 0x1008000 : { *(.text_incrval_double) }
				.text_incrval_dword 0x1009000 : { *(.text_incrval_dword) }
				.text_incrval_half_zero 0x1010000 : { *(.text_incrval_half_zero) }
	}			}

	#--- defs			#--- defs
	.section .tbss,"awT",@nobits			.section .tbss,"awT",@nobits
	.globl x			.globl x
	x:			x:
	.long 0			.long 0
	.globl y			.globl y
	y:			y:
	.long 0			.long 0

	#--- asm			#--- asm
	# IE-RELOC: Relocation section '.rela.dyn' at offset 0x10090 contains 2 entries:			# IE-RELOC: Relocation section '.rela.dyn' at offset 0x10090 contains 2 entries:
	# IE-RELOC: 00000000010040f0 0000000100000049 R_PPC64_TPREL64 0000000000000000 x + 0			# IE-RELOC: 00000000010100f0 0000000100000049 R_PPC64_TPREL64 0000000000000000 x + 0
	# IE-RELOC: 00000000010040f8 0000000200000049 R_PPC64_TPREL64 0000000000000000 y + 0			# IE-RELOC-NEXT: 00000000010100f8 0000000200000049 R_PPC64_TPREL64 0000000000000000 y + 0
				MaskRayUnsubmitted Done Reply Inline Actions This line can use `-NEXT:` MaskRay: This line can use `-NEXT:`

	# IE-SYM: Symbol table '.dynsym' contains 3 entries:			# IE-SYM: Symbol table '.dynsym' contains 3 entries:
	# IE-SYM: 1: 0000000000000000 0 TLS GLOBAL DEFAULT UND x			# IE-SYM: 1: 0000000000000000 0 TLS GLOBAL DEFAULT UND x
	# IE-SYM: 2: 0000000000000000 0 TLS GLOBAL DEFAULT UND y			# IE-SYM: 2: 0000000000000000 0 TLS GLOBAL DEFAULT UND y

	# IE-GOT: Hex dump of section '.got':			# IE-GOT: Hex dump of section '.got':
	# IE-GOT-NEXT: 0x010040e8 e8c00001 00000000 00000000 00000000			# IE-GOT-NEXT: 0x010100e8 e8800101 00000000 00000000 00000000

	# LE-RELOC: There are no relocations in this file.			# LE-RELOC: There are no relocations in this file.

	# LE-SYM: Symbol table '.symtab' contains 8 entries:			# LE-SYM: Symbol table '.symtab' contains 14 entries:
	# LE-SYM: 6: 0000000000000000 0 TLS GLOBAL DEFAULT 6 x			# LE-SYM: 0000000000000000 0 TLS GLOBAL DEFAULT [[#]] x
				MaskRayUnsubmitted Done Reply Inline Actions The symbol indexes are not so reliable if we change the symbol table order in the future. Omit the section indexes as well. # LE-SYM: 0000000000000000 0 TLS GLOBAL DEFAULT [[#]] x MaskRay: The symbol indexes are not so reliable if we change the symbol table order in the future. Omit…
	# LE-SYM: 7: 0000000000000004 0 TLS GLOBAL DEFAULT 6 y			# LE-SYM: 0000000000000004 0 TLS GLOBAL DEFAULT [[#]] y

	# LE-GOT: could not find section '.got'			# LE-GOT: could not find section '.got'

	# IE-LABEL: <IEAddr>:			# IE-LABEL: <IEAddr>:
	# IE-NEXT: pld 3, 12528(0), 1			# IE-NEXT: pld 3, 61680(0), 1
	# IE-NEXT: add 3, 3, 13			# IE-NEXT: add 3, 3, 13
	# IE-NEXT: blr			# IE-NEXT: blr
	# LE-LABEL: <IEAddr>:			# LE-LABEL: <IEAddr>:
	# LE-NEXT: paddi 3, 13, -28672, 0			# LE-NEXT: paddi 3, 13, -28672, 0
	# LE-NEXT: nop			# LE-NEXT: nop
	# LE-NEXT: blr			# LE-NEXT: blr
	.section .text_addr, "ax", %progbits			.section .text_addr, "ax", %progbits
	IEAddr:			IEAddr:
	pld 3, x@got@tprel@pcrel(0), 1			pld 3, x@got@tprel@pcrel(0), 1
	add 3, 3, x@tls@pcrel			add 3, 3, x@tls@pcrel
	blr			blr

	# IE-LABEL: <IEAddrCopy>:			# IE-LABEL: <IEAddrCopy>:
	# IE-NEXT: pld 3, 12512(0), 1			# IE-NEXT: pld 3, 61664(0), 1
	# IE-NEXT: add 4, 3, 13			# IE-NEXT: add 4, 3, 13
	# IE-NEXT: blr			# IE-NEXT: blr
	# LE-LABEL: <IEAddrCopy>:			# LE-LABEL: <IEAddrCopy>:
	# LE-NEXT: paddi 3, 13, -28672, 0			# LE-NEXT: paddi 3, 13, -28672, 0
	# LE-NEXT: mr 4, 3			# LE-NEXT: mr 4, 3
	# LE-NEXT: blr			# LE-NEXT: blr
	.section .text_addr, "ax", %progbits			.section .text_addr, "ax", %progbits
	IEAddrCopy:			IEAddrCopy:
	pld 3, x@got@tprel@pcrel(0), 1			pld 3, x@got@tprel@pcrel(0), 1
	add 4, 3, x@tls@pcrel			add 4, 3, x@tls@pcrel
	blr			blr

	# IE-LABEL: <IEVal>:			# IE-LABEL: <IEVal>:
	# IE-NEXT: pld 3, 8432(0), 1			# IE-NEXT: pld 3, 57584(0), 1
	# IE-NEXT: lwzx 3, 3, 13			# IE-NEXT: lwzx 3, 3, 13
	# IE-NEXT: blr			# IE-NEXT: blr
	# LE-LABEL: <IEVal>:			# LE-LABEL: <IEVal>:
	# LE-NEXT: paddi 3, 13, -28672, 0			# LE-NEXT: paddi 3, 13, -28672, 0
	# LE-NEXT: lwz 3, 0(3)			# LE-NEXT: lwz 3, 0(3)
	# LE-NEXT: blr			# LE-NEXT: blr
	.section .text_val, "ax", %progbits			.section .text_val, "ax", %progbits
	IEVal:			IEVal:
	pld 3, x@got@tprel@pcrel(0), 1			pld 3, x@got@tprel@pcrel(0), 1
	lwzx 3, 3, x@tls@pcrel			lwzx 3, 3, x@tls@pcrel
	blr			blr

	# IE-LABEL: <IETwoVal>:			# IE-LABEL: <IETwoVal>:
	# IE-NEXT: pld 3, 4336(0), 1			# IE-NEXT: pld 3, 53488(0), 1
	# IE-NEXT: pld 4, 4336(0), 1			# IE-NEXT: pld 4, 53488(0), 1
	# IE-NEXT: lwzx 3, 3, 13			# IE-NEXT: lwzx 3, 3, 13
	# IE-NEXT: lwzx 4, 4, 13			# IE-NEXT: lwzx 4, 4, 13
	# IE-NEXT: blr			# IE-NEXT: blr
	# LE-LABEL: <IETwoVal>:			# LE-LABEL: <IETwoVal>:
	# LE-NEXT: paddi 3, 13, -28672, 0			# LE-NEXT: paddi 3, 13, -28672, 0
	# LE-NEXT: paddi 4, 13, -28668, 0			# LE-NEXT: paddi 4, 13, -28668, 0
	# LE-NEXT: lwz 3, 0(3)			# LE-NEXT: lwz 3, 0(3)
	# LE-NEXT: lwz 4, 0(4)			# LE-NEXT: lwz 4, 0(4)
	# LE-NEXT: blr			# LE-NEXT: blr
	.section .text_twoval, "ax", %progbits			.section .text_twoval, "ax", %progbits
	IETwoVal:			IETwoVal:
	pld 3, x@got@tprel@pcrel(0), 1			pld 3, x@got@tprel@pcrel(0), 1
	pld 4, y@got@tprel@pcrel(0), 1			pld 4, y@got@tprel@pcrel(0), 1
	lwzx 3, 3, x@tls@pcrel			lwzx 3, 3, x@tls@pcrel
	lwzx 4, 4, y@tls@pcrel			lwzx 4, 4, y@tls@pcrel
	blr			blr

	# IE-LABEL: <IEIncrementVal>:			# IE-LABEL: <IEIncrementVal>:
	# IE-NEXT: pld 4, 248(0), 1			# IE-NEXT: pld 4, 49400(0), 1
	# IE-NEXT: lwzx 3, 4, 13			# IE-NEXT: lwzx 3, 4, 13
	# IE-NEXT: stwx 3, 4, 13			# IE-NEXT: stwx 3, 4, 13
	# IE-NEXT: blr			# IE-NEXT: blr
	# LE-LABEL: <IEIncrementVal>:			# LE-LABEL: <IEIncrementVal>:
	# LE-NEXT: paddi 4, 13, -28668, 0			# LE-NEXT: paddi 4, 13, -28668, 0
	# LE-NEXT: lwz 3, 0(4)			# LE-NEXT: lwz 3, 0(4)
	# LE-NEXT: stw 3, 0(4)			# LE-NEXT: stw 3, 0(4)
	# LE-NEXT: blr			# LE-NEXT: blr
	.section .text_incrval, "ax", %progbits			.section .text_incrval, "ax", %progbits
	IEIncrementVal:			IEIncrementVal:
	pld 4, y@got@tprel@pcrel(0), 1			pld 4, y@got@tprel@pcrel(0), 1
	lwzx 3, 4, y@tls@pcrel			lwzx 3, 4, y@tls@pcrel
	stwx 3, 4, y@tls@pcrel			stwx 3, 4, y@tls@pcrel
	blr			blr

				# IE-LABEL: <IEIncrementValHalf>:
				# IE-NEXT: pld 4, 45304(0), 1
				# IE-NEXT: lhax 3, 4, 13
				# IE-NEXT: sthx 3, 4, 13
				# IE-NEXT: blr
				# LE-LABEL: <IEIncrementValHalf>:
				# LE-NEXT: paddi 4, 13, -28668, 0
				# LE-NEXT: lha 3, 0(4)
				# LE-NEXT: sth 3, 0(4)
				# LE-NEXT: blr
				.section .text_incrval_half, "ax", %progbits
				IEIncrementValHalf:
				MaskRayUnsubmitted Done Reply Inline Actions Consider changing the numbered section name and label name to something more descriptive. Then, if we want to add a new test in the middle, we won't have to renumber all following `.text_incrval` and `IEIncrementVal`. MaskRay: Consider changing the numbered section name and label name to something more descriptive. Then…
				pld 4, y@got@tprel@pcrel(0), 1
				lhax 3, 4, y@tls@pcrel
				sthx 3, 4, y@tls@pcrel
				blr

				# IE-LABEL: <IEIncrementValWord>:
				# IE-NEXT: pld 4, 41208(0), 1
				# IE-NEXT: lwax 3, 4, 13
				# IE-NEXT: stwx 3, 4, 13
				# IE-NEXT: blr
				# LE-LABEL: <IEIncrementValWord>:
				# LE-NEXT: paddi 4, 13, -28668, 0
				# LE-NEXT: lwa 3, 0(4)
				# LE-NEXT: stw 3, 0(4)
				# LE-NEXT: blr
				.section .text_incrval_word, "ax", %progbits
				IEIncrementValWord:
				pld 4, y@got@tprel@pcrel(0), 1
				lwax 3, 4, y@tls@pcrel
				stwx 3, 4, y@tls@pcrel
				blr

				# IE-LABEL: <IEIncrementValFloat>:
				# IE-NEXT: pld 4, 37112(0), 1
				# IE-NEXT: lfsx 3, 4, 13
				# IE-NEXT: stfsx 3, 4, 13
				# IE-NEXT: blr
				# LE-LABEL: <IEIncrementValFloat>:
				# LE-NEXT: paddi 4, 13, -28668, 0
				# LE-NEXT: lfs 3, 0(4)
				# LE-NEXT: stfs 3, 0(4)
				# LE-NEXT: blr
				.section .text_incrval_float, "ax", %progbits
				IEIncrementValFloat:
				pld 4, y@got@tprel@pcrel(0), 1
				lfsx 3, 4, y@tls@pcrel
				stfsx 3, 4, y@tls@pcrel
				blr

				# IE-LABEL: <IEIncrementValDouble>:
				# IE-NEXT: pld 4, 33016(0), 1
				# IE-NEXT: lfdx 3, 4, 13
				# IE-NEXT: stfdx 3, 4, 13
				# IE-NEXT: blr
				# LE-LABEL: <IEIncrementValDouble>:
				# LE-NEXT: paddi 4, 13, -28668, 0
				# LE-NEXT: lfd 3, 0(4)
				# LE-NEXT: stfd 3, 0(4)
				# LE-NEXT: blr
				.section .text_incrval_double, "ax", %progbits
				IEIncrementValDouble:
				pld 4, y@got@tprel@pcrel(0), 1
				lfdx 3, 4, y@tls@pcrel
				stfdx 3, 4, y@tls@pcrel
				blr

				# IE-LABEL: <IEIncrementValDword>:
				# IE-NEXT: pld 4, 28920(0), 1
				# IE-NEXT: ldx 3, 4, 13
				# IE-NEXT: stdx 3, 4, 13
				# IE-NEXT: blr
				# LE-LABEL: <IEIncrementValDword>:
				# LE-NEXT: paddi 4, 13, -28668, 0
				# LE-NEXT: ld 3, 0(4)
				# LE-NEXT: std 3, 0(4)
				# LE-NEXT: blr
				.section .text_incrval_dword, "ax", %progbits
				IEIncrementValDword:
				pld 4, y@got@tprel@pcrel(0), 1
				ldx 3, 4, y@tls@pcrel
				stdx 3, 4, y@tls@pcrel
				blr

				# IE-LABEL: <IEIncrementValHalfZero>:
				# IE-NEXT: pld 4, 248(0), 1
				# IE-NEXT: lhzx 3, 4, 13
				# IE-NEXT: sthx 3, 4, 13
				# IE-NEXT: blr
				# LE-LABEL: <IEIncrementValHalfZero>:
				# LE-NEXT: paddi 4, 13, -28668, 0
				# LE-NEXT: lhz 3, 0(4)
				# LE-NEXT: sth 3, 0(4)
				# LE-NEXT: blr
				.section .text_incrval_half_zero, "ax", %progbits
				IEIncrementValHalfZero:
				pld 4, y@got@tprel@pcrel(0), 1
				lhzx 3, 4, y@tls@pcrel
				sthx 3, 4, y@tls@pcrel
				blr

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC][lld] Account for additional X-Forms -> D-Form/DS-Forms load/stores when relaxing initial-exec to local-execClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 555027

lld/ELF/Arch/PPC.cpp

lld/ELF/Arch/PPC64.cpp

lld/ELF/Target.h

lld/test/ELF/ppc32-tls-ie.s

lld/test/ELF/ppc64-tls-ie.s

lld/test/ELF/ppc64-tls-pcrel-ie.s

[PowerPC][lld] Account for additional X-Forms -> D-Form/DS-Forms load/stores when relaxing initial-exec to local-exec
ClosedPublic