This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC][lld] Account for additional X-Forms -> D-Form/DS-Forms load/stores when relaxing initial-exec to local-exec
ClosedPublic

Authored by amyk on Aug 17 2023, 9:19 AM.

Details

Summary

D153645 added additional X-Form load/stores that can be generated for TLS accesses.
However, these added instructions have not been accounted for in lld. As a result,
lld does not know how to handle them and cannot relax initial-exec to local-exec
when the initial-exec sequence contains these additional load/stores.

This patch aims to resolve https://github.com/llvm/llvm-project/issues/64424.

Diff Detail

Event Timeline

amyk created this revision.Aug 17 2023, 9:19 AM
Herald added a project: Restricted Project. · View Herald Transcript
amyk requested review of this revision.Aug 17 2023, 9:19 AM
nemanjai added inline comments.Aug 18 2023, 4:28 AM
lld/ELF/Arch/PPC64.cpp
846

I understand the motivation to return both the primary opcode and the last two bits here to differentiate LWA from LD/STD. However, I am not a fan of the different interface of these very similar and related functions.

Do any users of getPPCDFormOp() need the primary opcode to be in the least significant bits? Could we change the interface to return the instruction with the primary opcode in the most significant bits regardless of whether it is D-Form/DS-Form or even (in the future) DQ-Form?

MaskRay added inline comments.Aug 18 2023, 9:43 AM
lld/ELF/Arch/PPC64.cpp
846

static unsigned getPPCDSFormOp

getPPCDFormOp is external just because PPC.cpp uses it as well.

amyk updated this revision to Diff 551708.Aug 18 2023, 10:40 PM

Update getPPCDFormOp() to return the D-Form opcodes in the most significant bits (like getPPCDSFormOp().

lld/ELF/Arch/PPC64.cpp
846

@nemanjai That's a good point, since the functions are very related.

After taking a look, to me, it doesn't appear that any users of getPPCDFormOp() require the primary opcode to be in the least significant bits.

I also tried shifting all of the D-Forms over to the most significant bits like the DS-Form ones. This works, but I'm uncertain if this is the right approach so I was wondering if you have any thoughts regarding this.

846

@MaskRay I might be misunderstanding something, but I am also using getPPCDSFormOp() in PPC.cpp, as well.
Doesn't this mean this function also needs to be external?

This patch fixes a regression caused by a patch that is in v17. We will need to decide if we should backport this fix or if we should pull the original patch (which also has additional patches dependent on it that are in v17 as well). I think this patch is obvious enough to be backported, but I'll defer the final decision on that to @MaskRay once this is approved.
@amyk Perhaps it would help to perform a thorough test with the approved version of this patch applied to v17 to ensure it is safe to backport.

lld/ELF/Arch/PPC64.cpp
855

I forgot to mention it yesterday, but I don't think we should have a different signal for failure here (i.e. returning 1 instead of 0 like in getPPCDFormOp()). Let's just use zero for both.

944

Using 0x03FFFFFC rather than 0x03FFFFFF is presumably because the two least significant bits of the DS-Form must not be overwritten. While I think that makes sense, I have a bit of a concern with using a different value there:

  1. We always had DS-Forms there (LD/STD) and didn't treat them specially
  2. Presumably, having either of those bits set in the incoming instruction is an indication that something has gone wrong with whatever produced the code, so masking out the bits just hides the issue

Perhaps it would be appropriate to set the finalInstr the same way regardless of how we set the dFormOp and just have an assert (or possibly a fatal_error or warning is even better) in the DS-Form section that those bits are unset.

amyk marked an inline comment as done.Aug 19 2023, 8:09 AM

As Nemanja suggests, I will also apply this patch to the LLVM 17 release branch to test.

lld/ELF/Arch/PPC64.cpp
855

I will update this.

944

I'd like to understand this suggestion a bit more.

Do you mean, we should just do:

if (dFormOp == 0) { // Expecting a DS-Form instruction.
  dFormOp = getPPCDSFormOp(secondaryOp);
  if (dFormOp == 0)
    error("unrecognized instruction for IE to LE R_PPC64_TLS");
        
    // Check if the last two bits are unset. If set, we `errorOrWarn()`

    finalReloc = R_PPC64_TPREL16_LO_DS;
} else
    finalReloc = R_PPC64_TPREL16_LO;
  write32(loc, (dFormOp | (read32(loc) & 0x03FFFFFF)));
  relocateNoSym(loc + offset, finalReloc, val);
}

In any case, yes, if I understand correctly, 0x03FFFFFF grabs bits 6-31 in the X-Form instruction, while 0x03FFFFFC grabs bits 6-29, leaving the last two bits.

Also, to give a bit more context behind why I am treating the DS-Forms differently:

I discussed the use of use of 0x03FFFFFC in the DS-Form case with a few others before putting up the patch.
This all came about because I was using a different relocation for DS-Form R_PPC64_TPREL16_LO_DS.
I found that when I used the different relocation with the original masking in the DS-Form case:

write32(loc, (dFormOp | (read32(loc) & 0x03FFFFFF)));
relocateNoSym(loc + offset, R_PPC64_TPREL16_LO_DS, val);

For the following situation:

test8:
  addis 4, 2, l@got@tprel@ha
  ld 4, l@got@tprel@l(4)
  stdx 3, 4, l@tls

.section .tdata,"awT",@progbits
.globl l
l:
.quad 55

I'd get stq instead of std like we would expect.

Disassembly of section .text:

0000000010010200 <test8>:
10010200:      	nop
10010204:      	addis 4, 13, 0
10010208:      	stq 3, -28672(4)

Presumably this is because std and stq share the same primary opcode, and the other thing that differs are the last two bits (00 for std whereas 10 for stq), so we found that 0x03FFFFFC works with R_PPC64_TPREL16_LO_DS.

I don't quite remember the exact discussion that we also had, but I think we were saying before that LD/STD perhaps just happened to have worked when we did 0x03FFFFFF with R_PPC64_TPREL16_LO in the past, since the last two bits in these instructions are both 00.

amyk marked an inline comment as done.Aug 19 2023, 11:04 AM

We will need to decide if we should backport this fix or if we should pull the original patch (which also has additional patches dependent on it that are in v17 as well).

Yes, rG598cccea80f5614869bf0dda4d09d68b2c64423c is also dependent on the original patch.

Additionally, after applying this patch to the release/17.x branch, my bootstrap build/run completed with no issues.
This includes building libc++ as well (since libc++ was originally where I found this issue: https://github.com/llvm/llvm-project/issues/64424).

MaskRay added inline comments.Aug 19 2023, 7:05 PM
lld/ELF/Arch/PPC64.cpp
849

ditto below

855

Mark as unresolved.

862

ditto below

lld/test/ELF/ppc64-tls-ie.s
157

Preferred style for new tests is that instructions are indented 2 column more than <test11>:

# LE-LABEL: <test11>:
# LE-NEXT:    nop
# LE-NEXT:    addis 3, 13, 0
# LE-NEXT:    lha 3, -28670(3)
161

Consider changing the numbering to something more descriptive. Then, if we want to add a new test in the middle, we won't have to renumber all following test*.

166

ditto

175

ditto

lld/test/ELF/ppc64-tls-pcrel-ie.s
159

Consider changing the numbered section name and label name to something more descriptive. Then, if we want to add a new test in the middle, we won't have to renumber all following .text_incrval* and IEIncrementVal*.

amyk updated this revision to Diff 551804.Aug 19 2023, 10:21 PM
amyk marked 10 inline comments as done.
  • Remove extra parentheses in PPC64.cpp
  • Update test cases to have more unique names
  • Make getPPCDSFormOp() return 0 for the default case
amyk added a comment.EditedAug 19 2023, 10:21 PM

I've addressed all of the current comments, with the exception of Nemanja's comment on using a different mask for the DS-Form case (since we're still currently discussing that and I have asked for some clarification).

lld/ELF/Arch/PPC64.cpp
855

Thank you for pointing that out. I have now updated the patch, so the comment should be resolved.

nemanjai added inline comments.Aug 20 2023, 4:17 AM
lld/ELF/Arch/PPC64.cpp
944

That stq vs. std issue is a bit alarming. Where do the low bits come from? The variable l is aligned at 8 bytes.

sfertile added inline comments.Aug 21 2023, 7:57 AM
lld/ELF/Arch/PPC64.cpp
944

Where do the low bits come from?

From the stdx instruction. The XO field is 149 so bit 30 is 1, and bit 31 is indeterminate but those are typically zeroed, so we have 2 in the last 2 bits. Using read32(loc) & 0x03FFFFFF will leave those bits in, and then relocating with R_PPC64_TPREL16_LO_DS we end preserving the non-zero value in the 'XO' bits of the instruction.

I can see why you might want to preserve what we were doing before because it was working for LD/STD but this assumption is wrong -

Presumably, having either of those bits set in the incoming instruction is an indication that something has gone wrong with whatever produced the code, so masking out the bits just hides the issue

The stdx instruction is the counter example. Thats why we updated the mask to reflect we are actually working with a DS-form instruction.

nemanjai added inline comments.Aug 22 2023, 8:53 AM
lld/ELF/Arch/PPC64.cpp
944

Ah, ok. This makes sense. Just out of curiosity, why aren't we masking out the XO field completely? It seems odd that we are masking out the bottom two bits (one bit from XO and one reserved bit) but we are leaving the rest of the XO field intact.

MaskRay accepted this revision.Aug 22 2023, 9:18 AM

LGTM from my viewpoint if a PowerPC reviewer looks ok as well.

lld/test/ELF/ppc64-tls-pcrel-ie.s
52

This line can use -NEXT:

64

The symbol indexes are not so reliable if we change the symbol table order in the future.
Omit the section indexes as well.

# LE-SYM: 0000000000000000     0 TLS     GLOBAL DEFAULT     [[#]] x
This revision is now accepted and ready to land.Aug 22 2023, 9:18 AM
sfertile added inline comments.Aug 22 2023, 10:04 AM
lld/ELF/Arch/PPC64.cpp
944

Yeah that's a good point. I think it was probably working because the relocate function would end up masking out the relocated bits but we should update the mask used here to reflect exactly what bits we want to extract from the existing instruction.

amyk updated this revision to Diff 552453.Aug 22 2023, 11:27 AM
amyk marked 7 inline comments as done.

Address review comments:

  • Update checks for ppc64-tls-ie.s
  • Mask out all of the XO bits for DS-Form
lld/ELF/Arch/PPC64.cpp
944

Sure. I will update it so that the entire XO field is masked out. This also works.

MaskRay added inline comments.Aug 22 2023, 12:38 PM
lld/ELF/Arch/PPC64.cpp
947

Nit: lld/ELF codebase isn's consistent on uppercase and lowercase. But for newer code, prefer lowercase hexidedicamls.

amyk updated this revision to Diff 552528.Aug 22 2023, 4:04 PM

Address comment to make hexadecimals lowercase.

amyk marked an inline comment as done.Aug 22 2023, 4:04 PM
nemanjai added inline comments.Aug 23 2023, 1:39 PM
lld/ELF/Arch/PPC64.cpp
944

I'm sorry Amy, but the purpose of my comment was to common up the masks between the D-Form and DS-Form updated instructions. If I am not mistaken, regardless of what the input instruction was, we want to mask out bits 21-31. I suppose we really want to mask out all the bits that will be replaced by the D/DS field, so presumably it should be bits 16-31.

amyk updated this revision to Diff 552984.Aug 23 2023, 9:46 PM
amyk marked an inline comment as done.

Address Nemanja's comment by masking out bits 16-31 for both D-Form/DS-Forms.
Builds and tests successfully on both main and release/17.x.

lld/ELF/Arch/PPC64.cpp
944

Ah! Good point. Sorry, I had misunderstood your previous comment and thought it only applied to the DS-Forms.
What you said makes sense, and I've updated it once again.

MaskRay accepted this revision.Aug 28 2023, 12:51 PM

LGTM if you want to either backport this to release/17.x or consider this risky and revert the prior patch just in release/17.x :)

This revision was landed with ongoing or failed builds.Aug 31 2023, 6:45 AM
This revision was automatically updated to reflect the committed changes.