This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Replace foul hackery with real calls to __tls_get_addr
ClosedPublic

Authored by wschmidt on Nov 10 2014, 8:29 PM.

Details

Summary

My original support for the general dynamic and local dynamic TLS models contained some fairly obtuse hacks to generate calls to __tls_get-addr when lowering a TargetGlobalTLSAddress. Rather than generating real calls, special GET_TLS_ADDR nodes were used to wrap the calls and only reveal them at assembly time. I attempted to provide correct parameter and return values by chaining CopyToReg and CopyFromReg nodes onto the GET_TLS_ADDR nodes, but this was also not fully correct. Problems were seen with two back-to-back stores to TLS variables, where the call sequences ended up overlapping, with unhappy results. Additionally, since there weren't real calls, the proper register side effects of a call were not recorded, so clobbered values were kept live across the calls.

The proper thing to do is to lower these into calls in the first place. This is relatively straightforward; see the changes to PPCTargetLowering::LowerGlobalTLSAddress(). The changes here are standard call lowering, except that we need to track the fact that these calls will require a relocation. This is done by adding a machine operand flag of MO_TLSLD or MO_TLSGD to the TargetGlobalAddress operand that appears earlier in the sequence.

The calls to LowerCallTo() eventually find their way to LowerCall_64SVR4() or LowerCall_32SVR4(), which call FinishCall(), which calls PrepareCall(). In PrepareCall(), we detect the calls to __tls_get_addr and immediately snag the TargetGlobalTLSAddress with the annotated relocation information. This becomes an extra operand on the call following the callee, which is expected for nodes of type tlscall. We change the call opcode to CALL_TLS for this case. Back in FinishCall(), we change it again to CALL_NOP_TLS for 64-bit only, since we require a TOC-restore nop following the call for the 64-bit ABIs.

During selection, patterns in PPCInstrInfo.td and PPCInstr64Bit.td convert the CALL_TLS nodes into BL_TLS nodes, and convert the CALL_NOP_TLS nodes into BL8_NOP_TLS nodes. This replaces the code removed from PPCAsmPrinter.cpp, as the BL_TLS and BL8_NOP_TLS nodes can now be emitted normally using their patterns and the associated printTLSCall print method.

Note that this removed code includes some special handling for 32-bit SVR4 PIC (vs. pic) code, to ensure the proper relocation is placed on the __tls_get_addr symbol. I believe the same effect is produced by the code I added in PPCTargetLowering::LowerGlobalTLSAddress() that places the PPCII::MO_PLT_OR_STUB flag on the symbol. I don't know whether or not this is working, because Justin apparently didn't add any test cases for this when he put that code in place. /fingerwag

>>> Justin, can you please add a test case for this so I can verify that I haven't broken your code? <<<

Finally, as a result of these changes, all references to get-tls-addr in its various guises are no longer used, so they have been removed.

There are existing TLS tests to verify that the changes haven't messed anything up (other than the 32-bit PIC stuff). I've added one new test that verifies that the problem with the original code has been fixed.

Diff Detail

Event Timeline

wschmidt updated this revision to Diff 16024.Nov 10 2014, 8:29 PM
wschmidt retitled this revision from to [PowerPC] Replace foul hackery with real calls to __tls_get_addr.
wschmidt updated this object.
wschmidt edited the test plan for this revision. (Show Details)
wschmidt added reviewers: uweigand, hfinkel, chmeee, seurer.
wschmidt set the repository for this revision to rL LLVM.
wschmidt added a subscriber: Unknown Object (MLST).

Tests for 32-bit are already included, as part of tls-pic.ll.

rafael added inline comments.
lib/Target/PowerPC/PPCISelLowering.cpp
3726

This patch is nice, but this part looks a bit too clever. Could you pass TGTAddr down the callchain maybe?

hfinkel added inline comments.Nov 10 2014, 10:35 PM
lib/Target/PowerPC/PPCISelLowering.cpp
782–784

Please add CALL_TLS, CALL_NOP_TLS here.

1770

This block of code seems the same as the one added above. Can you please refactor this?

In D6209#5, @jhibbits wrote:

Tests for 32-bit are already included, as part of tls-pic.ll.

Yes, I saw those. However, what I'm referring to is the use of VK_PLT in the PPCAsmPrinter::EmitInstruction() code, the purpose of which is to ensure that ELF::R_PPC_PLTREL24 is applied to the __tls_get_addr symbol rather than ELF::R_PPC_REL24. You don't have a relocation test for this. See test/MC/PowerPC/tls-gd-obj.s for an example of how to do this.

Though using llvm-mc followed by llvm-readobj isn't really sufficient here, since we need to test a longer path. Something like llc with integrated-asm producing a .o and then using llvm-readobj. I haven't tried to write a test like that.

In D6209#10, @wschmidt wrote:
In D6209#5, @jhibbits wrote:

Tests for 32-bit are already included, as part of tls-pic.ll.

Yes, I saw those. However, what I'm referring to is the use of VK_PLT in the PPCAsmPrinter::EmitInstruction() code, the purpose of which is to ensure that ELF::R_PPC_PLTREL24 is applied to the __tls_get_addr symbol rather than ELF::R_PPC_REL24. You don't have a relocation test for this. See test/MC/PowerPC/tls-gd-obj.s for an example of how to do this.

I don't follow. VK_PLT is a decorator for @PLT, and I already have tests for ensuring @PLT becomes R_PPC_PLTREL24 (ppc-reloc.s). Isn't a sufficient test for this simply to ensure that @PLT is generated when that code path is followed? Do you instead expect that the inline assembler will recognize a call to __tls_get_addr() and itself add VK_PLT? The other relocations for PLT are identical to those of powerpc64 in this case. Okay, maybe a test to generate R_PPC_GOT_TLSLD/R_PPC_GOT_TLSGD instead of R_PPC_GOT_TLSLD_LO/R_PPC_GOT_TLSGD_LO might be prudent, and I can readily add, but that's not what you requested.

I don't follow. VK_PLT is a decorator for @PLT, and I already have tests for ensuring @PLT becomes R_PPC_PLTREL24 (ppc-reloc.s). Isn't a sufficient test for this simply to ensure that @PLT is generated when that code path is followed? Do you instead expect that the inline assembler will recognize a call to __tls_get_addr() and itself add VK_PLT? The other relocations for PLT are identical to those of powerpc64 in this case. Okay, maybe a test to generate R_PPC_GOT_TLSLD/R_PPC_GOT_TLSGD instead of R_PPC_GOT_TLSLD_LO/R_PPC_GOT_TLSGD_LO might be prudent, and I can readily add, but that's not what you requested.

Normally, @PLT is how you get R_PPC_PLTREL24, but that is apparently not the case here. You added some special-case code in PPCAsmPrinter.cpp that specifically places the VK_PLT on the __tls_get_addr symbol ONLY, which is never emitted with @plt. (It's in the deleted chunks in this patch.) This appears to be a way of forcing the relocation that you want, and this path is not tested in your test cases.

From what I have been able to piece together, I believe that placing MO_PLT_OR_STUB on the TGA operand early will eventually cause VK_PLT to be used, and hence generate the correct relocation. I can't tell whether this regresses your code or not, though, because there wasn't a test.

The way I know there isn't a test is that I only discovered your code when I was inspecting the patch. I deleted your VK_PLT workaround with impunity and none of the tests failed. When I spotted that I added the MO_PLT_OR_STUB code.

lib/Target/PowerPC/PPCISelLowering.cpp
782–784

Good catch, thanks!

1770

Sure, good idea.

3726

I couldn't think of a way to pass it down directly -- I'd certainly welcome a suggestion, as I agree with your assessment. Hacking it onto the Args list seemed like a worse idea. I can't really do more with the chain argument because it's used by the call-lowering code. Thoughts?

At least the present solution is guaranteed to work because we're just looking at the sequence generated by the call-lowering, which is still in progress. I.e., no optimizations or anything can get in the way.

In D6209#13, @wschmidt wrote:

I don't follow. VK_PLT is a decorator for @PLT, and I already have tests for ensuring @PLT becomes R_PPC_PLTREL24 (ppc-reloc.s). Isn't a sufficient test for this simply to ensure that @PLT is generated when that code path is followed? Do you instead expect that the inline assembler will recognize a call to __tls_get_addr() and itself add VK_PLT? The other relocations for PLT are identical to those of powerpc64 in this case. Okay, maybe a test to generate R_PPC_GOT_TLSLD/R_PPC_GOT_TLSGD instead of R_PPC_GOT_TLSLD_LO/R_PPC_GOT_TLSGD_LO might be prudent, and I can readily add, but that's not what you requested.

Normally, @PLT is how you get R_PPC_PLTREL24, but that is apparently not the case here. You added some special-case code in PPCAsmPrinter.cpp that specifically places the VK_PLT on the __tls_get_addr symbol ONLY, which is never emitted with @plt. (It's in the deleted chunks in this patch.) This appears to be a way of forcing the relocation that you want, and this path is not tested in your test cases.

From what I have been able to piece together, I believe that placing MO_PLT_OR_STUB on the TGA operand early will eventually cause VK_PLT to be used, and hence generate the correct relocation. I can't tell whether this regresses your code or not, though, because there wasn't a test.

The way I know there isn't a test is that I only discovered your code when I was inspecting the patch. I deleted your VK_PLT workaround with impunity and none of the tests failed. When I spotted that I added the MO_PLT_OR_STUB code.

There are two reasons for this: I took the path of least code change, where you were already using VK_None, on the MCSymbolExpr, and the call was already lowered, we were constructing the branch by hand. The VK_PLT normally gets added in GetSymbolRef() in PPCMCInstLower.cpp. An alternative route, in hindsight, could have been to lower it after constructing the call, but again that's hindsight, and with your change becomes irrelevant. Without explicitly adding the VK_PLT, it was not generating the @PLT you see in tls-pic.ll (I checked). So, really, it was only a case of where in the path it was added, and in this case the flag was added late, so the kind was needed.

Since you already did a code inspection, you may have seen that in PrepareCall() PPCII::MO_PLT_OR_STUB gets added automatically, making your addition unnecessary, so I think you can drop that from the patch (if it already passes the tests, then it works).

In D6209#14, @jhibbits wrote:
In D6209#13, @wschmidt wrote:

I don't follow. VK_PLT is a decorator for @PLT, and I already have tests for ensuring @PLT becomes R_PPC_PLTREL24 (ppc-reloc.s). Isn't a sufficient test for this simply to ensure that @PLT is generated when that code path is followed? Do you instead expect that the inline assembler will recognize a call to __tls_get_addr() and itself add VK_PLT? The other relocations for PLT are identical to those of powerpc64 in this case. Okay, maybe a test to generate R_PPC_GOT_TLSLD/R_PPC_GOT_TLSGD instead of R_PPC_GOT_TLSLD_LO/R_PPC_GOT_TLSGD_LO might be prudent, and I can readily add, but that's not what you requested.

Normally, @PLT is how you get R_PPC_PLTREL24, but that is apparently not the case here. You added some special-case code in PPCAsmPrinter.cpp that specifically places the VK_PLT on the __tls_get_addr symbol ONLY, which is never emitted with @plt. (It's in the deleted chunks in this patch.) This appears to be a way of forcing the relocation that you want, and this path is not tested in your test cases.

From what I have been able to piece together, I believe that placing MO_PLT_OR_STUB on the TGA operand early will eventually cause VK_PLT to be used, and hence generate the correct relocation. I can't tell whether this regresses your code or not, though, because there wasn't a test.

The way I know there isn't a test is that I only discovered your code when I was inspecting the patch. I deleted your VK_PLT workaround with impunity and none of the tests failed. When I spotted that I added the MO_PLT_OR_STUB code.

There are two reasons for this: I took the path of least code change, where you were already using VK_None, on the MCSymbolExpr, and the call was already lowered, we were constructing the branch by hand. The VK_PLT normally gets added in GetSymbolRef() in PPCMCInstLower.cpp. An alternative route, in hindsight, could have been to lower it after constructing the call, but again that's hindsight, and with your change becomes irrelevant. Without explicitly adding the VK_PLT, it was not generating the @PLT you see in tls-pic.ll (I checked). So, really, it was only a case of where in the path it was added, and in this case the flag was added late, so the kind was needed.

Since you already did a code inspection, you may have seen that in PrepareCall() PPCII::MO_PLT_OR_STUB gets added automatically, making your addition unnecessary, so I think you can drop that from the patch (if it already passes the tests, then it works).

Mea culpa -- I missed the @PLT examples in the tls-pic.ll test. I misunderstood what was going on with the extra code there, and agree that I can remove the extra code (which makes things much cleaner, yay). Thanks for educating me, and putting up with the nonsense. ;) This stuff is so dispersed throughout the code that it's hard to keep all the threads together.

wschmidt updated this revision to Diff 16051.Nov 11 2014, 11:54 AM

This version addresses the comments I've received, with the exception of the one from Rafael. I looked at this again and still couldn't think of a good way to pass the TGA through LowerCallTo. I'm willing to do this if anyone has a good idea, but for now this is what I've got.

hfinkel accepted this revision.Nov 11 2014, 12:29 PM
hfinkel edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Nov 11 2014, 12:29 PM
wschmidt closed this revision.Nov 11 2014, 12:36 PM

r221703, thanks!

Again, apologies to Justin for misunderstanding his test cases.