My original support for the general dynamic and local dynamic TLS models contained some fairly obtuse hacks to generate calls to __tls_get-addr when lowering a TargetGlobalTLSAddress. Rather than generating real calls, special GET_TLS_ADDR nodes were used to wrap the calls and only reveal them at assembly time. I attempted to provide correct parameter and return values by chaining CopyToReg and CopyFromReg nodes onto the GET_TLS_ADDR nodes, but this was also not fully correct. Problems were seen with two back-to-back stores to TLS variables, where the call sequences ended up overlapping, with unhappy results. Additionally, since there weren't real calls, the proper register side effects of a call were not recorded, so clobbered values were kept live across the calls.
The proper thing to do is to lower these into calls in the first place. This is relatively straightforward; see the changes to PPCTargetLowering::LowerGlobalTLSAddress(). The changes here are standard call lowering, except that we need to track the fact that these calls will require a relocation. This is done by adding a machine operand flag of MO_TLSLD or MO_TLSGD to the TargetGlobalAddress operand that appears earlier in the sequence.
The calls to LowerCallTo() eventually find their way to LowerCall_64SVR4() or LowerCall_32SVR4(), which call FinishCall(), which calls PrepareCall(). In PrepareCall(), we detect the calls to __tls_get_addr and immediately snag the TargetGlobalTLSAddress with the annotated relocation information. This becomes an extra operand on the call following the callee, which is expected for nodes of type tlscall. We change the call opcode to CALL_TLS for this case. Back in FinishCall(), we change it again to CALL_NOP_TLS for 64-bit only, since we require a TOC-restore nop following the call for the 64-bit ABIs.
During selection, patterns in PPCInstrInfo.td and PPCInstr64Bit.td convert the CALL_TLS nodes into BL_TLS nodes, and convert the CALL_NOP_TLS nodes into BL8_NOP_TLS nodes. This replaces the code removed from PPCAsmPrinter.cpp, as the BL_TLS and BL8_NOP_TLS nodes can now be emitted normally using their patterns and the associated printTLSCall print method.
Note that this removed code includes some special handling for 32-bit SVR4 PIC (vs. pic) code, to ensure the proper relocation is placed on the __tls_get_addr symbol. I believe the same effect is produced by the code I added in PPCTargetLowering::LowerGlobalTLSAddress() that places the PPCII::MO_PLT_OR_STUB flag on the symbol. I don't know whether or not this is working, because Justin apparently didn't add any test cases for this when he put that code in place. /fingerwag
>>> Justin, can you please add a test case for this so I can verify that I haven't broken your code? <<<
Finally, as a result of these changes, all references to get-tls-addr in its various guises are no longer used, so they have been removed.
There are existing TLS tests to verify that the changes haven't messed anything up (other than the 32-bit PIC stuff). I've added one new test that verifies that the problem with the original code has been fixed.
Please add CALL_TLS, CALL_NOP_TLS here.