This is an archive of the discontinued LLVM Phabricator instance.

[AIX][TLS] Generate optimized local-exec access code sequence using X-Form loads/stores
ClosedPublic

Authored by amyk on May 11 2023, 7:26 AM.

Details

Summary

This patch is a follow up to D149722, D152669 and D153645, where a slightly more optimized code
sequence is generated for 64-bit and 32-bit local-exec accesses when optimizations are turned on.

Handling is added PPCISelDAGToDAG.cpp in order to check if any D-form loads or stores that follow
an PPCISD::ADD_TLS can be optimized to use an X-Form load or store. In this particular situation,
this allows the ADD_TLS node to be removed completely.

Diff Detail

Event Timeline

amyk created this revision.May 11 2023, 7:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2023, 7:26 AM
amyk requested review of this revision.May 11 2023, 7:26 AM
amyk updated this revision to Diff 521739.May 13 2023, 10:28 PM

Rebase patch.

amyk updated this revision to Diff 525155.May 24 2023, 6:38 AM

Update this patch with a simplified way of generating the X-Form instructions, leveraging some existing infrastructure in place.

I am planning on updating this patch once again to fix a potential issue.

amyk updated this revision to Diff 526052.May 26 2023, 6:49 AM
  • Update patch to produce lhax and lwax when the load is sign-extending.
  • Rebase patch.
amyk updated this revision to Diff 528836.Jun 6 2023, 6:47 AM

Rebase patch and update comments in the patch.

amyk updated this revision to Diff 530376.Jun 11 2023, 9:03 PM
amyk retitled this revision from [AIX][TLS] Generate optimized 64-bit local-exec access code sequence using X-Form loads/stores to [AIX][TLS] Generate optimized local-exec access code sequence using X-Form loads/stores.
amyk edited the summary of this revision. (Show Details)

Update patch to account for 32-bit implementation in D152669.

amyk updated this revision to Diff 530978.Jun 13 2023, 10:28 AM
  • Rebase patch.
  • Update patch comments.
amyk updated this revision to Diff 532095.Jun 16 2023, 4:51 AM
  • Rebase patch to account for updating D152669.
  • Use the correct PPCISD node (GET_TPOINTER) in canOptimizeTLSDFormToXFormOnAIX().
amyk updated this revision to Diff 532367.Jun 16 2023, 11:03 PM

Rebase patch and simplify relocation test checks.

amyk updated this revision to Diff 532902.Jun 20 2023, 6:52 AM

Patch rebased.

amyk added a comment.Jun 21 2023, 11:13 AM
  • Specify why we are doing this optimization in the description.
  • Check on why these TLS instructions were not previously added.
llvm/lib/Target/PowerPC/P10InstrResources.td
1299 ↗(On Diff #532902)

Check these definitions in the P10InstrResources.td.

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
715
  • Fix this comment.
  • Can this function be general? Include both of the Linux and AIX parts.
744

Instead of explicitly using PPC::X13, is there something like Subtarget.getThreadPointerRegister() that we can use?

amyk edited the summary of this revision. (Show Details)Jun 21 2023, 9:54 PM
amyk updated this revision to Diff 533580.Jun 22 2023, 6:11 AM

Address review comments:

  • Simplify canOptimizeTLSDFormToXForm()
  • Add getThreadPointerRegister() into PPCSubtarget.h
  • Confirmed that adding the new *TLS load/store instructions are appropriate
amyk marked an inline comment as not done.Jun 22 2023, 6:11 AM
amyk added inline comments.
llvm/lib/Target/PowerPC/PPCSubtarget.h
279

Is it appropriate to add this in this patch? Or should this be separate?

amyk added a comment.Jun 22 2023, 7:53 AM

Like in D47382, it is possible that I need to add *TLS_ versions of the instructions that I added. I plan to update this patch to account for this.

amyk added a comment.Jun 22 2023, 11:40 AM

Group code review comments.

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
759–762

These two checks for the offset can be combined.

781

Update this comment for GET_TPOINTER, and just mention it's produced for AIX on 32-bit mode.

792
  • Implement a static helper function, to check if the node of a thread pointer acquisition node (if one of the operands is the thread pointer - either R13, or __get_tpointer).
  • If conditions are satisfied, return true.
  • Otherwise, return false at the end of the static helper function.
854

A ternary condition for isSExt for here and below might be better.

llvm/lib/Target/PowerPC/PPCSubtarget.h
279

Discussed this with the team. It is appropriate to add this here.

llvm/test/CodeGen/PowerPC/aix-tls-le-ldst-double.ll
14

Instead of adding -O0 lines to every test, we can add one file to show the -O0 codegen.
That way, we can remove the -O0 run lines from these existing test cases.

amyk updated this revision to Diff 533997.Jun 23 2023, 9:53 AM

Update patch to:

  • Separate out the new instruction definitions (which have been added in https://reviews.llvm.org/D153645)
  • Update comments
  • Add a new static helper function to check if one of the operands of an ADD_TLS node uses the thread pointer
amyk edited the summary of this revision. (Show Details)Jun 23 2023, 10:00 AM
amyk added inline comments.Jun 23 2023, 10:08 AM
llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
759–762

Done.

781

Done.

792

isThreadPointerAcquisitionNode() is added to account for this.

854

Done.

llvm/test/CodeGen/PowerPC/aix-tls-le-ldst-double.ll
14

Done.

lei added a subscriber: lei.Jun 23 2023, 12:07 PM
lei added inline comments.
llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
740–741
784–787
amyk updated this revision to Diff 534061.Jun 23 2023, 1:33 PM
amyk marked 2 inline comments as done.
  • Rebase patch.
  • Address comments from Lei: combine two if conditions, and return with the isThreadPointerAcquisitionNode() function inside canOptimizeTLSDFormToXForm().
lei added inline comments.Jun 26 2023, 6:25 AM
llvm/test/CodeGen/PowerPC/aix-tls-le-ldst-longlong.ll
63–64

This transformation increased reg pressure with no obvious benefits. Please see if we can bail out for such cases.

amyk updated this revision to Diff 534954.Jun 27 2023, 6:36 AM

Update the patch to only do this optimization if all users of the ADD_TLS node are normal loads with an offset of 0.
This resolves an issue that Lei pointed out with respect to increasing the usage of registers when we have a 64-bit load/store on a 32-bit system - in such a situation, we no longer perform this optimization.

amyk marked an inline comment as done.Jun 27 2023, 6:37 AM
lei accepted this revision as: lei.Jun 29 2023, 6:16 PM

Just a few nits, otherwise LGTM.
Thanks for addressing the issue with increasing the usage of registers when we have a 64-bit load/store on a 32-bit system.

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
788

Since Base is only used on line 788, consider just inlining it there vs defining a new variable.

837

Since Base only used here, can inline and remove declaration.

This revision is now accepted and ready to land.Jun 29 2023, 6:16 PM
amyk marked 2 inline comments as done.Jun 30 2023, 8:59 AM
amyk added inline comments.
llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
788

Actually, Base is also used here below:

SDValue Ops[] = {ST->getValue(), Base.getOperand(0), Base.getOperand(1),
                   Chain};
837

Actually, Base is also used here below, right?

SDValue Ops[] = {Base.getOperand(0), Base.getOperand(1), Chain};
This revision was landed with ongoing or failed builds.Jul 6 2023, 5:57 AM
This revision was automatically updated to reflect the committed changes.
amyk marked 2 inline comments as done.