This is an archive of the discontinued LLVM Phabricator instance.

[libunwind][PowerPC] Fix saving/restoring VSX registers on LE systems
ClosedPublic

Authored by nemanjai on Nov 7 2022, 4:16 PM.

Details

Reviewers
xingxue
MaskRay
compnerd
mstorsjo
Group Reviewers
Restricted Project
Restricted Project
Restricted Project
Commits
rG372820bf571c: [libunwind][PowerPC] Fix saving/restoring VSX registers on LE systems
Summary

Currently, libunwind just uses stxvd2x/lxvd2x to save/restore VSX registers respectively. This puts the registers in doubleword-reversed order into memory on little endian systems. If both the save and restore are done the same way, this isn't a problem. However if the unwinder is just restoring a callee-saved register, it will restore it in the wrong order (since function prologues save them in the correct order).

This patch fixes the order by:

  • Adding a check for ISA 3.0 if the GLIBC used at build time supports such a check
  • On machines that support ISA 3.0, it uses non-swapping DQ-Form instructions (stxv/lxv)
  • On machines that don't support ISA 3.0 (or on builds with GLIBC that is too old) swaps are added after the loads and before the stores

Diff Detail

Event Timeline

nemanjai created this revision.Nov 7 2022, 4:16 PM
Herald added 1 blocking reviewer(s): Restricted Project. · View Herald TranscriptNov 7 2022, 4:16 PM
Herald added a project: Restricted Project. · View Herald Transcript
nemanjai requested review of this revision.Nov 7 2022, 4:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 7 2022, 4:16 PM
xingxue added a comment.EditedNov 21 2022, 9:13 AM

The compiler generated prologue/epilogue uses instructions stvx/lvx to save/restore vector registers onto/from the stack. On the other hand, libunwind currently uses stxvd2x/lxvd2x to save/restore VSX vector to/from unwinder's context. Since stxvd2x/lxvd2x stores/loads VSX vector as 2 double words into/from memory, these instructions essentially change the byte orders of the contents if the machine is in little-endian mode. As a result, VSX vectors saved by prologues are not correctly represented in the unwinder context. PowerPC ISA 3.0 (Power9) has added instructions stxv/lxv that stores/loads VSX vectors and matches the behavior of stvx/lvx. Unfortunately, since supporting of older PowerPCs such as Power8 and earlier are still needed, we cannot replace stxvd2x/lxvd2x with stxv/lxv yet. Checking if the build system has the GLIBC facility for detecting the CPU architecture, and detecting at run time and going a different path add a lot of complicity to the code base. IMHO, we can simply add instruction xxswapd to before stxvd2x and after lxvd2x for little-endian like in your patch as a workaround for now. xxswapd is not an expensive instruction so it is unlikely to affect the performance in a significant way, noting that saving and restoring context are only executed when raising an exception and jumping to the landing pad. It will be good though to have a comment stating that we should replace them with stxv/lxv once only Power9 and higher are supported for little-endian PowerPC (RHEL 9.0 already supports minimum Power9). Thanks very much for working on this issue!

We probably can adapt test case libcxxabi/test/vendor/ibm/vec_reg_restore.pass.cpp to cover little-endian Linux on Power as well.

I created D139368 to simplify the repetitions with .irp.

libunwind/src/assembly.h
194 ↗(On Diff #473822)

This doesn't work as intended.

.quad __parse_hwcap_and_convert_at_platform without a definition creates an undefined symbol.
There it is un-referenced there won't be a undefined symbol/reference error.

nemanjai added inline comments.Jan 6 2023, 7:51 AM
libunwind/src/assembly.h
194 ↗(On Diff #473822)

I don't follow what you mean here. Under which conditions will this fail to produce an undefined symbol/reference error?

Perhaps it is a moot point since I will be removing this code as per Xing's comment, but I am just curious (especially since GCC also does this for emitting code for __builtin_cpu_is/__builtin_cpu_supports and Clang will follow GCC's implementation).

nemanjai updated this revision to Diff 486902.Jan 6 2023, 9:26 AM

Reduce complexity of the code at the expense of performance on Power9 and newer subtargets. The updated code uses {st|l}xvd2x for saving/restoring with the necessary swaps regardless of the subtarget.
Added a test case to libcxxabi.

Herald added a project: Restricted Project. · View Herald TranscriptJan 6 2023, 9:26 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
xingxue accepted this revision.Jan 9 2023, 12:12 PM

LGTM; thanks!

Could I get someone from libunwind and libc++abi groups to review this?

As of tomorrow, this patch has been sitting in approved state for 1 month. I plan to commit it at this point without approval from anyone in the libc++-abi and libunwind groups.
This bug is blocking a real-world user of the code and I need to commit it and back-port it to the LLVM 16 branch.

mstorsjo accepted this revision.Feb 9 2023, 3:24 AM
mstorsjo added a subscriber: mstorsjo.

The change seems reasonable to me - thus LGTM.

The patch description talks about checks for ISA 3.0, but I don't see any references to that in the patch itself. If this isn't relevant, remember to update the message before pushing.

MaskRay accepted this revision.Feb 9 2023, 2:08 PM
MaskRay added inline comments.
libcxxabi/test/vendor/ibm/vec_reg_restore-le.pass.cpp
41

add blank line

61

blank line after a macro definition

This revision was not accepted when it landed; it landed in state Needs Review.Feb 16 2023, 10:38 AM
This revision was automatically updated to reflect the committed changes.