This is an archive of the discontinued LLVM Phabricator instance.

[Clang][MC][MachO]Only emits compact-unwind format for "canonical" personality symbols. For the rest, use DWARFs.
ClosedPublic

Authored by oontvoo on Feb 28 2023, 12:23 PM.

Details

Summary

Details: https://github.com/rust-lang/rust/issues/102754

The MachO format uses 2 bits to encode these personality funtions, with 0 reserved for "no-personality".
This means we can only have up to 3 personality. There are already three popular personalities: gxx_personality_v0, gcc_personality_v0, and __objc_personality_v0.
As a result, any system that needs custom-personality will run into a problem (case in point : Rust).

This patch implemented jyknight's proposal to simply force DWARFs for all non-canonical personality functions , ie., __gxx_personality_v0 and __objc_personality_v0 (intentionally left out __gcc_personality_v0 since it's rarely used anyway - there is no point in reserving a spot for it.)

Additionally, added a flag to let users request emitting compact-unwind for non-canonical personality functions if they know it'd be OK for their builds (We have one "free" slot left).

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
oontvoo marked 2 inline comments as done.Apr 26 2023, 1:29 PM

new change:

  • updated tests to set the flag that emits compact-unwind for non-canonical to preserve old testings
  • added new tests that does NOT emit compact-unwind for non-canonical
oontvoo updated this revision to Diff 517681.Apr 27 2023, 12:30 PM

fix build warnings (struct vs class in forward decl)

oontvoo updated this revision to Diff 518710.May 2 2023, 7:06 AM

updated clang test

Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2023, 7:06 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
oontvoo updated this revision to Diff 519216.May 3 2023, 12:47 PM
  • defined equivalent flag in the clang driver

I wonder if we actually need to define a clang frontend flag for this; I suspect nobody will ever want to specify it, since the only non-canonical personality clang will ever generate that changes behavior is the pure-C destructor-only personality, __gnu_personality_v0. Otherwise, it could only arise from assembly or IR.

For a test-case, can just use -mllvm -emit-compact-unwind-non-canonical.

llvm/lib/MC/MCAsmBackend.cpp
119

This could use a comment as to why these two are "canonical" and why "__gcc_personality_v0" is excluded despite being similar.

oontvoo updated this revision to Diff 519849.May 5 2023, 7:26 AM

updated more clang test

oontvoo marked an inline comment as done.May 5 2023, 10:41 AM

I wonder if we actually need to define a clang frontend flag for this; I suspect nobody will ever want to specify it, since the only non-canonical personality clang will ever generate that changes behavior is the pure-C destructor-only personality, __gnu_personality_v0. Otherwise, it could only arise from assembly or IR.

For a test-case, can just use -mllvm -emit-compact-unwind-non-canonical.

Per offline discussion, passing it via -mllvm doesn't work because llvm-flags that populate MCTargetOptions flags aren't registered in Clang, so it needs to be at least a clang cc1 option.

lld/MachO/UnwindInfoSection.cpp
361

yes - good catch

oontvoo updated this revision to Diff 519939.May 5 2023, 12:00 PM

addressed review comments:

  • added explanation on why we left out _gcc personality
  • updated more clang tests

hopefully should be all green now!

Hi all! The CI builds are all clean now. Any further comment/review on this?
Would love to close this out soon. Thanks! :)

oontvoo retitled this revision from [RFC][MC][MachO]Only emits compact-unwind format for "canonical" personality symbols. For the rest, use DWARFs. to [Clang][MC][MachO]Only emits compact-unwind format for "canonical" personality symbols. For the rest, use DWARFs..May 9 2023, 11:35 AM
oontvoo edited the summary of this revision. (Show Details)

@int3 @thakis, et al - friendly 🔔 :)

int3 accepted this revision.May 17 2023, 10:00 AM

lgtm

lld/MachO/UnwindInfoSection.cpp
346–347

looks like the old comment can be retained

This revision is now accepted and ready to land.May 17 2023, 10:00 AM
oontvoo updated this revision to Diff 523147.May 17 2023, 1:12 PM
oontvoo marked an inline comment as done.
oontvoo edited the summary of this revision. (Show Details)

updated comment

oontvoo updated this revision to Diff 523354.May 18 2023, 6:12 AM

rebase again (windows failures in flang tests looked unrelated)

This makes lld crash when linking a libc++ with object files built by a clang that also contains the patch. Here's a reduced (to just libc++), stand-alone repro that repros the crash in 6 seconds on my m1 mbp: https://bugs.chromium.org/p/chromium/issues/detail?id=1446967#c10

Reverted in 4980eead4d0b4666d53dad07afb091375b3a13a0 for now.

(The assert only happens when targeting x86_64, not arm64. The repro script above has the right -arch flag to toggle it.)

oontvoo reopened this revision.May 31 2023, 11:28 AM

Reverted in 4980eead4d0b4666d53dad07afb091375b3a13a0 for now.

(The assert only happens when targeting x86_64, not arm64. The repro script above has the right -arch flag to toggle it.)

Thanks for the revert! This was caused by another bug (should be fixed in https://reviews.llvm.org/D151824)
Will reland this once that is landed.

llvm/lib/MC/MCAsmBackend.cpp
125

need an additional _

This revision is now accepted and ready to land.May 31 2023, 11:28 AM
oontvoo updated this revision to Diff 528965.Jun 6 2023, 11:44 AM

rebase for rolling forward. (Will wait ~1 day before pushing to avoid possible stacking reverts because this depends on D151824)

oontvoo closed this revision.Jun 7 2023, 7:07 AM

I believe this is breaking the LLDB tests:
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/56106/changes
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/buildTimeTrend

Build 56105 was passing, and the only relevant change seems to come from this review.

Note that the failure is due to a test that is now passing:

Unexpectedly Passed Tests (1):
  lldb-shell :: Unwind/prefer-debug-over-eh-frame.test

Maybe we forgot to re-enable that test?

I believe this is breaking the LLDB tests:
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/56106/changes
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/buildTimeTrend

Build 56105 was passing, and the only relevant change seems to come from this review.

Note that the failure is due to a test that is now passing:

Unexpectedly Passed Tests (1):
  lldb-shell :: Unwind/prefer-debug-over-eh-frame.test

Maybe we forgot to re-enable that test?

The behaviour has changed - I can update the test.
Question: Do you prefer to have it set a flag that goes back to the old behaviour? OR change to to expect-passing?

(My vote would be on have it expect the test to pass because, from reading the test, this is preferring debug-frame over eh-frame, which probably isn't ideal in the new code beacuse all non-canonical personality symbols will now be eh-frame)

I don't think this handles the no-personality case properly. For example this code leads to a DWARF entry now:

void bar(int *) noexcept;
void foo() {
  int arr;
  bar(&arr);
}
Michael137 added a subscriber: Michael137.EditedJun 9 2023, 4:12 AM

Looks like the latest reland of this patch (e60b30d5e3878e7d91f8872ec4c4dca00d4a2dfc) broke some debug-info cross-project-tests on the Arm64 macOS buildbots: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake-as/263/execution/node/54/log/

Failed Tests (2):
  cross-project-tests :: debuginfo-tests/dexter-tests/optnone-fastmath.cpp
  cross-project-tests :: debuginfo-tests/dexter-tests/optnone-struct-and-methods.cpp

You can run those tests by adding cross-project-tests to LLVM_ENABLE_PROJECTS and running ninja check-debuginfo.

AFAICT, it also broke following LLDB test (https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake-as/263/execution/node/65/log/):

Failed Tests (1):
  lldb-api :: functionalities/step-avoids-no-debug/TestStepNoDebug.py

Let me know if you need help reproducing this

I don't think this handles the no-personality case properly. For example this code leads to a DWARF entry now:

void bar(int *) noexcept;
void foo() {
  int arr;
  bar(&arr);
}

Thanks! Sent the fix in D152540

Looks like the latest reland of this patch (e60b30d5e3878e7d91f8872ec4c4dca00d4a2dfc) broke some debug-info cross-project-tests on the Arm64 macOS buildbots: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake-as/263/execution/node/54/log/

Failed Tests (2):
  cross-project-tests :: debuginfo-tests/dexter-tests/optnone-fastmath.cpp
  cross-project-tests :: debuginfo-tests/dexter-tests/optnone-struct-and-methods.cpp

You can run those tests by adding cross-project-tests to LLVM_ENABLE_PROJECTS and running ninja check-debuginfo.

AFAICT, it also broke following LLDB test (https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake-as/263/execution/node/65/log/):

Failed Tests (1):
  lldb-api :: functionalities/step-avoids-no-debug/TestStepNoDebug.py

Let me know if you need help reproducing this

Mind taking a look or reverting the patch until the tests pass?

Looks like the latest reland of this patch (e60b30d5e3878e7d91f8872ec4c4dca00d4a2dfc) broke some debug-info cross-project-tests on the Arm64 macOS buildbots: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake-as/263/execution/node/54/log/

Failed Tests (2):
  cross-project-tests :: debuginfo-tests/dexter-tests/optnone-fastmath.cpp
  cross-project-tests :: debuginfo-tests/dexter-tests/optnone-struct-and-methods.cpp

You can run those tests by adding cross-project-tests to LLVM_ENABLE_PROJECTS and running ninja check-debuginfo.

AFAICT, it also broke following LLDB test (https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake-as/263/execution/node/65/log/):

Failed Tests (1):
  lldb-api :: functionalities/step-avoids-no-debug/TestStepNoDebug.py

Let me know if you need help reproducing this

Mind taking a look or reverting the patch until the tests pass?

Yes - having a look now - wasn't able to repro the test failure before because the build broke at HEAD:

/mnt/ssd/repo/llvm-project/llvm/include/llvm/CodeGen/RDFRegisters.h: In member function ‘constexpr size_t llvm::rdf::RegisterRef::hash() const’:
/mnt/ssd/repo/llvm-project/llvm/include/llvm/CodeGen/RDFRegisters.h:92:35: error: call to non-‘constexpr’ function ‘std::size_t std::hash<unsigned int>::operator()(unsigned int) const’
   92 |     return std::hash<RegisterId>{}(Reg) ^
      |            ~~~~~~~~~~~~~~~~~~~~~~~^~~~~

P.S This might have the same root-cause with the previous comment and could be fixed by D152540 as well.
(not able to repro the failures yet - don't have arm64 readily available, will test it later this evening. but on linux x86-64 and macos x86-64, ninja check-debuginfo passed for me)

Michael137 added a comment.EditedJun 12 2023, 9:36 AM

P.S This might have the same root-cause with the previous comment and could be fixed by D152540 as well.
(not able to repro the failures yet - don't have arm64 readily available, will test it later this evening. but on linux x86-64 and macos x86-64, ninja check-debuginfo passed for me)

FYI, still failing with D152540. I'm able to repro this on my arm64 machine.

The repro without running dexter is as follows:

./bin/clang++ cross-project-tests/debuginfo-tests/dexter-tests/optnone-struct-and-methods.cpp -g -O0 -o test.out
./bin/lldb test.out -o "br se -l 90" -o run -o s -o s -o s -o s -o "expr _data.b.other_b"
(lldb) expr _data.b.other_b 
(A::B) $0 = 0x10d50

We would however expect the value to be B_VALUE

The backtrace also looks wonky:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
  * frame #0: 0x0000000100003d18 test.out`(anonymous namespace)::A::getData(this=0x0000000100011910) at optnone-struct-and-methods.cpp:87:5
    frame #1: 0x0000000187185058 dyld`start + 2224

Michael and I looked into this. This simple c++ file is resulting in eh_frame unwind info on aarch64 darwin instead of compact unwind info. The eh_frame instructions don't describe the epilogue, so lldb is stopping on the final RET instruction after it has adjusted $sp back to its original value and the unwind now fails to work in the debugger. That's always the problem with eh_frame/debug_frame from the debugger's point of view - it's only guaranteed to describe the unwind state at throwable locations. gdb lives exclusively off of eh_frame/debug_frame so in practice gcc/clang (at least on intel) describe both the prologue and the epilogue for the debugger's benefit. But that's not what this eh_frame includes on aarch64 darwin, and it breaks the debugger because it's trusting the eh_frame to be accurate at every instruction point.

The emission of eh_frame on aarch64 darwin instead of compact unwind, for this simple codegen, is a bug and cannot remain in the tree unfixed. It will increase binary size and reduce throw performance. It also happens to cause lldb regressions like this because we've never lived off of eh_frame as our primary unwind format on this platform, but that's a separate issue and I'm not going to dig in to finding a way to detect this & ignore the eh_frame on this target.

Looking at the failing test case that Michael has been debugging, none of these methods have a personality, and yet we're getting eh_frame instead of compact unwind. That's the bug.

Michael and I looked into this. This simple c++ file is resulting in eh_frame unwind info on aarch64 darwin instead of compact unwind info. The eh_frame instructions don't describe the epilogue, so lldb is stopping on the final RET instruction after it has adjusted $sp back to its original value and the unwind now fails to work in the debugger. That's always the problem with eh_frame/debug_frame from the debugger's point of view - it's only guaranteed to describe the unwind state at throwable locations. gdb lives exclusively off of eh_frame/debug_frame so in practice gcc/clang (at least on intel) describe both the prologue and the epilogue for the debugger's benefit. But that's not what this eh_frame includes on aarch64 darwin, and it breaks the debugger because it's trusting the eh_frame to be accurate at every instruction point.

The emission of eh_frame on aarch64 darwin instead of compact unwind, for this simple codegen, is a bug and cannot remain in the tree unfixed. It will increase binary size and reduce throw performance. It also happens to cause lldb regressions like this because we've never lived off of eh_frame as our primary unwind format on this platform, but that's a separate issue and I'm not going to dig in to finding a way to detect this & ignore the eh_frame on this target.

Thanks for the details! From your descriptions, I agree that this is a bug - the change in this patch is *NOT* intended to change the unwind format when there's no personality symbol - (it should only affect custom personality symbols, which is not the case in this example)
Still debugging - not sure why it only occurs for aarch64

Looking at the failing test case that Michael has been debugging, none of these methods have a personality, and yet we're getting eh_frame instead of compact unwind. That's the bug.

Michael and I looked into this. This simple c++ file is resulting in eh_frame unwind info on aarch64 darwin instead of compact unwind info.

P.S: still no luck reproducing this with TOT clang. Would you mind verifying my test case below (this was on M1 mac):

### clang is built with commit up to 5b1c62c0f2e9a739707429f650cb897c067f86c2
vyng-macbookpro3 /Users/vyng/repo/llvm-project/build_all$ ./bin/clang++  ../cross-project-tests/debuginfo-tests/dexter-tests/optnone-struct-and-methods.cpp -g -O0 -c -o test_tot.o

vyng-macbookpro3 /Users/vyng/repo/llvm-project/build_all$ objdump --macho --unwind-info --dwarf=frames  test_tot.o
Contents of __compact_unwind section:
  Entry at offset 0x0:
    start:                0x0 ltmp0
    length:               0x70
    compact encoding:     0x44000000
    personality function: 0x0 ___gxx_personality_v0
    LSDA:                 0x238 ltmp1
  Entry at offset 0x20:
    start:                0x70 __ZN12_GLOBAL__N_11AC1Ev
    length:               0x2c
    compact encoding:     0x04000000
  Entry at offset 0x40:
    start:                0x9c __ZN12_GLOBAL__N_11A7getDataEv
    length:               0x78
    compact encoding:     0x04000000
  Entry at offset 0x60:
    start:                0x114 __ZN12_GLOBAL__N_11AD1Ev
    length:               0x2c
    compact encoding:     0x04000000
  Entry at offset 0x80:
    start:                0x140 __ZN12_GLOBAL__N_11AC2Ev
    length:               0x2c
    compact encoding:     0x02001000
  Entry at offset 0xa0:
    start:                0x16c __ZN12_GLOBAL__N_11A12setOtherDataEv
    length:               0x74
    compact encoding:     0x04000000
  Entry at offset 0xc0:
    start:                0x1e0 __ZN12_GLOBAL__N_11A12getOtherDataEv
    length:               0x18
    compact encoding:     0x02001000
  Entry at offset 0xe0:
    start:                0x1f8 __ZN12_GLOBAL__N_11AD2Ev
    length:               0x40
    compact encoding:     0x04000000

.debug_frame contents:


.eh_frame contents:
<<<< nothing in eh_frame section, which is expected and matches what produced by base clang

Michael and I looked into this. This simple c++ file is resulting in eh_frame unwind info on aarch64 darwin instead of compact unwind info.

P.S: still no luck reproducing this with TOT clang. Would you mind verifying my test case below (this was on M1 mac):

### clang is built with commit up to 5b1c62c0f2e9a739707429f650cb897c067f86c2
vyng-macbookpro3 /Users/vyng/repo/llvm-project/build_all$ ./bin/clang++  ../cross-project-tests/debuginfo-tests/dexter-tests/optnone-struct-and-methods.cpp -g -O0 -c -o test_tot.o

vyng-macbookpro3 /Users/vyng/repo/llvm-project/build_all$ objdump --macho --unwind-info --dwarf=frames  test_tot.o
Contents of __compact_unwind section:
  Entry at offset 0x0:
    start:                0x0 ltmp0
    length:               0x70
    compact encoding:     0x44000000
    personality function: 0x0 ___gxx_personality_v0
    LSDA:                 0x238 ltmp1
  Entry at offset 0x20:
    start:                0x70 __ZN12_GLOBAL__N_11AC1Ev
    length:               0x2c
    compact encoding:     0x04000000
  Entry at offset 0x40:
    start:                0x9c __ZN12_GLOBAL__N_11A7getDataEv
    length:               0x78
    compact encoding:     0x04000000
  Entry at offset 0x60:
    start:                0x114 __ZN12_GLOBAL__N_11AD1Ev
    length:               0x2c
    compact encoding:     0x04000000
  Entry at offset 0x80:
    start:                0x140 __ZN12_GLOBAL__N_11AC2Ev
    length:               0x2c
    compact encoding:     0x02001000
  Entry at offset 0xa0:
    start:                0x16c __ZN12_GLOBAL__N_11A12setOtherDataEv
    length:               0x74
    compact encoding:     0x04000000
  Entry at offset 0xc0:
    start:                0x1e0 __ZN12_GLOBAL__N_11A12getOtherDataEv
    length:               0x18
    compact encoding:     0x02001000
  Entry at offset 0xe0:
    start:                0x1f8 __ZN12_GLOBAL__N_11AD2Ev
    length:               0x40
    compact encoding:     0x04000000

.debug_frame contents:


.eh_frame contents:
<<<< nothing in eh_frame section, which is expected and matches what produced by base clang

Sorry for the back-and-forth
Looks like D152540 did indeed resolve this. The bots apparently didn't pick that change up until today. And locally I must've been producing the binary without that commit

It's all good now, thanks!