This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Initial support for EH Frames
ClosedPublic

Authored by int3 on Apr 8 2022, 7:27 PM.

Details

Summary

Background

llvm-mc generates unwind info in both compact unwind and DWARF
formats. LLD already handles the compact unwind format; this diff gets
us close to handling the DWARF format properly.

Caveats

It's not quite done yet, but I figure it's worth getting this reviewed
and landed first as it's shaping up to be a fairly large code change.

Known limitations of the current code:

Since the feature is not ready for real use yet, I've gated it behind a
flag that only gets toggled on during test suite runs. With most of the
new code disabled, we see just a hint of perf regression, so I don't
think it'd be remiss to land this as-is:

           base           diff           difference (95% CI)
sys_time   1.926 ± 0.168  1.979 ± 0.117  [  -1.2% ..   +6.6%]
user_time  3.590 ± 0.033  3.606 ± 0.028  [  +0.0% ..   +0.9%]
wall_time  7.104 ± 0.184  7.179 ± 0.151  [  -0.2% ..   +2.3%]
samples    30             31

Design

Like compact unwind entries, EH frames are also represented as regular
ConcatInputSections that get pointed to via Defined::unwindEntry. This
allows them to be handled generically by e.g. the MarkLive and ICF
code. (But note that unlike compact unwind subsections, EH frame
subsections do end up in the final binary.)

In order to make EH frames "look like" a regular ConcatInputSection,
some processing is required. First, we need to split the __eh_frame
section along EH frame boundaries rather than along symbol boundaries.
We do this by decoding the length field of each EH frame. Second, the
abs-ified relocations need to be turned into regular Relocs.

Next Steps

In order to support EH frames on ARM targets, we will either have to
teach LLD how to handle EH frames with explicit relocs, or we can try to
make llvm-mc emit abs-ified relocs for ARM as well. I'm hoping to do
the latter as I think it will make the LLD implementation both simpler
and faster to execute.

Misc

The obj-file-with-stabs.s test had to be updated as the previous
version would trip assertion errors in the code. It appears that in our
attempt to produce a minimal YAML test input, we created a file with
invalid EH frame data. I've fixed this by re-generating the YAML and not
doing any hand-pruning of it.

Diff Detail

Event Timeline

int3 created this revision.Apr 8 2022, 7:27 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 8 2022, 7:27 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
int3 requested review of this revision.Apr 8 2022, 7:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2022, 7:27 PM

Is there another EH frame parser in-tree? LLVM or LLDB?

int3 added a comment.Apr 9 2022, 10:03 AM

There is DWARFDataExtractor, but I don't need to handle the full generality of the DWARF format with the perf overhead that entails. LLD-ELF has its own custom DWARF parser too.

int3 updated this revision to Diff 422696.Apr 13 2022, 5:03 PM

fix alignment

int3 updated this revision to Diff 423728.Apr 19 2022, 1:49 PM

more comments

int3 updated this revision to Diff 424002.Apr 20 2022, 12:56 PM
int3 edited the summary of this revision. (Show Details)

remove -dead_strip dependency

Roger added a subscriber: Roger.Apr 21 2022, 2:57 PM

Nice! I noticed that the tests in eh-frame.s only contain failing cases. Are there already success cases tested elsewhere?

lld/MachO/EhFrame.cpp
23

Why is there a copy of *off for errors? Could we not use *off in the case of errors?

63

In the other functions, you use a > comparison. Why do you use an equality check here?

78–79

Is there no need to check that *off + 1 < data.size() (+ 1 to have room for the null symbol)? I think we could potentially segfault if *off is too large?

80

I could go either way, but what do you think about a comment like this just to make things obvious to the reader?

lld/MachO/InputFiles.cpp
1306–1308

I'm a bit surprised why we want to use it++ instead of just it given that we're in a for loop that already increments it for us. Could we have a comment explaining this behavior?

int3 marked 5 inline comments as done.Apr 21 2022, 4:44 PM

I noticed that the tests in eh-frame.s only contain failing cases. Are there already success cases tested elsewhere?

There's eh-frame.s and invalid/eh-frame.s. I'm guessing you only looked at the latter :)

lld/MachO/EhFrame.cpp
23

I thought it'd be a bit clearer to have the error message point at the start of a mis-encoded length field, rather than in the middle of the value.

Also for line 35 below, when saying that a CIE/FDE extends past the end, I think it would make sense to point to the start of the CIE/FDE, rather than in the middle

80

I went with something a bit more terse. Note that man strnlen already says

The strnlen() function returns
either the same result as strlen() or maxlen, whichever is smaller.

but -- I realized belatedly that this entire function is wrong. I was not checking the right maxlen... fixing + adding test

int3 updated this revision to Diff 424328.Apr 21 2022, 4:45 PM
int3 marked 2 inline comments as done.

address comments

Roger accepted this revision.Apr 22 2022, 11:18 AM
This revision is now accepted and ready to land.Apr 22 2022, 11:18 AM
int3 added a subscriber: oontvoo.Apr 22 2022, 1:01 PM
int3 added inline comments.
lld/MachO/InputFiles.cpp
1191

it appears that this cast is asserting when running on obj-file-with-stabs.yaml. Unfortunately it's a bit hard to debug when we only have the yaml without the source that generated it originally. @oontvoo do you happen to have a link to the original source of that file?

oontvoo added inline comments.Apr 26 2022, 7:46 AM
lld/MachO/InputFiles.cpp
1191

Yes - this was produced with the following:

echo "int main() {return 1;}" > test.cc
clang -c -g -o test.o test.c
ld -r -o test2.o test.o;

The trick is ld -r which kept the debug stuff (stabs) in the output .o file.

int3 updated this revision to Diff 425321.Apr 26 2022, 3:23 PM
int3 edited the summary of this revision. (Show Details)

fix obj-file-with-stabs.s test

This revision was landed with ongoing or failed builds.Jun 8 2022, 8:41 PM
This revision was automatically updated to reflect the committed changes.
thakis added a comment.Jun 9 2022, 4:23 AM

This breaks tests everywhere where assertions are enabled: http://45.33.8.238/linux/78091/step_11.txt

dyung added a subscriber: dyung.Jun 9 2022, 5:23 AM

This breaks tests everywhere: http://45.33.8.238/linux/78091/step_11.txt

I believe the follow-up commit 977d62c33e3343a394777c1754682761eebb66cd is the one ultimately causing the current problems, but this commit does seem to cause at least one problem on its own:

int3 added a comment.Jun 9 2022, 5:44 AM

Ugh, sorry about that

kstoimenov added a subscriber: kstoimenov.EditedJun 13 2022, 1:16 PM

Looks like this might be causing some msan bots to fail:

https://lab.llvm.org/buildbot/#/builders/74/builds/11446/steps/11/logs/stdio

******************** TEST 'lld :: MachO/icf-safe.ll' FAILED ********************
.........
Command Output (stderr):
--
ld64.lld: error: invalid FDE relocation in __eh_frame
DavidSpickett added inline comments.
lld/test/MachO/eh-frame.s
57

Hi! This is very late report but as you'll see I don't think your patch is wrong, perhaps llvm-objdump is doing strange things.

This test is failing on our 32 bit armv8 bot: https://lab.llvm.org/buildbot/#/builders/178/builds/2463

When I run the first objdump command from my AArch64 build I get:

<       [0]: function offset=0x00000508, encoding[1]=0x52010000
<       [1]: function offset=0x0000050c, encoding[0]=0x00000000
<       [2]: function offset=0x00000510, encoding[3]=0x14000020
<       [3]: function offset=0x00000514, encoding[2]=0x24000068
<       [4]: function offset=0x00000518, encoding[0]=0x00000000

When I do the same from a 32 bit Arm build I get:

>       [0]: function offset=0x00000508, encoding=0x52010000
>       [1]: function offset=0x0000050c, encoding=0x00000000
>       [2]: function offset=0x00000510, encoding=0x14000020
>       [3]: function offset=0x00000514, encoding=0x24000068
>       [4]: function offset=0x00000518, encoding=0x00000000

The missing [ ] after encoding is what causes lit to fail to match.

Now it looks to me like the [0] after encoding is actually a bug on 64 bit but I'm totally guessing here. Just going by the seemingly random order of the numbering.

Do you have any idea if those are actually expected to be there?

57

(I see the same [ ] checked for in lld/test/MachO/compact-unwind-stack-ind.s, in the same Second level index[0] section)

DavidSpickett added inline comments.Jul 8 2022, 3:34 AM
lld/test/MachO/eh-frame.s
57

As usual as soon as I write it out I see what it means. The number is the encoding number that's why the two 0x0..0 are labelled with [0]. I see there are 2 functions over in objdump one for compressed (which has these []) and regular, I'll investigate those for any 32 bit strangeness.

So your patch is fine sorry for the noise!

Turns out the fix for our issue was in lld I've put it up for review https://reviews.llvm.org/D129363.