This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Overhaul map file code
ClosedPublic

Authored by int3 on Nov 3 2022, 3:37 PM.

Details

Summary

The previous map file code left out was modeled after LLD-ELF's
implementation. However, ld64's map file differs quite a bit from
LLD-ELF's. I've revamped our map file implementation so it is better
able to emit ld64-style map files.

Notable differences:

  • ld64 doesn't demangle symbols in map files, regardless of whether -demangle is passed. So we don't have to bother with getSymbolStrings().
  • ld64 doesn't emit symbols in cstring sections; it emits just the literal values. Moreover, it emits these literal values regardless of whether they are labeled with a symbol.
  • ld64 emits map file entries for things that are not strictly symbols, such as unwind info, GOT entries, etc. That isn't handled in this diff, but this redesign makes them easy to implement.

Additionally, the previous implementation sorted the symbols so as to
emit them in address order. This was slow and unnecessary -- the symbols
can already be traversed in address order by walking the list of
OutputSections. This brings significant speedups. Here's the numbers
from the chromium_framework_less_dwarf benchmark on my Mac Pro, with the
-map argument added to the response file:

           base            diff           difference (95% CI)
sys_time   2.922 ± 0.059   2.950 ± 0.085  [  -0.7% ..   +2.5%]
user_time  11.464 ± 0.191  8.290 ± 0.123  [ -28.7% ..  -26.7%]
wall_time  11.235 ± 0.175  9.184 ± 0.169  [ -19.3% ..  -17.2%]
samples    16              23

(It's worth noting that map files are written in parallel with the
output binary, but they often took longer to write than the binary
itself.)

Finally, I did further cleanups to the map-file.s test -- there was no
real need to have a custom-named section. There were also alt_entry
symbol declarations that had no corresponding definition. Either way,
neither custom-named sections nor alt_entry symbols trigger special code
paths in our map file implementation.

Diff Detail

Event Timeline

int3 created this revision.Nov 3 2022, 3:37 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptNov 3 2022, 3:37 PM
int3 requested review of this revision.Nov 3 2022, 3:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 3 2022, 3:37 PM
Roger accepted this revision.Nov 8 2022, 1:01 PM
Roger added a subscriber: Roger.

Nice!

lld/MachO/MapFile.cpp
176–178
This revision is now accepted and ready to land.Nov 8 2022, 1:01 PM
This revision was automatically updated to reflect the committed changes.
int3 marked an inline comment as done.Nov 8 2022, 1:34 PM
thakis added a subscriber: thakis.Nov 8 2022, 1:42 PM

Sorry, I missed this!

I didn't look in detail, but from a distance, this looks like a great change.

Does this allow us to undo D115416? I dislike that change a bit – parallelizing things by turning for loops into parallelFor() seems much easier to reason about than manually dealing with futures and the like. (I found _that_ change when I wondered why some for loop somewhere wasn't a parallelFor a while ago – I've since forgotten the details).

int3 added a comment.Nov 8 2022, 2:00 PM

Does this allow us to undo D115416?

Possibly, I'll have to do some benchmarks. However I think it is likely still a performance improvement to do it in parallel

thakis added a comment.Nov 8 2022, 4:23 PM

Does this allow us to undo D115416?

Possibly, I'll have to do some benchmarks. However I think it is likely still a performance improvement to do it in parallel

Sure, but having simpler code makes it easier to parallelize other parts. Doing complicated parallelism too early makes it harder to do parallelize things that would otherwise be easier to parallelize and might get us stuck in a local optimum.

(This is admittedly very hand-wavy!)

omjavaid reopened this revision.Nov 16 2022, 10:35 AM
omjavaid added a subscriber: omjavaid.

This patch series breaks lld:map-file.s on arm v7 linux buildbots. e.g https://lab.llvm.org/buildbot/#/builders/178/builds/3190
Kindly suggest an appropriate fix.

This revision is now accepted and ready to land.Nov 16 2022, 10:35 AM
This revision was automatically updated to reflect the committed changes.
int3 added a comment.Dec 5 2022, 2:21 PM

@omjavaid considering the patch set was in trunk for a week before your comment, reverting after a day seems pretty hasty. This buildbot doesn't even seem to send out emails for failing builds, so I'm not sure how I'm supposed to have noticed that it was failing.

Anyway, I'm skeptical that the problem stems from code changes in this diff and not some bug in the 2-stage bootstrap. Re-landing to see if it still repros.

int3 added a comment.Dec 5 2022, 11:47 PM

Well, looks like the test is still broken. Is there a way to disable it just on that one specific buildbot?

This buildbot doesn't even seem to send out emails for failing builds, so I'm not sure how I'm supposed to have noticed that it was failing.

This is because once a buildbot is red it doesn't send new emails, because the chance of it being the same failure over and over is pretty high. That's why we keep an eye on exactly what is making it red.

So you weren't supposed to notice, that's why we follow up like this :)

Anyway, I'm skeptical that the problem stems from code changes in this diff and not some bug in the 2-stage bootstrap. Re-landing to see if it still repros.

It does - https://lab.llvm.org/buildbot/#/builders/178/builds/3460

It also fails in stage 1 so either we have a bug in the released clang or more likely there's some 32/64 bit type confusion going on. I have seen this before in lld.

I will repro this when I get a chance. You may very well be correct that this change didn't add the issue, it just made it visible.

Well, looks like the test is still broken. Is there a way to disable it just on that one specific buildbot?

Unfortunately not by bot, but disabling the test on arm 32 bit isn't a big deal. Realistically, this is one of the few reliable 32 bit arm bots in any case. I will do that change.

int3 added a comment.Dec 6 2022, 11:07 AM

Ahh, thanks for the fix! I'd forgotten that integer promotion doesn't happen with format strings...

disabling the test on arm 32 bit isn't a big deal

Is it possible to disable tests on a per-host basis, though? From https://github.com/llvm/llvm-project/blob/main/llvm/utils/lit/lit/llvm/config.py#L97 it seems like the target triple is added to the lit features set, but the host triple never is... I could certainly make that addition, but I'm not sure if it was an intentional decision

Is it possible to disable tests on a per-host basis, though?

Literally by hostname, not at this time. Disabling by host triple could be appropriate for some tests, but not if it's some physical aspect of the machine that's the problem. Since there could be other's with the same host triple that do not have that hardware issue.

it seems like the target triple is added to the lit features set, but the host triple never is

There are some sanitizer tests disabled for AArch64 due to suspected atomics issues, but the way they do it is that it requires native and a target triple of AArch64. Which amounts to the same thing.

int3 added a comment.Dec 7 2022, 3:57 AM

Got it, thanks for the explanations!

Does this allow us to undo D115416?

Possibly, I'll have to do some benchmarks. However I think it is likely still a performance improvement to do it in parallel

Sure, but having simpler code makes it easier to parallelize other parts. Doing complicated parallelism too early makes it harder to do parallelize things that would otherwise be easier to parallelize and might get us stuck in a local optimum.

(This is admittedly very hand-wavy!)

@thakis the benchmarks on my end (especially on larger builds) bottleneck on writing map files than the binary writing stage, so this would be a regression on our end if D115416 is reverted. I do think we can find a happy medium where code is more extendable and keep the behavior parallelizing the map file stage. But before making any sudden moves, it would be nice to know what other general parallelization improvements you are thinking of so we can align on the correct path forward.