This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Warn when Mach-O files have overlapping segments
ClosedPublic

Authored by bulbazord on Feb 21 2023, 6:17 PM.

Details

Summary

I recently came across a binary that for some reason had overlapping
sections. When debugging it, LLDB was able to get information about one
of the sections but not the other because SectionLoadList assumes that
each address maps to exactly one section. We have the capability to warn
about this, but it was not turned on.

rdar://105751700

Diff Detail

Event Timeline

bulbazord created this revision.Feb 21 2023, 6:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 21 2023, 6:17 PM
bulbazord requested review of this revision.Feb 21 2023, 6:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 21 2023, 6:17 PM
mib accepted this revision.Feb 21 2023, 6:28 PM

LGTM! May be we should throw an error when loading a binary with overlapping segments, instead of just showing an error.

This revision is now accepted and ready to land.Feb 21 2023, 6:28 PM
jasonmolenda accepted this revision.Feb 22 2023, 1:06 PM

I'm OK with giving this a try. SectionLoadList::SetSectionLoadAddress specifically notes that there are cases were sections overlap in the virtual address space. All of the binaries in the shared cache have a single LINKEDIT segment that is shared, and each binary's mach-o LC_SYMTAB/LC_DYSYMTAB load command will point to different part of that same LINKEDIT. From lldb's perspective, that means every Module in the shared cache has a LINKEDIT Section that overlaps with all the others.

LINKEDIT is special in that nothing in TEXT/DATA refers to anything in it. The dynamic linker (ld.so, dyld) needs to process it on userland processes, and the debugger needs to read it to create a symbol table, but it basically doesn't need to be loaded in memory at all until you make a call to an external binary and the dynamic linker needs to do something. With the Darwin kernel (xnu), it's not loaded in memory at all, it only exists in the original binary file that is processed into the in-memory kernel image.

tl;dr it's fine if LINKEDIT's "overlap" because lldb will never need to take an addr_t and figure out which Section it is located in. (because an addr_t in the LINKEDIT segment of the shared cache would point to EVERY ObjectFile in the shared cache, if it was all reported correctly.)

We may find that enabling this warning fires for some unintended situation that we're not looking at right now, but we can re-evaluate if that turns out to be the case.

I'm OK with giving this a try. SectionLoadList::SetSectionLoadAddress specifically notes that there are cases were sections overlap in the virtual address space. All of the binaries in the shared cache have a single LINKEDIT segment that is shared, and each binary's mach-o LC_SYMTAB/LC_DYSYMTAB load command will point to different part of that same LINKEDIT. From lldb's perspective, that means every Module in the shared cache has a LINKEDIT Section that overlaps with all the others.

LINKEDIT is special in that nothing in TEXT/DATA refers to anything in it. The dynamic linker (ld.so, dyld) needs to process it on userland processes, and the debugger needs to read it to create a symbol table, but it basically doesn't need to be loaded in memory at all until you make a call to an external binary and the dynamic linker needs to do something. With the Darwin kernel (xnu), it's not loaded in memory at all, it only exists in the original binary file that is processed into the in-memory kernel image.

tl;dr it's fine if LINKEDIT's "overlap" because lldb will never need to take an addr_t and figure out which Section it is located in. (because an addr_t in the LINKEDIT segment of the shared cache would point to EVERY ObjectFile in the shared cache, if it was all reported correctly.)

We may find that enabling this warning fires for some unintended situation that we're not looking at right now, but we can re-evaluate if that turns out to be the case.

Out of curiosity, is the scenario you're describing not already handled by DynamicLoaderDarwin::UpdateImageLoadAddress? That method specifically checks to see if a section is __LINKEDIT to figure out if we should be emitting a warning. Should I be doing the same thing here?

tl;dr it's fine if LINKEDIT's "overlap" because lldb will never need to take an addr_t and figure out which Section it is located in. (because an addr_t in the LINKEDIT segment of the shared cache would point to EVERY ObjectFile in the shared cache, if it was all reported correctly.)

We may find that enabling this warning fires for some unintended situation that we're not looking at right now, but we can re-evaluate if that turns out to be the case.

Out of curiosity, is the scenario you're describing not already handled by DynamicLoaderDarwin::UpdateImageLoadAddress? That method specifically checks to see if a section is __LINKEDIT to figure out if we should be emitting a warning. Should I be doing the same thing here?

We don't get any new warnings with your patch when it is run against a macOS etc process using a shared cache; it behaves correctly. The comment in SectionLoadList::SetSectionLoadAddress specifically talks about the LINKEDIT case, saying that the DynamicLoader will do the right thing, as you noted. But it made me try to think through if there might be similar metadata segments that aren't actually loaded in memory, or are inconsequential if they're loaded or not. We may find when some living on time that there are other cases, but I can't think of them right now - I say give the patch a try.

This revision was automatically updated to reflect the committed changes.