This is an archive of the discontinued LLVM Phabricator instance.

[LLD][COFF] Improve symbol table info for import thunk
ClosedPublic

Authored by alvinhochun on Sep 19 2022, 1:42 AM.

Details

Summary

Import thunks themselves contain a jump or branch, which is code by
nature. Therefore the import thunk symbol should be marked as function
type in the symbol table to help with debugging.

The __imp_ import symbol associated to the import thunk is also useful
for debugging. However, when the import symbol isn't directly referenced
outside of the import thunk, it doesn't normally get added to the symbol
table. This change teaches LLD to add the import symbol explicitly.

Diff Detail

Event Timeline

alvinhochun created this revision.Sep 19 2022, 1:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 19 2022, 1:42 AM
alvinhochun requested review of this revision.Sep 19 2022, 1:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 19 2022, 1:42 AM
alvinhochun edited the summary of this revision. (Show Details)Sep 19 2022, 1:51 AM

Import thunks themselves contain a jump or branch, which is code by nature. Therefore the import thunk symbol should be marked as function type in the symbol table to help with debugging.

Totally agree about this.

The __imp_ import symbol associated to the import thunk is also useful for debugging. However, when the import symbol isn't directly referenced outside of the import thunk, it doesn't normally get added to the symbol table. This change teaches LLD to add the import symbol explicitly.

My initial reaction here is that I wonder when you'd want to look at these symbols in "normal" application level debugging. But on the other hand, I presume we're already inconsistently adding some __imp_ symbols to the symbol table, but not all of them, so maybe it's just for the best to add all of them. Additionally, if it makes things more consistent, and is useful for lower level debugging (like when figuring out which part of the toolchain is misbehaving), then it's probably good to add. So yeah, +1 from me to this too.

lld/test/COFF/symtab.test
25

If the actual value isn't very important for the test, one can also just strip this down to CHECK-NEXT: Value:, leaving out the value itself.

My initial reaction here is that I wonder when you'd want to look at these symbols in "normal" application level debugging. But on the other hand, I presume we're already inconsistently adding some __imp_ symbols to the symbol table, but not all of them, so maybe it's just for the best to add all of them. Additionally, if it makes things more consistent, and is useful for lower level debugging (like when figuring out which part of the toolchain is misbehaving), then it's probably good to add. So yeah, +1 from me to this too.

It was more for the sake of completeness, and also because ld.bfd seems to do the same.

The real reason which led me to add this symbol is: When I was using LLDB to inspect binaries without the import thunk being marked as functions, calls into the import thunk do not have any comments, and disassembling the import thunk shows a bogus label with symbol+offset (from another import symbol). Although marking the import thunk symbol as function already solves the biggest issue, it just annoys me knowing that LLDB would still show the wrong label for the jump inside the import thunk when not having the __imp_ symbol. (There is another way to fix this, which is to have LLDB load symbols from the import table, but that's more complicated.)

lld/test/COFF/symtab.test
25

Right, I agree the value isn't really important.

Changed test to be less specific

mstorsjo accepted this revision.Sep 19 2022, 3:10 AM

My initial reaction here is that I wonder when you'd want to look at these symbols in "normal" application level debugging. But on the other hand, I presume we're already inconsistently adding some __imp_ symbols to the symbol table, but not all of them, so maybe it's just for the best to add all of them. Additionally, if it makes things more consistent, and is useful for lower level debugging (like when figuring out which part of the toolchain is misbehaving), then it's probably good to add. So yeah, +1 from me to this too.

It was more for the sake of completeness, and also because ld.bfd seems to do the same.

The real reason which led me to add this symbol is: When I was using LLDB to inspect binaries without the import thunk being marked as functions, calls into the import thunk do not have any comments, and disassembling the import thunk shows a bogus label with symbol+offset (from another import symbol). Although marking the import thunk symbol as function already solves the biggest issue, it just annoys me knowing that LLDB would still show the wrong label for the jump inside the import thunk when not having the __imp_ symbol. (There is another way to fix this, which is to have LLDB load symbols from the import table, but that's more complicated.)

Oh, that's a great reason for adding them indeed. Yes, for stepping through things like thunks, having it show the right symbol name can be very valuable.

LGTM, thanks!

This revision is now accepted and ready to land.Sep 19 2022, 3:10 AM
This revision was automatically updated to reflect the committed changes.