This is an archive of the discontinued LLVM Phabricator instance.

[DWARFVerifier] Allow ObjectiveC names in dwarf_debug tables
ClosedPublic

Authored by fdeazeve on Sep 6 2023, 1:21 PM.

Details

Summary

ObjectiveC has its own extra accelerator table entries that are helpful for the
debugger. This patch relaxes the DWARFVerifier so that it accepts those in DWARF
5's debug_names.

Diff Detail

Event Timeline

fdeazeve created this revision.Sep 6 2023, 1:21 PM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: hiraditya. · View Herald Transcript
fdeazeve requested review of this revision.Sep 6 2023, 1:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 6 2023, 1:21 PM
aprantl accepted this revision.Sep 6 2023, 3:18 PM
aprantl added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
1366

I'm a little confused by this comment. "first" as opposed to when else?

This revision is now accepted and ready to land.Sep 6 2023, 3:18 PM
fdeazeve added inline comments.Sep 6 2023, 3:39 PM
llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
1366

As opposed to doing an Result.emplace_back(StrippedName), which would first ensure the vector has enough space, and then construct the std::string in-place from the (now possibly invalid) StringRef.

This is extremely subtle, but I will try to reword the comment

Just to register in case there are any concerns with performance here, I built Clang in Debug, generated a dsym with dsymutil build_Debug/bin/clang --accelerator Dwarf and measured the time it takes to verify it: build_Release/bin/llvm-dwarfdump --verify build_Debug/bin/clang.dSYM

Baseline:
  Time (mean ± σ):     51.624 s ±  0.772 s    [User: 50.222 s, System: 1.381 s]
  Range (min … max):   50.867 s … 53.138 s    10 runs
With Patch:
  Time (mean ± σ):     50.941 s ±  0.090 s    [User: 49.561 s, System: 1.342 s]
  Range (min … max):   50.808 s … 51.075 s    10 runs
fdeazeve added a comment.EditedSep 7 2023, 9:26 AM

This is exposing an issue with lldb/test/API/lang/objc/modules-objc-property/TestModulesObjCProperty.py.
This test compiles an ObjectiveC program and runs dsymutil on the binary.
By default, dsymutil verifies the output only if the input is valid.
With this patch, the input is now considered valid. So dsymutil verifies the output, which turns out to be invalid (for other reasons?)
If I don't apply this patch and force dsymutil to verify the output of that test, it also says the output is invalid.

build_Release/./bin/dsymutil"  -o "a.out.dSYM" "a.out"
Verifying .debug_abbrev...
Verifying .debug_info Unit Header Chain...
Verifying .debug_types Unit Header Chain...
Verifying non-dwo Units...
Verifying unit: 1 / 6, "_Builtin_stddef_max_align_t"
Verifying unit: 2 / 6, "Darwin"
Verifying unit: 3 / 6, "MachO"
Verifying unit: 4 / 6, "ObjectiveC"
Verifying unit: 5 / 6, "myModule"
Verifying unit: 6 / 6, "/Users/piovezan/workspace/llvm-project/worktrees/src1/lldb/test/API/lang/objc/modules-objc-property/main.m"
Verifying dwo Units...
Verifying .debug_line...
warning: .debug_line[0x00003b18].prologue.file_names[2] is a duplicate of file_names[0]
Verifying .debug_str_offsets...
error: .debug_str_offsets: contribution 0x0: index 0x0: invalid string offset *0x0 == 0x5C, is neither zero nor immediately following a null character
error: .debug_str_offsets: contribution 0x0: index 0x1: invalid string offset *0x4 == 0x5, is neither zero nor immediately following a null character
Verifying .debug_names...
error: output verification failed for arm64
make: *** [a.out.dSYM] Error 1

Since this is DWARF5/ObjectiveC only, and since it resolves more failures than it creates, I'll go ahead and merge it.

This revision was landed with ongoing or failed builds.Sep 7 2023, 11:27 AM
This revision was automatically updated to reflect the committed changes.
dblaikie added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
1367

Technically, that'd be the behavior of this code even if it used * instead of ->str(), right? Since push_back takes a T&& parameter the StringRef->std::string implicit conversion would happen first, before the function call.

So I'm not sure the use of ->str() is sufficiently more explicit to make it clear what's going on here. Maybe the comment is sufficient and we could use *, or maybe it isn't, and a named temporary (std::string X = *y; func(std::move(X);) might be better, or an explicit cast (I guess that's no better than the ->str())?

I'm not feeling strongly about any of this - if the comment and the slightly different syntax feels like the right tradeoff to you folks, I won't stand in the way.

fdeazeve added inline comments.Sep 18 2023, 3:55 AM
llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
1367

Oh, you are right; I could have done just * instead of ->str(), I totally missed that! I can probably simplify the comment a bit with that in mind.

I think the only important thing we emphasize with the comment is that we need to push instead of emplacing the StringRef, as this is a trap a well-intentioned developer could fall into during future code changes.

Personally I don't think the temporary would improve things a lot, as in the future someone could think "oh, let me remove this temporary that is just used once", not realizing it was done because of this subtle problem.

dblaikie added inline comments.Sep 18 2023, 11:47 AM
llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
1367

Fair enough - probably just leave/rephrase the comment (to try to reduce @aprantl's confusion) and replace the ->str() with *.