This is an archive of the discontinued LLVM Phabricator instance.

[llvm-dwp] Skip type unit debug info sections
ClosedPublic

Authored by kimanh on May 11 2021, 11:59 PM.

Details

Summary

This patch makes llvm-dwp skip debug info sections that may not be encoding a compile unit.
In DWARF5, debug info sections are also used for type units. As in preparation to support type units,
make llvm-dwp aware of other uses of debug info sections but skip them for now.

The patch first records all .debug_info sections, then goes through them one by one and records
the cu debug info section for writing the index unit, and copies that section to the final dwp output
info section. If it's not a compile unit, skip.

Diff Detail

Event Timeline

kimanh created this revision.May 11 2021, 11:59 PM
kimanh updated this revision to Diff 344694.May 12 2021, 12:01 AM

Upload whole range of commits.

kimanh updated this revision to Diff 344695.May 12 2021, 12:03 AM

Another try uploading whole range.

kimanh updated this revision to Diff 344701.May 12 2021, 12:12 AM

adapt comment

kimanh edited the summary of this revision. (Show Details)May 12 2021, 12:29 AM
kimanh updated this revision to Diff 344759.May 12 2021, 4:10 AM

Minor: add comment to unit type.

kimanh updated this revision to Diff 344761.May 12 2021, 4:14 AM

Minor: another comment.

kimanh added a subscriber: pfaffe.
kimanh published this revision for review.May 12 2021, 4:20 AM

Hello Dave,

I'm splitting up the CL in https://reviews.llvm.org/D101818 into a chain of 3 CLs, this is the first one. I'll add more info on the original comment that you made, ptal!

Herald added a project: Restricted Project. · View Herald TranscriptMay 12 2021, 4:20 AM

Maybe a bunch of the complexity in this patch comes from supporting multiple .debug_info.dwo sections in a .dwo file - could you check if that's needed? Could we fix that issue in LLVM instead of supporting it in llvm-dwp.

llvm/tools/llvm-dwp/llvm-dwp.cpp
156–159

Why different error handling here than for the rest of the header field parsing?

I'd have thought the MinHeaderLength check above would be adequate/intended to mean that all the rest of the header fields could be parsed without error checking?

Ah, but you don't know the length anymore at that point - since you have to wait until you've read the unit kind. OK.

But this error checking would be incomplete - since it's not the offset field that would fail to parse - it's the addrSize later on where we'd run out of bytes to read and fail there instead of here.

Test coverage would probably demonstrate this/help motivate changes to make this error checking correct.

603–604

The need to keep multiple info sections on input I think could reasonably be consider to be a bug in LLVM's output. At least as far as I'm concerned - we should produce a single .debug_info section with all the type and compile units in it. I don't think it's worth supporting multiple .debug_info sections in llvm-dwp - we should fix LLVM instead, if possible.

Hmm, now that I look at LLVM's output - I don't see multiple .debug_info.dwo sections. Do you?

Maybe you were testing GCC dwo files? I'm OK not working with those... though I guess I did add functionality to support them to llvm-dwp for type units. I think it's probably better that GCC improve their output (it improves compression too, compared to splitting things into separate sections).

Hmm, I'm going back and forth - we do support multiple .debug_types.dwo sections, so for consistency it probably makes sense to support multiple .debug_info.dwo sections too... ). Thinking...

Hmm, now that I look at LLVM's output - I don't see multiple .debug_info.dwo sections. Do you?

Maybe you were testing GCC dwo files? I'm OK not working with those... though I guess I did add functionality to support them to llvm-dwp for type units. I think it's probably better that GCC improve their output (it improves compression too, compared to splitting things into separate sections).

Hmm, I'm going back and forth - we do support multiple .debug_types.dwo sections, so for consistency it probably makes sense to support multiple .debug_info.dwo sections too... ). Thinking...

I was actually testing with LLVM dwo files (and no, I didn't see multiple .debug_info sections for LLVM). However, the DWARFv5 specification mentions multiple .debug_info sections:

"While a split DWARF object may contain multiple .debug_info.dwo sections, one for the compilation unit, and one for each type unit, a package file contains a single .debug_info.dwo section." (F.3, DWARFv5 spec).
On top of that this was also supported for the .debug_types section, so I added it for the .debug_info sections too. Looking at GCC, it indeed generates multiple .debug_info sections for the .dwo file.

I'll happily remove the support for multiple sections, if you think that's better for reducing the complexity. For our use case, we won't use GCC output, so we'd be fine not supporting it.

llvm/tools/llvm-dwp/llvm-dwp.cpp
156–159

Yes, exactly, it's because of the unit type that I need to know before knowing about the final length.

That's true (incomplete error checking), will change it!

kimanh updated this revision to Diff 346928.May 20 2021, 11:28 PM

Add test for type unit parsing and changed error checking
for type unit header.

kimanh marked an inline comment as done.May 20 2021, 11:36 PM

Hmm, now that I look at LLVM's output - I don't see multiple .debug_info.dwo sections. Do you?

Maybe you were testing GCC dwo files? I'm OK not working with those... though I guess I did add functionality to support them to llvm-dwp for type units. I think it's probably better that GCC improve their output (it improves compression too, compared to splitting things into separate sections).

Hmm, I'm going back and forth - we do support multiple .debug_types.dwo sections, so for consistency it probably makes sense to support multiple .debug_info.dwo sections too... ). Thinking...

I was actually testing with LLVM dwo files (and no, I didn't see multiple .debug_info sections for LLVM). However, the DWARFv5 specification mentions multiple .debug_info sections:

"While a split DWARF object may contain multiple .debug_info.dwo sections, one for the compilation unit, and one for each type unit, a package file contains a single .debug_info.dwo section." (F.3, DWARFv5 spec).
On top of that this was also supported for the .debug_types section, so I added it for the .debug_info sections too. Looking at GCC, it indeed generates multiple .debug_info sections for the .dwo file.

I'll happily remove the support for multiple sections, if you think that's better for reducing the complexity. For our use case, we won't use GCC output, so we'd be fine not supporting it.

Nah, no worries - you're right, we should support it since the spec says so and we do support the GCC behavior already for v4. Thanks for the refresher!

llvm/test/tools/llvm-dwp/X86/wrong-unit-type-info-v4.s
11 ↗(On Diff #344761)

Update this comment to correct the tag number and name?

llvm/tools/llvm-dwp/llvm-dwp.cpp
156–159

Looks like this issue is still pending (the error message is still not tested and the issue of where this would fail, etc, is still outstanding & hopefully illuminated by checking this with a test)

(also error messages are usually all lower case without a trailing '.', because they'll be concatenated after "error: " in classicy unix style error message output, eg: "error: type unit is missing a type offset")

271

Does "header" need to be passed by non-const reference? Could it be const ref?

727

Perhaps if CurInfoSection.size() == 1 we could skip all this searching and assume the info section is in that one section somewhere? (this'd work better for LLVM's output that only uses one section & I think it puts the types first, so this search would always be maximally suboptimal - having to search through all the units in LLVM's output and finding the CU on the very last one)

Oh, I guess it can't end early - because this has necessary side effects of finding the actual CU? OK then.

739–741

Looks like this has side effects (changing the values in the Entry? oh, but only a local copy of it? but that copy gets put in the IndexEntries map... )

Which is necessary to prune down the range for the CU index to only describe the CU itself?

767–769

Usually we skip braces on single statement blocks in LLVM.

779–781

Test coverage?

kimanh updated this revision to Diff 347694.May 25 2021, 8:44 AM
kimanh marked 4 inline comments as done.

Addressed comments:

  • Fixing tag number and name for one test
  • Remove braces for one-liner if case

Thanks again for the reviews! One test request is open (with a question on how to best write the test), I think otherwise I addressed/answered the questions.

llvm/tools/llvm-dwp/llvm-dwp.cpp
156–159

I've actually changed the code, and added a test case, see invalid_tu_header_length.s (but that was on Friday, so maybe you looked at a previous diff?). However, the dot is still there, so I'll remove the dot. Or am I missing something, maybe?

271

We are actually changing the header in this function (by setting the signature in ll.316 (for DWARFv4, the signature is in the abbrev section).

727

Yes, we need to search for it to properly prune the section offsets as you noted below.

739–741

Yes, we're generating a copy of it (I basically imitated the behavior in 'addAllTypes´: create a copy and update the .debug_info offsets for this particular entry).

Yes, for pruning down to only describe the CU itself and later for the TU itself.
Looking at the DWARF5 document, Appendix F10 and F11, the CU and TU indices for the debug_info section do not overlap (for example). But please correct me if I'm misundrestanding something!

779–781

I tried to add a *.s test for this case but I'm not entirely sure how to create one properly. If I add multiple .debug_info sections, after running llvm-mc there will only be one.

Is there a way to create a test with multiple .debug_info sections but one index than actually writing the byte code?

dblaikie added inline comments.May 25 2021, 2:40 PM
llvm/tools/llvm-dwp/llvm-dwp.cpp
779–781

I think the way this happens with GCC is that it produces a single .o file that has the usual comdat groups for type units - then it runs objcopy to split out the .dwo sections, and objcopy doesn't propagate the comdat groups but it does keep the separate sections.

Probably the easiest thing to do might be to have an input file use a comdat group? (try compiling without Split DWARF and check the assembly - you'll see the .debug_info section directive for the type unit will have something like .section .debug_types.dwo,"G",@progbits,wt.b0cf918beae249c9,comdat - so maybe you can do something like that - the consumer (llvm-dwp) won't care that it happens to be in a section group, I shouldn't think - also, maybe use an obvious made-up value for the hash (like 0xFDFDFDFDFD or whatever))

kimanh updated this revision to Diff 348883.May 31 2021, 11:00 PM

Add missing test on error reporting multiple debug_sections in a dwp file.

kimanh marked 2 inline comments as done.May 31 2021, 11:08 PM
kimanh added inline comments.
llvm/tools/llvm-dwp/llvm-dwp.cpp
779–781

Thanks for the guidance, that worked! I have added a test in the latest revision.

dblaikie accepted this revision.Jun 1 2021, 10:05 AM

Looks good, thanks!

This revision is now accepted and ready to land.Jun 1 2021, 10:05 AM
kimanh updated this revision to Diff 349183.Jun 1 2021, 11:57 PM
kimanh marked an inline comment as done.

Rebase on main to re-trigger pre-merge checks.

kimanh added a comment.Jun 2 2021, 8:54 AM

Thanks a lot for all the guidance and the help with the CLs!

I have rebased the patches and they pass the pre-merge checks. Could you help me to land them?

This revision was automatically updated to reflect the committed changes.