Page MenuHomePhabricator

[DebugInfo] Simplify DWARFDebugAddr.
ClosedPublic

Authored by ikudrin on Feb 6 2020, 10:48 PM.

Details

Summary

The patch removes unnecessary members of DWARFDebugAddr and further simplifies the implementation by separating parsing methods of tables in the DWARFv5 and pre-standard formats.

Diff Detail

Event Timeline

ikudrin created this revision.Feb 6 2020, 10:48 PM
aprantl added inline comments.Feb 7 2020, 8:44 AM
llvm/include/llvm/DebugInfo/DWARF/DWARFDebugAddr.h
65

Assuming that DWARF 6 doesn't change the format again, this naming scheme will look odd in the future.

How about we call this extractV2() and mention in the comment it is for 2 through 4? I.e., always use the minimum version that introduced the encoding in the name?

dblaikie added inline comments.Feb 7 2020, 11:31 AM
llvm/include/llvm/DebugInfo/DWARF/DWARFDebugAddr.h
65

Agreed - generally the different sections don't rev their versions until their own format changes (eg: debug_line still used v2 even though debug_info was v3 or v4) so I think it makes sense to name them as you're suggesting (with a comment about the valid range - "used up until DWARFv4", "used until at least DWARFv5(current)" or whatever seems good)

ikudrin marked an inline comment as done.Feb 7 2020, 5:32 PM
ikudrin added inline comments.
llvm/include/llvm/DebugInfo/DWARF/DWARFDebugAddr.h
65

If DWARF6 would not change the format of this section, they probably keep the value of the version field, so the function name will still sound accurate as "extract an address table in the format of version 5 (of address table)".

As for "PreV5", the section and its usage were defined only in DWARF5. Before that, there was only a proposal from GCC. So the name of the function should be read as "extract an address table as it was defined (in some other document) before DWARF5".

ikudrin updated this revision to Diff 243368.Feb 8 2020, 7:24 AM
  • Update to follow changes in D74196.
  • TmpLength -> DiagnosticLength.
jhenderson added inline comments.Feb 10 2020, 1:34 AM
llvm/include/llvm/DebugInfo/DWARF/DWARFDebugAddr.h
59–68

Perhaps this could be simplified to "Extract a DWARF V5 address table."

63

the address table in the pre-DWARF5 format -> a pre-DWARF V5 address table

I'd probably then change the next bit from ", which doesn't have a header and consists" to ". Such tables don't have a header and consist..."

65

How about extractPreStandard.

Also, I reckon a link pointing to the pre-standard spec this is using might be good.

llvm/lib/DebugInfo/DWARF/DWARFDebugAddr.cpp
21

Perhaps something for a later change, but the DataExtractor also supports sizes 1 and 2, and the latter at least is currently used by some architectures (possibly incorrectly - see D73961 and D73962).

31

Since your changing this already, I'd find hex sizes easier to read (especially since the data should be a multiple of 4 or 8 usually), so I'd prefer PRIx64.

39

Should this be getRelocatedAddress?

55–82

The debug line code uses getRelocatedValue here. I don't know if it's correct to do so (especially as it doesn't for the DWARF64 case...), but raising it just in case.

Also, as mentioned elsewhere, you could try passing an Error as the second argument to avoid needing to do the isValid... check. (Same goes elsewhere around here).

90

Do we have an unsupported version test case for version 6 (or more specifically, one higher than the current max supported version)?

llvm/test/tools/llvm-dwarfdump/X86/debug_addr_version_mismatch.s
0

The test name no longer really makes sense, as it isn't a mismatch error any more.

dblaikie added inline comments.Feb 10 2020, 10:06 AM
llvm/include/llvm/DebugInfo/DWARF/DWARFDebugAddr.h
59–68

nit: I think we've settled on "DWARFv5" (all one word, lowercase 'v') when naming DWARF specs in LLVM. (I seem to recall a conversation a few years ago - can go find it if it's useful)

llvm/lib/DebugInfo/DWARF/DWARFDebugAddr.cpp
39

Yes - and if someone's feeling fancy, it should probably use the optional section index to add section names/numbers to the dumping at some point. (this'd make it easier to eyeball possible efficiency opportunities in reducing the number of addresses in the address pool (any time more than one address from the same section is in the pool - and thus one could use an offset relative to the first to avoid needing the second), for instance)

I think an observable change (& thus a test that should be added) by using getRelocatedAddress would be that for a platform that stores the addend in the relocation record (not in the relocatable byte range), using getRelocatedAddress will apply the addend, whereas getUnsigned will not show the addend.

55–82

That sounds like a (harmless) bug in the debug line code - I don't think there's any reason the length would need to be relocatable. But I could be wrong (/maybe/ for platforms that do linker relaxation of some kind and use anything like a ULEB encoding for address ranges in debug_line (thus debug_line contribution could change size during linking & the length would then need a linker relocation to adjust for that) - but I don't think that's supported anywhere... ). Probably best to switch that to getU32 unless any tests fail/etc.

jhenderson added inline comments.Feb 11 2020, 12:56 AM
llvm/lib/DebugInfo/DWARF/DWARFDebugAddr.cpp
55–82

Now that I think about it, I did a prototype piece of work where I split the line table header into a separate section in the object from its body, as part of investigations into removing dead line table blocks for GC-ed sections. I think I needed it relocatable to allow for a symbol at the table end to be used to calculate the length in the unit_length field. However, it certainly didn't support DWARF64 or anything like that, so definitely had some inefficiencies. Perhaps the "correct" thing to do would be to make the DWARF64 length part a relocatable read? But in practice, I don't think it really matters.

ikudrin marked 4 inline comments as done.Feb 11 2020, 1:28 AM
ikudrin added a subscriber: labath.
ikudrin added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFDebugAddr.cpp
21

As there is some uncertainty about that, I'd prefer not to rush with that change here.

55–82

Also, as mentioned elsewhere, you could try passing an Error as the second argument to avoid needing to do the isValid... check. (Same goes elsewhere around here).

I've tried that in D71704, but I can't say I really like the result. For my taste, explicit checking for the remaining size looks much more clear and helps to generate more accurate error messages. Let's postpone the changes in that area until @labath presents his solution.

55–82

Now that I think about it, I did a prototype piece of work where I split the line table header into a separate section in the object from its body, as part of investigations into removing dead line table blocks for GC-ed sections.

Sounds like your object file is not following the DWARF standard, no? Next, when you link your objects into a final binary, which should have normal DWARF sections, the linker should apply your relocation statically and remove it from the resulting binary, right? Thus, we still can use just getU32/getU64 to read the length from any DWARF-compliant file.

90

llvm/test/tools/llvm-dwarfdump/X86/debug_addr_unsupported_version.s

jhenderson added inline comments.Feb 11 2020, 1:52 AM
llvm/lib/DebugInfo/DWARF/DWARFDebugAddr.cpp
55–82

Right, it's not DWARF compliant in the object (but would be in the final linked output was the idea), so would require some extra work in llvm-dwarfdump to simulate concatenation of the sections for it to work. I guess more generally, there's technically nothing stopping an assembler emitting a relocation instead of writing the literal value itself, but overall, I don't think it really matters.

ikudrin updated this revision to Diff 243852.Feb 11 2020, 6:42 AM
  • Updated to reflect changes in parent revisions.
  • Updated comments according to @jhenderson's and @dblaikie's suggestions.
  • extractPreV5() -> extractPreStandard().
  • Added a reference to https://gcc.gnu.org/wiki/DebugFission. If anybody knows a better source, please let me know and I will update the reference.
  • Print data size in hex.
  • Merged debug_addr_version_mismatch.s into debug_addr_unsupported_version.s.
ikudrin edited the summary of this revision. (Show Details)Feb 11 2020, 6:44 AM
jhenderson accepted this revision.Feb 11 2020, 7:56 AM

All looks good to me, but best wait for somebody else too.

This revision is now accepted and ready to land.Feb 11 2020, 7:56 AM
dblaikie accepted this revision.Feb 11 2020, 9:07 AM

Looks alright - thanks!

llvm/lib/DebugInfo/DWARF/DWARFDebugAddr.cpp
55–82

Yeah, for now the error checking needs to be against the DWARF-encoded length, which means the DWARFDataExtractor error checking would be insufficient (the DWARFDataExtractor would read correctly/without error beyond the DWARF-encoded length & so would miss errors). I think?

But after that, the error handling would look more or less like the D71704 case - and I do think that's better than explicit length checking. It will make the code easier to update in the future (not having to touch two places - one to parse a new field, another to change the length expectations, etc).

This revision was automatically updated to reflect the committed changes.
ikudrin marked an inline comment as done.Feb 13 2020, 4:51 AM
ikudrin added a subscriber: HsiangKai.
ikudrin added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFDebugAddr.cpp
55–82

Well, there was a revision where getU* was changed to getRelocatedValue for a Length field, D58335. Maybe @HsiangKai can tell more about using relocations against such fields.

dblaikie added inline comments.Feb 13 2020, 2:08 PM
llvm/lib/DebugInfo/DWARF/DWARFDebugAddr.cpp
55–82

That one I think's a bit clearer than any of the other DWARF sections as they're currently emitted by LLVM - as D58335 mentions, it's related to relaxation in the eh/debug_frame sections, which coudl cause them to change size/shrink. So a relocation would be used to update the length field to match that during linking.

As it currently stands, I haven't seen any example where a producer has produced linker-shrinking debug info in other sections. I'd skip it until there's an existence proof/example implementation that's doing this.