This is an archive of the discontinued LLVM Phabricator instance.

[DWARFv5] Support FORM_strp in .debug_line.dwo
ClosedPublic

Authored by probinson on Nov 9 2017, 11:17 AM.

Details

Summary

As a side effect, the .debug_line section will be dumped in physical
order, rather than in the order that compile units refer to their
associated portions of the .debug_line section. These are probably
always the same order anyway, and no tests noticed the difference.

Diff Detail

Event Timeline

probinson created this revision.Nov 9 2017, 11:17 AM
dblaikie added inline comments.Nov 9 2017, 11:35 AM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
213–217

Might make more sense for this function to return the map by value, rather than take it by reference, clear it, and populate it?

335

Probably be more specific/descriptive.

This also appears to be a regression - before your patch this code did the right thing (used the corresponding CU's address byte size for each CU) for non-DWO CUs, though it was probably wrong for DWO CUs by using this saved thing. But with your patch it now does the same sketchy thing for DWO and non-DWO CUs?

336–339

A loop that only runs once seems a bit odd, though I realize it's pretty compact.

probinson added inline comments.Nov 9 2017, 12:02 PM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
213–217

Agreed.

335

I had tried to fix this properly (allowing a v5 line table to be dumped with no Unit) but tripped over a different issue, and decided to defer it to the next patch. I think you are right that I could leave this set from the Unit whenever there is one, and pre-v5 basically there always has to be one.

336–339

Copied from the dwo case. Technically it could run zero times.
But if I set the size from the Unit, inside the dump loop, this goes away anyhow.

probinson updated this revision to Diff 122313.Nov 9 2017, 12:53 PM

Address @dblaikie comments.

probinson marked 3 inline comments as done.Nov 9 2017, 12:54 PM
dblaikie added inline comments.Nov 9 2017, 12:57 PM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
344–348

Would it make sense to sink this down into the LineTable parsing logic, so that the unit isn't looked up when it's not needed? (when using more recent line table formats that have a standalone description (if I'm understanding the DWARF5 line table support correctly))

probinson added inline comments.Nov 9 2017, 1:34 PM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
344–348

We need to look up the unit anyway. The unit is how the LineTable parsing finds the string section, and .debug_str_offsets (for FORM_strx). And the .debug_addr section, once I get to that. We could defer setting the data extractor's address size until we're down into LineTable, but we can't defer looking up the unit.

dblaikie added inline comments.Nov 9 2017, 1:46 PM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
344–348

Hmm, then I'm confused - I had thought one of the goals of the new line table work was to make it standalone/parsable without the debug_info? Any idea if that has been a goal? Or just something I totally misunderstood?

(Moving the why-do-we-need-the-unit discussion out of the inline comments.)

I do have a goal of being able to handle a line table without a unit, when possible. However, there are constraints, and many times you still end up needing a unit. Sadly, the caller of the line-table parser doesn't know whether the unit will be needed.

In fact, for dumping a .debug_line.dwo, the unit should *not* be needed. A .dwo doesn't have any addresses in it, and won't be v2, so all it needs to know is the 32/64 format, which is in the line-table header. However, because of how libDebugInfo works, the string-section handles are in DWARFUnit. So, for a v5 line-table header that uses any indirect string form, the parser still needs the DWARFUnit even though it doesn't need the .debug_info compile/type unit. I tilted at that windmill several times, and the windmill won every time.

If we defer looking up the unit until the line-table parser needs to know it, then the parser will frequently end up doing the linear scan through the units until it finds the right one (O(N*N/2)). It seemed less inefficient to scan all the units once and do a map lookup each time instead (O(N+NlogN)). I am guessing it will be unusual for a line table *not* to need the unit, so the unconditional lookup seemed like the better choice. And the parser doesn't know whether it should search the normal or .dwo CUs, so.... might as well just do it up front in the caller.

This does not mean my prior work was wasted, because with v5 using forms in the line-table header, the FormParams had to be split out anyway. But even with v5, if the header uses FORM_strx, or the line-number program wants to use the .debug_addr table, we still need the unit. (And again, given how libDebugInfo works, even FORM_strp needs the DWARFUnit, even though it doesn't need anything from the actual compile unit.)

As I mentioned elsewhere, for my next trick I will conjure up a standalone v5 line table, and parse it without the assistance of any units. But it will have to use all inline strings, until I can come back around to tracking the string sections independently of the DWARFUnit.

JDevlieghere added inline comments.Nov 10 2017, 6:45 AM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
341

Why don't we need the address byte size here anymore? I'm super excited you got rid of the "sketchy loop" below, but I'd like to understand why we can do without it.

probinson added inline comments.Nov 10 2017, 10:02 AM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
341

Theoretically it can vary per unit. The address size is still set below, inside the while loop, after we look up the unit associated with this part of the line-table section.

probinson added inline comments.Nov 10 2017, 10:30 AM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
341

Technically I should set the address size to zero for a .debug_line portion with no associated unit. If it's a v5 portion, we can establish the address size from the v5 header. If it's not a v5 portion, then we should still be able to parse/dump the header, but we'd have to skip over the line-number program. But I didn't want to get that far into things with this patch, there is some other lurking bug that will take a little time to track down.

Oh, head-smack! You don't need a unit to know the address size; the line table has always had that info, although indirectly.
DW_LNE_set_address (the only opcode that specifies an actual address) is an extended opcode, which has a ULEB telling you how long the operands are. Which is, one byte for the opcode, and.... the size of the address. Doh!

That means, DWARF v5 is the first time we actually *do* (sometimes) need a unit to parse the .debug_line section, instead of the first time we *don't* need it.

This patch is about one of the cases where we do need a DWARFUnit, so I think it's not particularly affected by this revelation, but there is likely to be some simplification that can be done as a follow-up.

Oh, head-smack! You don't need a unit to know the address size; the line table has always had that info, although indirectly.
DW_LNE_set_address (the only opcode that specifies an actual address) is an extended opcode, which has a ULEB telling you how long the operands are. Which is, one byte for the opcode, and.... the size of the address. Doh!

That means, DWARF v5 is the first time we actually *do* (sometimes) need a unit to parse the .debug_line section, instead of the first time we *don't* need it.

This patch is about one of the cases where we do need a DWARFUnit, so I think it's not particularly affected by this revelation, but there is likely to be some simplification that can be done as a follow-up.

I missed the call to setAddressSize below, but I'm glad that I did. Thanks for explaining Paul!

probinson updated this revision to Diff 122917.Nov 14 2017, 2:49 PM

Use address-size zero for the DWARFDataExtractor, in the .dwo section.
This required relaxing an assertion in the line-table parser. If there's no line-number program, the address size isn't needed, and if there is one, attempting to retrieve an address will assert farther down.

Technically we can derive the address size from the line-number program, but I wanted to leave that for the next patch.

Ping. I am moderately sure I addressed everything people asked about; if there are still questions please remind me.

JDevlieghere accepted this revision.Nov 21 2017, 12:18 PM

Looks good to me!

(Adrian is out until after Thanksgiving, just in case you're waiting for his blessing)

This revision is now accepted and ready to land.Nov 21 2017, 12:18 PM
This revision was automatically updated to reflect the committed changes.