This mainly required teaching DWARFFormValue to use relocations for a
section other than .debug_info (i.e., .debug_line). Also the version
and 32/64-bit format might be different for the .debug_line section.
Details
Diff Detail
Event Timeline
It turns out that the dumper doesn't visibly distinguish between in-line strings and referenced strings when it dumps the V5 line-table header. I think I can get that, but it should be a separate follow-up.
I am also thinking that in another follow-up, in DWARFDebugLine.cpp the static helpers to parse the different headers should become private methods. Then they can lose about half their parameters because they are class data members.
Let me know if you think these are bad ideas.
lib/DebugInfo/DWARF/DWARFFormValue.cpp | ||
---|---|---|
68 ↗ | (On Diff #98853) | It's specific to DWARFFormValue, which is why I stuck with the FV. |
122 ↗ | (On Diff #98853) | They are used now (e.g. in getAsCString). The old helper class didn't need them because the old helper class was used only in getFixedByteSize() and skipFormValue(). But the new helper is used in places where it used to call the Unit methods directly, so it needs all these. |
include/llvm/DebugInfo/DWARF/DWARFFormValue.h | ||
---|---|---|
57 ↗ | (On Diff #98864) | My best guess is that sizeof(DWARFFormValue) isn't so important this needs to be indirected. It looks like most/all DWARFFormValues are transient, when some code is visiting the current attribute or form - I don't see code manifesting in a container "all forms/attributes for a DIE" for example (nor all attributes/forms in an entire file, etc). So maybe having this as a value member, rather than a shared_ptr would reduce complexity? Also - while I understand it's convenient to wrap DWARFUnit in the proxy so existing code doesn't need to be touched, I'm not sure it's that much of an improvement in readability - but I could be wrong - compared to having a few members and using those directly? (indeed most code still wouldn't change because it could use the DWARFUnit *U that was there before/would remain - and a few places would use the few extra new state/variables) |
77–85 ↗ | (On Diff #98864) | Feel free to commit changes like this separately, without pre-commit review. Keeps patches small/targeted/easier to review/revert, etc. |
82 ↗ | (On Diff #98864) | Does the LLVM style guide say to document all parameters? (I guess the clang doxy warnings flag things like this if they're not document?) It'd be nice if there were a tidier way to just say "there's nothing to add here" - because I don't think "The DataExtractor to use" really adds anything compared to what the generated documentation or function signature already has. |
lib/DebugInfo/DWARF/DWARFContext.cpp | ||
193–196 ↗ | (On Diff #98864) | Guessing this probably isn't right for DWP files? Or potentially for a DWO containing multiple CUs that might have different CU properties that are relevant to the line table? Though that's perhaps an interesting question - since a DWO line table is just for type units, the CU won't have a reference to any line.dwo section, and multiple DWO TUs might refer to the same line table, I think? (I think LLVM produces only one, but I forget - it's been awhile since I implemented taht) so you'd want to pick one of the DWO TUs that references this line table to use here? |
lib/DebugInfo/DWARF/DWARFDebugLine.cpp | ||
143–144 ↗ | (On Diff #98864) | I know DWARF64 support is starting to be considered in a few places - but is this tested? |
lib/DebugInfo/DWARF/DWARFFormValue.cpp | ||
135 ↗ | (On Diff #98864) | avoid branch-to-unreachable, instead "assert(U)" at the start of the function (similarly for other cases above/in this patch) |
include/llvm/DebugInfo/DWARF/DWARFFormValue.h | ||
---|---|---|
57 ↗ | (On Diff #98864) | You have touched on a couple of different things here. Using a value member instead of shared_ptr seems fine, it would make the diff a whole lot longer but most of the extra changes would be replacing U->foo with U.foo. Yes we could build a helper class that has value members with all the right info. But then we are paying some nonzero cost to extract/compute those values (e.g., size of a ref-addr) when constructing the class, but most of the time we won't actually need all that. And remember that with all the new indirection in v5, we are talking about handles on quite a few different sections that we'd need to copy over into the new helper, on the off chance that we'd need them. Or make the constructor a lot more complicated, smart enough to work out which pieces it needs based on the Form... If we really wanted to get radical, DWARFFormValue should be handed the DWARFContext rather than the DWARFUnit, along with the few bits of unit-specific info (32/64 format, version, address size). Then we query the context as needed, instead of the unit. That minimizes copying while retaining the unit/line-table flexibility. I'm open to suggestions. |
82 ↗ | (On Diff #98864) | Seems to me I did get warnings or something when not all parameters were documents. |
lib/DebugInfo/DWARF/DWARFDebugLine.cpp | ||
143–144 ↗ | (On Diff #98864) | Huh. No, I missed that, I will add a test. |
lib/DebugInfo/DWARF/DWARFFormValue.cpp | ||
135 ↗ | (On Diff #98864) | Hmm so a release build would get a seg fault instead of a more intelligent error? I can do what you said but an actual diagnostic instead of a seg fault seems pretty cheap. |
include/llvm/DebugInfo/DWARF/DWARFFormValue.h | ||
---|---|---|
77–85 ↗ | (On Diff #98864) | r303214. |
lib/DebugInfo/DWARF/DWARFContext.cpp | ||
---|---|---|
193–196 ↗ | (On Diff #98864) | There are no CU properties relevant to a DWO v5 line table. The only reason we need a CU is because it is a convenient way to relay handles on the other sections that a v5 line table header might need (.debug_str etc). So it doesn't matter whether the CU is related to the line table. It's quite easy to argue that, therefore, we shouldn't pass a CU down at all, but something else that provides the same information. I thought about that for a while, and decided that if we do go that route then probably it should be done as a separate commit, because all the existing APIs are expecting a CU. |
lib/DebugInfo/DWARF/DWARFContext.cpp | ||
---|---|---|
193–196 ↗ | (On Diff #98864) | Ah, right - well, given the point was to make debug_line standalone, easier to preserve while stripping other things (if I recall/understand correctly) that seems like a good goal. It should be possible to dump a file that doesn't contain a debug_info section/any CUs and only line tables/debug_str/etc. As for ordering - I'm not too fussed either way. I'd probably/ideally have staged the work to separate the needed properties out of the CU abstractions first, but since we're here I can't see any reason to make that a necessity for this work. |
Make the helper class be copyable values rather than a shared_ptr.
Add a test for DWARF-64 line-table header.
Use assert() instead of unreachable in some cases.
I'd probably/ideally have staged the work to separate the needed properties out of the CU abstractions first,
Yeah, well, I didn't come to terms with that direction until I was really pretty invested in this piece. But hey, if I knew what the endpoint was up front, it wouldn't be iterative development.
Sadly, what has struck me just this minute (after posting the diff of course) is that what's currently the nested FormValueInfo class is probably going to have to be factored out on its own so callers can DTRT and then hand over a constructed helper to DWARFFormValue. Oh well.
include/llvm/DebugInfo/DWARF/DWARFFormValue.h | ||
---|---|---|
42 ↗ | (On Diff #99368) | = default |
58–66 ↗ | (On Diff #99368) | I'd consider keeping the ctors, dropping these Init functions, and where you have: x.Init(...); use: x = FormValueInfo(...); perhaps? since it's simple to copy/move, etc. Seems like less API surface, surprises (no/less chance that Init leaves some stale state, etc), etc :) |
86 ↗ | (On Diff #99368) | The other point I was making here was that maybe the members of FormValueInfo could be members of DWARFFormValue directly. I see there are a few cases where FormValueInfo objects are constructed separately from a DWARFFormValue - but they're all /inside/ a DWARFFormValue member function if I'm not mistaken. Why are those cases not using the FormValueInfo in the DWARFFormValue they're a member of? |
FYI I'm away at a conference next week and I might not get back to this until the week after.
include/llvm/DebugInfo/DWARF/DWARFFormValue.h | ||
---|---|---|
86 ↗ | (On Diff #99368) | Those cases are static member functions. it sounds like you would prefer each method to have several overloads for the different use-cases. I was thinking that FormValueInfo could be factored out into an independent class, constructed by the caller and then passed in, reducing the DWARFFormValue API surface (as there could be simply one copy of skipValue and getFixedByteSize instead of several of each). |
include/llvm/DebugInfo/DWARF/DWARFFormValue.h | ||
---|---|---|
58–66 ↗ | (On Diff #99368) | Still curious about this. Also this class doesn't seem to provide any encapsulation, so it's weird that it has init/ctor like behavior - since there's no invariants that can be held, etc. Should it have a narrower surface area? Or maybe wider - perhaps a couple of functions that return this struct? With names that make clear their use case (the two different ways of using this function that represent different situations, etc)? Will those use cases be unified in some way? Sorry, I'm still a bit fuzzy on the details/goals/end state. |
86 ↗ | (On Diff #99368) | I'm not sure what I'd prefer - mostly trying to understand the current motivations/use cases/etc. Thanks for pointing out/explaining the details. With regard to one function that takes a struct and behaves differently depending on the struct, versus two functions that behave differently/have different parameters: I'd tend to prefer not to make a decision dynamic when it's static (ie: if all the callers know, statically, which behavior they want - no need to make the caller do dynamic work (yeah, the inliner can fix this, but from a legibility standpoint too, it seems simpler)). But this doesn't always apply - if the callee does mostly common work and somewhere deep inside does one of two different things based on those callers, then yeah, it may make sense to have a single function with some kind of dynamic parameter. |
Sorry, I'm still a bit fuzzy on the details/goals/end state.
Yes, well. DWARFFormValue is a bit slippery, partly because forms are a bit slippery, and partly because it seems DWARFFormValue was a convenient place to stuff miscellaneous and not entirely related queries about forms.
Users of DWARFFormValue tend to want one of three things: (a) the data value described by the form, as extracted from a DataExtractor, and possibly from another section; (b) the *size* of the data value described by the form, because they want to skip over it; or (c) whether the size of a data value that *could* be described by the form is fixed-size, and if so, what size is that. The latter case comes up when parsing an abbrev table on its own, some consumers like to know ahead of time which abbrevs describe DIEs that are fixed-size and how big those are.
This gets trickier because sometimes the data value depends on external information (e.g. relocations) and sometimes the data value has a fixed size but that size depends on external information (e.g. the DWARF version or 32/64 format). Up through DWARF v4, all of that external information was available through the relevant DWARFUnit, because forms always described data from a Unit, and so passing a DWARFUnit pointer in to various methods was enough to hide all the details about bits and pieces from the callers.
Now in DWARF v5, forms become even more exciting, because sometimes the data value depends on even more external information (e.g. yet another object-file section such as the string-offset section or the address table). And when we are looking at forms describing information in the .debug_line section, the DWARF version and 32/64 format come from the line-table header not the Unit, and the relevant relocations are .debug_line relocations not .debug_info relocations. When we're dumping a .debug_line section in dwarfdump, really we shouldn't be relying on a Unit at all.
So, I am trying to package up the information that DWARFFormValue needs in order to answer the various queries that it gets, without DWARFFormValue itself needing to understand whether it is looking at .debug_info or .debug_line (it should be given the relevant info by its callers), and without making DWARFFormValue's API look too ridiculous.
Provide DWARFDebugLine parsers with the DWARFContext, not just a relocation map.
Add a second overload of DWARFFormValue::extractValue() for use by the line-table parser; both overloads call a common private implementation.
This allows the v5 line-table header to use DW_FORM_strp.
Ah, right, thanks for explaining/refreshing all that.
seems like anywhere that still has a DWARFCompileUnit in the DWARFFormValue is a possible bug waiting to happen for non-Unit uses of forms? I think it'd be good to port it all over so it's not in this hybrid/partially supported state. Perhaps the best thing would be if FormSizeHelper (hmm, wonder if there's a better name for that) had an implicit ctor from DWARFCompileUnit so DWARFFormValue could have one API in terms of FormSizeHelper only & callers can either construct the pieces they need, or have it constructed from a DWARFCompileUnit.
llvm/lib/DebugInfo/DWARF/DWARFFormValue.cpp | ||
---|---|---|
333 | Maybe reverse this to reduce indentation/early exit: if (!CU) return extractValueImpl(Data, OffsetPtr, nullptr); U = CU; ... | |
334 | The existence of this 'U' member makes it look like the dumping and other code may not work? Since the FormSizeHelper infrastructure won't be used in those cases. That seems problematic/inconsistent. | |
655–658 | This sort of fallback seems error prone or confusing. Are there cases where the first part is necessary? If so, it seems like there would be places where the second/new part is incorrect? |
I'm not super excited about the hybrid form, but always unpacking a Unit inside the DWARFFormValue methods felt kind of expensive.
I have, over the weeks, contemplated inventing a fake kind of Unit for the line table, which would have all the Right Stuff in it to let DWARFFormValue do its thing. This of course is getting away from the hybrid in the opposite way--make callers provide a Unit-like object, instead of making most callers unpack a Unit into some other kind of info/helper class. Building a fake Unit feels a little icky, and I think might not be especially cheap, but as it's only once per line-table it might be okay... I've been resisting it because a line table is-not-a Unit, what is your opinion? (Am I making sense?)
llvm/lib/DebugInfo/DWARF/DWARFFormValue.cpp | ||
---|---|---|
655–658 | I had discovered that the first part is necessary for .dwo cases (~10 tests), because of things like .debug_types.dwo wanting to get its strings from .debug_str.dwo not .debug_str, and so forth. The dumper constructs a Unit that has either the .dwo or non-.dwo section handles in it, so that getStringExtractor will point to the correct section. I can't get that directly out of the Context unless DWARFFormValue knows a whole lot more about its environment than I would really like. |
By passing a DWARFUnit into the line-table parser (if we have one), we allow it to handle indirect string forms, which require finding other sections. Previously the line-table parser couldn't do that (and didn't need to, prior to v5).
By passing DWARFFormParams to DWARFFormValue::extractValue() separately from the DWARFUnit, we allow interpreting forms according to the line-table's version/format, which could be different from the Unit's version/format.
This doesn't do the .dwo part yet, I'd like to leave that for a follow-up.
lld/ELF/InputFiles.cpp | ||
---|---|---|
80–83 ↗ | (On Diff #120642) | I don't fully understand this change, but it seems like if it was working before, Dwarf.getNumCompileUnits() >0 is guaranteed, no? |
llvm/test/DebugInfo/Inputs/dwarfdump-header-64.s | ||
---|---|---|
8 | Why do we need/want to check in the binary? Could it just be generated on the fly? It looks like this assembler file may be general enough to also assemble for am x86_64-apple-darwin triple so we could test Mach-O support, too? |
lld/ELF/InputFiles.cpp | ||
---|---|---|
80–83 ↗ | (On Diff #120642) | No. If you have no debug info, getNumCompileUnits() == 0, but we still go through this path. When we discover there's no debug info, lld falls back to using FILE symbols instead. |
llvm/test/DebugInfo/Inputs/dwarfdump-header-64.s | ||
8 | I thought checked-in binaries were usual; for example the DWARF32 version of the test is a checked-in binary. Generating the binary on the fly has a small test-time performance cost. |
llvm/test/DebugInfo/Inputs/dwarfdump-header-64.s | ||
---|---|---|
8 | We don't have a clear policy for binary files. My opinion is that we should avoid binaries where a sufficiently reliable alternative exists, to make it easier to modify and maintain testcases in the future. So unless we need to become independent from variability introduced by llvm-mc, I think just checking in the assembler sources is the way to go. A counterexample for when binaries are necessary are the LLVM bitcode compatibility tests. For completeness: another alternative is yam2obj, but it is even more low-level than assembler and in my experience too verbose for DWARF tests.
Good point, I forgot about those. Half-serious: we could run the C preprocessor over the assembler file to support alternate section names guarded by #ifdefs. |
llvm/test/DebugInfo/Inputs/dwarfdump-header-64.s | ||
---|---|---|
8 | I am pretty sure that with a checked-in binary, you don't need to build the right target architecture to be able to dump it. For sure assembling the file at test time will depend on having llvm-mc built with the correct target. But overall I think being able to show the cross-section stuff works for both Mach-O and ELF is probably more important than being able to run an ELF-only test on bots that don't build X86. So I will proceed with trying to find a reasonable way to make the assembly source work for both ELF and Mach-O. |
Thanks, that sounds great! As far as I know there are only very few bots that don't build the X86 backend at the moment so I wouldn't be too concerned about that.
Make the new test work for both ELF and Mach-O.
Simplify LLD change to never pass a Unit, as there's no test for it yet.
Ping.
I left the existing test (dwarfdump-header.s) as a checked-in binary, because it tests both normal and split headers, and I don't know what MachO section names to use for the .dwo sections. My intuition is that MachO will never bother supporting split DWARF so maybe it's fine to leave it as is. If somebody does want me to convert the test, I'll need a list of section names to use, and I would do it as a follow-up. Just thought I'd put that out there.
Thanks for converting the other testcase.
I left the existing test (dwarfdump-header.s) as a checked-in binary, because it tests both normal and split headers, and I don't know what MachO section names to use for the .dwo sections.
There is no need to test split DWARF + MachO since that combination is neither tested nor specified anywhere.
Why can't the other test be ELF-only assembler though? Otherwise this LGTM now.
llvm/test/DebugInfo/X86/dwarfdump-header-64.s | ||
---|---|---|
18 ↗ | (On Diff #121087) | Nice, thank you! |
Maybe reverse this to reduce indentation/early exit: