Currently, when using llvm as an assembler, DWARF debug information is only
generated for the .text section. This patch modifies this so that DWARF info
is emitted for all executable sections.
Details
Diff Detail
Event Timeline
Added -gdwarf-N options to llvm-mc.
Added error if trying to compile an asm file with multiple sections using -gdwarf-2.
Se also D2737, for clang part adding -gdwarf-N options to cc1as.
tools/llvm-mc/llvm-mc.cpp | ||
---|---|---|
412 ↗ | (On Diff #6982) | This will give preference to Dwarf 4 irrespective of how many other flags are in the command line, even if they're after the -gdwarf-4. |
I couldn't find any way to accept the last -gdwarf-N option in llvm-mc (without using clang's argument parsing), but this change makes it so that only one -gdwarf-N option can be used.
Hi Oliver,
I can't comment on the validity of taking only one option.
I think it's rather limiting, but this is an internal tool, so it should be fine.
I'll let others review the code.
cheers,
--renato
I think that only allowing a single -gdwarf-N option to llvm-mc seems sensible to me, until a specific requirement to support mulitple instances is raised (and the work to support it can then be implemented when there is a specific requirement).
The code to support describing mulitple code sections in DWARF looks OK to me.
Eric,
Things like:
$ llvm-mc -gdwarf-2 -gdwarf-3 -gdwarf-4
llvm-mc doesn't have the tools Clang has for dealing with multiple contradictory compiler flags. Since llvm-mc is an internal tool, we don't need to deal with it, so just make it an error.
cheers,
--renato
Rename the option then. llvm-mc isn't attempting to be compatible with (or used as) an assembler interface except for testing. Just make it -dwarf-version or something.
Haven't had a chance to check the rest of the patch yet.
Added some comments. Thanks!
include/llvm/MC/MCContext.h | ||
---|---|---|
25 | SetVector here perhaps? Stable iteration is important, but I'm not sure how many we expect here. I.e. are we expecting the full pain of large -ffunction-sections C++ or hand-written code? | |
138 | debug_ranges yes? | |
383 | 80-columns? | |
384–389 | We never check for NULL at the use. Either we should assert in the lookup routine or in the use, but we should check somehow. | |
lib/MC/MCDwarf.cpp | ||
567–568 | If we're using this for ranges and aranges then we should just pull it out into a separate "finalizeSections" routine or some such. | |
585–586 | Update comment. | |
611–625 | Needs a full sentence here :) | |
657 | EmitInt16. | |
691 | LineSectionSymbol? | |
806 | Use consistent variable naming please. | |
885–886 | Unrelatedly this seems like it should be near the top. |
lib/MC/MCDwarf.cpp | ||
---|---|---|
897 | Might want to make these two routines member functions rather than static and just have the variables be member variables. It should clean up a few conditionals in the patch. | |
912 | Reword? Seems confusing as written. | |
916–918 | Comment please. | |
lib/MC/MCParser/AsmParser.cpp | ||
1583–1584 | "for the current section and it contains code"? | |
1586–1587 | Maybe just check if the section is a text section instead of looking up in the set? | |
lib/MC/MCParser/ELFAsmParser.cpp | ||
548–565 | Formatting seems odd. clang-format? | |
554 | Formatting. | |
test/MC/ARM/dwarf-asm-multiple-sections.s | ||
56 ↗ | (On Diff #7059) | Check the relocations in the info section? |
test/MC/ARM/dwarf-asm-nonstandard-section.s | ||
44 ↗ | (On Diff #7059) | Same here. |
test/MC/ARM/dwarf-asm-single-section.s | ||
44 ↗ | (On Diff #7059) | Ditto. |
test/MC/ARM/ldr-pseudo.s | ||
19 ↗ | (On Diff #7059) | What's going on here? |
test/MC/ARM/ltorg.s | ||
16 ↗ | (On Diff #7059) | Ditto. |
tools/llvm-mc/llvm-mc.cpp | ||
156 ↗ | (On Diff #7059) | Let's just use a different command line option where you can set it. There's no need to make it identical to an assembler. |
include/llvm/MC/MCContext.h | ||
---|---|---|
25 | This set should stay small for sane use-cases. This code is only used when the input is an assembly file, there is a separate code path if we are generating native code from IR. Done. | |
138 | Both. Done. | |
383 | Done | |
384–389 | Done | |
lib/MC/MCDwarf.cpp | ||
567–568 | Done | |
585–586 | Done | |
611–625 | Done | |
657 | EmitInt16 is a method of AsmPrinter, and we are not outputting assembly here. Every other place in this file uses EmitIntValue. | |
691 | Done | |
806 | Done | |
885–886 | Done | |
897 | Everything in this class is currently static, and there doesn't seem to be a big enough advantage to make the change. | |
912 | Done | |
916–918 | Done | |
lib/MC/MCParser/AsmParser.cpp | ||
1583–1584 | We are emitting an instruction, so the section contains code (or will do very soon). | |
1586–1587 | Ditto | |
lib/MC/MCParser/ELFAsmParser.cpp | ||
548–565 | Done | |
554 | Done | |
test/MC/ARM/dwarf-asm-multiple-sections.s | ||
56 ↗ | (On Diff #7059) | Done |
test/MC/ARM/dwarf-asm-nonstandard-section.s | ||
44 ↗ | (On Diff #7059) | Done |
test/MC/ARM/dwarf-asm-single-section.s | ||
44 ↗ | (On Diff #7059) | Done |
test/MC/ARM/ldr-pseudo.s | ||
19 ↗ | (On Diff #7059) | These tests were broken by the extra symbol inserted to reference the beginning of each section, which changed anonymous symbol names. |
test/MC/ARM/ltorg.s | ||
16 ↗ | (On Diff #7059) | Ditto |
tools/llvm-mc/llvm-mc.cpp | ||
156 ↗ | (On Diff #7059) | We don't have to make it identical to the assembler, but I don't see a good reason to have a different option to clang, either. As it is now, we behave the exact same way as clang if the option is given 0 or 1 times, and give a sensible error message if it is given more than 1 time ("llvm-mc: for the -gdwarf-3 option: may only occur zero or one times"). |
Things mentioned in inline comments, plus:
- Don't check DWARF version if we are not generating DWARF
- Fix tests to not be dependent on length of compilation working dir
- Fix tests to match changes in output format of llvm-dwarfdump
Sorry, been trying to get to this. As you can see Dave is working in
the same file so you'll likely need to rebase your patch. I'd like to
get the refactoring I mentioned done first.
Let's split the testcase generalization out, and I really don't want
to add -gdwarf-N to llvm-mc.
-eric
lib/MC/MCDwarf.cpp | ||
---|---|---|
657 | Seems sad. I guess that can be a refactor. |
Rebased onto current trunk.
- Only minor changes needed, correcting offsets in the tests
Let's split the testcase generalization out
Done, see new dependency
I really don't want to add -gdwarf-N to llvm-mc
Why is that? I understand that we don't _have_ to maintain compatibility with clang, but I can't see a good reason not to. This way, the same options are accepted for the common cases (zero or one uses of -gdwarf-N), and an understandable error is thrown in the more complex case of multiple uses or it.
Seems sad. I guess that can be a refactor. (EmitIntValue -> EmitInt16)
Yes, but that would be a separate patch.
Hi Oliver,
Independent of the other comments by Eric, I agree that -dwarf-version N would be simpler to implement in this case. The main point here is to provide the functionality, not command-line compatibility.
cheers,
--renato
After rebasing, one test has started failing: test/MC/ELF/gen-dwarf.s. This was because we do not track which sections have had instructions emitted when we are emitting assembly (as opposed to object files). My patch was using this information to avoid emitting DWARF information for empty sections. Because of this, I have modified my patch to emit DWARF for all sections.
Could you pull the patch setting the dwarf version for the compile unit out into a separate patch? It seems pretty easily separable.
Some additional comments inline as well.
Thanks for all of the work here, it's shaping up nicely!
-eric
include/llvm/MC/MCContext.h | ||
---|---|---|
138 | Do you mean debug_aranges or debug_ranges? You use aranges here and then DW_AT_ranges below. | |
386 | Function names: http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly Also these functions appear to be a bit heavyweight to be defined inline. Bring them out of line? | |
lib/MC/MCDwarf.cpp | ||
514 | Is the argument necessary or can you use the same lookup you do in other places? | |
523–528 | Same comment on ranges as below. | |
690 | You still need a DW_AT_low_pc as the offset for the for the ranges section. Take a look at the code that does this for compile units in DwarfDebug.cpp (as of right now line 986) for a bit of an explanation. | |
800 | I don't think we're likely to ever emit more than one range list for assembly files are we? | |
tools/llvm-mc/llvm-mc.cpp | ||
157 ↗ | (On Diff #8304) | 3 is a bit odd in that it's not the same as the defaults for any target :) |
General comments:
Test coverage:
- a test case with more than one section, but only one section with instructions (it'd be nice for this not to use ranges, but just high_pc/low_pc)
- a test case with more than one section, but at least two sections with instructions and at least one section without (just to make sure we can omit those sections while still including a ranges section with the other sections)
The refactoring into a single data structure containing a mapping from section to start/end symbols would be nice.
Probably a bridge too far, but this starts to look almost, sort of, like what we'd like to have for source level debugging handling too. It might be nice to consider a generic range list representation that can handle all the ranges and emission. Though source level debugging needs the concept of multiple ranges even in a single section, which is an added complication compared to this code.
include/llvm/MC/MCContext.h | ||
---|---|---|
139 | Having 3 data structures all with the same key and related values is a bit of a pity. Could we get a struct (or even just a pair) of start/end symbol and have one map to that pair? I assume "GenDwarfSections" is then just the key set from that map, right? So you don't need a separate data structure for that as well, you can just use the presence/absence in the map Also - I assume the iteration order over the keys is important for stability of the output of the ranges section, so this might need to be a MapVector rather than just a DenseMap? | |
382 | With the suggested data structure changes above, these get/set functions probably just turn into one or two functions (two if you need const overloads, one if you don't) that, given an MCSection* returns the pair/structure containing the start and end. | |
386 | I think we can move to using nullptr these days - could just fix this in all places in your patch, but not strictly necessary I suppose. | |
391 | If you do the "hasInstructions" check here then you could avoid creating end symbols and just remove the elements from the container entirely - that would address the issue of creating ranges with only one range later on, right? | |
lib/MC/MCDwarf.cpp | ||
573 | Perhaps rather than creating end labels, etc, for sections that don't have any instructions anyway, and using another data structure, we could just remove the element from the Sections set/map immediately - then iterate the map later on? | |
700 | Would this break on an empty asm file (or one with only data, perhaps?)? | |
811 | This SwitchSection call seems bogus/dangerous - if we ever leave the loop via the continue on the next line, then we'll emit the range terminator into the section we were just looking at, not the range section. | |
812 | So this code would result in us possibly emitting a ranges section with only one range/section, right? (if there were multiple sections but only one turns out to have instructions) That can probably be tidied up later, though. | |
852 | Again, seems easier to just remove the sections that aren't relevant (& I'm not sure why this pass and the other one that checks for instructions and builds another data structure and closes ranges is separate - perhaps once the sections are just removed there won't be a need for two passes) | |
858 | Seems like this could go in separately - if we have null begin/end data today (when we track only one range) we should do the same thing, right? Be good to just add that check (then rephrase it to the empty test here in the following commit) and test case first, possibly, maybe... | |
lib/MC/MCParser/ELFAsmParser.cpp | ||
553 | Rather than doing a count and then an insertion, you should just be able to insert and use the bool in the returned iterator/bool pair to determine if this is the first time the section has been added to the list. | |
557 | What does GCC do here? (silently fall back to just describing the .text section?) | |
560 | Shouldn't this condition be the same as the one above (ie: if the element isn't in the sections list it also doesn't have a start sym?) Seems unnecessary to do two lookups. (& based on the suggested data structure change this would be one insertion, check whether the insertion succeeded (because the element wasn't already present), then provide the necessary error and/or create the start symbol) pair = sections.insert(make_pair(section, range())); if (dwarf <= 2 && sections > 1) error start = createTempSymbol emit(start); pair.first->start = start; } Something like that anyway. |
You still need a DW_AT_low_pc as the offset for the for the ranges section.
I don't think we need DW_AT_low_pc when the input is assembly, as it is optional in the DWARF spec and we do not emit any debug info which could use it.
The two reasons to include DW_AT_low_pc as well as DW_AT_ranges are:
- Specifying the default base address for the .debug_ranges section. We do not need this as we have a base address entry for each section in .debug_ranges, so the default never gets used.
- Specifying the default base address for the .debug_loc section. We do not need this as we never emit the .debug_loc section for an assembly file.
What does GCC do here?
GCC seems to do the same as I am doing:
- If no sections contain code, don't emit any debug info (not even .debug_abbrev)
- If exactly one section contains code, use low_pc and high_pc
- Otherwise, use .debug_ranges
Looks pretty good. I'd split the DwarfVersion out of this commit and go ahead and put that in separately.
The rest of the patch looks good outside of the comments and the discussion about address base that is going on.
Thanks for all of the work!
lib/MC/MCContext.cpp | ||
---|---|---|
32 | Extraneous whitespace addition. | |
342 | Comment what's going on in the function. | |
345 | auto? | |
lib/MC/MCDwarf.cpp | ||
657 | This can go in separately yes? | |
lib/MC/MCParser/ELFAsmParser.cpp | ||
552 | Think this can probably go in separately along with the refactor you've done for the section handling. If it doesn't make sense don't worry about it. |
As far as the Address Base:
That's weird. It's strictly larger to do it this way and using an offset of 0 is the same effect for this case and what gcc does if you _don't_ just assemble it. (since, of course, it's actually gas making this decision).
So I don't see what using an offset gets us in this case versus using DW_AT_low_pc other than bit identical output to gas.
Sorry, I let this patch slip off my radar, but I'm still interested in progressing it.
So I don't see what using an offset gets us in this case versus using DW_AT_low_pc other than bit identical output to gas.
If I understand it correctly, you are suggesting using the DW_AT_low_pc as the base address for the first range, and a base address selection entry to set the base address of each subsequent range (each range covers one code section). This seems like unnecessary complexity for the sake of 4 bytes. Also, the size of the abbrev will be increased by adding DW_AT_low_pc, reducing the size difference.
SetVector here perhaps? Stable iteration is important, but I'm not sure how many we expect here. I.e. are we expecting the full pain of large -ffunction-sections C++ or hand-written code?