This is an archive of the discontinued LLVM Phabricator instance.

Emit DWARF info for all code section in an assembly file
ClosedPublic

Authored by olista01 on Feb 5 2014, 2:52 AM.

Details

Summary

Currently, when using llvm as an assembler, DWARF debug information is only
generated for the .text section. This patch modifies this so that DWARF info
is emitted for all executable sections.

Diff Detail

Event Timeline

olista01 updated this revision to Unknown Object (????).Feb 11 2014, 5:45 AM

Added -gdwarf-N options to llvm-mc.

Added error if trying to compile an asm file with multiple sections using -gdwarf-2.

Se also D2737, for clang part adding -gdwarf-N options to cc1as.

rengolin added inline comments.Feb 13 2014, 5:51 AM
tools/llvm-mc/llvm-mc.cpp
412 ↗(On Diff #6982)

This will give preference to Dwarf 4 irrespective of how many other flags are in the command line, even if they're after the -gdwarf-4.

olista01 updated this revision to Unknown Object (????).Feb 13 2014, 7:02 AM

I couldn't find any way to accept the last -gdwarf-N option in llvm-mc (without using clang's argument parsing), but this change makes it so that only one -gdwarf-N option can be used.

Hi Oliver,

I can't comment on the validity of taking only one option.

I think it's rather limiting, but this is an internal tool, so it should be fine.

I'll let others review the code.

cheers,
--renato

I think that only allowing a single -gdwarf-N option to llvm-mc seems sensible to me, until a specific requirement to support mulitple instances is raised (and the work to support it can then be implemented when there is a specific requirement).

The code to support describing mulitple code sections in DWARF looks OK to me.

Thanks Keith,

Oliver, LGTM.

--renato

rengolin accepted this revision.Feb 25 2014, 6:57 AM

What do you mean by a single option? What won't work?

Eric,

Things like:

$ llvm-mc -gdwarf-2 -gdwarf-3 -gdwarf-4

llvm-mc doesn't have the tools Clang has for dealing with multiple contradictory compiler flags. Since llvm-mc is an internal tool, we don't need to deal with it, so just make it an error.

cheers,
--renato

Rename the option then. llvm-mc isn't attempting to be compatible with (or used as) an assembler interface except for testing. Just make it -dwarf-version or something.

Haven't had a chance to check the rest of the patch yet.

That's another option, yes. I don't mind either way.

Added some comments. Thanks!

include/llvm/MC/MCContext.h
25

SetVector here perhaps? Stable iteration is important, but I'm not sure how many we expect here. I.e. are we expecting the full pain of large -ffunction-sections C++ or hand-written code?

138

debug_ranges yes?

383

80-columns?

384–389

We never check for NULL at the use. Either we should assert in the lookup routine or in the use, but we should check somehow.

lib/MC/MCDwarf.cpp
567–568

If we're using this for ranges and aranges then we should just pull it out into a separate "finalizeSections" routine or some such.

585–586

Update comment.

611–625

Needs a full sentence here :)

657

EmitInt16.

691

LineSectionSymbol?

806

Use consistent variable naming please.

885–886

Unrelatedly this seems like it should be near the top.

echristo added inline comments.Feb 25 2014, 5:03 PM
lib/MC/MCDwarf.cpp
897

Might want to make these two routines member functions rather than static and just have the variables be member variables. It should clean up a few conditionals in the patch.

912

Reword? Seems confusing as written.

916–918

Comment please.

lib/MC/MCParser/AsmParser.cpp
1583–1584

"for the current section and it contains code"?

1586–1587

Maybe just check if the section is a text section instead of looking up in the set?

lib/MC/MCParser/ELFAsmParser.cpp
548–565

Formatting seems odd. clang-format?

554

Formatting.

test/MC/ARM/dwarf-asm-multiple-sections.s
56 ↗(On Diff #7059)

Check the relocations in the info section?

test/MC/ARM/dwarf-asm-nonstandard-section.s
44 ↗(On Diff #7059)

Same here.

test/MC/ARM/dwarf-asm-single-section.s
44 ↗(On Diff #7059)

Ditto.

test/MC/ARM/ldr-pseudo.s
19 ↗(On Diff #7059)

What's going on here?

test/MC/ARM/ltorg.s
16 ↗(On Diff #7059)

Ditto.

tools/llvm-mc/llvm-mc.cpp
156 ↗(On Diff #7059)

Let's just use a different command line option where you can set it. There's no need to make it identical to an assembler.

I've replied to the patch in phabricator.

olista01 added inline comments.Feb 26 2014, 9:53 AM
include/llvm/MC/MCContext.h
25

This set should stay small for sane use-cases. This code is only used when the input is an assembly file, there is a separate code path if we are generating native code from IR.

Done.

138

Both. Done.

383

Done

384–389

Done

lib/MC/MCDwarf.cpp
567–568

Done

585–586

Done

611–625

Done

657

EmitInt16 is a method of AsmPrinter, and we are not outputting assembly here. Every other place in this file uses EmitIntValue.

691

Done

806

Done

885–886

Done

897

Everything in this class is currently static, and there doesn't seem to be a big enough advantage to make the change.

912

Done

916–918

Done

lib/MC/MCParser/AsmParser.cpp
1583–1584

We are emitting an instruction, so the section contains code (or will do very soon).

1586–1587

Ditto

lib/MC/MCParser/ELFAsmParser.cpp
548–565

Done

554

Done

test/MC/ARM/dwarf-asm-multiple-sections.s
56 ↗(On Diff #7059)

Done

test/MC/ARM/dwarf-asm-nonstandard-section.s
44 ↗(On Diff #7059)

Done

test/MC/ARM/dwarf-asm-single-section.s
44 ↗(On Diff #7059)

Done

test/MC/ARM/ldr-pseudo.s
19 ↗(On Diff #7059)

These tests were broken by the extra symbol inserted to reference the beginning of each section, which changed anonymous symbol names.

test/MC/ARM/ltorg.s
16 ↗(On Diff #7059)

Ditto

tools/llvm-mc/llvm-mc.cpp
156 ↗(On Diff #7059)

We don't have to make it identical to the assembler, but I don't see a good reason to have a different option to clang, either. As it is now, we behave the exact same way as clang if the option is given 0 or 1 times, and give a sensible error message if it is given more than 1 time ("llvm-mc: for the -gdwarf-3 option: may only occur zero or one times").

olista01 updated this revision to Unknown Object (????).Feb 26 2014, 9:56 AM

Things mentioned in inline comments, plus:

  • Don't check DWARF version if we are not generating DWARF
  • Fix tests to not be dependent on length of compilation working dir
  • Fix tests to match changes in output format of llvm-dwarfdump
olista01 requested a review of this revision.Mar 10 2014, 11:48 AM

Ping?

Sorry, been trying to get to this. As you can see Dave is working in
the same file so you'll likely need to rebase your patch. I'd like to
get the refactoring I mentioned done first.

Let's split the testcase generalization out, and I really don't want
to add -gdwarf-N to llvm-mc.

-eric

echristo requested changes to this revision.Mar 12 2014, 3:50 PM
echristo added inline comments.
lib/MC/MCDwarf.cpp
657

Seems sad. I guess that can be a refactor.

olista01 updated this revision to Unknown Object (????).Mar 13 2014, 8:44 AM

Rebased onto current trunk.

  • Only minor changes needed, correcting offsets in the tests

Let's split the testcase generalization out

Done, see new dependency

I really don't want to add -gdwarf-N to llvm-mc

Why is that? I understand that we don't _have_ to maintain compatibility with clang, but I can't see a good reason not to. This way, the same options are accepted for the common cases (zero or one uses of -gdwarf-N), and an understandable error is thrown in the more complex case of multiple uses or it.

Seems sad. I guess that can be a refactor. (EmitIntValue -> EmitInt16)

Yes, but that would be a separate patch.

Hi Oliver,

Independent of the other comments by Eric, I agree that -dwarf-version N would be simpler to implement in this case. The main point here is to provide the functionality, not command-line compatibility.

cheers,
--renato

olista01 updated this revision to Unknown Object (????).Mar 17 2014, 5:16 AM

Changed to make llvm-mc accept -dwarf-version N rather than -gdwarf-N.

rengolin accepted this revision.Mar 19 2014, 10:12 AM

Hi Oliver,

This looks good to me.

--renato

echristo: ping?

echristo accepted this revision.Mar 31 2014, 5:01 PM

This patch LGTM.

olista01 updated this revision to Unknown Object (????).Apr 2 2014, 9:26 AM

After rebasing, one test has started failing: test/MC/ELF/gen-dwarf.s. This was because we do not track which sections have had instructions emitted when we are emitting assembly (as opposed to object files). My patch was using this information to avoid emitting DWARF information for empty sections. Because of this, I have modified my patch to emit DWARF for all sections.

Could you pull the patch setting the dwarf version for the compile unit out into a separate patch? It seems pretty easily separable.

Some additional comments inline as well.

Thanks for all of the work here, it's shaping up nicely!

-eric

include/llvm/MC/MCContext.h
138

Do you mean debug_aranges or debug_ranges? You use aranges here and then DW_AT_ranges below.

386

Function names:

http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

Also these functions appear to be a bit heavyweight to be defined inline. Bring them out of line?

lib/MC/MCDwarf.cpp
514

Is the argument necessary or can you use the same lookup you do in other places?

523–528

Same comment on ranges as below.

690

You still need a DW_AT_low_pc as the offset for the for the ranges section. Take a look at the code that does this for compile units in DwarfDebug.cpp (as of right now line 986) for a bit of an explanation.

800

I don't think we're likely to ever emit more than one range list for assembly files are we?

tools/llvm-mc/llvm-mc.cpp
157 ↗(On Diff #8304)

3 is a bit odd in that it's not the same as the defaults for any target :)

General comments:

Test coverage:

  1. a test case with more than one section, but only one section with instructions (it'd be nice for this not to use ranges, but just high_pc/low_pc)
  2. a test case with more than one section, but at least two sections with instructions and at least one section without (just to make sure we can omit those sections while still including a ranges section with the other sections)

The refactoring into a single data structure containing a mapping from section to start/end symbols would be nice.

Probably a bridge too far, but this starts to look almost, sort of, like what we'd like to have for source level debugging handling too. It might be nice to consider a generic range list representation that can handle all the ranges and emission. Though source level debugging needs the concept of multiple ranges even in a single section, which is an added complication compared to this code.

include/llvm/MC/MCContext.h
139

Having 3 data structures all with the same key and related values is a bit of a pity. Could we get a struct (or even just a pair) of start/end symbol and have one map to that pair?

I assume "GenDwarfSections" is then just the key set from that map, right? So you don't need a separate data structure for that as well, you can just use the presence/absence in the map

Also - I assume the iteration order over the keys is important for stability of the output of the ranges section, so this might need to be a MapVector rather than just a DenseMap?

382

With the suggested data structure changes above, these get/set functions probably just turn into one or two functions (two if you need const overloads, one if you don't) that, given an MCSection* returns the pair/structure containing the start and end.

386

I think we can move to using nullptr these days - could just fix this in all places in your patch, but not strictly necessary I suppose.

391

If you do the "hasInstructions" check here then you could avoid creating end symbols and just remove the elements from the container entirely - that would address the issue of creating ranges with only one range later on, right?

lib/MC/MCDwarf.cpp
573

Perhaps rather than creating end labels, etc, for sections that don't have any instructions anyway, and using another data structure, we could just remove the element from the Sections set/map immediately - then iterate the map later on?

700

Would this break on an empty asm file (or one with only data, perhaps?)?

811

This SwitchSection call seems bogus/dangerous - if we ever leave the loop via the continue on the next line, then we'll emit the range terminator into the section we were just looking at, not the range section.

812

So this code would result in us possibly emitting a ranges section with only one range/section, right? (if there were multiple sections but only one turns out to have instructions)

That can probably be tidied up later, though.

852

Again, seems easier to just remove the sections that aren't relevant (& I'm not sure why this pass and the other one that checks for instructions and builds another data structure and closes ranges is separate - perhaps once the sections are just removed there won't be a need for two passes)

858

Seems like this could go in separately - if we have null begin/end data today (when we track only one range) we should do the same thing, right? Be good to just add that check (then rephrase it to the empty test here in the following commit) and test case first, possibly, maybe...

lib/MC/MCParser/ELFAsmParser.cpp
553

Rather than doing a count and then an insertion, you should just be able to insert and use the bool in the returned iterator/bool pair to determine if this is the first time the section has been added to the list.

557

What does GCC do here? (silently fall back to just describing the .text section?)

560

Shouldn't this condition be the same as the one above (ie: if the element isn't in the sections list it also doesn't have a start sym?) Seems unnecessary to do two lookups. (& based on the suggested data structure change this would be one insertion, check whether the insertion succeeded (because the element wasn't already present), then provide the necessary error and/or create the start symbol)

pair = sections.insert(make_pair(section, range()));
if (pair.second) {

if (dwarf <= 2 && sections > 1)
  error
start = createTempSymbol
emit(start);
pair.first->start = start;

}

Something like that anyway.

olista01 updated this revision to Unknown Object (????).Apr 10 2014, 8:24 AM

You still need a DW_AT_low_pc as the offset for the for the ranges section.

I don't think we need DW_AT_low_pc when the input is assembly, as it is optional in the DWARF spec and we do not emit any debug info which could use it.

The two reasons to include DW_AT_low_pc as well as DW_AT_ranges are:

  • Specifying the default base address for the .debug_ranges section. We do not need this as we have a base address entry for each section in .debug_ranges, so the default never gets used.
  • Specifying the default base address for the .debug_loc section. We do not need this as we never emit the .debug_loc section for an assembly file.

What does GCC do here?

GCC seems to do the same as I am doing:

  • If no sections contain code, don't emit any debug info (not even .debug_abbrev)
  • If exactly one section contains code, use low_pc and high_pc
  • Otherwise, use .debug_ranges

Looks pretty good. I'd split the DwarfVersion out of this commit and go ahead and put that in separately.

The rest of the patch looks good outside of the comments and the discussion about address base that is going on.

Thanks for all of the work!

lib/MC/MCContext.cpp
32

Extraneous whitespace addition.

342

Comment what's going on in the function.

345

auto?

lib/MC/MCDwarf.cpp
657

This can go in separately yes?

lib/MC/MCParser/ELFAsmParser.cpp
552

Think this can probably go in separately along with the refactor you've done for the section handling. If it doesn't make sense don't worry about it.

olista01 updated this revision to Unknown Object (????).Apr 16 2014, 5:49 AM

I have split out the setting of the dwarf version into D3399.

As far as the Address Base:

That's weird. It's strictly larger to do it this way and using an offset of 0 is the same effect for this case and what gcc does if you _don't_ just assemble it. (since, of course, it's actually gas making this decision).

So I don't see what using an offset gets us in this case versus using DW_AT_low_pc other than bit identical output to gas.

Sorry, I let this patch slip off my radar, but I'm still interested in progressing it.

So I don't see what using an offset gets us in this case versus using DW_AT_low_pc other than bit identical output to gas.

If I understand it correctly, you are suggesting using the DW_AT_low_pc as the base address for the first range, and a base address selection entry to set the base address of each subsequent range (each range covers one code section). This seems like unnecessary complexity for the sake of 4 bytes. Also, the size of the abbrev will be increased by adding DW_AT_low_pc, reducing the size difference.

olista01 closed this revision.Jun 19 2014, 9:01 AM

Thanks, committed revision 211273.

dblaikie edited edge metadata.Sep 8 2014, 8:12 AM

Sounds like this message caused a bug:
http://llvm.org/bugs/show_bug.cgi?id=20879

Via Old World