This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Generate DWARF v5 Accelerator Tables
ClosedPublic

Authored by labath on Feb 14 2018, 5:20 AM.

Details

Summary
This patch adds a DwarfAccelTableEmitter class, which generates an
accelerator table, as specified in DWARF v5 standard. At the moment it
only generates a DIE offset column and (if we are indexing more than one
compile unit) a CU column.

Indexing type units is not currently supported, as we don't even have
the ability to generate DWARF v5-compatible compile units.

The implementation is not data-source agnostic like the one generating
apple tables. This was not necessary as we currently only have one user
of this code, and without a second user it was not obvious to me how to
best abstract this. (The difference between these tables and the apple
ones is that they need a lot more metadata about the debug info they are
indexing).

The generation is triggered by the --dwarf-accel-tables argument, which
I've changed from a simple on/off switch into an enum, so one can choose
what kind of tables to generate.

This is tested by parsing the generated tables with llvm-dwarfdump and
I've also verified that GNU readelf is able to make sense of them. I
intend to do more testing when I implement the DWARF verifier for these
tables.

Diff Detail

Repository
rL LLVM

Event Timeline

labath created this revision.Feb 14 2018, 5:20 AM
labath updated this revision to Diff 137340.Mar 7 2018, 2:36 AM
  • add tests (one of them is fairly long, so I'm not sure if it's worth it)
  • add glue to generate proper sections
  • extend -dwarf-accel-tables option to enable selection between different accelerator tables

I think this should be ready for the first round of proper review.

labath retitled this revision from WIP: .debug_names generation to [CodeGen] Generate DWARF v5 Accelerator Tables.Mar 7 2018, 2:38 AM
labath edited the summary of this revision. (Show Details)
labath added reviewers: aprantl, probinson, dblaikie.
labath added subscribers: vleschuk, clayborg, echristo.

This is looking very good. A few small comment inline but I don't expect this will need a lot of additional work.

include/llvm/MC/MCObjectFileInfo.h
258 ↗(On Diff #137340)

Obsolete now we have the "real" thing? :-)

lib/CodeGen/AsmPrinter/AccelTable.cpp
15 ↗(On Diff #137340)

should be on top (clang-format should do this for you IIRC)

193 ↗(On Diff #137340)

Hmm, we should probably move this elsewhere and have Cmake generate it for us? This seems hard to maintain and quite a burden, wouldn't you agree?

233 ↗(On Diff #137340)

It seems odd that we sink the DwarfCompileUnits, why's that?

362 ↗(On Diff #137340)

This is different from the comments we currently generate for the Apple table header where we have "Header Version". I'm fine with either (or maybe I slightly prefer this). Reminder to self that I should unify this in a follow up commit.

lib/CodeGen/AsmPrinter/DwarfDebug.cpp
297 ↗(On Diff #137340)

This doesn't seem right, if you specify apple explicitly you'll end up with DWARF? Shouldn't we just omit the else?

Thanks for working on this!

include/llvm/CodeGen/AccelTable.h
249 ↗(On Diff #137340)

Perhaps add a small doxygen comment to the class?

255 ↗(On Diff #137340)

Perhaps use LLVM_DUMP_METHOD instead?

include/llvm/MC/MCObjectFileInfo.h
258 ↗(On Diff #137340)

That comment seems to be out-of-date now.

lib/CodeGen/AsmPrinter/AccelTable.cpp
181 ↗(On Diff #137340)

Small doxygen comment ?

193 ↗(On Diff #137340)

What does the '0700' derive from?

358 ↗(On Diff #137340)

Maybe this gets too long, but it might be nice to say "DWARF v5 Accel Header" instead of just "Header"

403 ↗(On Diff #137340)

should this be factored out?

408 ↗(On Diff #137340)

This should probably be a hard error rather than an assertion. We don't want the backend to be exploitable by malformed input more than necessary :-)

469 ↗(On Diff #137340)

Don't we have this somewhere else already and if not, should it be factored into a common utility function?

501 ↗(On Diff #137340)

Again, if it is possible to craft input that triggers this unreachable it should be a real error instead.

lib/CodeGen/AsmPrinter/DwarfDebug.h
195 ↗(On Diff #137340)

Doxygen comment explaining what these enumerators mean?

test/DebugInfo/Generic/debug-names-hash-collisions.ll
30 ↗(On Diff #137340)

CHECK-NEXT?

labath marked an inline comment as done.Mar 7 2018, 8:58 AM

I responded as many comments as I could to get another roundtrip before tomorrow morning. I will respond to the rest tomorrow.

include/llvm/CodeGen/AccelTable.h
255 ↗(On Diff #137340)

You mean, instead of #ifndef NDEBUG ?
I can, but this method is declared abstract this way in the base class, so I'd need to change the whole hierarchy.

lib/CodeGen/AsmPrinter/AccelTable.cpp
193 ↗(On Diff #137340)

If we always want to put the current version here, then that would definitely be a good thing to do. However, it wasn't clear to me whether we want to do that, or only bump the version when we make a substantial change in the information we generate (I hardcoded it because I assumed the latter). However, maybe we shouldn't even follow the llvm versions, but put some completely independent strings here ? I'm also starting to think the format should be more human-readable (e.g. "LLVM 7.0.0" instead of "LLVM0700").

Basically, I have no idea what would be a good thing to put here, because I don't know how (or if) will we end up using it. I am open to any suggestions...

233 ↗(On Diff #137340)

What do you mean by "sink" ? I am not taking the ownership of the compile units.. I couldn't even do that as the (non-mutable) ArrayRef prevents me from modifying the unique_ptrs in the list.

358 ↗(On Diff #137340)

Yeah, I think that would be too long. When you look at the generated assembly, it should be fairly obvious that you are in the .debug_names section.

408 ↗(On Diff #137340)

How do I do a "hard error"? (Also, technically, I don't think this should lead to UB, only to truncation of certain values (and generation of broken tables).)

469 ↗(On Diff #137340)

We have something in the DebugInfo library, but I haven't seen anything in CodeGen. I'll have another look tomorrow...

501 ↗(On Diff #137340)

This is not triggerable by user input. It just means somebody added a new entry to getUniformAttributes and forgot to update this function.

lib/CodeGen/AsmPrinter/DwarfDebug.cpp
297 ↗(On Diff #137340)

You'll end up with whatever was you specified on the command line. DwarfAccelTables is the cl::opt option. Maybe I should rename it to something else to make it more clear.

labath marked an inline comment as done.Mar 7 2018, 8:59 AM

Thanks for the quick review, BTW. :)

JDevlieghere added inline comments.Mar 7 2018, 9:09 AM
lib/CodeGen/AsmPrinter/AccelTable.cpp
193 ↗(On Diff #137340)

If we only want to change this when making changes to the format we'd probably want to use something other than the compiler version? Also, I'm guessing the last 0 is the null terminator, so that means we cant' change it for minor versions (which probably we shouldn't anyway, but who knows). Counting from zero/one also seems weird.

Anyway I think it's good to have this, just not sure what to put there. I'll sleep on it and get back to you tomorrow :-)

233 ↗(On Diff #137340)

Right, I missed the "Ref" part in ArrayRef!

469 ↗(On Diff #137340)

I've seen similar code quite a bit the last few weeks but always in different parts which made it hard to reuse. Maybe it's time to put something as general as possible in Dwarf.def? The problem is usually that there's no fixed length for (U)LEB128 encoded forms.

lib/CodeGen/AsmPrinter/DwarfDebug.cpp
297 ↗(On Diff #137340)

Yeah, I think that'd be less confusing.

labath updated this revision to Diff 137565.Mar 8 2018, 6:56 AM
labath marked 10 inline comments as done.

Respond to review comments and update code. Each comment should be either
resolved or waiting for more input.

labath added a comment.Mar 8 2018, 8:19 AM

I should also call out that this does not support type units and it does not represent the info that was present in the apple_objc accelerator tables. For the first part, that's mainly because we don't generate dwarf5-compatible type units right now. For the second item, it's because it was not clear to me how the apple_objc table maps to debug_names. It was fairly obvious that the other three tables should be merged into one, but it is not clear to me how to map the objc one, as it is doing something a bit strange: it uses the class name as a lookup key, but it points to DIEs of the individual member functions.

lib/CodeGen/AsmPrinter/AccelTable.cpp
15 ↗(On Diff #137340)

clang-format puts AccelTable.h on top because that's the header this cpp file implements. The reason this looks weird is because DwarfCompileUnit is in the lib folder, so I have to use the local include path. Otherwise, this would be down below with the rest of llvm/CodeGen includes. The other files in this folder have the same style as well.

193 ↗(On Diff #137340)

The last 0 is not the null terminator (we don't need it as we specify the length explicitly). But in any case, there's no upper bound on the augmentation string length. It only needs to be a multiple of four.

403 ↗(On Diff #137340)

I've put the form calculation it into a static function. Or did you mean something more fancy?

408 ↗(On Diff #137340)

I've replaced this with a call to abort(). Does that count as a hard error?
BTW, the reason why we cannot describe more than this many CUs is because of the CompUnitCount field, which is 32-bit (even for DWARF64). Right now we don't generate DWARF64, and for DWARF32 with this many compilation units, I would expect things to break way before we reach this part.

469 ↗(On Diff #137340)

It seems it should be fairly easy to move the DWARFFormParams class and the static DWARFFormValue::getFixedByteSize function somewhere into the BinaryFormat library. Jonas, can you give me pointers to the other places which do form size calculations?

test/DebugInfo/Generic/debug-names-hash-collisions.ll
30 ↗(On Diff #137340)

I've added check-next where it's possible right now (the String: line immediately follows the Hash: line). I can't use it everywhere without adding expectations about the stuff I don't care about for this test. I'm not sure where you were heading with this, but if the issue was to make sure the Buckets don't contain any extra names, i've added some CHECK-NOT lines to prevent that.

JDevlieghere added inline comments.Mar 8 2018, 8:49 AM
lib/CodeGen/AsmPrinter/AccelTable.cpp
15 ↗(On Diff #137340)

You're right, apologies for the noise!

193 ↗(On Diff #137340)

In that case this sounds reasonable, unless someone else has a better suggestion :-)

469 ↗(On Diff #137340)

Looks like it's not as prominent as I thought, or maybe I'm not grep'ing effectively:

  • DIEInteger::SizeOf
  • DIEEntry::SizeOf'
  • DWARFFormValue::getFixedByteSize

Theres's also DW_EH_PE_udata4/DW_EH_PE_sdata4 variants that could probably benefit from a similar function:

  • getSizeForEncoding in MCDwarf
  • DWARFDataExtractor::getEncodedPointer
labath updated this revision to Diff 137732.Mar 9 2018, 5:24 AM
labath marked 14 inline comments as done.
  • use DIEInteger functions to avoid computing form sizes manually
  • add llvm-dwarfdump -verify lines to check the generated output in tests
  • After my adventures on the parsing side (and reading the DWARF 5 form classes section), I've concluded that emitting the DIE offset as a DW_FORM_section_offset is not the right thing to do. The most correct class here seems to be the "reference" class (DW_FORM_refX), so I've changed it to that, although a case could be made for the DW_FORM_dataX forms (which are used by the apple tables) as well. I wish the dwarf standard could be more specific here.
lib/CodeGen/AsmPrinter/AccelTable.cpp
469 ↗(On Diff #137340)

Actually, it looks like the DIEInteger class does exactly what I want here, so I was able to remove both instances of size computation by delegating to it. There is still the opportunity to merge DWARFFormValue::getFixedByteSize and DIEInteger::SizeOf, which I will pursue separately.

After my adventures on the parsing side (and reading the DWARF 5 form classes section), I've concluded that emitting the DIE offset as a DW_FORM_section_offset is not the right thing to do. The most correct class here seems to be the "reference" class (DW_FORM_refX), so I've changed it to that, although a case could be made for the DW_FORM_dataX forms (which are used by the apple tables) as well. I wish the dwarf standard could be more specific here.

If you want to, we could bring this up on the dwarf-discuss mailing list. It will be pretty much the same people as here, but the outcome will have more of a binding character :-)

If you want to, we could bring this up on the dwarf-discuss mailing list. It will be pretty much the same people as here, but the outcome will have more of a binding character :-)

After nearly finishing my email, I've found that the forms actually are actually present in the spec, just in a different table (page 234) than what I was looking at. It confirms that the form class should be "reference", (which is a bit unfortunate as our current reference parsing code assumes you have a DWARFUnit pointer around, but that's not the problem of the spec). So, one less thing to worry about here..

labath marked 5 inline comments as done.Mar 20 2018, 7:06 AM

The way I read this review, there are no major issues with this patch. There are a couple of minor remarks, for which I am waiting for more feedback to see how to go about resolving them. I've added fresh replies to all comments that I think are still unresolved, so you can see what I mean.

Can you please take another look and let me know how to move this forward?

include/llvm/CodeGen/AccelTable.h
255 ↗(On Diff #137340)

If we want to go that way, I can swap ifndef NDEBUG for LLVM_DUMP_METHOD for the whole hierarchy in a separate patch.

lib/CodeGen/AsmPrinter/AccelTable.cpp
408 ↗(On Diff #137340)

As a consequence of using the DIEInteger class, there are no asserts here anymore (it will just automatically emit this field as DW_FORM_data8). However, someone does manage (which I doubt) to create 2^32 compilation units then we will generate broken tables, as the CompUnitCount field will be truncated. I don't think it's necessary, but it may still be interesting to add a call to abort() or similar in this case.

lib/CodeGen/AsmPrinter/DwarfDebug.cpp
297 ↗(On Diff #137340)

I've renamed the command line option to "AccelTables". I'm not sure if that makes things better or worse...

test/DebugInfo/Generic/debug-names-hash-collisions.ll
30 ↗(On Diff #137340)

Is this CHECK-NEXT-ified enough?

labath updated this revision to Diff 139117.Mar 20 2018, 7:07 AM
  • Add a couple of more CHECK-NEXTs to the tests.

I don't have any general objections and I'll defer to @JDevlieghere for any remaining details. Thanks!

labath added a comment.Apr 3 2018, 5:58 AM

Jonas: Could you take one more look at this please?

This revision is now accepted and ready to land.Apr 3 2018, 7:10 AM
labath added a comment.Apr 4 2018, 5:15 AM

Thank you.

This revision was automatically updated to reflect the committed changes.

I've seen the failure. Looks like msvc has a different idea of how enum
types should be referred to. I'll have that fixed shortly.

thakis added a comment.Apr 4 2018, 6:10 AM

It's not just MSVC, see the polly bot.

I tried to fix in 329184, 329186 but unsuccessfully. I've backed this out for now to green up the bots.

labath added a comment.Apr 4 2018, 6:18 AM

It's not just MSVC, see the polly bot.

I tried to fix in 329184, 329186 but unsuccessfully. I've backed this out for now to green up the bots.

Thanks. I see we've another compiler error with MSVC after you've fixed the enum issue. And on one of the arm bots, all of the tests fail for some reason.

I'm going to try to figure out what's going on here.