This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Process DBG_VALUE_LIST in LiveDebugValues
ClosedPublic

Authored by StephenTozer on Jul 15 2020, 10:58 AM.

Details

Summary

Continuing the work discussed in the RFC[0], this patch implements DBG_VALUE_LIST handling in the LiveDebugValues pass.

This is without question the largest change to a single pass in terms of both line count and complexity, largely due to the complexity of the pass and the extent to which it is currently coupled with single machine location variables. Although the line count for this change is high, the logical changes to the patch derive largely from a few basic changes.

First and foremost, the VarLocSet+VarLocMap model has changed:

Previously, VarLocMap acted as a simple bidirectional map between machine locations and VarLocs. Each VarLoc would be assigned a single unique LocIndex by VarLocMap, which could be used to look up that VarLoc in the map. This coupled with VarLocSet, which was a set of LocIndexes optimized for range lookups (using the CoalescingBitVector class). Due to how LocIndex was structured, this allowed VarLocSet to be used to look up every LocIndex associated with a given machine location - either a specific register, or one of several shared buckets for spill locations, entry values, entry value backups, or constants (undef or immediate). VarLocSet also supports all other basic set operations, and was used to track all live VarLocs at a given point.

In order to support DBG_VALUE_LIST, a few changes have been made to this model. VarLocMap no longer provides a single unique LocIndex for a VarLoc, but a set of them corresponding to each machine location used; a single VarLoc will still never have multiple indices in a single bucket, and every LocIndex of a VarLoc must be inserted into any associated VarLocSet. Any one of these indices can be used by itself to look up the VarLoc they are assigned to. Within this index set, every VarLoc has an index in the "constant" bucket; the purpose of this is to provide a means of looking up every current VarLoc in VarLocSet.
VarLocSet is no longer strictly a "set of VarLocs", but a set of LocIndexes (this probably warrants a name change), as each VarLoc may have multiple LocIndexes inserted into the VarLocSet. Functionally, this means that VarLocSet is used slightly differently; the methods for looking up every VarLoc that uses a given bucket (a specific register, spill location, entry value, or entry value backup) is unchanged. When looking up every active VarLoc, instead of fetching the entire set we simply fetch every LocIndex with Location=0 (the "constant" bucket), as every VarLoc appears exactly once in there. While these changes make up a significant bulk of the line count, there are few resulting changes outside of the functions that directly operate on VarLocSets - the signatures of these functions are mostly unchanged.

The other major change is the VarLoc class itself. This mostly consists of cleaning up its interface to work with multiple machine locations, as well as separating the old "Kind" enum into MachineLocKind (for machine locations) and EntryValueKind (describing the VarLoc as a whole). The majority of these changes are simply using more specific functions to do things that didn't need them before, such as replacing if (VarLoc.getSpillLoc()) with if (VarLoc.containsSpillLocs()), or if (VarLoc.getReg() == RegNo) with if (VarLoc.usesReg(RegNo).

[0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139376.html

Diff Detail

Event Timeline

StephenTozer created this revision.Jul 15 2020, 10:58 AM

Thanks for doing this!

Can you please update the top-level comment of the file, so we have the whole picture of the change?

llvm/lib/CodeGen/LiveDebugValues.cpp
263 ↗(On Diff #278233)

In stands for Indices here? I'd change that in order not to mix up terms with the In/Out...

267 ↗(On Diff #278233)

This should have more intuitive name, I guess...

308 ↗(On Diff #278233)

I don't understand why we need new Kind for the EntryValue here.

322 ↗(On Diff #278233)

why typedef here?

331 ↗(On Diff #278233)

Please add a comment here by describing the purpose of the structure.

1579 ↗(On Diff #278233)

This should be deleted?

1719 ↗(On Diff #278233)

We have declared this multiple times, therefore, can we declare it just once out of the methods?

llvm/test/DebugInfo/MIR/X86/dvl-livedebugvalues-clobber.mir
74

We don't need all the MF attributes. Most of them could be deleted (applies to all the tests).

StephenTozer marked 3 inline comments as done.Jul 17 2020, 7:49 AM
StephenTozer added inline comments.
llvm/lib/CodeGen/LiveDebugValues.cpp
263 ↗(On Diff #278233)

In this case, "In" just means "In", as in the VarLocs that are in a given range. I think there could be a better name for this as well as some of the other classes here; it might even be better to make them into full classes rather than just named sets/vectors. I'm not sure whether doing so would make it easier to understand what's going on, or just make it harder to compare to the previous behaviour.

308 ↗(On Diff #278233)

After completing the overall work I found the separation isn't strictly necessary, as entry values currently only ever consist of a single register. The logical distinction is that a complete VarLoc is or is not an EntryValue; a machine location doesn't have EntryValue-ness. Still: at the moment we lose no information by combining them, although if at any point we expand what is permissible for an entry value they should be separated.

322 ↗(On Diff #278233)

This union type is useful to have at the scope of VarLoc, since several VarLoc methods use variables with these values. The best alternative I think would be to move it into MachineLoc; regardless, having it as a typedef instead of an anonymous union is convenient in the functions below where it is used.

djtodoro added inline comments.Jul 20 2020, 1:13 AM
llvm/lib/CodeGen/LiveDebugValues.cpp
263 ↗(On Diff #278233)

OK. I thinks In/Out are well known already here, so I think we don't have to worry about that. We might want to update the comment above to remove any ambiguity.

308 ↗(On Diff #278233)

That makes sense then. There are targets where the entry value would be in multiple regs, but I am not sure we can face it with the current support. I am OK with the separation.

Remove unused data from tests, add/modify comments, move VarVec declaration up, fix error in duplicated-op-removal.

@StephenTozer Thanks for addressing the comments! Can you please share with us if you have any measurements with the respect to build time increase (e.g. on clang build itself) with this patch? I think that this patch is the only one from the stack that could give us potential performance issues.

  • Fixed a few minor errors, including a linux compile error
  • Rebased onto latest master, moving changes into VarLocBasedImpl.cpp.

Rebased onto recent master, no changes.

StephenTozer added a comment.EditedDec 8 2020, 9:48 AM

@StephenTozer Thanks for addressing the comments! Can you please share with us if you have any measurements with the respect to build time increase (e.g. on clang build itself) with this patch? I think that this patch is the only one from the stack that could give us potential performance issues.

I mentioned this in a comment on another review, but just to make sure the answer is available here: there doesn't appear to be a large compile time increase as a result of this change. I haven't yet measured LiveDebugValues specifically, but for the samples in the CTMark projects in the LLVM test suite the average compile time increase currently appears to be 0.5% at -O2 -g. At this point it's hard to say how much of this is down to LiveDebugValues, but it is probably the most significant pass in that regard.

Rebased onto recent master.

jmorse added a subscriber: jmorse.Jan 22 2021, 7:51 AM

Paging @vsk -- it was a while ago, but do you still have the bitcode from D7406 handy, and would you have spare cycles to test it against this (combined patch in D94631)?

Taking a high level view: because variadic variable locations can be part of more than one machine location, here @StephenTozer is adding a VarLoc to each machine location the variable location uses, and keeping an index of all VarLocs in LocIndex=0, so that a "cannonical" VarLoc can be looked up.

I'm not concerned about the average performance, as the cost of extra VarLocs will be proportionate to the number of variadic variable locations, which will be small (at least initially). However, keeping everything in a single index risks defeating the CoalescingBitVector improvements by fragmenting the collection of set bits more than it was before, increasing pointer chasing. Hence it'd be good to test against the original motivating example, if possible.

vsk added a comment.EditedJan 27 2021, 8:54 AM

Paging @vsk -- it was a while ago, but do you still have the bitcode from D7406 handy, and would you have spare cycles to test it against this (combined patch in D94631)?

Taking a high level view: because variadic variable locations can be part of more than one machine location, here @StephenTozer is adding a VarLoc to each machine location the variable location uses, and keeping an index of all VarLocs in LocIndex=0, so that a "cannonical" VarLoc can be looked up.

I'm not concerned about the average performance, as the cost of extra VarLocs will be proportionate to the number of variadic variable locations, which will be small (at least initially). However, keeping everything in a single index risks defeating the CoalescingBitVector improvements by fragmenting the collection of set bits more than it was before, increasing pointer chasing. Hence it'd be good to test against the original motivating example, if possible.

I do still have the bitcode handy, but wasn't able to apply D94631 cleanly to my checkout. Happy to report results back, just let me know which hash of llvm-project to apply D94631 onto.

Edit: FTR here's what I see with top-of-tree llc --

 % ./bin/llc -O3 glyphbench-metarenamed.bc -o /dev/null -filetype=obj -time-passes
===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 412.5068 seconds (484.1841 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  202.0015 ( 49.3%)   0.2855 ( 10.7%)  202.2870 ( 49.0%)  203.3913 ( 42.0%)  Machine Common Subexpression Elimination
  46.3881 ( 11.3%)   0.3938 ( 14.7%)  46.7819 ( 11.3%)  87.3571 ( 18.0%)  X86 DAG->DAG Instruction Selection
  46.3634 ( 11.3%)   0.1293 (  4.8%)  46.4927 ( 11.3%)  75.9819 ( 15.7%)  Merge disjoint stack slots
  72.4355 ( 17.7%)   0.3826 ( 14.3%)  72.8180 ( 17.7%)  73.1227 ( 15.1%)  Greedy Register Allocator
  25.1784 (  6.1%)   0.0254 (  1.0%)  25.2038 (  6.1%)  25.3227 (  5.2%)  Machine Instruction Scheduler
   6.6854 (  1.6%)   1.2851 ( 48.0%)   7.9705 (  1.9%)   8.0105 (  1.7%)  X86 Assembly Printer
   5.2662 (  1.3%)   0.0256 (  1.0%)   5.2918 (  1.3%)   5.3122 (  1.1%)  Debug Variable Analysis
   1.1159 (  0.3%)   0.0010 (  0.0%)   1.1169 (  0.3%)   1.1253 (  0.2%)  X86 Execution Dependency Fix
   0.7744 (  0.2%)   0.0106 (  0.4%)   0.7849 (  0.2%)   0.7908 (  0.2%)  Live DEBUG_VALUE analysis
   0.3617 (  0.1%)   0.0016 (  0.1%)   0.3633 (  0.1%)   0.3672 (  0.1%)  Virtual Register Rewriter
   0.2488 (  0.1%)   0.0050 (  0.2%)   0.2538 (  0.1%)   0.2539 (  0.1%)  Module Verifier #2
   0.2426 (  0.1%)   0.0083 (  0.3%)   0.2508 (  0.1%)   0.2508 (  0.1%)  Module Verifier
   0.2184 (  0.1%)   0.0081 (  0.3%)   0.2265 (  0.1%)   0.2265 (  0.0%)  Live Variable Analysis
[snip]
In D83890#2525566, @vsk wrote:

I do still have the bitcode handy, but wasn't able to apply D94631 cleanly to my checkout. Happy to report results back, just let me know which hash of llvm-project to apply D94631 onto.

The current version of the patch can be applied to 993c488ed, although I'll be updating it tomorrow onto the latest master, plus some bug fixes (also, D95463 is necessary for some source files to prevent an immense excess of debug values). I'd be very interested to see the results with just what we have now, however.

Finally, assuming that the bitcode you have is already optimized, we won't see the full impact of this change with it: the major feature being added is the support for DBG_VALUEs that reference more than one machine location (DBG_VALUE_LISTs), which makes the LiveDebugValues analysis more complicated. These DBG_VALUE_LISTs can currently only be produced when salvaging debug values in opt passes; with pre-optimized bitcode, the resulting MIR won't have any DBG_VALUE_LISTs and so won't give an indication of how well we handle the new instruction. However, it would still be very valuable just to know what impact this change has on simple DBG_VALUEs - preventing a regression here is vital.

@vsk Just pushed an update onto aforementioned patch bringing it up to 30b8f553 and fixing some small bugs; it does not conflict with the latest master (5d3dca24 as of this comment), but I've not finished building and testing yet, so the former revision should be fine.

jmorse added a comment.EditedFeb 1 2021, 10:38 AM

I'm going to tentatively "LGTM" this -- I've read through it a couple of times and think it works, it'd be great if someone else could chip in. There are various things that aren't ideal, but that stems from having to track things across multiple entries. It's encouraging that @vsk's benchmark didn't show a regression. I've added some minor comments inline.

I'd highly recommend giving the "location zero" index a fixed number, like kEntryValueBackupLocation, to identify all the parts of LiveDebugValues that rely on it.

There's a small amount of code to cope with scenarios where a DBG_VALUE_LIST has duplicate operands; if we're going to support this, there'll need to be a test that this is properly handled.

Could I also ask for a test featuring a DBG_VALUE_LIST with a $noreg operand, just to check that doesn't propagate any further.

llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
233–1

Comment indicating these are only meaningful for NonEntryValueKind VarLocs?

233–1

std::tie for consistency?

233–1

This and nearby reformatting looks to be a no-op refactor, yes? IMO best to remove it, or add it as a follow-up, to avoid further review / audit burden.

233–1

Explanation of the magic zero would help, "Iterate over all VarLocs -- they will all have an entry at location zero"?

234

I worry that this will hit "unused type / name" warnings in later compilers, can we guard it with NDEBUG, like its uses?

234

Is now "later"? If this needs to stay it, better to document it and what its limitation is (reduces bus factor)

236

IMHO having a default location-bucket here is undesirable: the KillSet is meaningless without it, and it raises the possibility of accidentally clearing VarLocs from location-bucket zero, when a different location-bucket should have been used.

237

assertion for that? Strikes me as fragile otherwise.

239

I'm not sure what "true set" means here

241

Register rather than uint32_t?

277

nit: I would say transformed, as VarLocs themselves should be constant.

279–281

As I understand it, this is to cope with duplicate locations from a MachineInstr, yes? If so, that should go in the comment.

Rebased onto latest master, address review comments.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 9 2021, 10:59 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
nikic added a subscriber: nikic.Mar 9 2021, 1:12 PM

It looks like this is a significant compile-time regression for optimized builds with debug info: https://llvm-compile-time-tracker.com/compare.php?from=b4825a6d9c18bdb3e241d709f2f76573aba9f91b&to=e2196ddcdbf13aa1051e0150036e103d23a03f13&stat=instructions SPASS.link is up 3.6%.

StephenTozer added a comment.EditedMar 10 2021, 4:29 AM

It looks like this is a significant compile-time regression for optimized builds with debug info: https://llvm-compile-time-tracker.com/compare.php?from=b4825a6d9c18bdb3e241d709f2f76573aba9f91b&to=e2196ddcdbf13aa1051e0150036e103d23a03f13&stat=instructions SPASS.link is up 3.6%.

Some compile-time regression was expected to take place as a result of this; the LiveDebugValues implementation was heavily optimized, and this change necessarily breaks part of that optimization. With that said, performance tests of this patch prior to merging did not give results that were this dramatic; total compile times for SPASS were closer to 1% than 3%. This is still a significant increase in compile time, but proportional to the expected increase in variable coverage (in aggregate). It is possible to optimize the LiveDebugValues implementation further however - this implementation represents a first pass attempt to allow LiveDebugValues to handle variables with more complex location expressions, but further optimizations can be made in subsequent patches.