Page MenuHomePhabricator

[NFC] Replace a linked list in LiveDebugVariables pass with a DenseMap

Authored by Orlando on Oct 10 2019, 10:10 AM.



This patch improves asan self-host build times slightly.

Self-host build times with and without this patch
With this patch applied | Build type           | CPU time (mins)
No                      | RelWithDebInfo, asan | 226
Yes                     | RelWithDebInfo, asan | 218 (-3.5%)

In LiveDebugVariables.cpp:
Prior to this patch, UserValues were grouped into linked list chains. Each
chain was the union of two sets: { A: Matching Source variable } or
{ B: Matching virtual register }. A ptr to the heads (or 'leaders')
of each of these chains were kept in a map with the { Source variable } used
as the key (set A predicate) and another with { Virtual register } as key
(set B predicate).

There was a search through the chains in the function getUserValue looking for
UserValues with matching { Source variable, Complex expression, Inlined-at
location }. Essentially searching for a subset of A through two interleaved
linked lists of set A and B. Importantly, by design, the subset will only
contain one or zero elements here. That is to say a UserValue can be uniquely
identified by the tuple { Source variable, Complex expression, Inlined-at
location } if it exists.

This patch removes the linked list and instead uses a DenseMap to map
the tuple { Source variable, Complex expression, Inlined-at location }
to UserValue ptrs so that the getUserValue search predicate is this map key.
The virtual register map now maps a vreg to a SmallVector<UserVal *> so that
set B is still available for quick searches.

Diff Detail

Event Timeline

Orlando created this revision.Oct 10 2019, 10:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 10 2019, 10:10 AM

The numbers are much better than I expected but I haven't been able to prove
myself wrong after a bunch of testing. So, I've put this patch up for the
community's expert eyes.

This is a really nice performance win and the code looks nicer, too.



/// The debug info variable we are part of.
 const DILocalVariable *Variable;

FYI. If you don't want to implement all this, you can also inherit from std::pair<std::pair<>>. (With a single pair this is a more obvious win.)

aprantl added inline comments.Oct 10 2019, 10:27 AM

Not your code, but: add a FragmentInfo element?


How many elements does this have on average? Is a SmallDenseMap a win?

bjope added a subscriber: bjope.Oct 10 2019, 10:32 AM
dblaikie added inline comments.Oct 10 2019, 11:43 AM

Drop the inline keyword here and above - the linkage is implied by the definition being provided inside a class definition.

Orlando updated this revision to Diff 224579.EditedOct 11 2019, 5:53 AM
Orlando edited the summary of this revision. (Show Details)

Addressed comments and fixed Summary (previously it referred to
LiveDebugValues.cpp instead of LiveDebugVariables.cpp).


Not your code, but: add a FragmentInfo element?

As you point out in D66415 we need to take care to track overlapping fragment
ranges too. This shouldn't be too hard but it makes more sense to me to rebase
D66415 on this and work from there.

How many elements does this have on average? Is a SmallDenseMap a win?

The following numbers are only anecdotal of course. I compiled two example
single file projects with -g -O2 (A: ~100,000 loc, B: ~20,000 loc) and in both
cases roughly 90% of UserVarMap.size() were <= 64.

Project A: UserVarMap size
Size <= x | Num maps | % of total maps
0         | 697      | 50
2         | 770      | 55.2
4         | 797      | 57.2
8         | 858      | 61.5
16        | 969      | 69.5
32        | 1109     | 79.6
64        | 1221     | 87.6
128       | 1311     | 94.0
256       | 1361     | 97.6
512       | 1379     | 98.9
1024      | 1388     | 99.6

Project B: UserVarMap size
Size <= x | Num maps | % of total maps
0         | 296      | 50.3
2         | 306      | 52.0
4         | 333      | 56.6
8         | 364      | 61.9
16        | 397      | 67.5
32        | 477      | 81.1
64        | 530      | 90.1
128       | 572      | 97.3
256       | 582      | 99.0
512       | 587	     | 99.8
1024      | 588      | 100

I don't have any self-host build time stats for this yet but he default DenseMap
64 element alloc seems almost perfect for these examples. Assuming a
SmallDenseMap allocates space in the object (like SmallVector) and given that
LDVImpl seems to only ever be heap allocated, a SmallDenseMap<..., 64> would
likely give some (small) build time reduction. I'll add more info here
when I can get some self-host build times.

aprantl accepted this revision.Oct 11 2019, 3:02 PM

This is good to land with whatever conclusions we draw from the SmallDenseMap experiment. Thanks!

This revision is now accepted and ready to land.Oct 11 2019, 3:02 PM
This revision was automatically updated to reflect the committed changes.
Orlando marked 4 inline comments as done.
Orlando edited the summary of this revision. (Show Details)Oct 16 2019, 1:39 AM
Orlando added a comment.EditedOct 16 2019, 1:42 AM

After timing some more self-hosts it looks like my original numbers which looked
too good to be true sadly are just that. The self-host asan build time reduction
seems correct (-3.5%) but the non-asan builds are much less impacted then first
thought (about -1%). These numbers are more in line with my original expectation.

After doing some more self-host builds and building a smaller llvm-based project
16 times it seems like the SmallDenseMap makes essentially no difference.

bjope added a subscriber: uabelho.Oct 29 2019, 3:10 AM
bjope added inline comments.

The mapVirtReg call below adds elements to the DenseMap. When a DenseMap grows/shrinks it might be re-allocated. So the pointer to the SmallVector (stored inside the map elements), called UserVals here, may be a dangling pointer after the first call to mapVirtReg and the vector it points to may have been moved.

So the iteration over *UserVals here is not safe, and asan complains about accessing deallocated memory.

A quick workaround that I'm trying downstream is to copy the vector like this:

diff --git a/llvm/lib/CodeGen/LiveDebugVariables.cpp b/llvm/lib/CodeGen/LiveDebugVariables.cpp
index 6032bfe48af..4659ebea254 100644
--- a/llvm/lib/CodeGen/LiveDebugVariables.cpp
+++ b/llvm/lib/CodeGen/LiveDebugVariables.cpp
@@ -1144,10 +1144,16 @@ void LDVImpl::splitRegister(unsigned OldReg, ArrayRef<unsigned> NewRegs) {
   // Map all of the new virtual registers.
-  if (auto *UserVals = lookupVirtReg(OldReg))
-    for (auto *UV : *UserVals)
+  if (auto *UserVals = lookupVirtReg(OldReg)) {
+    // Make a copy of the vector. This is needed because VirtRegToUserVals is
+    // updated in the loop below, so the pointer into the VirtRegToUserVals
+    // DenseMap may become invalid in case the map is reallocated.
+    SmallVector<UserValue *, 4> UserValsCopy;
+    UserValsCopy.assign(UserVals->begin(), UserVals->end());
+    for (auto *UV : UserValsCopy)
       for (unsigned i = 0; i != NewRegs.size(); ++i)
         mapVirtReg(NewRegs[i], UV);
+  }

No idea if this impacts the performance negatively (considering that the idea with the refactoring into using a DenseMap seem to be to improve compile speed maybe there are better solutions). But right now we the code is broken so I guess we need to submit a workaround or a revert.

Hi @bjope thank you for pointing this out and providing a workaround. I'll test out your change and investigate further.

I seem to have lost commit access in the move over to GitHub - thanks @andreadb for reverting this (67720e7bf7df).

Orlando marked an inline comment as done.Oct 31 2019, 4:15 AM
Orlando added a subscriber: jmorse.
Orlando added inline comments.

A quick workaround that I'm trying downstream is to copy the vector like this

Replacing the DenseMap LDVImpl::VirtRegToUserVals uses with a std::unordered_map looks like it should also work.

I started work on this patch because @jmorse found that during an asan build LDV consumed ~30% CPU time when building AMDGPUDisassembler.cpp. This worst-case is reduced to <3% for me with my faulty patch and there is no noticeable difference when adding either of these changes. I'll look at the build time impact of these two approaches on self host builds when I get the chance but I'm leaning towards the unordered_map because it makes it harder to run into this again.

Thanks again for the heads up!