This is an archive of the discontinued LLVM Phabricator instance.

Remove a std::map and std::set that show up in LLD profiles
ClosedPublic

Authored by rnk on Nov 3 2017, 11:33 AM.

Details

Summary

For GC roots, add a bit to SymbolBody to ensure that we don't add the
same root twice, and switch to a vector. This is an improvement, since
we iterate the GCRoot list later and it the order should be
deterministic.

For fixupExports, we can just use DenseMap. This is a simple string
uniquing task, and we don't iterate the map.

Diff Detail

Repository
rL LLVM

Event Timeline

rnk created this revision.Nov 3 2017, 11:33 AM
ruiu edited edge metadata.Nov 3 2017, 11:53 AM

Is Config->GCRoot that slow? If so, if you replace it with SetVector, is it still slow?

rnk added a comment.Nov 3 2017, 3:18 PM
In D39609#915412, @ruiu wrote:

Is Config->GCRoot that slow? If so, if you replace it with SetVector, is it still slow?

We spend 350ms out of 52s in the GCRoot->insert call, so I guess 0.6%.

I suspect that the majority of this comes from this loop over exports:

// Windows specific -- Make sure we resolve all dllexported symbols.
for (Export &E : Config->Exports) {
  if (!E.ForwardTo.empty())
    continue;
  E.Sym = addUndefined(E.Name);
  if (!E.Directives)
    Symtab->mangleMaybe(E.Sym);
}

Consider that there are many duplicate /export: directives in each object file. Therefore, most calls to addUndefined will be for symbols that we already have. The code I'm changing does one lookup of the string (this is necessary) and then a set lookup of the SymbolBody pointer. That set lookup is completely unnecessary, though, if we add the GCRoot bit to SymbolBody.

I haven't tried SetVector, I don't think the performance difference will be measurable. My goal isn't to compare containers, it's to eliminate the pointer set lookup completely.

ruiu added a comment.Nov 3 2017, 3:20 PM

If you can get a similar speedup using SetVector, you don't need to add a new member to Symbol, so I think it is worth a try.

rnk added a comment.Nov 9 2017, 6:07 PM
In D39609#915677, @ruiu wrote:

If you can get a similar speedup using SetVector, you don't need to add a new member to Symbol, so I think it is worth a try.

I switched to SetVector, and profiling still shows 46 samples in DenseMap insertion. It's a big improvement over 391 samples in binary tree insertion, but I don't see any reason to leave these samples on the table when we can replace it with a simple bit. This will save a DenseMap of pointers to all the exported symbols, so it seems worth it to me, assuming that there are many exports. I attached screenshots of the profiles.

ruiu accepted this revision.Nov 13 2017, 2:13 AM

LGTM

This revision is now accepted and ready to land.Nov 13 2017, 2:13 AM
This revision was automatically updated to reflect the committed changes.