Page MenuHomePhabricator

[LPM] A targeted but somewhat horrible fix to the legacy pass manager's querying of the pass registry.
ClosedPublic

Authored by chandlerc on Jan 27 2015, 3:41 PM.

Details

Summary

The pass manager relies on the static registry of PassInfo objects to
perform all manner of its functionality. I don't understand why it does
much of this. My very vague understanding is that this registry is
touched both during static initialization *and* while each pass is being
constructed. As a consequence it is hard to make accessing it not
require a acquiring some lock. This lock ends up in the hot path of
setting up, tearing down, and invaliditing analyses in the legacy pass
manager.

On most systems you can observe this as a non-trivial % of the time
spent in 'ninja check-llvm'. However, I haven't really seen it be more
than 1% in extreme cases of compiling more real-world software,
including LTO.

Unfortunately, some of the GPU JITs are seeing this taking essentially
all of their time because they have very small IR running through
a small pass pipeline very many times (at least, this is the vague
understanding I have of it).

This patch tries to minimize the cost of looking up PassInfo objects by
leveraging the fact that the objects themselves are immutable and they
are allocated separately on the heap and so don't have their address
change. It also requires a change I made the last time I tried to debug
this problem which removed the ability to de-register a pass from the
registry. This patch creates a single access path to these objects
inside the PMTopLevelManager which memoizes the result of querying the
registry. This is somewhat gross as I don't really know if
PMTopLevelManager is the *right* place to put it, and I dislike using
a mutable member to memoize things, but it seems to work.

For long-lived pass managers this should completely eliminate
the cost of acquiring locks to look into the pass registry once the
memoized cache is warm. For 'ninja check' I measured about 10% reduction
in CPU time, and about a 1% reduction in total time on a machine with 32
hardware threads. For normal compilation, I don't know how much this
will help, sadly. We will still pay the cost while we populate the
memoized cache. I don't think it will hurt though, and for LTO or
compiles with many small functions it should still be a win.

Diff Detail

Repository
rL LLVM

Event Timeline

chandlerc updated this revision to Diff 18862.Jan 27 2015, 3:41 PM
chandlerc retitled this revision from to [LPM] A targeted but somewhat horrible fix to the legacy pass manager's querying of the pass registry..
chandlerc updated this object.
chandlerc edited the test plan for this revision. (Show Details)
chandlerc added a reviewer: resistor.
chandlerc added a subscriber: Unknown Object (MLST).
resistor edited edge metadata.Jan 27 2015, 4:55 PM

Hi Chandler,

In my measurements, this captures about half of the regression, i.e. the regression goes from ~20% to ~10%. While the reader/writer locks are gone from the profile, several other PassManager related pieces of code have jumped up to take their place:

—Owen

To summarize the discussion on IRC:

This hot code path doesn't really seem to be due to my patch. If it is, I
can't explain why. I definitely can't explain why without some steps to
reproduce.

This patch essentially erases all the performance hits I can find any way
to measure of the pthread locks, so I'll wait for more information from you
or the others that have measured more significant issues here. Let me know
what you'd like me to do next.

This revision was automatically updated to reflect the committed changes.