This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
PassAnalysisSupport.h
-
lib/IR/
-
IR/
-
LegacyPassManager.cpp
-
Pass.cpp

Differential D94416

[PM] Avoid duplicates in the Used/Preserved/Required sets
ClosedPublic

Authored by bjope on Jan 11 2021, 7:15 AM.

Download Raw Diff

Details

Reviewers

foad
chandlerc

Commits

rG985b9b7e421a: [PM] Avoid duplicates in the Used/Preserved/Required sets

Summary

The pass analysis uses "sets" implemented using a SmallVector type
to keep track of Used, Preserved, Required and RequiredTransitive
passes. When having nested analyses we could end up with duplicates
in those sets, as there was no checks to see if a pass already
existed in the "set" before pushing to the vectors. This idea with
this patch is to avoid such duplicates by avoiding pushing elements
that already is contained when adding elements to those sets.

To align with the above PMDataManager::collectRequiredAndUsedAnalyses
is changed to skip adding both the Required and RequiredTransitive
passes to its result vectors (since RequiredTransitive always is
a subset of Required we ended up with duplicates when traversing
both sets).

Main goal with this is to avoid spending time verifying the same
analysis mulitple times in PMDataManager::verifyPreservedAnalysis
when iterating over the Preserved "set". It is assumed that removing
duplicates from a "set" shouldn't have any other negative impact
(I have not seen any problems so far). If this ends up causing
problems one could do some uniqueness filtering of the vector being
traversed in verifyPreservedAnalysis instead.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

bjope created this revision.Jan 11 2021, 7:15 AM

Herald added subscribers: dexonsmith, hiraditya. · View Herald TranscriptJan 11 2021, 7:15 AM

bjope requested review of this revision.Jan 11 2021, 7:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 11 2021, 7:15 AM

Harbormaster completed remote builds in B84691: Diff 315801.Jan 11 2021, 8:15 AM

bjope added a reviewer: foad.Jan 12 2021, 4:57 AM

Does this make any difference in practice? E.g. does the output of opt -O1 -debug-pass=Executions change, or can you measure any timing difference?

In D94416#2493099, @foad wrote:

Does this make any difference in practice? E.g. does the output of opt -O1 -debug-pass=Executions change, or can you measure any timing difference?

Currently -debug-pass isn't showing when verifyAnalysis is called. But if we add a debug printout in PMDataManager::verifyPreservedAnalysis that prints the analysis passes that are verified (either for -debug-pass=Executions or -debug-pass=Details) then it would be possible to create a test case that show the difference in number of verifyAnalysis calls that we get.

There might be some overhead when setting up the pipeline (due to the llvm::is_contained calls), while we also same some work as we no longer need to deal with duplicates when iterating over the different sets. To see a speedup due to less calls to verifyAnalysis one would need to run lots of test cases with the verifiers enabled (and maybe also with EXPENSIVE_CHECKS=ON).
I haven't done any timing measurements. I've just assumed that it won't be worse if doing less duplicated work.

I created this patch as my earlier patch (https://reviews.llvm.org/D94138) resulted in even more duplicates in the preserved sets. So the idea was to compensate a bit for a potential speed regression, when using verifiers, by simply getting rid of all duplicates in the sets.

But maybe it isn't worth the churn to proceed with this considering that legacy PM is supposed to be phased out?

I think improving the legacy pass manager is fine (after all it is taking a very long time for the new pass manager to supersede it). But I think any patch that claims to do less work overall, or speed something up, needs *some* kind of evidence that it actually has the desired effect.

I've now tried to do some performance comparisons using perf stat -r 100 opt -O3 -o /dev/null --verify-dom-info -verify-assumption-cache -verify-loop-info.

Tried to find a test case in-tree that has several basic blocks (to actually spend some time in the verifiers). Here is the result ("opt-old" is without this patch and "opt-new" is with this patch):

Performance counter stats for 'opt-old -O3 -o /dev/null --verify-dom-info -verify-assumption-cache -verify-loop-info test/Analysis/LoopNestAnalysis/imperfectnest.ll' (100 runs):

            76.38 msec task-clock:u              #    0.973 CPUs utilized            ( +-  0.63% )
                0      context-switches:u        #    0.000 K/sec                  
                0      cpu-migrations:u          #    0.000 K/sec                  
            4,141      page-faults:u             #    0.054 M/sec                    ( +-  0.00% )
      217,856,171      cycles:u                  #    2.852 GHz                      ( +-  0.34% )
      425,842,775      instructions:u            #    1.95  insn per cycle           ( +-  0.05% )
       87,203,931      branches:u                # 1141.767 M/sec                    ( +-  0.05% )
        1,675,703      branch-misses:u           #    1.92% of all branches          ( +-  0.24% )

         0.078464 +- 0.000485 seconds time elapsed  ( +-  0.62% )


Performance counter stats for 'opt-new -O3 -o /dev/null --verify-dom-info -verify-assumption-cache -verify-loop-info test/Analysis/LoopNestAnalysis/imperfectnest.ll' (100 runs):

            63.83 msec task-clock:u              #    0.963 CPUs utilized            ( +-  0.97% )
                0      context-switches:u        #    0.000 K/sec                  
                0      cpu-migrations:u          #    0.000 K/sec                  
            4,124      page-faults:u             #    0.065 M/sec                    ( +-  0.00% )
      180,555,724      cycles:u                  #    2.828 GHz                      ( +-  1.01% )
      323,290,930      instructions:u            #    1.79  insn per cycle           ( +-  0.04% )
       66,711,896      branches:u                # 1045.074 M/sec                    ( +-  0.04% )
        1,532,334      branch-misses:u           #    2.30% of all branches          ( +-  0.35% )

         0.066310 +- 0.000631 seconds time elapsed  ( +-  0.95% )

So for that particular test this patch looks like a win.

I've also compared the result with an empty input and without verifiers. This is to see if there is an overhead when not using verifiers:

Performance counter stats for 'opt-old -O3 -o /dev/null /dev/null' (100 runs):

            12.71 msec task-clock:u              #    0.885 CPUs utilized            ( +-  0.96% )
                0      context-switches:u        #    0.000 K/sec                  
                0      cpu-migrations:u          #    0.000 K/sec                  
            2,774      page-faults:u             #    0.218 M/sec                    ( +-  0.00% )
       22,026,642      cycles:u                  #    1.733 GHz                      ( +-  1.02% )
       25,342,570      instructions:u            #    1.15  insn per cycle           ( +-  0.00% )
        6,191,303      branches:u                #  487.180 M/sec                    ( +-  0.00% )
          139,025      branch-misses:u           #    2.25% of all branches          ( +-  0.29% )

         0.014362 +- 0.000132 seconds time elapsed  ( +-  0.92% )

Performance counter stats for 'opt-new -O3 -o /dev/null /dev/null' (100 runs):

            12.80 msec task-clock:u              #    0.883 CPUs utilized            ( +-  1.02% )
                0      context-switches:u        #    0.000 K/sec                  
                0      cpu-migrations:u          #    0.000 K/sec                  
            2,760      page-faults:u             #    0.216 M/sec                    ( +-  0.00% )
       21,916,544      cycles:u                  #    1.712 GHz                      ( +-  1.19% )
       24,035,752      instructions:u            #    1.10  insn per cycle           ( +-  0.00% )
        5,788,238      branches:u                #  452.062 M/sec                    ( +-  0.00% )
          139,649      branch-misses:u           #    2.41% of all branches          ( +-  1.14% )

         0.014502 +- 0.000147 seconds time elapsed  ( +-  1.01% )

So even without anything to verify etc, the number of instructions/branches are smaller with the patch (looking at cycles and task-clock is more inconclusive since it varies a bit on my test server, but I'd say that opt-old and opt-new performs equally good here).

Thanks for doing the timings. The patch seems reasonable to me. I'll approve it but it would be great if you could find at least one other reviewer to take a look too.

An alternative implementation would be to use SetVector instead of SmallVector for each of the Required, RequiredTransitive, Preserved and Used sets.

Incidentally I tried a similar kind of optimisation here: https://reviews.llvm.org/differential/diff/316602/
It's still work in progress because (a) I still don't fully understand setLastUser and (b) I haven't done any performance measurements on it.

This revision is now accepted and ready to land.Jan 14 2021, 2:47 AM

bjope added a reviewer: chandlerc.Jan 18 2021, 6:55 AM

This revision was landed with ongoing or failed builds.Jan 20 2021, 4:56 AM

Closed by commit rG985b9b7e421a: [PM] Avoid duplicates in the Used/Preserved/Required sets (authored by bjope). · Explain Why

This revision was automatically updated to reflect the committed changes.

bjope added a commit: rG985b9b7e421a: [PM] Avoid duplicates in the Used/Preserved/Required sets.

Revision Contents

Path

Size

llvm/

include/

llvm/

PassAnalysisSupport.h

20 lines

lib/

IR/

LegacyPassManager.cpp

6 lines

Pass.cpp

11 lines

Diff 317843

llvm/include/llvm/PassAnalysisSupport.h

Show All 11 Lines
// NO .CPP FILES SHOULD INCLUDE THIS FILE DIRECTLY		// NO .CPP FILES SHOULD INCLUDE THIS FILE DIRECTLY
//		//
// Instead, #include Pass.h		// Instead, #include Pass.h
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#if !defined(LLVM_PASS_H) \|\| defined(LLVM_PASSANALYSISSUPPORT_H)		#if !defined(LLVM_PASS_H) \|\| defined(LLVM_PASSANALYSISSUPPORT_H)
#error "Do not include <PassAnalysisSupport.h>; include <Pass.h> instead"		#error "Do not include <PassAnalysisSupport.h>; include <Pass.h> instead"
#endif		#endif

#ifndef LLVM_PASSANALYSISSUPPORT_H		#ifndef LLVM_PASSANALYSISSUPPORT_H
#define LLVM_PASSANALYSISSUPPORT_H		#define LLVM_PASSANALYSISSUPPORT_H

		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include <cassert>		#include <cassert>
#include <tuple>		#include <tuple>
#include <utility>		#include <utility>
#include <vector>		#include <vector>

namespace llvm {		namespace llvm {

Show All 20 Lines	private:
// this usecase. The sizes were picked to minimize wasted space, but are		// this usecase. The sizes were picked to minimize wasted space, but are
// otherwise fairly meaningless.		// otherwise fairly meaningless.
SmallVector<AnalysisID, 8> Required;		SmallVector<AnalysisID, 8> Required;
SmallVector<AnalysisID, 2> RequiredTransitive;		SmallVector<AnalysisID, 2> RequiredTransitive;
SmallVector<AnalysisID, 2> Preserved;		SmallVector<AnalysisID, 2> Preserved;
SmallVector<AnalysisID, 0> Used;		SmallVector<AnalysisID, 0> Used;
bool PreservesAll = false;		bool PreservesAll = false;

		void pushUnique(VectorType &Set, AnalysisID ID) {
		if (!llvm::is_contained(Set, ID))
		Set.push_back(ID);
		}

public:		public:
AnalysisUsage() = default;		AnalysisUsage() = default;

///@{		///@{
/// Add the specified ID to the required set of the usage info for a pass.		/// Add the specified ID to the required set of the usage info for a pass.
AnalysisUsage &addRequiredID(const void *ID);		AnalysisUsage &addRequiredID(const void *ID);
AnalysisUsage &addRequiredID(char &ID);		AnalysisUsage &addRequiredID(char &ID);
template<class PassClass>		template<class PassClass>
AnalysisUsage &addRequired() {		AnalysisUsage &addRequired() {
return addRequiredID(PassClass::ID);		return addRequiredID(PassClass::ID);
}		}

AnalysisUsage &addRequiredTransitiveID(char &ID);		AnalysisUsage &addRequiredTransitiveID(char &ID);
template<class PassClass>		template<class PassClass>
AnalysisUsage &addRequiredTransitive() {		AnalysisUsage &addRequiredTransitive() {
return addRequiredTransitiveID(PassClass::ID);		return addRequiredTransitiveID(PassClass::ID);
}		}
///@}		///@}

///@{		///@{
/// Add the specified ID to the set of analyses preserved by this pass.		/// Add the specified ID to the set of analyses preserved by this pass.
AnalysisUsage &addPreservedID(const void *ID) {		AnalysisUsage &addPreservedID(const void *ID) {
Preserved.push_back(ID);		pushUnique(Preserved, ID);
return *this;		return *this;
}		}
AnalysisUsage &addPreservedID(char &ID) {		AnalysisUsage &addPreservedID(char &ID) {
Preserved.push_back(&ID);		pushUnique(Preserved, &ID);
return *this;		return *this;
}		}
/// Add the specified Pass class to the set of analyses preserved by this pass.		/// Add the specified Pass class to the set of analyses preserved by this pass.
template<class PassClass>		template<class PassClass>
AnalysisUsage &addPreserved() {		AnalysisUsage &addPreserved() {
Preserved.push_back(&PassClass::ID);		pushUnique(Preserved, &PassClass::ID);
return *this;		return *this;
}		}
///@}		///@}

///@{		///@{
/// Add the specified ID to the set of analyses used by this pass if they are		/// Add the specified ID to the set of analyses used by this pass if they are
/// available..		/// available..
AnalysisUsage &addUsedIfAvailableID(const void *ID) {		AnalysisUsage &addUsedIfAvailableID(const void *ID) {
Used.push_back(ID);		pushUnique(Used, ID);
return *this;		return *this;
}		}
AnalysisUsage &addUsedIfAvailableID(char &ID) {		AnalysisUsage &addUsedIfAvailableID(char &ID) {
Used.push_back(&ID);		pushUnique(Used, &ID);
return *this;		return *this;
}		}
/// Add the specified Pass class to the set of analyses used by this pass.		/// Add the specified Pass class to the set of analyses used by this pass.
template<class PassClass>		template<class PassClass>
AnalysisUsage &addUsedIfAvailable() {		AnalysisUsage &addUsedIfAvailable() {
Used.push_back(&PassClass::ID);		pushUnique(Used, &PassClass::ID);
return *this;		return *this;
}		}
///@}		///@}

/// Add the Pass with the specified argument string to the set of analyses		/// Add the Pass with the specified argument string to the set of analyses
/// preserved by this pass. If no such Pass exists, do nothing. This can be		/// preserved by this pass. If no such Pass exists, do nothing. This can be
/// useful when a pass is trivially preserved, but may not be linked in. Be		/// useful when a pass is trivially preserved, but may not be linked in. Be
/// careful about spelling!		/// careful about spelling!
▲ Show 20 Lines • Show All 167 Lines • Show Last 20 Lines

llvm/lib/IR/LegacyPassManager.cpp

Show First 20 Lines • Show All 1,104 Lines • ▼ Show 20 Lines	for (const auto &UsedID : AnUsage->getUsedSet())
if (Pass *AnalysisPass = findAnalysisPass(UsedID, true))		if (Pass *AnalysisPass = findAnalysisPass(UsedID, true))
UP.push_back(AnalysisPass);		UP.push_back(AnalysisPass);

for (const auto &RequiredID : AnUsage->getRequiredSet())		for (const auto &RequiredID : AnUsage->getRequiredSet())
if (Pass *AnalysisPass = findAnalysisPass(RequiredID, true))		if (Pass *AnalysisPass = findAnalysisPass(RequiredID, true))
UP.push_back(AnalysisPass);		UP.push_back(AnalysisPass);
else		else
RP_NotAvail.push_back(RequiredID);		RP_NotAvail.push_back(RequiredID);

for (const auto &RequiredID : AnUsage->getRequiredTransitiveSet())
if (Pass *AnalysisPass = findAnalysisPass(RequiredID, true))
UP.push_back(AnalysisPass);
else
RP_NotAvail.push_back(RequiredID);
}		}

// All Required analyses should be available to the pass as it runs! Here		// All Required analyses should be available to the pass as it runs! Here
// we fill in the AnalysisImpls member of the pass so that it can		// we fill in the AnalysisImpls member of the pass so that it can
// successfully use the getAnalysis() method to retrieve the		// successfully use the getAnalysis() method to retrieve the
// implementations it needs.		// implementations it needs.
//		//
void PMDataManager::initializeAnalysisImpl(Pass *P) {		void PMDataManager::initializeAnalysisImpl(Pass *P) {
▲ Show 20 Lines • Show All 649 Lines • Show Last 20 Lines

llvm/lib/IR/Pass.cpp

Show First 20 Lines • Show All 253 Lines • ▼ Show 20 Lines	void AnalysisUsage::setPreservesCFG() {
// Since this transformation doesn't modify the CFG, it preserves all analyses		// Since this transformation doesn't modify the CFG, it preserves all analyses
// that only depend on the CFG (like dominators, loop info, etc...)		// that only depend on the CFG (like dominators, loop info, etc...)
GetCFGOnlyPasses(Preserved).enumeratePasses();		GetCFGOnlyPasses(Preserved).enumeratePasses();
}		}

AnalysisUsage &AnalysisUsage::addPreserved(StringRef Arg) {		AnalysisUsage &AnalysisUsage::addPreserved(StringRef Arg) {
const PassInfo *PI = Pass::lookupPassInfo(Arg);		const PassInfo *PI = Pass::lookupPassInfo(Arg);
// If the pass exists, preserve it. Otherwise silently do nothing.		// If the pass exists, preserve it. Otherwise silently do nothing.
if (PI) Preserved.push_back(PI->getTypeInfo());		if (PI)
		pushUnique(Preserved, PI->getTypeInfo());
return *this;		return *this;
}		}

AnalysisUsage &AnalysisUsage::addRequiredID(const void *ID) {		AnalysisUsage &AnalysisUsage::addRequiredID(const void *ID) {
Required.push_back(ID);		pushUnique(Required, ID);
return *this;		return *this;
}		}

AnalysisUsage &AnalysisUsage::addRequiredID(char &ID) {		AnalysisUsage &AnalysisUsage::addRequiredID(char &ID) {
Required.push_back(&ID);		pushUnique(Required, &ID);
return *this;		return *this;
}		}

AnalysisUsage &AnalysisUsage::addRequiredTransitiveID(char &ID) {		AnalysisUsage &AnalysisUsage::addRequiredTransitiveID(char &ID) {
Required.push_back(&ID);		pushUnique(Required, &ID);
RequiredTransitive.push_back(&ID);		pushUnique(RequiredTransitive, &ID);
return *this;		return *this;
}		}