This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/ML/
-
llvm/
-
Analysis/
-
ML/
1
CodeFeaturesAnalysis.h
-
lib/
-
Analysis/ML/
-
ML/
-
CMakeLists.txt
2
CodeFeaturesAnalysis.cpp
-
Passes/
-
PassBuilder.cpp
-
PassRegistry.def

Differential D81716

Extend InlineFeatureAnalysis to more extract generic code features [Obsolete]
AcceptedPublic

Authored by tarinduj on Jun 11 2020, 11:27 PM.

Download Raw Diff

Details

Reviewers

jdoerfert
uenoku
ggeorgakoudis
mtrofin

Summary

This patch extends the InlineFeatureAnalysis pass adds a printer pass to it. Planning to add the ability extract more code features including #loops, loop depth, types of instructions, etc.

Diff Detail

Event Timeline

tarinduj created this revision.Jun 11 2020, 11:27 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 11 2020, 11:27 PM

Herald added subscribers: llvm-commits, hiraditya, mgorny. · View Herald Transcript

tarinduj retitled this revision from Extend InlineFeatureAnalysis to extract generic code features to Extend InlineFeatureAnalysis to more extract generic code features.Jun 11 2020, 11:27 PM

@mtrofin We want to "rename" InlineFeatureAnalysis to a more generic name and extend it. This patch does the former, basically, and adds a printer pass. Extensions will follow soon. Is that generally OK with you?

@tarinduj I left some comments. We also need to replace the old InlineFeaturesAnalsysis with the CodeFeature one everywhere, assuming @mtrofin doesn't have any concerns. Generally we minimize duplication ;)

llvm/include/llvm/Analysis/ML/CodeFeaturesAnalysis.h
1	Please use the same file comment style we have elsewhere.
llvm/lib/Analysis/ML/CodeFeaturesAnalysis.cpp
1	File comment missing too.
49	We need a lit test for the printer.

In D81716#2091417, @jdoerfert wrote:

@mtrofin We want to "rename" InlineFeatureAnalysis to a more generic name and extend it. This patch does the former, basically, and adds a printer pass. Extensions will follow soon. Is that generally OK with you?

@tarinduj I left some comments. We also need to replace the old InlineFeaturesAnalsysis with the CodeFeature one everywhere, assuming @mtrofin doesn't have any concerns. Generally we minimize duplication ;)

Some feature calculations are computationally intensive. At least for inlining, we plan in a next step to look at regions of the call graph around a call site. I wouldn't want to burden other consumers with this extra cost.

As a first step, if the planned new features would add space or computational overhead, could the work requiring the extra features have its own analysis and consume both analysis results? As we have examples of more such analyses and understand the usecases and performance tradeoffs, we can very easily refactor / rename.

If the new features are simple and small, SGTM - the only issue I have is the name, it's too generic - I worry it invites grabbag dumping of features. Can we find something more specific? What would the short/medium term consumers be (that could help with a name)

In D81716#2092826, @mtrofin wrote:

In D81716#2091417, @jdoerfert wrote:

@mtrofin We want to "rename" InlineFeatureAnalysis to a more generic name and extend it. This patch does the former, basically, and adds a printer pass. Extensions will follow soon. Is that generally OK with you?

@tarinduj I left some comments. We also need to replace the old InlineFeaturesAnalsysis with the CodeFeature one everywhere, assuming @mtrofin doesn't have any concerns. Generally we minimize duplication ;)

Some feature calculations are computationally intensive. At least for inlining, we plan in a next step to look at regions of the call graph around a call site. I wouldn't want to burden other consumers with this extra cost.

That is fair, see below.

As a first step, if the planned new features would add space or computational overhead, could the work requiring the extra features have its own analysis and consume both analysis results? As we have examples of more such analyses and understand the usecases and performance tradeoffs, we can very easily refactor / rename.

While we can have more analysis if we want, I somehow doubt that will make it better. I was envisioning a lazy approach in which the analysis will not do anything until instructed to. Users can "ask" for what they want while reuse is still possible and code is kept at a single place. As an example, if users ask to findCFGstructureFeatures they will not get any instruction level results. Querying those could assert w/o extra cost. I don't see why this would cost us extra and I would prefer it very much over a multitude of passes. WDYT?

If the new features are simple and small, SGTM - the only issue I have is the name, it's too generic - I worry it invites grabbag dumping of features. Can we find something more specific? What would the short/medium term consumers be (that could help with a name)

Short term consumers on our site are experiments. We hope to eventually set up heuristics or models that use more features.

In D81716#2094028, @jdoerfert wrote:

In D81716#2092826, @mtrofin wrote:

In D81716#2091417, @jdoerfert wrote:

@mtrofin We want to "rename" InlineFeatureAnalysis to a more generic name and extend it. This patch does the former, basically, and adds a printer pass. Extensions will follow soon. Is that generally OK with you?

@tarinduj I left some comments. We also need to replace the old InlineFeaturesAnalsysis with the CodeFeature one everywhere, assuming @mtrofin doesn't have any concerns. Generally we minimize duplication ;)

Some feature calculations are computationally intensive. At least for inlining, we plan in a next step to look at regions of the call graph around a call site. I wouldn't want to burden other consumers with this extra cost.

That is fair, see below.

As a first step, if the planned new features would add space or computational overhead, could the work requiring the extra features have its own analysis and consume both analysis results? As we have examples of more such analyses and understand the usecases and performance tradeoffs, we can very easily refactor / rename.

While we can have more analysis if we want, I somehow doubt that will make it better. I was envisioning a lazy approach in which the analysis will not do anything until instructed to. Users can "ask" for what they want while reuse is still possible and code is kept at a single place. As an example, if users ask to findCFGstructureFeatures they will not get any instruction level results. Querying those could assert w/o extra cost. I don't see why this would cost us extra and I would prefer it very much over a multitude of passes. WDYT?

IIUC, that means the one analysis memoizes results. I'm speculating invalidation would be total - i.e. even if only some results were needed (and, thus, computed), invalidation clears everything. Getting results would be semantically equivalent to having many analyses. Invalidating would be semantically equivalent to going piecemeal through the analyses that should be cleared and doing that. I'm not sure if that'd be always desirable, I suppose we'll learn when we hit a specific problem.

How about this:

we rename InlineFeaturesAnalysis to FunctionFeaturesAnalysis (just because "Code" is too general - do we use that term elsewhere in LLVM?)
as long as the features are trivially computable , let's just eagerly compute them
if this causes perf/space problems, or when we hit more complicated features, we figure out at that point what the best course of action may be.

wdyt?

If the new features are simple and small, SGTM - the only issue I have is the name, it's too generic - I worry it invites grabbag dumping of features. Can we find something more specific? What would the short/medium term consumers be (that could help with a name)

Short term consumers on our site are experiments. We hope to eventually set up heuristics or models that use more features.

Is there a better name than features? I initially read this as having to do with subtarget features.

In D81716#2094081, @mtrofin wrote:

As a first step, if the planned new features would add space or computational overhead, could the work requiring the extra features have its own analysis and consume both analysis results? As we have examples of more such analyses and understand the usecases and performance tradeoffs, we can very easily refactor / rename.

While we can have more analysis if we want, I somehow doubt that will make it better. I was envisioning a lazy approach in which the analysis will not do anything until instructed to. Users can "ask" for what they want while reuse is still possible and code is kept at a single place. As an example, if users ask to findCFGstructureFeatures they will not get any instruction level results. Querying those could assert w/o extra cost. I don't see why this would cost us extra and I would prefer it very much over a multitude of passes. WDYT?

IIUC, that means the one analysis memoizes results. I'm speculating invalidation would be total - i.e. even if only some results were needed (and, thus, computed), invalidation clears everything. Getting results would be semantically equivalent to having many analyses. Invalidating would be semantically equivalent to going piecemeal through the analyses that should be cleared and doing that. I'm not sure if that'd be always desirable, I suppose we'll learn when we hit a specific problem.

How about this:

we rename InlineFeaturesAnalysis to FunctionFeaturesAnalysis (just because "Code" is too general - do we use that term elsewhere in LLVM?)

I think that is fine. @tarinduj, WDYT?

as long as the features are trivially computable , let's just eagerly compute them

Sure.

if this causes perf/space problems, or when we hit more complicated features, we figure out at that point what the best course of action may be.

Sounds good to me.

In D81716#2094112, @arsenm wrote:

Is there a better name than features? I initially read this as having to do with subtarget features.

Hm, properties?

In D81716#2094270, @jdoerfert wrote:

In D81716#2094081, @mtrofin wrote:

As a first step, if the planned new features would add space or computational overhead, could the work requiring the extra features have its own analysis and consume both analysis results? As we have examples of more such analyses and understand the usecases and performance tradeoffs, we can very easily refactor / rename.

While we can have more analysis if we want, I somehow doubt that will make it better. I was envisioning a lazy approach in which the analysis will not do anything until instructed to. Users can "ask" for what they want while reuse is still possible and code is kept at a single place. As an example, if users ask to findCFGstructureFeatures they will not get any instruction level results. Querying those could assert w/o extra cost. I don't see why this would cost us extra and I would prefer it very much over a multitude of passes. WDYT?

IIUC, that means the one analysis memoizes results. I'm speculating invalidation would be total - i.e. even if only some results were needed (and, thus, computed), invalidation clears everything. Getting results would be semantically equivalent to having many analyses. Invalidating would be semantically equivalent to going piecemeal through the analyses that should be cleared and doing that. I'm not sure if that'd be always desirable, I suppose we'll learn when we hit a specific problem.

How about this:

we rename InlineFeaturesAnalysis to FunctionFeaturesAnalysis (just because "Code" is too general - do we use that term elsewhere in LLVM?)

I think that is fine. @tarinduj, WDYT?

Yeah, looks good.

as long as the features are trivially computable , let's just eagerly compute them

Sure.

if this causes perf/space problems, or when we hit more complicated features, we figure out at that point what the best course of action may be.

Sounds good to me.

In D81716#2094112, @arsenm wrote:

Is there a better name than features? I initially read this as having to do with subtarget features.

Hm, properties?

So FunctionPropertiesAnalysis?

Replaced InlineFeatureAnalysis with FunctionPropertiesAnalysis.

added comments

The printer pass needs a test. @mtrofin WDYT?

In D81716#2096179, @jdoerfert wrote:

The printer pass needs a test. @mtrofin WDYT?

Yup - never hurts to add a test.

In D81716#2096300, @mtrofin wrote:

In D81716#2096179, @jdoerfert wrote:

The printer pass needs a test. @mtrofin WDYT?

Yup - never hurts to add a test.

My comment was formatted badly. The test is needed for sure, wanted to know if you are fine with the rename ;)

In D81716#2096404, @jdoerfert wrote:

In D81716#2096300, @mtrofin wrote:

In D81716#2096179, @jdoerfert wrote:

The printer pass needs a test. @mtrofin WDYT?

Yup - never hurts to add a test.

My comment was formatted badly. The test is needed for sure, wanted to know if you are fine with the rename ;)

Ah, yes - lgtm

One overall nit: could we split this into the stand-alone rename, followed by the addition of the new code? Thanks!

lgtm

This revision is now accepted and ready to land.Jun 18 2020, 9:26 AM

tarinduj retitled this revision from Extend InlineFeatureAnalysis to more extract generic code features to Extend InlineFeatureAnalysis to more extract generic code features [Obsolete].Jun 25 2020, 12:24 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

ML/

CodeFeaturesAnalysis.h

59 lines

lib/

Analysis/

ML/

CMakeLists.txt

1 line

CodeFeaturesAnalysis.cpp

49 lines

Passes/

PassBuilder.cpp

1 line

PassRegistry.def

2 lines

Diff 270309

llvm/include/llvm/Analysis/ML/CodeFeaturesAnalysis.h

This file was added.

				#ifndef LLVM_CODEFEATURESANALYSIS_H_
				jdoerfertUnsubmitted Not Done Reply Inline Actions Please use the same file comment style we have elsewhere. jdoerfert: Please use the same file comment style we have elsewhere.
				#define LLVM_CODEFEATURESANALYSIS_H_

				#include "llvm/IR/PassManager.h"

				namespace llvm {
				class Function;

				class CodeFeaturesInfo{
				public:
				/// Number of basic blocks
				int64_t BasicBlockCount = 0;

				/// Number of blocks reached from a conditional instruction, or that are
				/// 'cases' of a SwitchInstr.
				// FIXME: We may want to replace this with a more meaningful metric, like
				// number of conditionally executed blocks:
				// 'if (a) s();' would be counted here as 2 blocks, just like
				// 'if (a) s(); else s2(); s3();' would.
				int64_t BlocksReachedFromConditionalInstruction = 0;

				/// Number of uses of this function, plus 1 if the function is callable
				/// outside the module.
				int64_t Uses = 0;

				/// Number of direct calls made from this function to other functions
				/// defined in this module.
				int64_t DirectCallsToDefinedFunctions = 0;

				void calculate(const Function &F);

				void print(raw_ostream &OS) const;
				};

				//Analysis pass
				class CodeFeaturesAnalysis
				: public AnalysisInfoMixin<CodeFeaturesAnalysis> {

				public:
				static AnalysisKey Key;

				using Result = CodeFeaturesInfo;

				Result run(const Function &F, FunctionAnalysisManager &FAM);
				};

				/// Printer pass for the CodeFeaturesAnalysis results.
				class CodeFeaturesPrinterPass
				: public PassInfoMixin<CodeFeaturesPrinterPass> {
				raw_ostream &OS;

				public:
				explicit CodeFeaturesPrinterPass(raw_ostream &OS) : OS(OS) {}

				PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
				};

				} // namespace llvm
				#endif // LLVM_CODEFEATURESANALYSIS_H_
				No newline at end of file

llvm/lib/Analysis/ML/CMakeLists.txt

	add_llvm_component_library(LLVMMLPolicies			add_llvm_component_library(LLVMMLPolicies
				CodeFeaturesAnalysis.cpp
	InlineFeaturesAnalysis.cpp			InlineFeaturesAnalysis.cpp

	DEPENDS			DEPENDS
	intrinsics_gen			intrinsics_gen
	)			)

llvm/lib/Analysis/ML/CodeFeaturesAnalysis.cpp

This file was added.

				#include "llvm/Analysis/ML/CodeFeaturesAnalysis.h"
				jdoerfertUnsubmitted Not Done Reply Inline Actions File comment missing too. jdoerfert: File comment missing too.
				#include "llvm/IR/Instructions.h"
				#include "llvm/Passes/PassBuilder.h"
				#include "llvm/Passes/PassPlugin.h"
				#include "llvm/Support/raw_ostream.h"

				using namespace llvm;

				void CodeFeaturesInfo::calculate(const Function &F) {
				Uses = ((!F.hasLocalLinkage()) ? 1 : 0) + F.getNumUses();
				for (const auto &BB : F) {
				++BasicBlockCount;
				if (const auto *BI = dyn_cast<BranchInst>(BB.getTerminator())) {
				if (BI->isConditional())
				BlocksReachedFromConditionalInstruction += BI->getNumSuccessors();
				} else if (const auto *SI = dyn_cast<SwitchInst>(BB.getTerminator()))
				BlocksReachedFromConditionalInstruction +=
				(SI->getNumCases() + (nullptr != SI->getDefaultDest()));
				for (const auto &I : BB)
				if (auto *CS = dyn_cast<CallBase>(&I)) {
				const auto *Callee = CS->getCalledFunction();
				if (Callee && !Callee->isIntrinsic() && !Callee->isDeclaration())
				++DirectCallsToDefinedFunctions;
				}
				}
				}

				void CodeFeaturesInfo::print(raw_ostream &OS) const {
				OS << "BasicBlockCount: " << BasicBlockCount << "\n"
				<< "BlocksReachedFromConditionalInstruction: " << BlocksReachedFromConditionalInstruction << "\n"
				<< "Uses: " << Uses << "\n"
				<< "DirectCallsToDefinedFunctions: " << DirectCallsToDefinedFunctions << "\n\n";
				}

				AnalysisKey CodeFeaturesAnalysis::Key;

				CodeFeaturesInfo CodeFeaturesAnalysis::run(const Function &F, FunctionAnalysisManager &FAM) {
				CodeFeaturesInfo CFI;
				CFI.calculate(F);
				return CFI;
				}

				PreservedAnalyses CodeFeaturesPrinterPass::run(Function &F, FunctionAnalysisManager &AM) {
				OS << "Printing analysis results of CFA for function "
				<< "'" << F.getName() << "':"
				<< "\n";
				AM.getResult<CodeFeaturesAnalysis>(F).print(OS);
				return PreservedAnalyses::all();
				}
				jdoerfertUnsubmitted Not Done Reply Inline Actions We need a lit test for the printer. jdoerfert: We need a lit test for the printer.

llvm/lib/Passes/PassBuilder.cpp

	Show All 34 Lines
	#include "llvm/Analysis/IVUsers.h"			#include "llvm/Analysis/IVUsers.h"
	#include "llvm/Analysis/InlineAdvisor.h"			#include "llvm/Analysis/InlineAdvisor.h"
	#include "llvm/Analysis/LazyCallGraph.h"			#include "llvm/Analysis/LazyCallGraph.h"
	#include "llvm/Analysis/LazyValueInfo.h"			#include "llvm/Analysis/LazyValueInfo.h"
	#include "llvm/Analysis/LoopAccessAnalysis.h"			#include "llvm/Analysis/LoopAccessAnalysis.h"
	#include "llvm/Analysis/LoopCacheAnalysis.h"			#include "llvm/Analysis/LoopCacheAnalysis.h"
	#include "llvm/Analysis/LoopInfo.h"			#include "llvm/Analysis/LoopInfo.h"
	#include "llvm/Analysis/LoopNestAnalysis.h"			#include "llvm/Analysis/LoopNestAnalysis.h"
				#include "llvm/Analysis/ML/CodeFeaturesAnalysis.h"
	#include "llvm/Analysis/ML/InlineFeaturesAnalysis.h"			#include "llvm/Analysis/ML/InlineFeaturesAnalysis.h"
	#include "llvm/Analysis/MemoryDependenceAnalysis.h"			#include "llvm/Analysis/MemoryDependenceAnalysis.h"
	#include "llvm/Analysis/MemorySSA.h"			#include "llvm/Analysis/MemorySSA.h"
	#include "llvm/Analysis/ModuleSummaryAnalysis.h"			#include "llvm/Analysis/ModuleSummaryAnalysis.h"
	#include "llvm/Analysis/OptimizationRemarkEmitter.h"			#include "llvm/Analysis/OptimizationRemarkEmitter.h"
	#include "llvm/Analysis/PhiValues.h"			#include "llvm/Analysis/PhiValues.h"
	#include "llvm/Analysis/PostDominators.h"			#include "llvm/Analysis/PostDominators.h"
	#include "llvm/Analysis/ProfileSummaryInfo.h"			#include "llvm/Analysis/ProfileSummaryInfo.h"
	▲ Show 20 Lines • Show All 2,591 Lines • Show Last 20 Lines

llvm/lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 120 Lines • ▼ Show 20 Lines

	#ifndef FUNCTION_ANALYSIS			#ifndef FUNCTION_ANALYSIS
	#define FUNCTION_ANALYSIS(NAME, CREATE_PASS)			#define FUNCTION_ANALYSIS(NAME, CREATE_PASS)
	#endif			#endif
	FUNCTION_ANALYSIS("aa", AAManager())			FUNCTION_ANALYSIS("aa", AAManager())
	FUNCTION_ANALYSIS("assumptions", AssumptionAnalysis())			FUNCTION_ANALYSIS("assumptions", AssumptionAnalysis())
	FUNCTION_ANALYSIS("block-freq", BlockFrequencyAnalysis())			FUNCTION_ANALYSIS("block-freq", BlockFrequencyAnalysis())
	FUNCTION_ANALYSIS("branch-prob", BranchProbabilityAnalysis())			FUNCTION_ANALYSIS("branch-prob", BranchProbabilityAnalysis())
				FUNCTION_ANALYSIS("code-features", CodeFeaturesAnalysis())
	FUNCTION_ANALYSIS("domtree", DominatorTreeAnalysis())			FUNCTION_ANALYSIS("domtree", DominatorTreeAnalysis())
	FUNCTION_ANALYSIS("postdomtree", PostDominatorTreeAnalysis())			FUNCTION_ANALYSIS("postdomtree", PostDominatorTreeAnalysis())
	FUNCTION_ANALYSIS("demanded-bits", DemandedBitsAnalysis())			FUNCTION_ANALYSIS("demanded-bits", DemandedBitsAnalysis())
	FUNCTION_ANALYSIS("domfrontier", DominanceFrontierAnalysis())			FUNCTION_ANALYSIS("domfrontier", DominanceFrontierAnalysis())
	FUNCTION_ANALYSIS("loops", LoopAnalysis())			FUNCTION_ANALYSIS("loops", LoopAnalysis())
	FUNCTION_ANALYSIS("lazy-value-info", LazyValueAnalysis())			FUNCTION_ANALYSIS("lazy-value-info", LazyValueAnalysis())
	FUNCTION_ANALYSIS("da", DependenceAnalysis())			FUNCTION_ANALYSIS("da", DependenceAnalysis())
	FUNCTION_ANALYSIS("inliner-features", InlineFeaturesAnalysis())			FUNCTION_ANALYSIS("inliner-features", InlineFeaturesAnalysis())
	▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
	FUNCTION_PASS("loop-load-elim", LoopLoadEliminationPass())			FUNCTION_PASS("loop-load-elim", LoopLoadEliminationPass())
	FUNCTION_PASS("loop-fuse", LoopFusePass())			FUNCTION_PASS("loop-fuse", LoopFusePass())
	FUNCTION_PASS("loop-distribute", LoopDistributePass())			FUNCTION_PASS("loop-distribute", LoopDistributePass())
	FUNCTION_PASS("pgo-memop-opt", PGOMemOPSizeOpt())			FUNCTION_PASS("pgo-memop-opt", PGOMemOPSizeOpt())
	FUNCTION_PASS("print", PrintFunctionPass(dbgs()))			FUNCTION_PASS("print", PrintFunctionPass(dbgs()))
	FUNCTION_PASS("print<assumptions>", AssumptionPrinterPass(dbgs()))			FUNCTION_PASS("print<assumptions>", AssumptionPrinterPass(dbgs()))
	FUNCTION_PASS("print<block-freq>", BlockFrequencyPrinterPass(dbgs()))			FUNCTION_PASS("print<block-freq>", BlockFrequencyPrinterPass(dbgs()))
	FUNCTION_PASS("print<branch-prob>", BranchProbabilityPrinterPass(dbgs()))			FUNCTION_PASS("print<branch-prob>", BranchProbabilityPrinterPass(dbgs()))
				FUNCTION_PASS("print<code-features>", CodeFeaturesPrinterPass(dbgs()))
	FUNCTION_PASS("print<da>", DependenceAnalysisPrinterPass(dbgs()))			FUNCTION_PASS("print<da>", DependenceAnalysisPrinterPass(dbgs()))
	FUNCTION_PASS("print<domtree>", DominatorTreePrinterPass(dbgs()))			FUNCTION_PASS("print<domtree>", DominatorTreePrinterPass(dbgs()))
	FUNCTION_PASS("print<postdomtree>", PostDominatorTreePrinterPass(dbgs()))			FUNCTION_PASS("print<postdomtree>", PostDominatorTreePrinterPass(dbgs()))
	FUNCTION_PASS("print<demanded-bits>", DemandedBitsPrinterPass(dbgs()))			FUNCTION_PASS("print<demanded-bits>", DemandedBitsPrinterPass(dbgs()))
	FUNCTION_PASS("print<domfrontier>", DominanceFrontierPrinterPass(dbgs()))			FUNCTION_PASS("print<domfrontier>", DominanceFrontierPrinterPass(dbgs()))
	FUNCTION_PASS("print<loops>", LoopPrinterPass(dbgs()))			FUNCTION_PASS("print<loops>", LoopPrinterPass(dbgs()))
	FUNCTION_PASS("print<memoryssa>", MemorySSAPrinterPass(dbgs()))			FUNCTION_PASS("print<memoryssa>", MemorySSAPrinterPass(dbgs()))
	FUNCTION_PASS("print<phi-values>", PhiValuesPrinterPass(dbgs()))			FUNCTION_PASS("print<phi-values>", PhiValuesPrinterPass(dbgs()))
	▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines