This is an archive of the discontinued LLVM Phabricator instance.

Extend InlineFeatureAnalysis to more extract generic code features [Obsolete]
AcceptedPublic

Authored by tarinduj on Jun 11 2020, 11:27 PM.

Details

Summary

This patch extends the InlineFeatureAnalysis pass adds a printer pass to it. Planning to add the ability extract more code features including #loops, loop depth, types of instructions, etc.

Diff Detail

Event Timeline

tarinduj created this revision.Jun 11 2020, 11:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 11 2020, 11:27 PM
tarinduj retitled this revision from Extend InlineFeatureAnalysis to extract generic code features to Extend InlineFeatureAnalysis to more extract generic code features.Jun 11 2020, 11:27 PM
jdoerfert added a subscriber: mtrofin.

@mtrofin We want to "rename" InlineFeatureAnalysis to a more generic name and extend it. This patch does the former, basically, and adds a printer pass. Extensions will follow soon. Is that generally OK with you?

@tarinduj I left some comments. We also need to replace the old InlineFeaturesAnalsysis with the CodeFeature one everywhere, assuming @mtrofin doesn't have any concerns. Generally we minimize duplication ;)

llvm/include/llvm/Analysis/ML/CodeFeaturesAnalysis.h
1 ↗(On Diff #270309)

Please use the same file comment style we have elsewhere.

llvm/lib/Analysis/ML/CodeFeaturesAnalysis.cpp
1 ↗(On Diff #270309)

File comment missing too.

49 ↗(On Diff #270309)

We need a lit test for the printer.

@mtrofin We want to "rename" InlineFeatureAnalysis to a more generic name and extend it. This patch does the former, basically, and adds a printer pass. Extensions will follow soon. Is that generally OK with you?

@tarinduj I left some comments. We also need to replace the old InlineFeaturesAnalsysis with the CodeFeature one everywhere, assuming @mtrofin doesn't have any concerns. Generally we minimize duplication ;)

Some feature calculations are computationally intensive. At least for inlining, we plan in a next step to look at regions of the call graph around a call site. I wouldn't want to burden other consumers with this extra cost.

As a first step, if the planned new features would add space or computational overhead, could the work requiring the extra features have its own analysis and consume both analysis results? As we have examples of more such analyses and understand the usecases and performance tradeoffs, we can very easily refactor / rename.

If the new features are simple and small, SGTM - the only issue I have is the name, it's too generic - I worry it invites grabbag dumping of features. Can we find something more specific? What would the short/medium term consumers be (that could help with a name)

@mtrofin We want to "rename" InlineFeatureAnalysis to a more generic name and extend it. This patch does the former, basically, and adds a printer pass. Extensions will follow soon. Is that generally OK with you?

@tarinduj I left some comments. We also need to replace the old InlineFeaturesAnalsysis with the CodeFeature one everywhere, assuming @mtrofin doesn't have any concerns. Generally we minimize duplication ;)

Some feature calculations are computationally intensive. At least for inlining, we plan in a next step to look at regions of the call graph around a call site. I wouldn't want to burden other consumers with this extra cost.

That is fair, see below.

As a first step, if the planned new features would add space or computational overhead, could the work requiring the extra features have its own analysis and consume both analysis results? As we have examples of more such analyses and understand the usecases and performance tradeoffs, we can very easily refactor / rename.

While we can have more analysis if we want, I somehow doubt that will make it better. I was envisioning a lazy approach in which the analysis will not do anything until instructed to. Users can "ask" for what they want while reuse is still possible and code is kept at a single place. As an example, if users ask to findCFGstructureFeatures they will not get any instruction level results. Querying those could assert w/o extra cost. I don't see why this would cost us extra and I would prefer it very much over a multitude of passes. WDYT?

If the new features are simple and small, SGTM - the only issue I have is the name, it's too generic - I worry it invites grabbag dumping of features. Can we find something more specific? What would the short/medium term consumers be (that could help with a name)

Short term consumers on our site are experiments. We hope to eventually set up heuristics or models that use more features.

@mtrofin We want to "rename" InlineFeatureAnalysis to a more generic name and extend it. This patch does the former, basically, and adds a printer pass. Extensions will follow soon. Is that generally OK with you?

@tarinduj I left some comments. We also need to replace the old InlineFeaturesAnalsysis with the CodeFeature one everywhere, assuming @mtrofin doesn't have any concerns. Generally we minimize duplication ;)

Some feature calculations are computationally intensive. At least for inlining, we plan in a next step to look at regions of the call graph around a call site. I wouldn't want to burden other consumers with this extra cost.

That is fair, see below.

As a first step, if the planned new features would add space or computational overhead, could the work requiring the extra features have its own analysis and consume both analysis results? As we have examples of more such analyses and understand the usecases and performance tradeoffs, we can very easily refactor / rename.

While we can have more analysis if we want, I somehow doubt that will make it better. I was envisioning a lazy approach in which the analysis will not do anything until instructed to. Users can "ask" for what they want while reuse is still possible and code is kept at a single place. As an example, if users ask to findCFGstructureFeatures they will not get any instruction level results. Querying those could assert w/o extra cost. I don't see why this would cost us extra and I would prefer it very much over a multitude of passes. WDYT?

IIUC, that means the one analysis memoizes results. I'm speculating invalidation would be total - i.e. even if only some results were needed (and, thus, computed), invalidation clears everything. Getting results would be semantically equivalent to having many analyses. Invalidating would be semantically equivalent to going piecemeal through the analyses that should be cleared and doing that. I'm not sure if that'd be always desirable, I suppose we'll learn when we hit a specific problem.

How about this:

  • we rename InlineFeaturesAnalysis to FunctionFeaturesAnalysis (just because "Code" is too general - do we use that term elsewhere in LLVM?)
  • as long as the features are trivially computable , let's just eagerly compute them
  • if this causes perf/space problems, or when we hit more complicated features, we figure out at that point what the best course of action may be.

wdyt?

If the new features are simple and small, SGTM - the only issue I have is the name, it's too generic - I worry it invites grabbag dumping of features. Can we find something more specific? What would the short/medium term consumers be (that could help with a name)

Short term consumers on our site are experiments. We hope to eventually set up heuristics or models that use more features.

arsenm added a subscriber: arsenm.Jun 15 2020, 2:47 PM

Is there a better name than features? I initially read this as having to do with subtarget features.

As a first step, if the planned new features would add space or computational overhead, could the work requiring the extra features have its own analysis and consume both analysis results? As we have examples of more such analyses and understand the usecases and performance tradeoffs, we can very easily refactor / rename.

While we can have more analysis if we want, I somehow doubt that will make it better. I was envisioning a lazy approach in which the analysis will not do anything until instructed to. Users can "ask" for what they want while reuse is still possible and code is kept at a single place. As an example, if users ask to findCFGstructureFeatures they will not get any instruction level results. Querying those could assert w/o extra cost. I don't see why this would cost us extra and I would prefer it very much over a multitude of passes. WDYT?

IIUC, that means the one analysis memoizes results. I'm speculating invalidation would be total - i.e. even if only some results were needed (and, thus, computed), invalidation clears everything. Getting results would be semantically equivalent to having many analyses. Invalidating would be semantically equivalent to going piecemeal through the analyses that should be cleared and doing that. I'm not sure if that'd be always desirable, I suppose we'll learn when we hit a specific problem.

How about this:

  • we rename InlineFeaturesAnalysis to FunctionFeaturesAnalysis (just because "Code" is too general - do we use that term elsewhere in LLVM?)

I think that is fine. @tarinduj, WDYT?

  • as long as the features are trivially computable , let's just eagerly compute them

Sure.

  • if this causes perf/space problems, or when we hit more complicated features, we figure out at that point what the best course of action may be.

Sounds good to me.

Is there a better name than features? I initially read this as having to do with subtarget features.

Hm, properties?

As a first step, if the planned new features would add space or computational overhead, could the work requiring the extra features have its own analysis and consume both analysis results? As we have examples of more such analyses and understand the usecases and performance tradeoffs, we can very easily refactor / rename.

While we can have more analysis if we want, I somehow doubt that will make it better. I was envisioning a lazy approach in which the analysis will not do anything until instructed to. Users can "ask" for what they want while reuse is still possible and code is kept at a single place. As an example, if users ask to findCFGstructureFeatures they will not get any instruction level results. Querying those could assert w/o extra cost. I don't see why this would cost us extra and I would prefer it very much over a multitude of passes. WDYT?

IIUC, that means the one analysis memoizes results. I'm speculating invalidation would be total - i.e. even if only some results were needed (and, thus, computed), invalidation clears everything. Getting results would be semantically equivalent to having many analyses. Invalidating would be semantically equivalent to going piecemeal through the analyses that should be cleared and doing that. I'm not sure if that'd be always desirable, I suppose we'll learn when we hit a specific problem.

How about this:

  • we rename InlineFeaturesAnalysis to FunctionFeaturesAnalysis (just because "Code" is too general - do we use that term elsewhere in LLVM?)

I think that is fine. @tarinduj, WDYT?

Yeah, looks good.

  • as long as the features are trivially computable , let's just eagerly compute them

Sure.

  • if this causes perf/space problems, or when we hit more complicated features, we figure out at that point what the best course of action may be.

Sounds good to me.

Is there a better name than features? I initially read this as having to do with subtarget features.

Hm, properties?

So FunctionPropertiesAnalysis?

tarinduj updated this revision to Diff 270981.Jun 16 2020, 1:42 AM

Replaced InlineFeatureAnalysis with FunctionPropertiesAnalysis.

tarinduj updated this revision to Diff 271023.Jun 16 2020, 3:51 AM

added comments

The printer pass needs a test. @mtrofin WDYT?

The printer pass needs a test. @mtrofin WDYT?

Yup - never hurts to add a test.

The printer pass needs a test. @mtrofin WDYT?

Yup - never hurts to add a test.

My comment was formatted badly. The test is needed for sure, wanted to know if you are fine with the rename ;)

The printer pass needs a test. @mtrofin WDYT?

Yup - never hurts to add a test.

My comment was formatted badly. The test is needed for sure, wanted to know if you are fine with the rename ;)

Ah, yes - lgtm

One overall nit: could we split this into the stand-alone rename, followed by the addition of the new code? Thanks!

mtrofin accepted this revision.Jun 18 2020, 9:26 AM

lgtm

This revision is now accepted and ready to land.Jun 18 2020, 9:26 AM
tarinduj retitled this revision from Extend InlineFeatureAnalysis to more extract generic code features to Extend InlineFeatureAnalysis to more extract generic code features [Obsolete].Jun 25 2020, 12:24 AM
llvm/unittests/Analysis/ML/FunctionPropertiesAnalysisTest.cpp