Page MenuHomePhabricator

Use @llvm.invariant.start/end intrinsics to extend basic AA with invariant range analysis for GVN-based load elimination purposes [Local objects only]
Needs ReviewPublic

Authored by lvoufo on Dec 1 2015, 12:14 PM.

Details

Summary

The invariant range analysis is supported under a new pass, -invariant-range-analysis, and uses both dominator tree and post-dominator tree analyses to determine if a given instruction modifies some given memory.
For starters, this is focused only on local variables and call instructions as potential clobbers.

To avoid efficiency concerns with postdom analysis, and pending upcoming benchmark results, the effects of this analysis are also localized to GVN only (via a flag that is set only in GVN).

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
nlewycky added inline comments.Dec 8 2015, 8:55 PM
include/llvm/Analysis/InvariantInfo.h
23–25

Please alphabetize.

30

I don't understand the connection between "writeonce" (which doesn't seem to be defined anywhere? link?) and invariant.start/end. I have have as many invariant intrinsics over the same memory as I like, there's nothing 'once' about it. There's also no guarantee that the memory ever gets written to at any point in particular.

31

Typo, "intrisic_start" should be "intrinsic_start".

45

I looked up the "certain preconditions" and they appear to be "if Addr is a global or an alloca". Why those preconditions? Why not llvm::isIdentifiedObject, for example? Why aren't these preconditions mentioned on SetStartInstruction?

Ttrue if removed, false if preconditions are not met ... what if the preconditions are met and it's not removed (because it's not a member)?

lib/Analysis/BasicAliasAnalysis.cpp
503

"associated to" --> "associated with".

503–504

I don't understand this comment. Globals, for instance, aren't local to the function.

507

You assert on invariant.start with a constant global variable, but that isn't a verifier failure. You would assert if queried on this well-defined code, right?

declare {}* @llvm.invariant.start(i64 %size, i8* nocapture %ptr)
@gv = internal constant i8 0
define void @test() {
  call {}* @llvm.invariant.start(i64 1, i8* @gv)
  ret void
}

I agree the construct is not useful, either we should make it fail the verifier (this would mean auditing all parts of llvm which mark globals constant to ensure they eliminate all possible invariant intrinsic calls that use the global), or we should delete them in some optimization pass (instcombine).

508–509

Please add a space after "or". As it is, the text from the two lines get concatenated producing "...instructions ornon-constant global...".

788–789

This ...

814–816

... and this look like solid changes on their own. Can you factor them out into their own review and write testcases for them?

lib/Analysis/InvariantInfo.cpp
30

Capitalize.

36–38

Does this code simplify to:

return InvariantMarkers.lookup(const_cast<Value *>(Addr));

?

55–57

Er, the code doesn't match the comment here, right? As written, this means that if it is set, then trying to set again will unset it. That doesn't ensure the new value is inserted.

60

Extra space between words in in "alloca instruction".

69–70

You need to add a

(void)Inserted;

for people building with assertions off and -Wunused-variable on.

lvoufo marked 22 inline comments as done.Dec 9 2015, 9:10 AM
lvoufo added inline comments.
include/llvm/Analysis/InvariantInfo.h
31

The design doc stated that this was all part of simulating 'writeonce' behavior in LLVM (+Clang)...
Ideally, invariant.start instructions are generated right after first write (e.g. construction) and LLVM would reject any write after an invariant_start and before a corresponding invariant_end, but we are not reinforcing this in LLVM. Documentations like this are meant to specify (and clarify) the intended use of invariant.start/end (for simulating wirteonce semantics).
If it makes things even clearer, I could update the LangRef to this effect later (?).

42–43

meant "address", not "Address".

45

Yeah. I think this is a case of poorly-named functions after code restructuring... I'll reorganize everything accordingly.

82–84

Right. This could be clearer.
I mentioned in earlier comments that I may be getting rid of this function altogether (in the next revision of this patch), if I can't rename it appropriately, as part of separating the extended logic from the order in which GVN processes instructions.

lib/Analysis/BasicAliasAnalysis.cpp
503–504

I think the operative word is "considered".
This is merely an attempt to follow the current documentation structure of the function pointsToConstantMemory(). Its documentation states:

/// Returns whether the given pointer value points to memory that is local to                                                       
/// the function, with global constants being considered local to all                                                               
/// functions.                                                                                                                      
bool BasicAAResult::pointsToConstantMemory(const MemoryLocation &Loc,
                                           bool OrLocal)
507

I have removed the assertion.
We just want to localized this extended logic to non-constant globals, and this should be checked in InvariantInfo::SetStartInstruction().

lib/Analysis/InvariantInfo.cpp
36–38

In theory, yes. But I like to edge on the side of safety with find() over lookup() when it comes to the default initialization of pointers (or other non-user-defined objects).

The documentation of lookup says:
"lookup - Return the entry for the specified key, or a default constructed value if no such entry exists".
This explicitly returns nullptr, in place of a "default constructed value"---the specification of which could in practice change at any time, "if no such entry exists".

55–57

Removed.

119

Oops. Residuals from multiple restructurings...

lib/Analysis/MemoryDependenceAnalysis.cpp
478–480

How so?
I think it will do exactly what we want. By manipulating invariant info here based on QueryInst, we are selectively choosing when to apply the extension in BasicAA.

  • A load from %i within an invariant range for %i temporarily unsets the marking for %i.
  • A load from %j within an invariant range for %i does nothing to the marking for %i.
  • A load from %j within an invariant range for %i should preserve the value of BasicAAResult::pointsToConstantMemory() for %j, whether it is nested within an invariant range for %j or not.
  • Meanwhile, A load from %i within an invariant range for %i always leads to BasicAAResult::pointsToConstantMemory for %i == true.

What is missing?

lvoufo marked 5 inline comments as done.Dec 9 2015, 9:33 AM
lvoufo added inline comments.
test/Transforms/LoadElim/invariant.ll
1 ↗(On Diff #41942)

(1) This test case is eventually merged with other passes besides -gvn, like -instcombine. See for example D15136. You can also take a look at earlier patches linked from the design doc where we explore even more complex pass combinations. Keeping things in their respective pass directories, like GVN/ or InstCombine/, essentially duplicates the same test case several times.
As we progress through the next incremental patches, there will be more test cases that will require the same kind of duplicate tests.
To avoid all this duplication. It looks fitting to just keep all the test cases in one new folder, LoadElim.
I should probably rename it to InvariantInfo since these test cases are really more about the effects of manipulating InvariantInfo's more than they are about -gvn, -instcombine, or -globalopt.

(2) -gvn already requires -basicaa. So, -basicaa would be redundant in '-basicaa -gvn'. I could add it for clarity if you prefer.

"(2) -gvn already requires -basicaa. So, -basicaa would be redundant in
'-basicaa -gvn'. I could add it for clarity if you prefer.
"
-gvn does not require -basicaa.
You've stated this a few times, and i'm not sure why you think this.

<from GVN>:
void getAnalysisUsage(AnalysisUsage &AU) const override {

AU.addRequired<AssumptionCacheTracker>();
AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();
if (!NoLoads)
  AU.addRequired<MemoryDependenceAnalysis>();
AU.addRequired<AAResultsWrapperPass>();

}

GVN requires AC, DominatorTree, and TLI all the time.
It uses AA results, which is "whatever aa you have enabled".
It requires memorydependence if you want to optimize loads.
Memory dependence, in turn, does not require basicaa either:

void MemoryDependenceAnalysis::getAnalysisUsage(AnalysisUsage &AU) const {

AU.setPreservesAll();
AU.addRequired<AssumptionCacheTracker>();
AU.addRequiredTransitive<AAResultsWrapperPass>();
AU.addRequiredTransitive<TargetLibraryInfoWrapperPass>();

}

No passes *require* basicaa, because basicaa is an optional pass providing
AA results.

lvoufo added a comment.EditedDec 10 2015, 11:27 AM

"(2) -gvn already requires -basicaa. So, -basicaa would be redundant in
'-basicaa -gvn'. I could add it for clarity if you prefer.
"
-gvn does not require -basicaa.
You've stated this a few times, and i'm not sure why you think this.

<from GVN>:

void getAnalysisUsage(AnalysisUsage &AU) const override {
      AU.addRequired<AssumptionCacheTracker>();
      AU.addRequired<DominatorTreeWrapperPass>();
      AU.addRequired<TargetLibraryInfoWrapperPass>();
      if (!NoLoads)
        AU.addRequired<MemoryDependenceAnalysis>();
      AU.addRequired<AAResultsWrapperPass>();

}

GVN requires AC, DominatorTree, and TLI all the time.
It uses AA results, which is "whatever aa you have enabled".
It requires memorydependence if you want to optimize loads.
Memory dependence, in turn, does not require basicaa either:

void MemoryDependenceAnalysis::getAnalysisUsage(AnalysisUsage &AU) const {

  AU.setPreservesAll();
  AU.addRequired<AssumptionCacheTracker>();
  AU.addRequiredTransitive<AAResultsWrapperPass>();
  AU.addRequiredTransitive<TargetLibraryInfoWrapperPass>();
}

No passes *require* basicaa, because basicaa is an optional pass providing
AA results.

Point taken. I may be misusing "require" here, but I sure understand what is going on better now thanks to this.
To clarify my statements and for added clarity from your points above, I have 2 questions:

1 - What is one to make of the dependence on BasicAAWrapperPass in the following?

INITIALIZE_PASS_BEGIN(AAResultsWrapperPass, "aa",
                      "Function Alias Analysis Results", false, true)
INITIALIZE_PASS_DEPENDENCY(BasicAAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(CFLAAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(ExternalAAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(ObjCARCAAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(SCEVAAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(ScopedNoAliasAAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(TypeBasedAAWrapperPass)
INITIALIZE_PASS_END(AAResultsWrapperPass, "aa",
                    "Function Alias Analysis Results", false, true)

So far as I understand, this *will* initialize (and register) a BasicAAWrapperPass pass which, when run (either via run() or runOnFunction()), will build a BasicAAResult object.

2 - There does not seem to be a difference between opt -gvn and opt -basicaa -gvn. In fact, running opt -gvn -debug-pass=Arguments and opt -basicaa -gvn -debug-pass=Arguments both return the following, where -basicaa is automatically added.

Pass Arguments:  -targetlibinfo -tti -assumption-cache-tracker -basicaa -domtree -aa -memdep -gvn -verify
lvoufo updated this revision to Diff 43568.EditedDec 23 2015, 5:40 PM
lvoufo retitled this revision from Use @llvm.invariant.start/end intrinsics to extend the meaning of basic AA's pointsToConstantMemory(), for GVN-based load elimination purposes [Local objects only] to Use @llvm.invariant.start/end intrinsics to extend basic AA with invariant range analysis for GVN-based load elimination purposes [Local objects only].
lvoufo updated this object.
lvoufo added a reviewer: hfinkel.

I will be running benchmarks over next week and keep you posted. For now, I just want to see if this is headed in the right direction... Please let me know what you think.

Note: This patch does not fully implement the PostDominatorTree pass for use with the new pass manager. If this patch is viable, then one will have to fully migrate the PostDominatorTree from the legacy to the new pass manager.

Later patches will also adapt the logic of InvariantInfo::pointsToReadOnlyMemory() to other (applicable) uses of basic AA's pointsToConstantMemory(), e.g, in FunctionAttrs, LICM, for other kinds of instructions, etc... (See included FIXMEs/TODOs.)

lvoufo updated this revision to Diff 43977.Jan 5 2016, 3:05 AM

Ensure invariant info is properly shared between multiple passes, e.g., GVN and basic AA, or -On.

lvoufo updated this revision to Diff 44174.Jan 6 2016, 3:25 PM

Note that uses of invariant_start instructions can be other that invariant_end.

lvoufo updated this revision to Diff 44354.Jan 8 2016, 12:18 PM

Fix typo (resulting in warning) in PostDominators.h

chandlerc edited edge metadata.Jan 12 2016, 8:15 PM

First pass of comments. Also, see my reply to Hal on the earlier thread as pertains to post-dominance information.

While many of these issues are small, there are a couple of pretty significant design comments below. Happy to discuss those to try to make sure you have the right design before you work on another iteration.

include/llvm/Analysis/InvariantInfo.h
72

Using an init function is not terribly common in LLVM. Why not use a constructor?

82

The style of this patch seems mixed. Sometimes methods start with a lower case, some times an upper case. Please try to consistently follow the LLVM coding standards.

86–97

Why are these public methods? they only appear to be called from within the implementation or from diagnostic utilities.

196–219

Why use a pass at all? You don't seem to be getting much value from the pass infrastructure currently. It looks like currently, you could just have a function like "populateInfo" which returns an InvariantRangeInfo object after analyzing that function, and the InvariantRangeInfo object could have a method called "pointsToReadOnlyMemory"?

Do you have specific future changes that would require the pass infrastructure to work? It's entirely possible I'm missing something, but if so, it would be very helpful to explain this kind of detail in the comments.

include/llvm/Analysis/PostDominators.h
118–135

Please split the port of postdom to the new pass manager into a separate patch. Even if it is truly necessary, the port shouldn't be part of the patch that introduces the invariant stuff.

lib/Analysis/BasicAliasAnalysis.cpp
710

It isn't clear what value this helper function is providing.

Is it valid to pass a null pointer here? I find that surprising in and of itself.

Either way, I think it would be more clear to either teach the routine that is already provided by InvariantInfo.h to accept ImmutableCallSite objects, or to inline the extraction of the instruction into the places this is called below.

788–789

This formatting seems inconsistent. Run clang-format?

814–816

You haven't addressed Nick's comment that teaching BasicAA to ignore the invariant intrinsic calls themselves is a separable patch that should be split out and submitted independently. Please do so.

lib/Analysis/InvariantInfo.cpp
37–39

I don't really understand your concern here.

The comments are using imprecise language in that they're not using the standard's precise terminology, but the intent seems clear: it returns whatever "T()" produces. That *always* produces a null pointer when T is a pointer type.

All of LLVM's code (and much of Clang's) relies on the lookup method being reasonable to call when the result is a pointer. Please follow this convention as suggested by Nick originally.

90–96

You have two different API comments, one in the header, and one in the source. Please only have one comment. For public interfaces, it should be in the header. For private functions, you can put it near the implementation if you want.

lib/Analysis/PostDominators.cpp
53–68

This doesn't seem like it belongs in the postdom pass. Instead if anything it should live somewhere with the invariant handling code.

lib/Transforms/Scalar/GVN.cpp
2165–2171

This does not seem like the right design and seems over-engineered.

Here is an alternative that seems much simpler: pass your InvariantRangeAnalysis or InvariantRangeInfo in as an optional parameter to MemoryDependenceAnalysis::getSimplePointerDependencyFrom and the few APIs that call it. Then GVN can opt into building the invariant information structures and passing them into the query and other passes can skip that. Then, within memdep, where we query AA, also query the invariant analysis.

Perhaps there are other designs as well that would be simpler and more direct? I'm curious what others think.

lvoufo added a comment.EditedJan 13 2016, 9:26 PM

Alas, I wanted to address all these comments today, but I ended up being consumed with a bit of data crunching, while building clang with different versions of clang (with and without this patch).
For what it's worth---and my intention was for this to help this ongoing discussion, here are the results:
https://docs.google.com/spreadsheets/d/19W1167l9QFrXMXccX4-cValmC5EtlFfUbT7Tr3xaoJs/edit#gid=1063925865usp=sharing

More load instructions are deleted as expected, a little bit above 9% increase, and there are other gains and losses (highlighted in the spreadsheet) which I haven't gotten around to analyzing yet.
In general, there is either a slight compilation time increase, below 1%, when there is no decrease.
And there is a runtime improvement of 4% with clang binaries build with GCC.
Unexpectedly, there is a small runtime hit, below 0/5%, with clang binaries built with clang.

The times here were collected while other programs may have been running on my machine. I am going to rerun the scripts overnight (with all other programs turned off) and see if we get different outcomes in tomorrow...

In the meantime, I welcome any suggestion for better data crunching...

I will also be addressing all the above comments tomorrow.

I'm sorry, what is this data comparing? Specifically, what is extended in
clang? Is it just this patch applied?

lvoufo marked 15 inline comments as done.Jan 15 2016, 4:40 AM

I hope I addressed all the comments. Let me know if I missed anything.

include/llvm/Analysis/InvariantInfo.h
72

Perhaps set makes more sense here than init.

InvariantInfo serves two purposes. First, it tracks all invariant intrinsics in the input program. Second, it performs invariant region analysis *on demand*, from basic AA and GVN, based on tracked intrinsics. The region analysis requires dominance info, but tracking intrinsics, e.g. to print out or verify invariant info (via the passes InvariantInfoPrinterPass or InvariantInfoVerifierPass), does not need dominance info. So, it does not make much sense to require dominance info in the constructor of InvariantInfo.

Besides, requiring dominance info in the constructor does not work very well with the pass manager system(s). For example, look at the uses of InvariantInfo in the passes InvariantInfoAnalysis and InvariantRangeAnalysisPass. There is no guarantee that DominatorTree or PostDominatorTreeT instances will be created (and present) when these passes are instantiated. Even if the instances are present, there is no guarantee that they would be the instances that the range analysis needs to be run with.

Ideally, instead of introducing init() here, one would make both InvariantRangeAnalysisPass and PostDominatorTree dependencies of BasicAAWrapperPass (along with DominatorTreeWrapperPass), and pass the arguments of init() herein as additional arguments to pointsToReadOnlyMemory(). But this either stands in the way of or is redundant with two of our objectives, which are as follows.

  1. Keep as much of our changes -- and their effect -- as possible local to GVN, and only for purposes of load elimination (for now)---at least until we are okay with using post-dominance info more pervasively throughout LLVM.
  2. Ensure that both dominator and post-dominator passes are instantiated whenever invariant range analysis is enabled. (Note that we can do the analysis without both passes.)

To achieve Objective #1, we rely on calls to EnableInvariantRangeAnalysis() from GVN::runOnFunction() which appropriately enable and disable the analysis. This allows us to avoid adding a new field in BasicAAResults, and, more generally speaking, to avoid creating situations where implementation details are unnecessarily exposed at function call sites thereby hindering future-proof-ness. (In other words, the pointsToReadOnlyMemory() call from basic AA does not have to express that its implementation uses dominance info. That kind of detail should be left to InvariantInfo if possible.)

To achieve Objective #2, we rely on InitDependencies(), which throws a runtime exception if both passes are not available when invariant range analysis is enabled, and directly sets pointers to the passes in InvariantInfo. This is okay because we have made sure to add PostDominatorTree as a dependency of GVN that is explicitly required in getAnalysisUsage(). We cannot directly make the passes dependencies of InvariantRangeAnalysisPass because InvariantRangeAnalysisPass is an ImmutablePass...

Instead of relying on InitDependencies(), could we just not do range analysis if the passes are not available? Sure. But we just would not know if we've erroneously missed opportunities for load elimination.

As per init, InitDependencies should probably be renamed to SetDependencies.

82

My mistake... I'll fix that.

86–97

Right. I'll fix these too. (I was preoccupied with the functional aspect of the implementation.)

196–219

Ok. This was addressed in at least one previous thread. In principle, the -invariant-range-analysis pass is similar to the -assumption-tracker pass, except that it enables selective data analysis in addition to data tracking. (-assumption-tracker only does data tracking, enabling analysis by other passes.)

Like -assumption-tracker, -invariant-range-analysis populates and uses tracked data across different passes, in this case GVN and basic AA. GVN selectively enables the analysis while basic AA requests the analysis to be performed... cf. Objective #1 in earlier comment.

The selective process in GVN could be (incrementaly) adapted in other passes beyond GVN where it makes sense to extend basic AA with invariant range analysis, e.g., places where pointsToConstantMemory() is called with OrLocal == false.

include/llvm/Analysis/PostDominators.h
118–135

Yes. You probably missed this, but when this patch was submitted, the description indicated that the port will have to be done separately...
Indeed, this is a very minimal port, only intended for getting this patch working. The actual port will have to be a bit more substantial than that...
(I am ready to do it as soon as the invariant portion of this patch is agreed upon.)

lib/Analysis/BasicAliasAnalysis.cpp
503–504

I am marking these off because this extension has been moved out of here and into pointsToReadOnlyMemory()...

710

Ok.

788–789

Ok. Thought I did, but will try again, and pay closer attention...

788–789

This is necessary for the test cases below, where load instructions are merged into other load instructions across intrinsic calls.
Consider calling getModRef(Inst, Loc) from memdep where Inst is an intrinsic_start call. Without this, pointsToReadOnlyMemory() does not know that Inst does not modify Loc.Ptr if there is Inst itself is not already in another invariant range over Loc.Ptr.

814–816

I just addressed Nick's comment above. Sorry I missed it earlier.

I am not sure that all of the test cases below would still work without this... I think that keeping these around are actually an essential part of the extended logic for load elimination. I'll take another look and see...

lib/Analysis/InvariantInfo.cpp
37–39

Right. This comment is outdated by the way... The implementation was very different when at the time of Nick's comment.

90–96

Ok. Will do.

140

Marking these off because they are outdated...

lib/Analysis/PostDominators.cpp
53–68

Yes. This part should remain with this pass, while the rest of the postdom migration to the new pass manager should go in the postdom migration patch.

lib/Transforms/Scalar/GVN.cpp
2165–2171

Hmm... See my earlier reply about Objective #1. This sounds like it'd basically shift the selective process from GVN into memdep, and increase opportunities for errors along the way. Note that GVN does not call getSimplePointerDependencyFrom directly. It starts with getDependency which eventually calls getSimplePointerDependencyFrom through multiple hoops in the call chain.

I prefer to keep things as simple as possible, i.e., enable the analysis in one place and not worry about it again until the analysis is requested in basic AA...

Open for alternative suggestions too.

lvoufo marked an inline comment as done.EditedJan 15 2016, 5:46 PM

I'm sorry, what is this data comparing? Specifically, what is extended in
clang? Is it just this patch applied?

[Updating my earlier reply in email here -- to make sure it doesn't get lost...]
Yes, what is extended in clang is just this patch applied (i.e., anything named "opt*").
This is comparing build times, execution times, and statistical data from "-mllvm -stats" between the version of clang with this patch applied and a version of clang without the patch (i.e., anything named "cur*").

These comparison data are respectively in the sheets named "Build Clang", "Run Clang" and "Stats [*]".
Just in case, I also generated opt*- and cur*- clang binaries using both clang and gcc.
The sheets named "Stats [Clang]" and "Stats [GCC]" both represent statistical data for clang built with clang and gcc.

For run times, I simply ran regression tests. There may be other ways to get better runtime measurements (?). But I thought regression tests could give us something to start with.

I also put some descriptions under the "Legend" sheet. I recognize that this could be clearer. So, please don't hesitate to ask me for any clarification.

lvoufo added a comment.EditedJan 15 2016, 6:07 PM

I'm sorry, what is this data comparing? Specifically, what is extended in
clang? Is it just this patch applied?

[Updating my earlier reply in email here -- to make sure it doesn't get lost...]
Yes, what is extended in clang is just this patch applied (i.e., anything named "opt*").
This is comparing build times, execution times, and statistical data from "-mllvm -stats" between the version of clang with this patch applied and a version of clang without the patch (i.e., anything named "cur*).

These comparison data are respectively in the sheets named "Build Clang", "Run Clang" and "Stats [*]".
Just in case, I also generated opt*- and cur*- clang binaries using both clang and gcc.
The sheets named "Stats [Clang]" and "Stats [GCC]" both represent statistical data with clang build clang and gcc.

For run times, I simply ran regression tests. There may be other ways to get better runtime measurements (?). But I thought regression tests could give us something to start with.

I also put some descriptions under the "Legend" sheet. I recognize that this could be clearer. So, please don't hesitate to ask me for any clarification.

[Update] The second pass at the benchmarks script is still running, amidst a few hiccups which are now resolved... But with the results that I currently have, I think we have a new piece of observation, that is as follows:
For a little bit over 2000 invariant ranges generated, and an average of 433 invariant range analysis computed per range (also approximately the number of instructions within each range), it looks like we get:

  • a 9.26% improvement on the number of load instructions deleted (15 more loads / range),
  • a 3.51% improvement on the number of instructions deleted (21 more instrs / range), and
  • a 8.46% (or 2.85%) runtime improvement with clang-built (or gcc-built) binaries.

I do get slightly different stats numbers than with the first run, but with the same percentages. So, it is possible that I may have missed something while fidgeting with the hiccups. So, I will re-run all these again over the week-end and see if the numbers get more stable. I do not anticipate any more hiccups. So, if everything goes well, the new numbers should be in on Sunday evening.

I'm open to any comment. But if you prefer to wait until after the 3rd run, that's fine too. Will keep you posted.

Why post-dominance analysis? I'm glad you asked. :)
Check out this doc: https://docs.google.com/document/d/1R-gINRdpxzLy82EZK_ymnVNyOPO-tzW5ksNcHuRvxBs/edit?usp=sharing.
Comments welcome.

On the benchmark results: The previous runs have led to results that are inconsistent enough that I am revising the benchmarking environment in use. Will keep you posted once I have a more reliable environment and more consistent data.

Sorry for not mentioning it here sooner, but I've put some comments in the
document. Let me know where you'd like to conclude that discussion. I think
it will be hard to dig much further into the code without resolving that
fundamental question.

lvoufo added a subscriber: lvoufo.EditedJan 29 2016, 6:30 PM

Hello All,

A quick note to drop a quick summary of my meeting with Nick (below).

Stats at:
https://docs.google.com/spreadsheets/d/19W1167l9QFrXMXccX4-cValmC5EtlFfUbT7Tr3xaoJs/edit?usp=sharing#gid=1063925865

Summary/Highlights:

  • The number of generated intrinsics is significant enough to validate the analysis.
  • Also helps to know that the stats are the same regardless of whether the host compiler for the LLVM binaries is GCC or Clang.
  • Expected wins from GVN:
    • Number of equalities propagated: 416 : 1.24%
    • Number of instructions deleted: 43905 : 3.51%
    • Number of instructions simplified: 678 : 0.25%
    • Number of loads deleted: 30762 : 9.26%
    • Number of loads PRE’d: 9826 : 10.77%
  • Undesired losses from GVN
    • Number of blocks merged -301 : -6.51%
    • Number of instructions PRE’d -1023 : -2.94%
  • Losses could be due to the fact that other parts of the compilation pipeline do not know about invariant intrinsics yet.
    • Need to teach other passes about the intrinsics,
      • in particular, -sroa and -early-cse,
      • then -functionattrs, -inline, etc…
    • The current patch does not have to wait for other passes to learn about the intrinsics, but
    • The corresponding Clang patch (generating the intrinsics) will have to wait until the passes are properly taught.
  • A positive effect in -instcombine:
    • Number of constants folds, instructions combined, and instructions sunk.
    • GVN may have set this up well for -instcombine, even though -instcombine does not know about invariant intrinsics yet!

Have a good weekend! -- Larisse.

lvoufo added a comment.EditedJan 29 2016, 6:55 PM

In an effort to keep review comments within code review system, I'm uploading a PDF snapshot of the document arguing for post-dominance analysis at https://docs.google.com/document/d/1R-gINRdpxzLy82EZK_ymnVNyOPO-tzW5ksNcHuRvxBs/edit?usp=sharing here.

Here's the PDF snapshot:

lvoufo added a comment.EditedFeb 1 2016, 1:13 PM

Some thoughts going through my head... I'm reading through Keith D. Cooper and Linda Torczon's book, "Engineering a Compiler", 10.3.1 [Eliminating Useless Code]; and here are some excerpts:

  • "The algorithm [...] consists of two passes. The first [...] discovers the set of useful operations. The second [...] removes useless operations. [The first] relies on reverse dominance frontiers [...]"
  • "The treatment of operations other than branches or jumps is straightforward. The marking phase determines whether an operation is useful. The sweep phase removes operations that have not been marked as useful."
  • "The treatment of control-flow operations is more complex. [...] As the marking phase discovers useful operations, it also marks the appropriate branches as useful. To map from a marked operation to the branches that it makes useful, the algorithm relies on on the notion of control dependence."
  • "The definition of control dependence relies on postdominance. [...]. In a CFG, node j is control-dependent on node i if and only if (1) [...] j postdominates every node on the path after i. [...] and (2) j does not strictly postdominate i. [...]"
  • "The notion of control dependence is captured precisely by the reverse dominance frontier of j [...]. Reverse dominance frontiers are simply dominance frontiers computed on the reverse CFG."

My thoughts are:

  1. The main difference between the algorithm in the book and what we are doing is two folds:
    1. through getModRef() in -memdep, we identify useless operations instead of useful ones, and
    2. we do so using whether an invariant_end is control-dependent on a load-clobber, instead of whether a branch is control-dependent on a useful operation.
  2. control-dependence is said to "require postdominance" and to be "captured precisely by the reverse dominance frontier", which is "simply dominance frontier computed on the reverse CFG"; which is exactly what our current implementation of postdominator as a dual of dominator gives us.
  3. The book does not seem to reference doing control-dependence any other way. So I highly doubt that we can get to a complete and sound solution for this range analysis without postdominance.

Thoughts? Did I miss anything? If so, what did I miss?

Thanks,

  • Larisse.
lvoufo updated this revision to Diff 52373.Apr 1 2016, 7:38 AM
lvoufo edited edge metadata.

Update the patch with respect to API changes in LLVM since the last version. Of particular interest here:

  • The PostDominators API has been restructured as suggested from the last version of this patch, which simplifies this patch.
  • The count of invariant range analysis requests has been restricted to non-trivial requests.
lvoufo added a comment.Apr 1 2016, 7:53 AM

Update:
Over the past few weeks, I have been researching how to best benchmark this work, and am finalizing an infrastructure that will help with this and other similar extensions (where existing benchmarks do not cover enough of some given extensions' use case).
I am hoping to use that infrastructure next week or so to crunch some useful data for this discussion.

Benchmarking put aside, could someone please remind what the fundamental concern to pushing this in right now is?
The only thing that comes to mind is that this makes use of undefined behavior that is currently not handled by UBSan; which is thus likely to cause regressions on some systems. To get around this, until UBSan checks for const objects, we could disable the pass by default.

After this patch goes in, note that there is still a lot of work that must be done, incrementally, to:
1 - teach other LLVM passes about the invariant intrinsics,
2 - broaden the application of the intrinsics to more complex const objects like const member objects, and
3 - generate the intrinsics from the frontend.

So, the sooner this goes in, the quicker the progress in actually using "const" (and similar features) for optimization purposes.
The later the goes in, the more outdated my gazilion previous patches (which are waiting in the background), will be. :)

Please advice.

Thanks,

  • Larisse.
lvoufo updated this revision to Diff 52425.Apr 1 2016, 2:22 PM

Syntax change -- in the way that unused variable warnings are disabled.

reames resigned from this revision.Apr 18 2016, 6:07 PM
reames removed a reviewer: reames.
hfinkel edited edge metadata.Apr 26 2016, 3:23 PM

Benchmarking put aside, could someone please remind what the fundamental concern to pushing this in right now is?

It's been a while, so I apologize if I'm off base on this...

I recall being concerned about the use of single-invariant-end postdominance here, because it is not, fundamentally, the correct property. You don't care that a single invariant-end call postdominates the access, but that the set of all such calls do. As you allude to in the PDT change, once exceptions are enabled, there are multiple paths on which a destructor is called. If I assume that, in general, you want an invariant-start intrinsic after the constructor call, and an invariant-end call before the destructor call, then you need to deal with this property. Even if Clang always generates one cleanup block, and even when exceptions are disabled, nothing prevents other passes from "tail merging" the cleanup block. Given that, with exceptions enabled, we already have the more-general case of multiple ends per start, we should design this around a scheme that can handle that situation.

lib/Analysis/BasicAliasAnalysis.cpp
797

By "all call sites" what do you mean? Invokes? Please either make the FIXME more explanatory, or remove it.

lib/Analysis/PostDominators.cpp
53–68

I don't understand your response. Modifying the definition of post-dominance to add a special case for exception unwinding desired by the invariant analysis seems unwise - i.e. likely to lead to bugs in other users of this code.

lib/Transforms/InstCombine/InstructionCombining.cpp
1982

Why is Intrinsic::objectsize handling in this patch?