Page MenuHomePhabricator

Clean up: Refactoring the hardcoded value of 6 for FindAvailableLoadedValue()'s parameter MaxInstsToScan.
ClosedPublic

Authored by lvoufo on Sep 15 2015, 12:09 PM.

Details

Summary

Any idea why this is hardcoded?
Is it okay to switch the value from 6 to 8 (or else) to resolve the following problem?

Right now, in -O2, -early-cse replaces a load with a store, too early for -instcombine to recognize
subsequent duplicate loads as actual duplicates to be removed.

For example, consider the following code snippet:

%i = alloca %struct.A, align 4
%j = alloca %struct.A, align 4
%0 = bitcast %struct.A* %i to i8*
call void @llvm.lifetime.start(i64 4, i8* %0) #3
%call = call i32 @_Z3onev()
%a2.i = getelementptr inbounds %struct.A, %struct.A* %i, i64 0, i32 0
store i32 %call, i32* %a2.i, align 4, !tbaa !1
%1 = bitcast %struct.A* %i to i8*
%2 = call {}* @llvm.invariant.start(i64 4, i8* %1) #3
%3 = bitcast %struct.A* %j to i8*
call void @llvm.lifetime.start(i64 4, i8* %3) #3
%4 = getelementptr inbounds %struct.A, %struct.A* %i, i64 0, i32 0
%5 = getelementptr inbounds %struct.A, %struct.A* %j, i64 0, i32 0
%6 = load i32, i32* %4, align 4
store i32 %6, i32* %5, align 4, !tbaa !6
%7 = bitcast %struct.A* %j to i8*
%8 = call {}* @llvm.invariant.start(i64 4, i8* %7) #3
call void @_Z3bar1A(i32 %6)
call void @_Z3bar1A(i32 %6)
call void @_Z4foo2PK1AS1_(%struct.A* nonnull %j, %struct.A* nonnull %i)
%9 = getelementptr inbounds %struct.A, %struct.A* %i, i64 0, i32 0
%10 = load i32, i32* %9, align 4       ;  <--- duplicate. Should be removed.
call void @_Z3bar1A(i32 %10)

After -early-cse, the above becomes

%i = alloca %struct.A, align 4
%j = alloca %struct.A, align 4
%0 = bitcast %struct.A* %i to i8*
call void @llvm.lifetime.start(i64 4, i8* %0) #3
%call = call i32 @_Z3onev()
%a2.i = getelementptr inbounds %struct.A, %struct.A* %i, i64 0, i32 0
store i32 %call, i32* %a2.i, align 4, !tbaa !1
%1 = call {}* @llvm.invariant.start(i64 4, i8* %0) #3
%2 = bitcast %struct.A* %j to i8*
call void @llvm.lifetime.start(i64 4, i8* %2) #3
%3 = getelementptr inbounds %struct.A, %struct.A* %j, i64 0, i32 0
store i32 %call, i32* %3, align 4, !tbaa !6
%4 = call {}* @llvm.invariant.start(i64 4, i8* %2) #3
call void @_Z3bar1A(i32 %call)
call void @_Z3bar1A(i32 %call)
call void @_Z4foo2PK1AS1_(%struct.A* nonnull %j, %struct.A* nonnull %i)
%5 = load i32, i32* %a2.i, align 4       ;  <--- duplicate. Should be removed.
call void @_Z3bar1A(i32 %5)

where the first load from %i has been replaced with the store into %a2.i which points to %i.

For -instcombine to remove the duplicate load above, either

  • -early-cse should not merge the first load into the store -- thereby treating the load as not a "trivially redundant instruction", or
  • -instcombine should allow FindAvailableLoadedValue() to scan more than 6 instructions.

Note that @llvm.invariant.start() calls are ignored in the count, just like @llvm.lifetime.start().

A quick run has found the value of 8 to be the minimal needed, for this example case scenario.

Diff Detail

Event Timeline

lvoufo updated this revision to Diff 34824.Sep 15 2015, 12:09 PM
lvoufo retitled this revision from to Clean up: Refactoring the hardcoded value of 6 for FindAvailableLoadedValue()'s parameter MaxInstsToScan..
lvoufo updated this object.
lvoufo added reviewers: dberlin, nlewycky, resistor.
lvoufo added a subscriber: llvm-commits.
nlewycky edited edge metadata.Sep 15 2015, 4:06 PM

A few unconnected issues:

-1-
Please generate the patch with either:

git diff -U999999 other-branch

or

svn diff --diff-cmd=diff -x -U999999

per the instructions at http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface . The symptom of not doing this is that I can't easily see the rest of the function around the parts you modified, and that makes it hard to review the code. I need to go find a copy elsewhere to read.

-2-
Does "opt -basicaa -gvn" clean up the redundant load? If so, is it okay that instcombine doesn't?

-3-
My first preference is to not have instcombine do this transform at all, but I have no suggestion for how to make that happen. The xform redundant with all of mem2reg, sroa, early-cse and gvn, but IIRC it exists to fix a pass ordering problem where one iteration of instcombine makes the transform possible, and doing this load removal makes a transform in the next iteration of instcombine possible.

My second preference is to not rely on a threshold. It's subject to problems like this one, where minor changes that aren't relevant to the semantic behaviour of the program will change the code we generate. We shouldn't have those. One possibility is to track loads as we scan forwards, much like early-cse does. It's a fair amount of code, but actually quite cheap in compile time especially when compared to everything else instcombine does. The idea is that you load, you add it to a map, when you perform an action which may write memory (non-readonly call, or store) you wipe the map. This way, you never need to scan backwards.

My third preference is to query a system designed to do this with caching of queries, such as MemoryDependenceAnalysis (still scans backwards) or MemorySSA (not ready yet). Yes, this makes Instcombine a real clone of the PRE half of GVN instead of just a half-way pretending piece that doesn't really do things right.

Fixing any of the above is a serious project that would take quite some time.

And finally we can bump 6 to 8.

include/llvm/Analysis/Loads.h
41–43

The usual way to do this in LLVM is "cl::opt". Take a look at UnrollCount for an example.

64

Using a cl::opt means you won't get to have a default here (unless you pick a sigil value like -1 to indicate "use the default"). However, I think every caller of this function passes in 6 today, so the default argument is dead anyways.

However, I'm not sure why this should be a caller-controlled value at all.

lvoufo updated this revision to Diff 34852.Sep 15 2015, 4:52 PM
lvoufo edited edge metadata.

Updating diff to suggested/preferred formatting.

A few unconnected issues:

-1-
Please generate the patch with either:

git diff -U999999 other-branch

or

svn diff --diff-cmd=diff -x -U999999

per the instructions at http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface . The symptom of not doing this is that I can't easily see the rest of the function around the parts you modified, and that makes it hard to review the code. I need to go find a copy elsewhere to read.

Ok. Done.

-2-
Does "opt -basicaa -gvn" clean up the redundant load? If so, is it okay that instcombine doesn't?

It does not. Neither "-basicaa -gvn", nor "-basicaa -early-cse -gvn", "nor -basicaa -early-cse -gvn -instcombine", ...
the issue is that the load is actually eliminated with "-O1", but not with "-O2". Examining the passes in -O2, I was able to reduce the problem area down to "-basicaa -instcombine -inline -early-cse -instcombine" (on a lowered sample *.cc program, lowered with optimizations turned off).

-3-
My first preference is to not have instcombine do this transform at all, but I have no suggestion for how to make that happen. The xform redundant with all of mem2reg, sroa, early-cse and gvn, but IIRC it exists to fix a pass ordering problem where one iteration of instcombine makes the transform possible, and doing this load removal makes a transform in the next iteration of instcombine possible.

That may be so, but regardless of why it is the way it is at the moment, how it is affects compilation with -O2.
What is -instcombine designed for in the first place?
W.r.t. load instructions, I view it as a place where duplicate loads are combined so long as they are now too far apart from one another.

My second preference is to not rely on a threshold. It's subject to problems like this one, where minor changes that aren't relevant to the semantic behaviour of the program will change the code we generate. We shouldn't have those. One possibility is to track loads as we scan forwards, much like early-cse does. It's a fair amount of code, but actually quite cheap in compile time especially when compared to everything else instcombine does. The idea is that you load, you add it to a map, when you perform an action which may write memory (non-readonly call, or store) you wipe the map. This way, you never need to scan backwards.

A good starting point would be to a least understand the origin of the current semantics. What does '6' mean? Why '6'? Why not something else? Any documentation available?

My third preference is to query a system designed to do this with caching of queries, such as MemoryDependenceAnalysis (still scans backwards) or MemorySSA (not ready yet). Yes, this makes Instcombine a real clone of the PRE half of GVN instead of just a half-way pretending piece that doesn't really do things right.

Fixing any of the above is a serious project that would take quite some time.

And finally we can bump 6 to 8.

Ok. Could do this temporarily, and explore more future-proof solutions afterwards.

lvoufo added inline comments.Sep 15 2015, 5:12 PM
include/llvm/Analysis/Loads.h
42–44

Okay. Will do.

56

Not sure. DEF_MAX_INSTS_TO_SCAN was simply meant to highlight the very fact that the same value is reused everywhere anyway.

lvoufo updated this revision to Diff 34867.Sep 15 2015, 7:20 PM

Using cl::opt now.
This patch should focus only on the refactoring and documentation of the value 6.
I will change the value to 8 or consider alternative options (e.g. messing with -gvn instead of -instcombine)
in a separate patch (likely the initial const optimization patch that I'm working on).

lvoufo marked 2 inline comments as done.Sep 16 2015, 4:53 PM
lvoufo updated this revision to Diff 34950.Sep 16 2015, 5:36 PM
  • Typo fixes.
  • Any news on the documentation?
  • Is this ready to go?
  • What would make this ready to go (or not)?

-2-
Does "opt -basicaa -gvn" clean up the redundant load? If so, is it okay that instcombine doesn't?

Now it does, as of Diff 35065 of D11826. But it still does not make it okay imo since once might still expect
something like "-basicaa -instcombine -inline -early-cse -instcombine" to remove redundant loads when
"-basicaa -instcombine -early-cse -instcombine" does.

nlewycky accepted this revision.Sep 18 2015, 9:48 AM
nlewycky edited edge metadata.

LGTM with changes.

lib/Analysis/Loads.cpp
174

max-insts-to-scan is too generic, this name applies across the whole of llvm. Maybe "available-load-scan-limit"?

175–177

Mention FindAvailableLoadedValue, don't mention -instcombine.

This revision is now accepted and ready to land.Sep 18 2015, 9:48 AM
Eugene.Zelenko added a subscriber: Eugene.Zelenko.

Committed in rL248022.