This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
12/13
GVN.cpp
-
test/Transforms/GVN/
-
Transforms/
-
GVN/
5/5
loadpre-missed-opportunity.ll

Differential D84181

[GVN] Rewrite IsValueFullyAvailableInBlock()
ClosedPublic

Authored by lebedev.ri on Jul 20 2020, 9:20 AM.

Download Raw Diff

Details

Reviewers

fhahn
nikic
mkazantsev
jdoerfert
lattner

Commits

rGe40315d2b4ed: [GVN] Rewrite IsValueFullyAvailableInBlock(): no recursion, less false-negatives

Summary

While this doesn't appear to help with the perf issue being exposed by D84108,
the function as-is is very weird, convoluted, and what's worse, recursive.

There was no need for SpeculativelyAvaliableAndUsedForSpeculation,
tri-state choice is enough. We don't even ever check for that state.

The basic idea here is that we need to perform a depth-first traversal of the
predecessors of the basic block in question, either finding a preexisting
state for the block in a map, or inserting a "placeholder" SpeculativelyAvaliable,

If we encounter an Unavaliable block, then we need to give up search,
and back-propagate the Unavaliable state to the each successor of said block,
more specifically to the each SpeculativelyAvaliable we've just created.

However, if we have traversed entirety of the predecessors and have not
encountered an Unavaliable block, then it must mean the value is fully available.
We could update each inserted SpeculativelyAvaliable into a Avaliable,
but we don't need to, as assertion excersizes, because we can assume that if
we see an SpeculativelyAvaliable entry, it is actually Avaliable,
because during the time we've produced it, if we would have found that it
has an Unavaliable predecessor, we would have updated it's successors,
including this block, into Unavaliable

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lebedev.ri created this revision.Jul 20 2020, 9:20 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptJul 20 2020, 9:20 AM

xbolva00 added a subscriber: xbolva00.Jul 20 2020, 9:23 AM

xbolva00 added inline comments.

llvm/lib/Transforms/Scalar/GVN.cpp
683	Available (Same typo below)

Harbormaster failed remote builds in B64940: Diff 279268!Jul 20 2020, 10:04 AM

Fix typo Avaliable->Available everywhere in the patch.

llvm/lib/Transforms/Scalar/GVN.cpp
683	Ugh, thanks!

While this doesn't appear to help with the perf issue being exposed by D84108, the function as-is is very weird, convoluted, and what's worse, recursive.

Would it be possible to share the pre-GVN IR for the problematic case? (Independently of this cleanup, just curious)

But i'm not sure instructions is *really* the right metric here, Because task-clock stat really improved, by -1.5% .. -5% all across the board.

This is just (very large) noise. I've added a disclaimer to the page that the task-clock/wall-time results are best ignored.

Harbormaster failed remote builds in B64953: Diff 279296!Jul 20 2020, 9:16 PM

lattner resigned from this revision.Jul 20 2020, 10:02 PM

lebedev.ri edited the summary of this revision. (Show Details)Jul 21 2020, 2:14 AM

In D84181#2163120, @nikic wrote:

While this doesn't appear to help with the perf issue being exposed by D84108, the function as-is is very weird, convoluted, and what's worse, recursive.

Would it be possible to share the pre-GVN IR for the problematic case? (Independently of this cleanup, just curious)

I don't have any particular problematic case in mind, but here's pre-GVN (i disabled all passes in pipeline starting with GVN)
llvm-test-suite/MultiSource/Applications/JM/lencod/context_ini.c (produced with D84108)

lencod-context_ini-pre-gvn.ll.xz55 KBDownload

But i'm not sure instructions is *really* the right metric here, Because task-clock stat really improved, by -1.5% .. -5% all across the board.

This is just (very large) noise. I've added a disclaimer to the page that the task-clock/wall-time results are best ignored.

And here's standalone results:

y$ perf stat --repeat=100 /builddirs/llvm-project/build-Clang10-Release/bin/opt-old /tmp/lencod-context_ini-pre-gvn.bc -o /dev/null -gvn ; perf stat --repeat=100 /builddirs/llvm-project/build-Clang10-Release/bin/opt /tmp/lencod-context_ini-pre-gvn.bc -o /dev/null -gvn

 Performance counter stats for '/builddirs/llvm-project/build-Clang10-Release/bin/opt-old /tmp/lencod-context_ini-pre-gvn.bc -o /dev/null -gvn' (100 runs):

            795.35 msec task-clock                #    0.999 CPUs utilized            ( +-  0.04% )
                 3      context-switches          #    0.004 K/sec                    ( +-  4.66% )
                 0      cpu-migrations            #    0.000 K/sec                  
              2723      page-faults               #    0.003 M/sec                    ( +-  0.23% )
        3186511786      cycles                    #    4.006 GHz                      ( +-  0.04% )  (83.20%)
         522856359      stalled-cycles-frontend   #   16.41% frontend cycles idle     ( +-  0.09% )  (83.36%)
         956532477      stalled-cycles-backend    #   30.02% backend cycles idle      ( +-  0.11% )  (33.44%)
        2668685815      instructions              #    0.84  insn per cycle         
                                                  #    0.36  stalled cycles per insn  ( +-  0.04% )  (50.05%)
         514286258      branches                  #  646.619 M/sec                    ( +-  0.03% )  (66.64%)
          19609714      branch-misses             #    3.81% of all branches          ( +-  0.06% )  (83.24%)

          0.795919 +- 0.000313 seconds time elapsed  ( +-  0.04% )


 Performance counter stats for '/builddirs/llvm-project/build-Clang10-Release/bin/opt /tmp/lencod-context_ini-pre-gvn.bc -o /dev/null -gvn' (100 runs):

            787.80 msec task-clock                #    0.999 CPUs utilized            ( +-  0.04% )
                 3      context-switches          #    0.003 K/sec                    ( +-  6.11% )
                 0      cpu-migrations            #    0.000 K/sec                    ( +- 70.35% )
              2703      page-faults               #    0.003 M/sec                    ( +-  0.20% )
        3156293784      cycles                    #    4.006 GHz                      ( +-  0.04% )  (83.23%)
         520963102      stalled-cycles-frontend   #   16.51% frontend cycles idle     ( +-  0.09% )  (83.24%)
         931974714      stalled-cycles-backend    #   29.53% backend cycles idle      ( +-  0.14% )  (33.52%)
        2681503031      instructions              #    0.85  insn per cycle         
                                                  #    0.35  stalled cycles per insn  ( +-  0.05% )  (50.28%)
         515472885      branches                  #  654.318 M/sec                    ( +-  0.03% )  (66.96%)
          18427045      branch-misses             #    3.57% of all branches          ( +-  0.08% )  (83.43%)

          0.788365 +- 0.000320 seconds time elapsed  ( +-  0.04% )

So there it's an improvement of -0.949% +- 0.04%

lebedev.ri mentioned this in D84108: [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline.Jul 21 2020, 5:29 AM

Though I don't see obvious problems in the new algorithm, I cannot wrap my mind around the old one and can't say if they are doing the same thing. I'd suggest the following reliable way to make sure it is NFC:

Instead of changing logic of IsValueFullyAvailableInBlock, introduce a new version of it and use it.
In debug mode, also call the old one and assert that the results are the same.
Then, after it has been in for a long enough while and we are certain those algorithms do the same thing, remove the old one.

WDYT?

llvm/lib/Transforms/Scalar/GVN.cpp
700–701	This name only makes sense because it is used as DFS traversal root. In function signature, `BB` is more clear.

In D84181#2168668, @mkazantsev wrote:

Though I don't see obvious problems in the new algorithm, I cannot wrap my mind around the old one

Yeah, i had that feeling myself :/ That's why i've put @lattner as reviewer,
since it was him who seems to have wrote/modified it in rL60408/rL60588 ~12 years ago.

and can't say if they are doing the same thing.

Any particular part that is causing trouble?
The code essentially consists of two separate parts, "maze solving" and backpropagation.

I think, backpropagation is obviously identical at least?
We start at the first block we've found to be an unavailable.
We need to mark each successor block, that isn't known to be available, as unavailable.
Also, to not pay for what we don't use, we should only mark the blocks that we care about,
i.e. those that are already present in the FullyAvailableBlocks map.
So we put said block into worklist, and do pretty typical worklist dance,
take block from worklist, and query (with insert!) it's availability status in a map.
WARNING that in the old version, if the block wasn't in a map already, this adds it as unavailable!
Then, if we have found out that it was unavailable, we just go to the next entry in worklist.
If it wasn't unavailable, we mark it as such
(WARNING: in old version, it would also mark available blocks as unavailable, which is obviously bogus,
while we should only do that for speculatively available blocks),
and proceed to further backpropagate that information to it's successors (by putting them into worklist).

The "maze solving" seems pretty intuitive in hindsight, too:
We start at some block, we want to know if some value is available in that block.
So we see if the map already contains an entry for the block, then we know our answer,
it is either explicitly marked as unavailable, or it's available/speculatively available.
Otherwise, as a perf optimization, we think it will be available, so that added an entry to the map,
marking the block as speculatively available.
Now, we need to check if the value can live-in from each(!) predecessor.
The value can't live-in if there are no predecessors and the block is unavailable.
If there are predecessors, we need to visit each one and do the all over again.

As soon as we find out that the block is unavailable, we have a problem,
we may have marked blocks as speculatively available along the way to the unavailable block.
It is important to notice that we don't deviate from our exploration path,
so all the blocks we have just marked as speculatively available are successors of this block.

Otherwise, if we never encounter an unavailable block, then we're all good.
Since we would have retrospectively fixed all the blocks we have set as speculatively available
as unavailable, from now on, all these speculatively available blocks are available.

All this logic is verified with asserts in the end.

I'd suggest the following reliable way to make sure it is NFC:

Instead of changing logic of IsValueFullyAvailableInBlock, introduce a new version of it and use it.

In debug mode, also call the old one and assert that the results are the same.

Then, after it has been in for a long enough while and we are certain those algorithms do the same thing, remove the old one.

WDYT?

I'm afraid it's not that easy to do.

We'd need to workaround the situation where the old code gave up due to the gvn-max-recurse-depth recursion limit
We'll need to duplicate FullyAvailableBlocks so we pass the identical one into both the old version and the new one. Note that we can't equality-compare them afterwards, because the new code, unlike the old one, in backpropagation, doesn't insert new nodes
The new code doesn't (erroneously) mark previously available successor nodes of an unavailable node unavailable. This technically i guess makes it a non-NFC patch,

This overall makes sense I think.

In D84181#2168702, @lebedev.ri wrote:

In D84181#2168668, @mkazantsev wrote:>

I'd suggest the following reliable way to make sure it is NFC:

Instead of changing logic of IsValueFullyAvailableInBlock, introduce a new version of it and use it.

In debug mode, also call the old one and assert that the results are the same.

Then, after it has been in for a long enough while and we are certain those algorithms do the same thing, remove the old one.

WDYT?

I'm afraid it's not that easy to do.

We'd need to workaround the situation where the old code gave up due to the gvn-max-recurse-depth recursion limit

I think we also need some kind of limit on the worklist size for the iterative algorithm. Without it we might traverse the whole function in some cases, which could lead to huge compile-times in some degenerate cases I think.

We'll need to duplicate FullyAvailableBlocks so we pass the identical one into both the old version and the new one. Note that we can't equality-compare them afterwards, because the new code, unlike the old one, in backpropagation, doesn't insert new nodes

The new code doesn't (erroneously) mark previously available successor nodes of an unavailable node unavailable. This technically i guess makes it a non-NFC patch,

Yes I think this shouldn't be marked as NFC.

It would be interesting however, how often the answers diverge (with respect the how the result is used in GVN). For example, it would be interesting to know how often PRE triggers in GVN on some large programs (e.g. MultiSource/SPEC/LLVM bootstrap) and how many differences codegen differences there are between the old and new implementation. Hopefully that would be a very small number, which might lead to a test case for the new behavior.

llvm/lib/Transforms/Scalar/GVN.cpp
701	Not sure about the name change. It's not clear to me what fixpoint means here. More accurately, its the map of blocks with known states, right? Whatever name is chosen, the comment above needs updating and we should keep the name consistent at the call sites.
722	The place of the comment next to the return seems a bit strange to me. IMO it would make more sense to have a comment above Worklist.append, stating we queue the successors to continue back-propagating.
752	Again, it is not entirely clear what "non-fixpoint" refers to here. It only means blocks marked as SpeculativelyAvailable, right? Also, back-propagating seems a bit counter-intuitive here at a first glance, if you think of the CFG as a directed graph (edges directed from BB to its successors). Saying something like // Okay, we have encountered an "unavailable" block. // Mark SpeculativelAvailable blocks reachable from UnavailableBB as unavailable as well. Paths are terminated when they reach blocks no in Fixpoints or they are not marked as SpeculativelAvailable.
810	nit: could this just `return !UnabilableBB`?

In D84181#2168819, @fhahn wrote:

This overall makes sense I think.

In D84181#2168702, @lebedev.ri wrote:

In D84181#2168668, @mkazantsev wrote:>

I'd suggest the following reliable way to make sure it is NFC:

Instead of changing logic of IsValueFullyAvailableInBlock, introduce a new version of it and use it.

In debug mode, also call the old one and assert that the results are the same.

Then, after it has been in for a long enough while and we are certain those algorithms do the same thing, remove the old one.

WDYT?

I'm afraid it's not that easy to do.

We'd need to workaround the situation where the old code gave up due to the gvn-max-recurse-depth recursion limit

I think we also need some kind of limit on the worklist size for the iterative algorithm. Without it we might traverse the whole function in some cases, which could lead to huge compile-times in some degenerate cases I think.

Okay, added one. Right now it's based on the number of times we find a previously unknown block,
mark it as speculatively available, and recurse into it. I think this is the correct way,
i wouldn't think we should just count every block queried, even if we'd found an entry in map?

I don't have SPEC, and didn't try it on whole LLVM, but on vanilla test-suite + RawSpeed + darktable,
said IsValueFullyAvailableInBlockNumSpeculationsMax stat is at max 300, so i've picked a limit of 600.

We'll need to duplicate FullyAvailableBlocks so we pass the identical one into both the old version and the new one. Note that we can't equality-compare them afterwards, because the new code, unlike the old one, in backpropagation, doesn't insert new nodes

The new code doesn't (erroneously) mark previously available successor nodes of an unavailable node unavailable. This technically i guess makes it a non-NFC patch,

Yes I think this shouldn't be marked as NFC.

It would be interesting however, how often the answers diverge (with respect the how the result is used in GVN). For example, it would be interesting to know how often PRE triggers in GVN on some large programs (e.g. MultiSource/SPEC/LLVM bootstrap) and how many differences codegen differences there are between the old and new implementation. Hopefully that would be a very small number, which might lead to a test case for the new behavior.

Let's see if i can catch such a case..

Harbormaster completed remote builds in B65352: Diff 280066.Jul 23 2020, 4:28 AM

In D84181#2168983, @lebedev.ri wrote:

In D84181#2168819, @fhahn wrote:

This overall makes sense I think.

In D84181#2168702, @lebedev.ri wrote:

In D84181#2168668, @mkazantsev wrote:>

I'd suggest the following reliable way to make sure it is NFC:

Instead of changing logic of IsValueFullyAvailableInBlock, introduce a new version of it and use it.

In debug mode, also call the old one and assert that the results are the same.

Then, after it has been in for a long enough while and we are certain those algorithms do the same thing, remove the old one.

WDYT?

I'm afraid it's not that easy to do.

We'd need to workaround the situation where the old code gave up due to the gvn-max-recurse-depth recursion limit

I think we also need some kind of limit on the worklist size for the iterative algorithm. Without it we might traverse the whole function in some cases, which could lead to huge compile-times in some degenerate cases I think.

Okay, added one. Right now it's based on the number of times we find a previously unknown block,
mark it as speculatively available, and recurse into it. I think this is the correct way,
i wouldn't think we should just count every block queried, even if we'd found an entry in map?

I don't have SPEC, and didn't try it on whole LLVM, but on vanilla test-suite + RawSpeed + darktable,
said IsValueFullyAvailableInBlockNumSpeculationsMax stat is at max 300, so i've picked a limit of 600.

We'll need to duplicate FullyAvailableBlocks so we pass the identical one into both the old version and the new one. Note that we can't equality-compare them afterwards, because the new code, unlike the old one, in backpropagation, doesn't insert new nodes

The new code doesn't (erroneously) mark previously available successor nodes of an unavailable node unavailable. This technically i guess makes it a non-NFC patch,

Yes I think this shouldn't be marked as NFC.

It would be interesting however, how often the answers diverge (with respect the how the result is used in GVN). For example, it would be interesting to know how often PRE triggers in GVN on some large programs (e.g. MultiSource/SPEC/LLVM bootstrap) and how many differences codegen differences there are between the old and new implementation. Hopefully that would be a very small number, which might lead to a test case for the new behavior.

Let's see if i can catch such a case..

Done. It's rather horrible, but better than nothing i guess.

Harbormaster failed remote builds in B65461: Diff 280271!Jul 23 2020, 3:28 PM

lebedev.ri mentioned this in rG0a5971139a01: [NFC][GVN] Add a (horrible) test for D84181 demonstrating non-NFC'ness.Jul 23 2020, 3:29 PM

fhahn added inline comments.Jul 24 2020, 1:37 AM

llvm/test/Transforms/GVN/loadpre-missed-opportunity.ll
4–36	those should not be needed.
4–36	Not used currently? I think the test below would benefit from having some actual users of `%i` & `i7` that cannot be folded away.
104–106	nit: unnecessary attribute/dso_local. Same for _Z2axv

@fhahn thanks for taking a look!
Indeed, the test is bad.

llvm/test/Transforms/GVN/loadpre-missed-opportunity.ll
4–36	It would benefit indeed, but that breaks the test :/ It's overreduced, but i'm not sure what extra interestingness tests i should have used.

Harbormaster completed remote builds in B65517: Diff 280378.Jul 24 2020, 3:27 AM

lattner removed a subscriber: lattner.Jul 24 2020, 8:31 AM

In D84181#2171662, @lebedev.ri wrote:

@fhahn thanks for taking a look!
Indeed, the test is bad.

Perhaps the problem was a missing read none on the use calls?

E.g. something like the following seems to work as expected. Might still be possible to remove some parts of the cfg.

define i32 @loadpre_opportunity(i32** %arg, i1 %arg1, i1 %arg2, i1 %arg3) {
bb:
  %i = load i32*, i32** %arg, align 8
  %i4 = getelementptr inbounds i32, i32* %i, i64 0
  br label %bb5

bb5:
  %v.1 = call i32 @use(i32* %i4)
  br label %bb9

bb6:
  %i7 = load i32*, i32** %arg, align 8
  %i8 = getelementptr inbounds i32, i32* %i7, i64 0
  %v.2 = call i32 @use(i32* %i8)
  br label %bb9

bb9:
  %p = phi i32 [ %v.1, %bb5 ], [ %v.2, %bb6]
  br i1 %arg1, label %bb6, label %bb10

bb10:
  call void @somecall()
  br i1 %arg2, label %bb12, label %bb15

bb12:
  br label %bb13

bb13:
  br i1 %arg3, label %bb14, label %bb13

bb14:
  br label %bb15

bb15:
  %c = call i1 @cond()
  br i1 %c, label %bb6, label %exit

exit:
  ret i32 %p
}

declare void @somecall()
declare i32 @use(i32*) readnone
declare i1 @cond() readnone

@fhahn thanks! that is indeed better.

Harbormaster completed remote builds in B65596: Diff 280514.Jul 24 2020, 11:57 AM

Thanks for the test case! I think this makes sense and is in the spirit of the original implementation, with some improvements. Some more comments, mostly related to wording inline.

llvm/lib/Transforms/Scalar/GVN.cpp
102	For consistency, I would suggest updating the wording to use a similar style to above, e.g. `Number of blocks speculated as available in IsVal..`. Same for the second stat.
123	Is the default not 600?
708–709	nit: maybe adjust with something like `as fixpoint yet (the Unavailable and Available states are fixpoints)` to make clear what we mean with fixpoints.
711	nit: use try_emplace to skip std::make_pair? (if that compiles)
725	As mentioned earlier, it seems odd to me to call propagating to successors as 'back-propagating'. I guess it makes sense if you think about it as back-propagating to the starting node. Might be good to clarify the comment. Might just say `Queue successors for further processing`.
llvm/test/Transforms/GVN/loadpre-missed-opportunity.ll
2–3	also add a line with the limit set so we don't perform the optimization, to check that the limit works?

Thank you for taking a look!

In D84181#2175234, @fhahn wrote:

Thanks for the test case! I think this makes sense and is in the spirit of the original implementation, with some improvements. Some more comments, mostly related to wording inline.

Nits addressed.

LGTM, thanks. Might be good to wait a day with committing, in case there are any additional concerns.

llvm/lib/Transforms/Scalar/GVN.cpp
710	nit: add period at end of sentence

This revision is now accepted and ready to land.Jul 27 2020, 4:10 AM

Harbormaster failed remote builds in B65803: Diff 280851!Jul 27 2020, 5:00 AM

In D84181#2175360, @fhahn wrote:

LGTM, thanks. Might be good to wait a day with committing, in case there are any additional concerns.

Thank you for the review!
I'll wait a bit.

This revision was landed with ongoing or failed builds.Jul 28 2020, 12:27 AM

Closed by commit rGe40315d2b4ed: [GVN] Rewrite IsValueFullyAvailableInBlock(): no recursion, less false-negatives (authored by lebedev.ri). · Explain Why

This revision was automatically updated to reflect the committed changes.

lebedev.ri added a commit: rGe40315d2b4ed: [GVN] Rewrite IsValueFullyAvailableInBlock(): no recursion, less false-negatives.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

GVN.cpp

198 lines

test/

Transforms/

GVN/

loadpre-missed-opportunity.ll

34 lines

Diff 281129

llvm/lib/Transforms/Scalar/GVN.cpp

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
STATISTIC(NumGVNInstr, "Number of instructions deleted");		STATISTIC(NumGVNInstr, "Number of instructions deleted");
STATISTIC(NumGVNLoad, "Number of loads deleted");		STATISTIC(NumGVNLoad, "Number of loads deleted");
STATISTIC(NumGVNPRE, "Number of instructions PRE'd");		STATISTIC(NumGVNPRE, "Number of instructions PRE'd");
STATISTIC(NumGVNBlocks, "Number of blocks merged");		STATISTIC(NumGVNBlocks, "Number of blocks merged");
STATISTIC(NumGVNSimpl, "Number of instructions simplified");		STATISTIC(NumGVNSimpl, "Number of instructions simplified");
STATISTIC(NumGVNEqProp, "Number of equalities propagated");		STATISTIC(NumGVNEqProp, "Number of equalities propagated");
STATISTIC(NumPRELoad, "Number of loads PRE'd");		STATISTIC(NumPRELoad, "Number of loads PRE'd");

		STATISTIC(IsValueFullyAvailableInBlockNumSpeculationsMax,
		"Number of blocks speculated as available in "
		fhahnUnsubmitted Done Reply Inline Actions For consistency, I would suggest updating the wording to use a similar style to above, e.g. `Number of blocks speculated as available in IsVal..`. Same for the second stat. fhahn: For consistency, I would suggest updating the wording to use a similar style to above, e.g.
		"IsValueFullyAvailableInBlock(), max");
		STATISTIC(MaxBBSpeculationCutoffReachedTimes,
		"Number of times we we reached gvn-max-block-speculations cut-off "
		"preventing further exploration");

static cl::opt<bool> GVNEnablePRE("enable-pre", cl::init(true), cl::Hidden);		static cl::opt<bool> GVNEnablePRE("enable-pre", cl::init(true), cl::Hidden);
static cl::opt<bool> GVNEnableLoadPRE("enable-load-pre", cl::init(true));		static cl::opt<bool> GVNEnableLoadPRE("enable-load-pre", cl::init(true));
static cl::opt<bool> GVNEnableLoadInLoopPRE("enable-load-in-loop-pre",		static cl::opt<bool> GVNEnableLoadInLoopPRE("enable-load-in-loop-pre",
cl::init(true));		cl::init(true));
static cl::opt<bool> GVNEnableMemDep("enable-gvn-memdep", cl::init(true));		static cl::opt<bool> GVNEnableMemDep("enable-gvn-memdep", cl::init(true));

// Maximum allowed recursion depth.
static cl::opt<uint32_t>
MaxRecurseDepth("gvn-max-recurse-depth", cl::Hidden, cl::init(1000), cl::ZeroOrMore,
cl::desc("Max recurse depth in GVN (default = 1000)"));

static cl::opt<uint32_t> MaxNumDeps(		static cl::opt<uint32_t> MaxNumDeps(
"gvn-max-num-deps", cl::Hidden, cl::init(100), cl::ZeroOrMore,		"gvn-max-num-deps", cl::Hidden, cl::init(100), cl::ZeroOrMore,
cl::desc("Max number of dependences to attempt Load PRE (default = 100)"));		cl::desc("Max number of dependences to attempt Load PRE (default = 100)"));

		// This is based on IsValueFullyAvailableInBlockNumSpeculationsMax stat.
		static cl::opt<uint32_t> MaxBBSpeculations(
		"gvn-max-block-speculations", cl::Hidden, cl::init(600), cl::ZeroOrMore,
		cl::desc("Max number of blocks we're willing to speculate on (and recurse "
		"into) when deducing if a value is fully avaliable or not in GVN "
		"(default = 600)"));
		fhahnUnsubmitted Done Reply Inline Actions Is the default not 600? fhahn: Is the default not 600?

struct llvm::GVN::Expression {		struct llvm::GVN::Expression {
uint32_t opcode;		uint32_t opcode;
bool commutative = false;		bool commutative = false;
Type *type = nullptr;		Type *type = nullptr;
SmallVector<uint32_t, 4> varargs;		SmallVector<uint32_t, 4> varargs;

Expression(uint32_t o = ~2U) : opcode(o) {}		Expression(uint32_t o = ~2U) : opcode(o) {}

▲ Show 20 Lines • Show All 540 Lines • ▼ Show 20 Lines	for (DenseMap<uint32_t, Value*>::iterator I = d.begin(),
I->second->dump();		I->second->dump();
}		}
errs() << "}\n";		errs() << "}\n";
}		}
#endif		#endif

enum class AvaliabilityState : char {		enum class AvaliabilityState : char {
/// We know the block is not fully available. This is a fixpoint.		/// We know the block is not fully available. This is a fixpoint.
Unavaliable = 0,		Unavailable = 0,
/// We know the block is fully available. This is a fixpoint.		/// We know the block is fully available. This is a fixpoint.
Avaliable = 1,		Available = 1,
		xbolva00Unsubmitted Done Reply Inline Actions Available (Same typo below) xbolva00: Available (Same typo below)
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Ugh, thanks! lebedev.ri: Ugh, thanks!
/// We do not know whether the block is fully available or not,		/// We do not know whether the block is fully available or not,
/// but we are currently speculating that it will be.		/// but we are currently speculating that it will be.
SpeculativelyAvaliable = 2,		/// If it would have turned out that the block was, in fact, not fully
/// We are speculating for this block and have used that		/// available, this would have been cleaned up into an Unavailable.
/// to speculate for other blocks.		SpeculativelyAvailable = 2,
SpeculativelyAvaliableAndUsedForSpeculation = 3,
};		};

/// Return true if we can prove that the value		/// Return true if we can prove that the value
/// we're analyzing is fully available in the specified block. As we go, keep		/// we're analyzing is fully available in the specified block. As we go, keep
/// track of which blocks we know are fully alive in FullyAvailableBlocks. This		/// track of which blocks we know are fully alive in FullyAvailableBlocks. This
/// map is actually a tri-state map with the following values:		/// map is actually a tri-state map with the following values:
/// 0) we know the block is not fully available.		/// 0) we know the block is not fully available.
/// 1) we know the block is fully available.		/// 1) we know the block is fully available.
/// 2) we do not know whether the block is fully available or not, but we are		/// 2) we do not know whether the block is fully available or not, but we are
/// currently speculating that it will be.		/// currently speculating that it will be.
/// 3) we are speculating for this block and have used that to speculate for
/// other blocks.
static bool IsValueFullyAvailableInBlock(		static bool IsValueFullyAvailableInBlock(
BasicBlock *BB,		BasicBlock *BB,
DenseMap<BasicBlock *, AvaliabilityState> &FullyAvailableBlocks,		DenseMap<BasicBlock *, AvaliabilityState> &FullyAvailableBlocks) {
		mkazantsevUnsubmitted Done Reply Inline Actions This name only makes sense because it is used as DFS traversal root. In function signature, `BB` is more clear. mkazantsev: This name only makes sense because it is used as DFS traversal root. In function signature…
		fhahnUnsubmitted Done Reply Inline Actions Not sure about the name change. It's not clear to me what fixpoint means here. More accurately, its the map of blocks with known states, right? Whatever name is chosen, the comment above needs updating and we should keep the name consistent at the call sites. fhahn: Not sure about the name change. It's not clear to me what fixpoint means here. More accurately…
uint32_t RecurseDepth) {		SmallVector<BasicBlock *, 32> Worklist;
if (RecurseDepth > MaxRecurseDepth)		Optional<BasicBlock *> UnavailableBB;
return false;
		// The number of times we didn't find an entry for a block in a map and
		// optimistically inserted an entry marking block as speculatively avaliable.
		unsigned NumNewNewSpeculativelyAvailableBBs = 0;

		#ifndef NDEBUG
		fhahnUnsubmitted Done Reply Inline Actions nit: maybe adjust with something like `as fixpoint yet (the Unavailable and Available states are fixpoints)` to make clear what we mean with fixpoints. fhahn: nit: maybe adjust with something like `as fixpoint yet (the Unavailable and Available states…
		SmallSet<BasicBlock *, 32> NewSpeculativelyAvailableBBs;
		fhahnUnsubmitted Not Done Reply Inline Actions nit: add period at end of sentence fhahn: nit: add period at end of sentence
		SmallVector<BasicBlock *, 32> AvailableBBs;
		fhahnUnsubmitted Done Reply Inline Actions nit: use try_emplace to skip std::make_pair? (if that compiles) fhahn: nit: use try_emplace to skip std::make_pair? (if that compiles)
		#endif

// Optimistically assume that the block is speculatively available and check		Worklist.emplace_back(BB);
		while (!Worklist.empty()) {
		BasicBlock *CurrBB = Worklist.pop_back_val(); // LIFO - depth-first!
		// Optimistically assume that the block is Speculatively Available and check
// to see if we already know about this block in one lookup.		// to see if we already know about this block in one lookup.
std::pair<DenseMap<BasicBlock *, AvaliabilityState>::iterator, bool> IV =		std::pair<DenseMap<BasicBlock *, AvaliabilityState>::iterator, bool> IV =
FullyAvailableBlocks.insert(		FullyAvailableBlocks.try_emplace(
std::make_pair(BB, AvaliabilityState::SpeculativelyAvaliable));		CurrBB, AvaliabilityState::SpeculativelyAvailable);
		AvaliabilityState &State = IV.first->second;
		fhahnUnsubmitted Done Reply Inline Actions The place of the comment next to the return seems a bit strange to me. IMO it would make more sense to have a comment above Worklist.append, stating we queue the successors to continue back-propagating. fhahn: The place of the comment next to the return seems a bit strange to me. IMO it would make more…

// If the entry already existed for this block, return the precomputed value.		// Did the entry already exist for this block?
if (!IV.second) {		if (!IV.second) {
		fhahnUnsubmitted Done Reply Inline Actions As mentioned earlier, it seems odd to me to call propagating to successors as 'back-propagating'. I guess it makes sense if you think about it as back-propagating to the starting node. Might be good to clarify the comment. Might just say `Queue successors for further processing`. fhahn: As mentioned earlier, it seems odd to me to call propagating to successors as 'back…
// If this is a speculative "available" value, mark it as being used for		if (State == AvaliabilityState::Unavailable) {
// speculation of other blocks.		UnavailableBB = CurrBB;
if (IV.first->second == AvaliabilityState::SpeculativelyAvaliable)		break; // Backpropagate unavaliability info.
IV.first->second =
AvaliabilityState::SpeculativelyAvaliableAndUsedForSpeculation;
return IV.first->second != AvaliabilityState::Unavaliable;
}		}

// Otherwise, see if it is fully available in all predecessors.		#ifndef NDEBUG
pred_iterator PI = pred_begin(BB), PE = pred_end(BB);		AvailableBBs.emplace_back(CurrBB);
		#endif
// If this block has no predecessors, it isn't live-in here.		continue; // Don't recurse further, but continue processing worklist.
if (PI == PE)		}
goto SpeculationFailure;

for (; PI != PE; ++PI)
// If the value isn't fully available in one of our predecessors, then it
// isn't fully available in this block either. Undo our previous
// optimistic assumption and bail out.
if (!IsValueFullyAvailableInBlock(*PI, FullyAvailableBlocks,RecurseDepth+1))
goto SpeculationFailure;

return true;		// No entry found for block.
		++NumNewNewSpeculativelyAvailableBBs;
		bool OutOfBudget = NumNewNewSpeculativelyAvailableBBs > MaxBBSpeculations;

// If we get here, we found out that this is not, after		// If we have exhausted our budget, mark this block as unavailable.
// all, a fully-available block. We have a problem if we speculated on this and		// Also, if this block has no predecessors, the value isn't live-in here.
// used the speculation to mark other blocks as available.		if (OutOfBudget \|\| pred_empty(CurrBB)) {
SpeculationFailure:		MaxBBSpeculationCutoffReachedTimes += (int)OutOfBudget;
AvaliabilityState &BBVal = FullyAvailableBlocks[BB];		State = AvaliabilityState::Unavailable;
		UnavailableBB = CurrBB;
		break; // Backpropagate unavaliability info.
		}

// If we didn't speculate on this, just return with it set to unavaliable.		// Tentatively consider this block as speculatively available.
if (BBVal == AvaliabilityState::SpeculativelyAvaliable) {		#ifndef NDEBUG
BBVal = AvaliabilityState::Unavaliable;		NewSpeculativelyAvailableBBs.insert(CurrBB);
		fhahnUnsubmitted Done Reply Inline Actions Again, it is not entirely clear what "non-fixpoint" refers to here. It only means blocks marked as SpeculativelyAvailable, right? Also, back-propagating seems a bit counter-intuitive here at a first glance, if you think of the CFG as a directed graph (edges directed from BB to its successors). Saying something like // Okay, we have encountered an "unavailable" block. // Mark SpeculativelAvailable blocks reachable from UnavailableBB as unavailable as well. Paths are terminated when they reach blocks no in Fixpoints or they are not marked as SpeculativelAvailable. fhahn: Again, it is not entirely clear what "non-fixpoint" refers to here. It only means blocks marked…
return false;		#endif
		// And further recurse into block's predecessors, in depth-first order!
		Worklist.append(pred_begin(CurrBB), pred_end(CurrBB));
}		}

// If we did speculate on this value, we could have blocks set to		#if LLVM_ENABLE_STATS
// speculatively avaliable that are incorrect. Walk the (transitive)		IsValueFullyAvailableInBlockNumSpeculationsMax.updateMax(
// successors of this block and mark them as unavaliable instead.		NumNewNewSpeculativelyAvailableBBs);
SmallVector<BasicBlock*, 32> BBWorklist;		#endif
BBWorklist.push_back(BB);

do {		// If the block isn't marked as fixpoint yet
BasicBlock *Entry = BBWorklist.pop_back_val();		// (the Unavailable and Available states are fixpoints)
// Note that this sets blocks to unavailable if they happen to not		auto MarkAsFixpointAndEnqueueSuccessors =
// already be in FullyAvailableBlocks. This is safe.		[&](BasicBlock *BB, AvaliabilityState FixpointState) {
AvaliabilityState &EntryVal = FullyAvailableBlocks[Entry];		auto It = FullyAvailableBlocks.find(BB);
if (EntryVal == AvaliabilityState::Unavaliable)		if (It == FullyAvailableBlocks.end())
continue; // Already unavailable.		return; // Never queried this block, leave as-is.
		switch (AvaliabilityState &State = It->second) {
		case AvaliabilityState::Unavailable:
		case AvaliabilityState::Available:
		return; // Don't backpropagate further, continue processing worklist.
		case AvaliabilityState::SpeculativelyAvailable: // Fix it!
		State = FixpointState;
		#ifndef NDEBUG
		assert(NewSpeculativelyAvailableBBs.erase(BB) &&
		"Found a speculatively available successor leftover?");
		#endif
		// Queue successors for further processing.
		Worklist.append(succ_begin(BB), succ_end(BB));
		return;
		}
		};

// Mark as unavailable.		if (UnavailableBB) {
EntryVal = AvaliabilityState::Unavaliable;		// Okay, we have encountered an unavailable block.
		// Mark speculatively available blocks reachable from UnavailableBB as
		// unavailable as well. Paths are terminated when they reach blocks not in
		// FullyAvailableBlocks or they are not marked as speculatively available.
		Worklist.clear();
		Worklist.append(succ_begin(UnavailableBB), succ_end(UnavailableBB));
		while (!Worklist.empty())
		MarkAsFixpointAndEnqueueSuccessors(Worklist.pop_back_val(),
		AvaliabilityState::Unavailable);
		}

		#ifndef NDEBUG
		Worklist.clear();
		for (BasicBlock *AvailableBB : AvailableBBs)
		Worklist.append(succ_begin(AvailableBB), succ_end(AvailableBB));
		while (!Worklist.empty())
		MarkAsFixpointAndEnqueueSuccessors(Worklist.pop_back_val(),
		AvaliabilityState::Available);

BBWorklist.append(succ_begin(Entry), succ_end(Entry));		assert(NewSpeculativelyAvailableBBs.empty() &&
} while (!BBWorklist.empty());		"Must have fixed all the new speculatively available blocks.");
		#endif

return false;		return !UnavailableBB;
		fhahnUnsubmitted Done Reply Inline Actions nit: could this just `return !UnabilableBB`? fhahn: nit: could this just `return !UnabilableBB`?
}		}

/// Given a set of loads specified by ValuesPerBlock,		/// Given a set of loads specified by ValuesPerBlock,
/// construct SSA form, allowing us to eliminate LI. This returns the value		/// construct SSA form, allowing us to eliminate LI. This returns the value
/// that should be used at LI's definition site.		/// that should be used at LI's definition site.
static Value ConstructSSAForLoadSet(LoadInst LI,		static Value ConstructSSAForLoadSet(LoadInst LI,
SmallVectorImpl<AvailableValueInBlock> &ValuesPerBlock,		SmallVectorImpl<AvailableValueInBlock> &ValuesPerBlock,
GVN &gvn) {		GVN &gvn) {
▲ Show 20 Lines • Show All 348 Lines • ▼ Show 20 Lines	bool GVN::PerformLoadPRE(LoadInst *LI, AvailValInBlkVect &ValuesPerBlock,
assert(TmpBB);		assert(TmpBB);
LoadBB = TmpBB;		LoadBB = TmpBB;

// Check to see how many predecessors have the loaded value fully		// Check to see how many predecessors have the loaded value fully
// available.		// available.
MapVector<BasicBlock , Value > PredLoads;		MapVector<BasicBlock , Value > PredLoads;
DenseMap<BasicBlock *, AvaliabilityState> FullyAvailableBlocks;		DenseMap<BasicBlock *, AvaliabilityState> FullyAvailableBlocks;
for (const AvailableValueInBlock &AV : ValuesPerBlock)		for (const AvailableValueInBlock &AV : ValuesPerBlock)
FullyAvailableBlocks[AV.BB] = AvaliabilityState::Avaliable;		FullyAvailableBlocks[AV.BB] = AvaliabilityState::Available;
for (BasicBlock *UnavailableBB : UnavailableBlocks)		for (BasicBlock *UnavailableBB : UnavailableBlocks)
FullyAvailableBlocks[UnavailableBB] = AvaliabilityState::Unavaliable;		FullyAvailableBlocks[UnavailableBB] = AvaliabilityState::Unavailable;

SmallVector<BasicBlock *, 4> CriticalEdgePred;		SmallVector<BasicBlock *, 4> CriticalEdgePred;
for (BasicBlock *Pred : predecessors(LoadBB)) {		for (BasicBlock *Pred : predecessors(LoadBB)) {
// If any predecessor block is an EH pad that does not allow non-PHI		// If any predecessor block is an EH pad that does not allow non-PHI
// instructions before the terminator, we can't PRE the load.		// instructions before the terminator, we can't PRE the load.
if (Pred->getTerminator()->isEHPad()) {		if (Pred->getTerminator()->isEHPad()) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "COULD NOT PRE LOAD BECAUSE OF AN EH PAD PREDECESSOR '"		dbgs() << "COULD NOT PRE LOAD BECAUSE OF AN EH PAD PREDECESSOR '"
<< Pred->getName() << "': " << *LI << '\n');		<< Pred->getName() << "': " << *LI << '\n');
return false;		return false;
}		}

if (IsValueFullyAvailableInBlock(Pred, FullyAvailableBlocks, 0)) {		if (IsValueFullyAvailableInBlock(Pred, FullyAvailableBlocks)) {
continue;		continue;
}		}

if (Pred->getTerminator()->getNumSuccessors() != 1) {		if (Pred->getTerminator()->getNumSuccessors() != 1) {
if (isa<IndirectBrInst>(Pred->getTerminator())) {		if (isa<IndirectBrInst>(Pred->getTerminator())) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "COULD NOT PRE LOAD BECAUSE OF INDBR CRITICAL EDGE '"		dbgs() << "COULD NOT PRE LOAD BECAUSE OF INDBR CRITICAL EDGE '"
<< Pred->getName() << "': " << *LI << '\n');		<< Pred->getName() << "': " << *LI << '\n');
▲ Show 20 Lines • Show All 1,639 Lines • Show Last 20 Lines

llvm/test/Transforms/GVN/loadpre-missed-opportunity.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -gvn -S \| FileCheck %s			; RUN: opt < %s -gvn -gvn-max-block-speculations=1 -S \| FileCheck -check-prefixes=ALL,PRE %s
				; RUN: opt < %s -gvn -gvn-max-block-speculations=0 -S \| FileCheck -check-prefixes=ALL,CHECK %s
				fhahnUnsubmitted Done Reply Inline Actions also add a line with the limit set so we don't perform the optimization, to check that the limit works? fhahn: also add a line with the limit set so we don't perform the optimization, to check that the…

	define i32 @loadpre_opportunity(i32** %arg, i1 %arg1, i1 %arg2, i1 %arg3) {			define i32 @loadpre_opportunity(i32** %arg, i1 %arg1, i1 %arg2, i1 %arg3) {
				; PRE-LABEL: @loadpre_opportunity(
				; PRE-NEXT: bb:
				; PRE-NEXT: [[I:%.]] = load i32, i32** [[ARG:%.*]], align 8
				; PRE-NEXT: [[I6:%.]] = call i32 @use(i32 [[I]])
				; PRE-NEXT: br label [[BB11:%.*]]
				; PRE: bb7:
				; PRE-NEXT: [[I8:%.]] = phi i32 [ [[I8_PRE:%.]], [[BB17_BB7_CRIT_EDGE:%.]] ], [ [[I81:%.*]], [[BB11]] ]
				; PRE-NEXT: [[I10:%.]] = call i32 @use(i32 [[I8]])
				; PRE-NEXT: br label [[BB11]]
				; PRE: bb11:
				; PRE-NEXT: [[I81]] = phi i32* [ [[I]], [[BB:%.]] ], [ [[I8]], [[BB7:%.]] ]
				; PRE-NEXT: [[I12:%.*]] = phi i32 [ [[I6]], [[BB]] ], [ [[I10]], [[BB7]] ]
				; PRE-NEXT: br i1 [[ARG1:%.]], label [[BB7]], label [[BB13:%.]]
				; PRE: bb13:
				; PRE-NEXT: call void @somecall()
				; PRE-NEXT: br i1 [[ARG2:%.]], label [[BB14:%.]], label [[BB17:%.*]]
				; PRE: bb14:
				; PRE-NEXT: br label [[BB15:%.*]]
				; PRE: bb15:
				; PRE-NEXT: br i1 [[ARG3:%.]], label [[BB16:%.]], label [[BB15]]
				; PRE: bb16:
				; PRE-NEXT: br label [[BB17]]
				; PRE: bb17:
				; PRE-NEXT: [[I18:%.*]] = call i1 @cond()
				; PRE-NEXT: br i1 [[I18]], label [[BB17_BB7_CRIT_EDGE]], label [[BB19:%.*]]
				; PRE: bb17.bb7_crit_edge:
				; PRE-NEXT: [[I8_PRE]] = load i32, i32* [[ARG]], align 8
				; PRE-NEXT: br label [[BB7]]
				; PRE: bb19:
				; PRE-NEXT: ret i32 [[I12]]
				;
				fhahnUnsubmitted Done Reply Inline Actions those should not be needed. fhahn: those should not be needed.
				fhahnUnsubmitted Done Reply Inline Actions Not used currently? I think the test below would benefit from having some actual users of `%i` & `i7` that cannot be folded away. fhahn: Not used currently? I think the test below would benefit from having some actual users of `%i`…
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions It would benefit indeed, but that breaks the test :/ It's overreduced, but i'm not sure what extra interestingness tests i should have used. lebedev.ri: It would benefit indeed, but that breaks the test :/ It's overreduced, but i'm not sure what…
	; CHECK-LABEL: @loadpre_opportunity(			; CHECK-LABEL: @loadpre_opportunity(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[I:%.]] = load i32, i32** [[ARG:%.*]], align 8			; CHECK-NEXT: [[I:%.]] = load i32, i32** [[ARG:%.*]], align 8
	; CHECK-NEXT: [[I6:%.]] = call i32 @use(i32 [[I]])			; CHECK-NEXT: [[I6:%.]] = call i32 @use(i32 [[I]])
	; CHECK-NEXT: br label [[BB11:%.*]]			; CHECK-NEXT: br label [[BB11:%.*]]
	; CHECK: bb7:			; CHECK: bb7:
	; CHECK-NEXT: [[I8:%.]] = load i32, i32** [[ARG]], align 8			; CHECK-NEXT: [[I8:%.]] = load i32, i32** [[ARG]], align 8
	; CHECK-NEXT: [[I10:%.]] = call i32 @use(i32 [[I8]])			; CHECK-NEXT: [[I10:%.]] = call i32 @use(i32 [[I8]])
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	bb17:			bb17:
	%i18 = call i1 @cond()			%i18 = call i1 @cond()
	br i1 %i18, label %bb7, label %bb19			br i1 %i18, label %bb7, label %bb19

	bb19:			bb19:
	ret i32 %i12			ret i32 %i12
	}			}

	declare void @somecall()			declare void @somecall()
	declare i32 @use(i32*) readnone			declare i32 @use(i32*) readnone
	declare i1 @cond() readnone			declare i1 @cond() readnone
				fhahnUnsubmitted Done Reply Inline Actions nit: unnecessary attribute/dso_local. Same for _Z2axv fhahn: nit: unnecessary attribute/dso_local. Same for _Z2axv