This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Scalar/
-
llvm/
-
Transforms/
-
Scalar/
-
GVN.h
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
14/24
GVN.cpp
-
test/Transforms/GVN/
-
Transforms/
-
GVN/
-
PRE/
-
2011-06-01-NonLocalMemdepMiscompile.ll
1/2
2017-06-28-pre-load-dbgloc.ll
-
pre-load.ll
-
volatile.ll
-
condprop.ll

Differential D141712

[GVN] Improve PRE on load instructions
ClosedPublic

Authored by Carrot on Jan 13 2023, 11:12 AM.

Download Raw Diff

Details

Reviewers

mkazantsev
nikic
SixWeining
dyung
chapuni
nickdesaulniers

Commits

rG84bcfa0e1b34: [GVN] Improve PRE on load instructions
rGd6811826371d: [GVN] Improve PRE on load instructions
rG5f1448fe1585: [GVN] Improve PRE on load instructions

Summary

This patch implements the enhancement proposed by https://github.com/llvm/llvm-project/issues/59312.

Suppose we have following code

   v0 = load %addr
   br %LoadBB

LoadBB:
   v1 = load %addr
   ...

PredBB:
   ...
   br %cond, label %LoadBB, label %SuccBB

SuccBB:
   v2 = load %addr
   ...

Instruction v1 in LoadBB is partially redundant, edge (PredBB, LoadBB) is a critical edge. SuccBB is another successor of PredBB, it contains another load v2 which is identical to v1. Current GVN splits the critical edge (PredBB, LoadBB) and inserts a new load in it. A better method is move the load of v2 into PredBB, then v1 can be changed to a PHI instruction.

If there are two or more similar predecessors, like the test case in the bug entry, current GVN simply gives up because otherwise it needs to split multiple critical edges. But we can move all loads in successor blocks into predecessors.

This is the second try of D139582, with the following enhancement.

Don't try to move load instructions across exception handling instructions.
In function replaceValuesPerBlockEntry ValuesPerBlock[BB] may not be the moved loaded value, in this case we should not replace its value.

Diff Detail

Unit TestsFailed

	Time	Test
	80 ms	x64 debian > LLVM.Transforms/GVN/PRE::2018-06-08-pre-load-dbgloc-no-null-opt.ll

Event Timeline

Carrot created this revision.Jan 13 2023, 11:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2023, 11:12 AM

Herald added subscribers: StephenFan, hiraditya. · View Herald Transcript

Carrot requested review of this revision.Jan 13 2023, 11:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2023, 11:12 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Carrot edited the summary of this revision. (Show Details)Jan 13 2023, 11:14 AM

Harbormaster completed remote builds in B207691: Diff 489074.Jan 13 2023, 1:42 PM

Some minor nits & style proposals; could you please add more tests for weird corner cases, specifically same block being a pred multiple times?

llvm/lib/Transforms/Scalar/GVN.cpp
1395	Do you care about `switch` with 2 branches? If not, maybe then `match(m_Br(m_Value(), m_BasicBlock(IfTrue), m_BasicBlock(IfFalse)))`?
1398	Will there be a problem if `Term->getSuccessor(0) == Term->getSuccessor(1)`? Any tests for it?
1400	Do you really need to check `EHPad` after you've already checked for `isExceptionalTerminator`?
1403	nit: unsigned to signed conversion, might be warning in compiler
1404	Maybe for (Instruction &Inst : *SuccBB) { if (!Inst.isIdenticalTo(Load)) continue; // same code with reduced nest }
1412	Add a comment that, if one identical load already depends on something, then there is no point to look further?
1488	Add some statistic here?
1591	What if the same block goes into LoadBB multiple times? Smth like switch cond case 1: LoadBB case 2: LoadBB case 3: LoadBB default: LoadBB Will this work correctly for this case? Please add some tests for situations like this.
1631–1634	nit: `{ }` not needed
llvm/test/Transforms/GVN/PRE/2017-06-28-pre-load-dbgloc.ll
3	Why change that?

This revision now requires changes to proceed.Jan 16 2023, 9:41 PM

Carrot updated this revision to Diff 490011.Jan 17 2023, 6:19 PM

Carrot marked 5 inline comments as done.

Carrot added inline comments.

llvm/lib/Transforms/Scalar/GVN.cpp
1395	I prefer to include switch case, this is a benefit without any extra cost.
1398	If Term->getSuccessor(0) == Term->getSuccessor(1), then SuccBB will have two predecessors, the next statement should exit immediately. Added a test case for it.
1400	You are right, this is not necessary.
1591	It looks similar to your comment in findLoadToHoistIntoPred. Either the switch BB contains an identical load, nothing should be handled. Or switch BB doesn't contains an identical load, findLoadToHoistIntoPred returns nullptr because of too many edges. Test added.
llvm/test/Transforms/GVN/PRE/2017-06-28-pre-load-dbgloc.ll
3	Because with this optimization, both cases generate same result. The PRE of load %desc now can be detected and moved to entry block.

Harbormaster completed remote builds in B208388: Diff 490011.Jan 17 2023, 7:52 PM

mkazantsev requested changes to this revision.Jan 18 2023, 4:12 AM

mkazantsev added inline comments.

llvm/lib/Transforms/Scalar/GVN.cpp
97	More specific name, smth related to critical edges maybe?
1643	Is it really needed? There is literally same check just below.
2703	I think this assert should not fail, and if it fails, you have a bug. `ICF` may keep cached information for it bound to its old parent, and will try to remove it from its new parent. You may get inconsistent state of `ICF` because of it. At least I don't see how you update it. You should not move instructions. The right approach is to create a new one.

This revision now requires changes to proceed.Jan 18 2023, 4:12 AM

I have a question regarding code example: it looks like a case of load hoisting from SuccBB and basic block after splitting of (PredBB, LoadBB) critical edge. If so, the same result can be achieved by:

running SimplifyCFGOpt::HoistThenElseCodeToIf after GVN. Its implementation is very constrained (can hoist only first N exactly the same instructions from ThenBB and ElseBB, so very sensitive to instruction order)
Run GVNHoist pass after GVN -- the most general solution. Unfortunately it's disabled by default and placed before GVN in pass pipeline (although GVN exposes lots of hoisting opportunities, and its authors advise to run it before and after GVN-PRE).

Also there was an attempt to implement intermediate solution (hoist only instructions from 2 BBs to their common successors, but match them with value numbers): https://discourse.llvm.org/t/rfc-simple-gvn-hoist, so there is a chance that https://reviews.llvm.org/D110822 also covers the same problem.

In general, I'm concerned that this patch tries to combine 2 different transformations: load PRE and hoisting. It can omit critical edge splitting in some cases, but after hoisting we can also cleanup CFG.

In D141712#4062475, @kachkov98 wrote:

In general, I'm concerned that this patch tries to combine 2 different transformations: load PRE and hoisting. It can omit critical edge splitting in some cases, but after hoisting we can also cleanup CFG.

Sort of. But most of the code are used to detect the optimization opportunities, some changes to reuse current code to insert load into PredBB, only the changes in eliminatePartiallyRedundantLoad are specifically for the hoisting part. I think it's a too heavy hammer for this purpose, especially the expected following pass is still unavailable.

Carrot updated this revision to Diff 490288.Jan 18 2023, 2:13 PM

Carrot marked 2 inline comments as done.

Carrot added inline comments.

llvm/lib/Transforms/Scalar/GVN.cpp
2703	Add a call to ICF->insertInstructionTo for the new created load instruction. For the deleted instruction ICF->removeInstruction is called in line 2711. Maybe a naive question, does Load instruction impact implicit control flow?

Harbormaster completed remote builds in B208595: Diff 490288.Jan 18 2023, 3:13 PM

mkazantsev added inline comments.Jan 18 2023, 8:46 PM

llvm/lib/Transforms/Scalar/GVN.cpp
2703	Sorry, I didn't formulate the problem I'm seeing correctly. It's not that loads create implicit control flow. It's that by removing this assert, you allow instruction motion here. There is no check that you could only have moved a load, right? So potentially it creates a room for this kind of bugs.

mkazantsev added inline comments.Jan 18 2023, 8:50 PM

llvm/lib/Transforms/Scalar/GVN.cpp
2703	Let's not scatter cache updates across the code. There can be new caches added in the future, not only `ICF` or whatever it is now. We don't want multiple places where we need to update them. This opens doors for bugs. Any serious reasons to move instruction rather than create a new one?

mkazantsev added inline comments.Jan 18 2023, 8:54 PM

llvm/lib/Transforms/Scalar/GVN.cpp
2703	Just imagine that someone will accidentally move an ICF instruction in later change. Currently, the assert is protecting us from it. By giving it up, we make mistakes like this harder to find.

Carrot updated this revision to Diff 490663.Jan 19 2023, 2:32 PM

Carrot added inline comments.

llvm/lib/Transforms/Scalar/GVN.cpp
2703	Sounds reasonable. So now I just create a new load, replace all uses of old load with the new load. The dead old load instruction can be deleted in the next iteration of GVN.

Harbormaster completed remote builds in B208860: Diff 490663.Jan 19 2023, 3:55 PM

Looks good now, thanks for your efforts!

This revision is now accepted and ready to land.Jan 22 2023, 8:26 PM

This revision was landed with ongoing or failed builds.Jan 25 2023, 11:48 AM

Closed by commit rG5f1448fe1585: [GVN] Improve PRE on load instructions (authored by Carrot). · Explain Why

This revision was automatically updated to reflect the committed changes.

Carrot added a commit: rG5f1448fe1585: [GVN] Improve PRE on load instructions.

chapuni mentioned this in rG6595ef090012: GVN.cpp: Suppress a warning in D141712 [-Wunused-variable].Jan 25 2023, 4:54 PM

It looks like the new version of the patch ended up being more expensive in terms of compile-time: Original, New

I wonder whether this is because not removing the load essentially always forces an extra GVN iteration?

llvm/lib/Transforms/Scalar/GVN.cpp
1495	If we're not removing the load, we probably shouldn't be removing it from the leader table either?

In D141712#4082217, @nikic wrote:

It looks like the new version of the patch ended up being more expensive in terms of compile-time: Original, New

I wonder whether this is because not removing the load essentially always forces an extra GVN iteration?

I think so, it causes all triggering cases need another iteration.

llvm/lib/Transforms/Scalar/GVN.cpp
1495	Because we expect it to be deleted in the next iteration, and the same value is also available in the NewLoad instruction, so I think it should not be used by other optimizations, and assume it's not available in the leader table.

Hello!

It seems that with this patch GVN may yield different results when debug info is present.
I suspect the MaxNumInsnsPerBlock limit includes also debug instructions?
This is seen with the following example:

opt -passes=gvn bbi-78272.ll -S -o -

If you remove some dbg.declare from the input you get different output.

bbi-78272.ll11 KBDownload

dstenb added a subscriber: dstenb.Jan 27 2023, 8:06 AM

@uabelho, thank you for the reporting! I think you are right. I'm preparing the patch.

Just a heads up that we've bisected a test failure to this revision: https://bugs.chromium.org/p/chromium/issues/detail?id=1411693
We're still investigating, but it would be interesting to know if others have also hit any issues.

I'm not sure it explains our test failure, but @reames raised a concern on the llvm-commits email:

I believe this patch to be incorrect, and need reverted.

The case I'm concerned about is that this patch appears to hoist a load 
over instructions which may throw.  The load at the original location 
may be dynamically dead because the exception path is always taken.  
Moving said load into the executed path can introduce a new fault in the 
program.

@hans, do you have a reduced test case to demonstrate the problem?

I exchanged emails with @reames

A throwing call is a terminator, can we find a load below it ?

On Mon, Jan 30, 2023 at 11:49 AM Philip Reames


<listmail@philipreames.com> wrote:
>
> You continue on any non-identical instruction.  That instruction can be
> a throwing call.
>
> Philip
>
> On 1/30/23 11:23, Carrot Wei wrote:
> > Hi Philip
> >
> > Do you have a reduced test case to show your problem?
> >
> > This patch tries to check exception throwing instructions in the
> > function findLoadToHoistIntoPred. What's your case that I missed?
> >
> > thanks
> > Guozhi Wei

At this point, please revert the patch immediately.

There's been a probable bug reported. There's been a failure reported.
These are probably the same. You should revert, investigate, and fix
offline.

Philip

@reames, I'm happy to revert it once I get your test case, otherwise I have nothing to investigate.

slightly modified from one of the test cases:

declare i1 @foo()
declare void @maybethrow() readnone

; %v3 is partially redundant, bb3 has multiple predecessors coming through
; critical edges. The other successors of those predecessors have same loads.
; We can move all loads into predecessors.

define void @test17(ptr %p1, ptr %p2, ptr %p3, ptr %p4) {
entry:
  %v1 = load i64, ptr %p1, align 8
  %cond1 = icmp sgt i64 %v1, 200
  br i1 %cond1, label %bb200, label %bb1

bb1:
  %cond2 = icmp sgt i64 %v1, 100
  br i1 %cond2, label %bb100, label %bb2

bb2:
  %v2 = add nsw i64 %v1, 1
  store i64 %v2, ptr %p1, align 8
  br label %bb3

bb3:
  %v3 = load i64, ptr %p1, align 8
  store i64 %v3, ptr %p2, align 8
  ret void

bb100:
  %cond3 = call i1 @foo()
  br i1 %cond3, label %bb3, label %bb101

bb101:
  %v4 = load i64, ptr %p1, align 8
  store i64 %v4, ptr %p3, align 8
  ret void

bb200:
  %cond4 = call i1 @foo()
  br i1 %cond4, label %bb3, label %bb201

bb201:
  %_ = call i1 @maybethrow()
  %v5 = load i64, ptr %p1, align 8
  store i64 %v5, ptr %p4, align 8
  ret void
}

becomes

declare i1 @foo()

; Function Attrs: memory(none)
declare void @maybethrow() #0

define void @test17(ptr %p1, ptr %p2, ptr %p3, ptr %p4) {
entry:
  %v1 = load i64, ptr %p1, align 8
  %cond1 = icmp sgt i64 %v1, 200
  br i1 %cond1, label %bb200, label %bb1

bb1:                                              ; preds = %entry
  %cond2 = icmp sgt i64 %v1, 100
  br i1 %cond2, label %bb100, label %bb2

bb2:                                              ; preds = %bb1
  %v2 = add nsw i64 %v1, 1
  store i64 %v2, ptr %p1, align 8
  br label %bb3

bb3:                                              ; preds = %bb200, %bb100, %bb2
  %v3 = phi i64 [ %v3.pre, %bb200 ], [ %v3.pre1, %bb100 ], [ %v2, %bb2 ]
  store i64 %v3, ptr %p2, align 8
  ret void

bb100:                                            ; preds = %bb1
  %cond3 = call i1 @foo()
  %v3.pre1 = load i64, ptr %p1, align 8
  br i1 %cond3, label %bb3, label %bb101

bb101:                                            ; preds = %bb100
  store i64 %v3.pre1, ptr %p3, align 8
  ret void

bb200:                                            ; preds = %entry
  %cond4 = call i1 @foo()
  %v3.pre = load i64, ptr %p1, align 8
  br i1 %cond4, label %bb3, label %bb201

bb201:                                            ; preds = %bb200
  %_ = call i1 @maybethrow()
  store i64 %v3.pre, ptr %p4, align 8
  ret void
}

attributes #0 = { memory(none) }

%v5 is hoisted above call @maybethrow()

Carrot added a reverting change: rG43969af627aa: Revert "[GVN] Improve PRE on load instructions".Feb 1 2023, 2:49 PM

It's reverted now. Thanks for the test case. Will improve it in another version.

Carrot reopened this revision.Feb 2 2023, 8:08 PM

This revision is now accepted and ready to land.Feb 2 2023, 8:08 PM

Check for implicit control flow instruction in function findLoadToHoistIntoPred, so we can avoid moving load across unexpected control flow.

@reames and @hans please try to see if this patch works for your code.

Herald added a subscriber: ormris. · View Herald TranscriptFeb 2 2023, 8:10 PM

Harbormaster completed remote builds in B211635: Diff 494499.Feb 2 2023, 8:59 PM

This fixes (via the call to isDominatedByICFIFromSame) the issue I had identified in post commit review. I have not closely examined the patch more generally.

In D141712#4101594, @Carrot wrote:

Check for implicit control flow instruction in function findLoadToHoistIntoPred, so we can avoid moving load across unexpected control flow.

@reames and @hans please try to see if this patch works for your code.

Sadly, our test fails also with this version.

We don't have a reduced repro.

Would it be possible to try this out on some code internally in case any test there exposes the problem?

In private email communication, @hans told me he couldn't find any problem in his optimized code, and his fail disappeared after updating his source code. So there is no known problem with this version.

Could anybody take another look at this patch? Compared to last version there is only one line change in function findLoadToHoistIntoPred.

Works for me.

I believe the updated patch doesn't address the compile-time regression yet -- it's still running a whole extra GVN iteration to remove the load that is left behind.

To improve compile time, now I record the basic blocks contain dead load instructions, at the end of the same iteration, process these blocks again to delete load instructions. So we can avoid the extra iteration on the whole function.

Harbormaster completed remote builds in B215852: Diff 500326.Feb 25 2023, 1:45 AM

Current compile time change
http://llvm-compile-time-tracker.com/compare.php?from=c8e5354f0063b090b6fa7ad4cfc2b9df78038454&to=d21dbfa65423bbd9251ab5948f6cc1baa2db61c8&stat=instructions:u

@nikic, the original version of this patch is 0.07% slower because more triggered optimizations caused more iterations on huge functions.
http://llvm-compile-time-tracker.com/compare.php?from=d38d6065584ee5dd837e9a629f90c731d8a7dffc&to=94fc2022ff32b63d6c744d3eeff4f304e1a81618&stat=instructions:u

The second version is 0.16% slower because it leaves the dead instructions to next iteration, so it causes an extra iteration on whole function. http://llvm-compile-time-tracker.com/compare.php?from=287508cd9c4396c8845d92310d258879202a179e&to=5f1448fe1585b5677d5f0064e4eeac3b493d8a18&stat=instructions%3Au

The current version is 0.08% slower, very close to the first version. This time I record the BBs contain dead stores, instead of revisit the whole function, I just revisit these recorded BBs. http://llvm-compile-time-tracker.com/compare.php?from=c8e5354f0063b090b6fa7ad4cfc2b9df78038454&to=d21dbfa65423bbd9251ab5948f6cc1baa2db61c8&stat=instructions:u.

Is it OK now?
Thanks.

@nikic, do you have any other concern on this patch?

@mkazantsev, could you help to take a look at the latest version?

@nikic expressed his concern on compile time, then I sent out this version to improve compile time. But @nikic didn't response for several weeks.

The change is the use of a new variable AdditionalWorkSet for tracking BBs contain dead load, so we can process these BBs from AdditionalWorkSet at the end of each iteration, and avoid an extra iteration on the whole function.

@nikic, ping.

In D141712#4211777, @Carrot wrote:

@nikic, the original version of this patch is 0.07% slower because more triggered optimizations caused more iterations on huge functions.
http://llvm-compile-time-tracker.com/compare.php?from=d38d6065584ee5dd837e9a629f90c731d8a7dffc&to=94fc2022ff32b63d6c744d3eeff4f304e1a81618&stat=instructions:u

The second version is 0.16% slower because it leaves the dead instructions to next iteration, so it causes an extra iteration on whole function. http://llvm-compile-time-tracker.com/compare.php?from=287508cd9c4396c8845d92310d258879202a179e&to=5f1448fe1585b5677d5f0064e4eeac3b493d8a18&stat=instructions%3Au

The current version is 0.08% slower, very close to the first version. This time I record the BBs contain dead stores, instead of revisit the whole function, I just revisit these recorded BBs. http://llvm-compile-time-tracker.com/compare.php?from=c8e5354f0063b090b6fa7ad4cfc2b9df78038454&to=d21dbfa65423bbd9251ab5948f6cc1baa2db61c8&stat=instructions:u.

Is it OK now?
Thanks.

0.08% compile time difference is too small to be considered significant. To double check, what is the clang self build time diff?

Sorry for the long silence. I think it's fine in terms of compile time, but I guess I still don't fully get why it is better to reprocess the block than remove the load. I understand that InstrsToErase doesn't work for this case because it's only for the current BB, but we basically have the same code copied at the end of performScalarPRE() as well, so it seems like it should be possible to extract that into a common helper and use it here as well. Maybe I'm missing some subtlety.

Extract the common code for deleting an instruction to a function removeInstruction, call it from several places as required.

Harbormaster completed remote builds in B228921: Diff 518049.Apr 28 2023, 3:01 PM

I believe the change to metadata.ll is a miscompile. The new range metadata in that case should be a union of both old ranges. This is missing a combineMetadataForCSE() somewhere.

This revision now requires changes to proceed.May 9 2023, 6:24 AM

Combine the old load's metadata before deleting it.

Harbormaster completed remote builds in B231003: Diff 520874.May 9 2023, 6:31 PM

LGTM (Haven't reviewed the whole thing, but my concerns have been addressed.)

This revision is now accepted and ready to land.May 14 2023, 7:45 AM

This revision was landed with ongoing or failed builds.May 16 2023, 6:12 PM

Closed by commit rGd6811826371d: [GVN] Improve PRE on load instructions (authored by Carrot). · Explain Why

This revision was automatically updated to reflect the committed changes.

Carrot added a commit: rGd6811826371d: [GVN] Improve PRE on load instructions.

I strongly suspect that this patch is breaking the Rust compiler when building against LLVM head:

Instruction does not dominate all uses!
  %11 = load i32, ptr %10, align 4, !range !10, !alias.scope !6314, !noalias !6319
  %19 = phi i32 [ %8, %9 ], [ %11, %32 ]
in function _RINvNtCsbJsLLO9YF7d_9rustc_hir10intravisit9walk_bodyINtNtCs7c017R9yI2R_10rustc_lint4late18LateContextAndPassNtBT_33BuiltinCombinedModuleLateLintPassEEBT_
LLVM ERROR: Broken function found, compilation aborted!
error: could not compile `rustc_lint` (lib)

From https://buildkite.com/llvm-project/rust-llvm-integrate-prototype/builds/19373

Changes since the last successful run of that build bot: https://github.com/llvm/llvm-project/compare/d3b9d8b28f8e9fde41a4136c28f458347d9bb292...6a2f52a3a00bdfc8bd4edf6592099a7a749d324e

(I don't have a minimized repro yet).

Carrot added a reverting change: rGa3fbe5f7e6e6: Revert "[GVN] Improve PRE on load instructions".May 17 2023, 12:40 AM

@TimNN, I have reverted it, LLVM buildbot also reported a sanitized bootstrap failure.

If you have simple repro, it will be a great help to me.

Thanks.

Hi, this patch also broke several aarch64 2stage bots:

Antoine

In D141712#4348919, @Carrot wrote:

@TimNN, I have reverted it, LLVM buildbot also reported a sanitized bootstrap failure.

If you have simple repro, it will be a great help to me.

Thanks for reverting! I sadly don't have a good repro, and likely won't have the time to try and get one in the near future.

@maxim-kuvyrkov reported another build failure by this: https://ci.linaro.org/job/tcwg_kernel--llvm-master-arm-stable-allyesconfig-build/29/artifact/artifacts/notify/mail-body.txt/*view*/

The compiler crash didn't occur in my last commit, but occurred in this commit. It is caused by the interaction between the new instruction deletion method and the implementation of replaceValuesPerBlockEntry.

Consider the following case

; case1.

bb1:
   ...
   ; NewLoad is added here.
   br %cond1, label %bb2, label %bb3

bb2:
   OldLoad
   ...
   br label %bb3

bb3:
   Load
   ...

Once we decide to do the transformation, we take the following steps

Insert NewLoad into bb1.
Replace ValuesPerBlock(OldLoad->getParent(), OldLoad) with ValuesPerBlock(OldLoad->getParent(), NewLoad), so we'll get ValuesPerBlock(%bb2, NewLoad).
Delete instruction OldLoad, and replace all its uses with NewLoad.
Replace instruction Load with a new PHI instruction, required information is from ValuesPerBlock.

If we change the case to

; case2.

bb1:
   ...
   ; NewLoad is added here.
   br %cond, label %bb2, label %bb3

bb2:
   OldLoad
   ...
   br %cond2, label %bb5, label %bb4

bb4:
   ...
   br label %bb3

bb3:
   Load
   ...

We take the following steps to do the transformation

Insert NewLoad into bb1.
Try to replace ValuesPerBlock(OldLoad->getParent(), OldLoad) with ValuesPerBlock(OldLoad->getParent(), NewLoad). But we don't have an entry for ValuesPerBlock(%bb2, OldLoad), instead we have an entry ValuesPerBlock(%bb4, OldLoad), and it is not changed.
Delete instruction OldLoad, and replace all its uses with NewLoad.
Replace instruction Load with a new PHI instruction, required information is from ValuesPerBlock. Then we get the value OldLoad from %bb4 again, because OldLoad is already deleted, it's an invalid value, and causes crash.

In my last version, OldLoad isn't deleted immediately, the reference to it in the PHI instruction is valid and correct. In the next iteration of GVN, OldLoad can be found as fully redundant, then it is deleted and the use in PHI instruction is updated to use NewLoad.

It should be fixed in replaceValuesPerBlockEntry, every entry with the value OldLoad should be changed to NewLoad. Because OldLoad is dominated by NewLoad, so this change is safe.

Carrot reopened this revision.May 26 2023, 1:22 PM

This revision is now accepted and ready to land.May 26 2023, 1:22 PM

Updated the function replaceValuesPerBlockEntry to replace all OldLoad with NewLoad. So the deleted OldLoad instruction will not be used by later PHI instruction.

@TimNN, @antmo, @nickdesaulniers, @maxim-kuvyrkov could you help to test if this version work for you?

thanks

Harbormaster completed remote builds in B234954: Diff 526181.May 26 2023, 2:40 PM

the new version looks ok here (no more crash on ByteCode.cpp)

In D141712#4377041, @Carrot wrote:

Updated the function replaceValuesPerBlockEntry to replace all OldLoad with NewLoad. So the deleted OldLoad instruction will not be used by later PHI instruction.

@TimNN, @antmo, @nickdesaulniers, @maxim-kuvyrkov could you help to test if this version work for you?

thanks

Sorry, before even being able to test the new patch, I'm running into numerous regressions that I need to sort out first:

It's unusual to have that many regressions over the weekend, but may we please have some time to resolve those so that we can re-test this patch properly (before you resubmit)?

@antmo, thank you for your verification.

@nickdesaulniers, no problem to me, thanks!

Thanks for your patience. I was able to verify that Diff 526181 no longer ICE's as was observed by @maxim-kuvyrkov in linux stable 6.3.y for ARCH=arm allyesconfig https://ci.linaro.org/job/tcwg_kernel--llvm-master-arm-stable-allyesconfig-build/29/artifact/artifacts/06-build_linux/console.log.xz.

I could _not_ reproduce the problems I previously saw when building the Rust compiler, so seems to be all good now :).

@nickdesaulniers, @TimNN, thank you for your verification.

@mkazantsev, @nikic, could you take a look at this version, the only modification is in function replaceValuesPerBlockEntry and a new test case.

LGTM

dtcxzyw added a subscriber: dtcxzyw.Jun 5 2023, 10:55 PM

This revision was landed with ongoing or failed builds.Jun 6 2023, 12:47 PM

Closed by commit rG84bcfa0e1b34: [GVN] Improve PRE on load instructions (authored by Carrot). · Explain Why

This revision was automatically updated to reflect the committed changes.

Carrot added a commit: rG84bcfa0e1b34: [GVN] Improve PRE on load instructions.

Hi,

The following starts crashing with this patch:

opt -passes="inline,function<eager-inv;no-rerun>(gvn<>)" bbi-83499.ll -o /dev/null -debug

It requires -debug to crash since there is a verifyRemoved call hidden in a LLVM_DEBUG.

bbi-83499.ll687 BDownload

llvm/lib/Transforms/Scalar/GVN.cpp
3153	A bit strange to hide verification in LLVM_DEBUG? So we only run that with debug printouts turned on?

nikic mentioned this in rG282324aa4a6c: [GVN] Fix verifyRemoved() verification.Jun 12 2023, 6:17 AM

@uabelho Thanks for the report, should be fixed by https://github.com/llvm/llvm-project/commit/282324aa4a6c29d5ce31c66f8def15d9bd8e84e4.

(I did not add an additional test, because plenty of other tests already fail if we don't require -debug for this.)

In D141712#4413696, @nikic wrote:

@uabelho Thanks for the report, should be fixed by https://github.com/llvm/llvm-project/commit/282324aa4a6c29d5ce31c66f8def15d9bd8e84e4.

(I did not add an additional test, because plenty of other tests already fail if we don't require -debug for this.)

@nikic : Yes, I verified it doesn't crash anymore. Thanks, that was fast :)

@nikic Thanks for fixing!

It is not a fix for debug but a fix for actual unstable bug.
I saw the case that phantoms were found in VN and caused nondeterministic behavior with a low probability.
(I guess new Inst might point deleted one, not sure)

chapuni mentioned this in D150923: [KnownBits] Factor out and improve the lowbit computation for {u,s}div.Jun 13 2023, 6:49 AM

In D141712#4413696, @nikic wrote:

@uabelho Thanks for the report, should be fixed by https://github.com/llvm/llvm-project/commit/282324aa4a6c29d5ce31c66f8def15d9bd8e84e4.

(I did not add an additional test, because plenty of other tests already fail if we don't require -debug for this.)

Does it make sense to use AssertingVH instead of bare Value* in GVNPass::ValueTable::valueNumbering (and probably in some other places as well) to prevent such bugs?

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Scalar/

GVN.h

8 lines

lib/

Transforms/

Scalar/

GVN.cpp

135 lines

test/

Transforms/

GVN/

PRE/

2011-06-01-NonLocalMemdepMiscompile.ll

2 lines

2017-06-28-pre-load-dbgloc.ll

13 lines

pre-load.ll

199 lines

volatile.ll

10 lines

condprop.ll

12 lines

Diff 490663

llvm/include/llvm/Transforms/Scalar/GVN.h

Show First 20 Lines • Show All 324 Lines • ▼ Show 20 Lines	private:

/// Given a list of non-local dependencies, determine if a value is		/// Given a list of non-local dependencies, determine if a value is
/// available for the load in each specified block. If it is, add it to		/// available for the load in each specified block. If it is, add it to
/// ValuesPerBlock. If not, add it to UnavailableBlocks.		/// ValuesPerBlock. If not, add it to UnavailableBlocks.
void AnalyzeLoadAvailability(LoadInst *Load, LoadDepVect &Deps,		void AnalyzeLoadAvailability(LoadInst *Load, LoadDepVect &Deps,
AvailValInBlkVect &ValuesPerBlock,		AvailValInBlkVect &ValuesPerBlock,
UnavailBlkVect &UnavailableBlocks);		UnavailBlkVect &UnavailableBlocks);

		/// Given a critical edge from Pred to LoadBB, find a load instruction
		/// which is identical to Load from another successor of Pred.
		LoadInst findLoadToHoistIntoPred(BasicBlock Pred, BasicBlock *LoadBB,
		LoadInst *Load);

bool PerformLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,		bool PerformLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
UnavailBlkVect &UnavailableBlocks);		UnavailBlkVect &UnavailableBlocks);

/// Try to replace a load which executes on each loop iteraiton with Phi		/// Try to replace a load which executes on each loop iteraiton with Phi
/// translation of load in preheader and load(s) in conditionally executed		/// translation of load in preheader and load(s) in conditionally executed
/// paths.		/// paths.
bool performLoopLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,		bool performLoopLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
UnavailBlkVect &UnavailableBlocks);		UnavailBlkVect &UnavailableBlocks);

/// Eliminates partially redundant \p Load, replacing it with \p		/// Eliminates partially redundant \p Load, replacing it with \p
/// AvailableLoads (connected by Phis if needed).		/// AvailableLoads (connected by Phis if needed).
void eliminatePartiallyRedundantLoad(		void eliminatePartiallyRedundantLoad(
LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,		LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
MapVector<BasicBlock , Value > &AvailableLoads);		MapVector<BasicBlock , Value > &AvailableLoads,
		MapVector<BasicBlock , LoadInst > *CriticalEdgePredAndLoad);

// Other helper routines		// Other helper routines
bool processInstruction(Instruction *I);		bool processInstruction(Instruction *I);
bool processBlock(BasicBlock *BB);		bool processBlock(BasicBlock *BB);
void dump(DenseMap<uint32_t, Value *> &d) const;		void dump(DenseMap<uint32_t, Value *> &d) const;
bool iterateOnFunction(Function &F);		bool iterateOnFunction(Function &F);
bool performPRE(Function &F);		bool performPRE(Function &F);
bool performScalarPRE(Instruction *I);		bool performScalarPRE(Instruction *I);
Show All 37 Lines

llvm/lib/Transforms/Scalar/GVN.cpp

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
STATISTIC(NumGVNInstr, "Number of instructions deleted");		STATISTIC(NumGVNInstr, "Number of instructions deleted");
STATISTIC(NumGVNLoad, "Number of loads deleted");		STATISTIC(NumGVNLoad, "Number of loads deleted");
STATISTIC(NumGVNPRE, "Number of instructions PRE'd");		STATISTIC(NumGVNPRE, "Number of instructions PRE'd");
STATISTIC(NumGVNBlocks, "Number of blocks merged");		STATISTIC(NumGVNBlocks, "Number of blocks merged");
STATISTIC(NumGVNSimpl, "Number of instructions simplified");		STATISTIC(NumGVNSimpl, "Number of instructions simplified");
STATISTIC(NumGVNEqProp, "Number of equalities propagated");		STATISTIC(NumGVNEqProp, "Number of equalities propagated");
STATISTIC(NumPRELoad, "Number of loads PRE'd");		STATISTIC(NumPRELoad, "Number of loads PRE'd");
STATISTIC(NumPRELoopLoad, "Number of loop loads PRE'd");		STATISTIC(NumPRELoopLoad, "Number of loop loads PRE'd");
		STATISTIC(NumPRELoadMoved2CEPred,
		mkazantsevUnsubmitted Done Reply Inline Actions More specific name, smth related to critical edges maybe? mkazantsev: More specific name, smth related to critical edges maybe?
		"Number of loads moved to predecessor of a critical edge in PRE");

STATISTIC(IsValueFullyAvailableInBlockNumSpeculationsMax,		STATISTIC(IsValueFullyAvailableInBlockNumSpeculationsMax,
"Number of blocks speculated as available in "		"Number of blocks speculated as available in "
"IsValueFullyAvailableInBlock(), max");		"IsValueFullyAvailableInBlock(), max");
STATISTIC(MaxBBSpeculationCutoffReachedTimes,		STATISTIC(MaxBBSpeculationCutoffReachedTimes,
"Number of times we we reached gvn-max-block-speculations cut-off "		"Number of times we we reached gvn-max-block-speculations cut-off "
"preventing further exploration");		"preventing further exploration");

Show All 12 Lines

// This is based on IsValueFullyAvailableInBlockNumSpeculationsMax stat.		// This is based on IsValueFullyAvailableInBlockNumSpeculationsMax stat.
static cl::opt<uint32_t> MaxBBSpeculations(		static cl::opt<uint32_t> MaxBBSpeculations(
"gvn-max-block-speculations", cl::Hidden, cl::init(600),		"gvn-max-block-speculations", cl::Hidden, cl::init(600),
cl::desc("Max number of blocks we're willing to speculate on (and recurse "		cl::desc("Max number of blocks we're willing to speculate on (and recurse "
"into) when deducing if a value is fully available or not in GVN "		"into) when deducing if a value is fully available or not in GVN "
"(default = 600)"));		"(default = 600)"));

		static cl::opt<uint32_t> MaxNumInsnsPerBlock(
		"gvn-max-num-insns", cl::Hidden, cl::init(100),
		cl::desc("Max number of instructions to scan in each basic block in GVN "
		"(default = 100)"));

struct llvm::GVNPass::Expression {		struct llvm::GVNPass::Expression {
uint32_t opcode;		uint32_t opcode;
bool commutative = false;		bool commutative = false;
// The type is not necessarily the result type of the expression, it may be		// The type is not necessarily the result type of the expression, it may be
// any additional type needed to disambiguate the expression.		// any additional type needed to disambiguate the expression.
Type *type = nullptr;		Type *type = nullptr;
SmallVector<uint32_t, 4> varargs;		SmallVector<uint32_t, 4> varargs;

▲ Show 20 Lines • Show All 782 Lines • ▼ Show 20 Lines	#ifndef NDEBUG

assert(NewSpeculativelyAvailableBBs.empty() &&		assert(NewSpeculativelyAvailableBBs.empty() &&
"Must have fixed all the new speculatively available blocks.");		"Must have fixed all the new speculatively available blocks.");
#endif		#endif

return !UnavailableBB;		return !UnavailableBB;
}		}

		/// If the specified (BB, OldValue) exists in ValuesPerBlock, replace its value
		/// with NewValue, otherwise we don't change it.
		static void replaceValuesPerBlockEntry(
		SmallVectorImpl<AvailableValueInBlock> &ValuesPerBlock, BasicBlock *BB,
		Value OldValue, Value NewValue) {
		for (AvailableValueInBlock &V : ValuesPerBlock) {
		if (V.BB == BB) {
		if ((V.AV.isSimpleValue() && V.AV.getSimpleValue() == OldValue) \|\|
		(V.AV.isCoercedLoadValue() && V.AV.getCoercedLoadValue() == OldValue))
		V = AvailableValueInBlock::get(BB, NewValue);
		return;
		}
		}
		}

/// Given a set of loads specified by ValuesPerBlock,		/// Given a set of loads specified by ValuesPerBlock,
/// construct SSA form, allowing us to eliminate Load. This returns the value		/// construct SSA form, allowing us to eliminate Load. This returns the value
/// that should be used at Load's definition site.		/// that should be used at Load's definition site.
static Value *		static Value *
ConstructSSAForLoadSet(LoadInst *Load,		ConstructSSAForLoadSet(LoadInst *Load,
SmallVectorImpl<AvailableValueInBlock> &ValuesPerBlock,		SmallVectorImpl<AvailableValueInBlock> &ValuesPerBlock,
GVNPass &gvn) {		GVNPass &gvn) {
// Check for the fully redundant, dominating load case. In this case, we can		// Check for the fully redundant, dominating load case. In this case, we can
▲ Show 20 Lines • Show All 411 Lines • ▼ Show 20 Lines	if (AnalyzeLoadAvailability(Load, DepInfo, Address, AV)) {
UnavailableBlocks.push_back(DepBB);		UnavailableBlocks.push_back(DepBB);
}		}
}		}

assert(NumDeps == ValuesPerBlock.size() + UnavailableBlocks.size() &&		assert(NumDeps == ValuesPerBlock.size() + UnavailableBlocks.size() &&
"post condition violation");		"post condition violation");
}		}

		/// Given the following code, v1 is partially available on some edges, but not
		/// available on the edge from PredBB. This function tries to find if there is
		/// another identical load in the other successor of PredBB.
		///
		/// v0 = load %addr
		/// br %LoadBB
		///
		/// LoadBB:
		/// v1 = load %addr
		/// ...
		///
		/// PredBB:
		/// ...
		/// br %cond, label %LoadBB, label %SuccBB
		///
		/// SuccBB:
		/// v2 = load %addr
		/// ...
		///
		LoadInst GVNPass::findLoadToHoistIntoPred(BasicBlock Pred, BasicBlock *LoadBB,
		LoadInst *Load) {
		// For simplicity we handle a Pred has 2 successors only.
		auto *Term = Pred->getTerminator();
		if (Term->getNumSuccessors() != 2 \|\| Term->isExceptionalTerminator())
		mkazantsevUnsubmitted Not Done Reply Inline Actions Do you care about `switch` with 2 branches? If not, maybe then `match(m_Br(m_Value(), m_BasicBlock(IfTrue), m_BasicBlock(IfFalse)))`? mkazantsev: Do you care about `switch` with 2 branches? If not, maybe then `match(m_Br(m_Value()…
		CarrotAuthorUnsubmitted Done Reply Inline Actions I prefer to include switch case, this is a benefit without any extra cost. Carrot: I prefer to include switch case, this is a benefit without any extra cost.
		return nullptr;
		auto *SuccBB = Term->getSuccessor(0);
		if (SuccBB == LoadBB)
		mkazantsevUnsubmitted Not Done Reply Inline Actions Will there be a problem if `Term->getSuccessor(0) == Term->getSuccessor(1)`? Any tests for it? mkazantsev: Will there be a problem if `Term->getSuccessor(0) == Term->getSuccessor(1)`? Any tests for it?
		CarrotAuthorUnsubmitted Done Reply Inline Actions If Term->getSuccessor(0) == Term->getSuccessor(1), then SuccBB will have two predecessors, the next statement should exit immediately. Added a test case for it. Carrot: If Term->getSuccessor(0) == Term->getSuccessor(1), then SuccBB will have two predecessors, the…
		SuccBB = Term->getSuccessor(1);
		if (!SuccBB->getSinglePredecessor())
		mkazantsevUnsubmitted Not Done Reply Inline Actions Do you really need to check `EHPad` after you've already checked for `isExceptionalTerminator`? mkazantsev: Do you really need to check `EHPad` after you've already checked for `isExceptionalTerminator`?
		CarrotAuthorUnsubmitted Done Reply Inline Actions You are right, this is not necessary. Carrot: You are right, this is not necessary.
		return nullptr;

		unsigned int NumInsts = MaxNumInsnsPerBlock;
		mkazantsevUnsubmitted Done Reply Inline Actions nit: unsigned to signed conversion, might be warning in compiler mkazantsev: nit: unsigned to signed conversion, might be warning in compiler
		for (Instruction &Inst : *SuccBB) {
		mkazantsevUnsubmitted Done Reply Inline Actions Maybe for (Instruction &Inst : SuccBB) { if (!Inst.isIdenticalTo(Load)) continue; // same code with reduced nest } mkazantsev:* Maybe ``` for (Instruction &Inst : *SuccBB) { if (!Inst.isIdenticalTo(Load)) continue…
		if (--NumInsts == 0)
		return nullptr;

		if (!Inst.isIdenticalTo(Load))
		continue;

		MemDepResult Dep = MD->getDependency(&Inst);
		// If an identical load doesn't depends on any local instructions, it can
		mkazantsevUnsubmitted Done Reply Inline Actions Add a comment that, if one identical load already depends on something, then there is no point to look further? mkazantsev: Add a comment that, if one identical load already depends on something, then there is no point…
		// be safely moved to PredBB.
		if (Dep.isNonLocal())
		return cast<LoadInst>(&Inst);

		// Otherwise there is something in the same BB clobbers the memory, we can't
		// move this and later load to PredBB.
		return nullptr;
		}

		return nullptr;
		}

void GVNPass::eliminatePartiallyRedundantLoad(		void GVNPass::eliminatePartiallyRedundantLoad(
LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,		LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
MapVector<BasicBlock , Value > &AvailableLoads) {		MapVector<BasicBlock , Value > &AvailableLoads,
		MapVector<BasicBlock , LoadInst > *CriticalEdgePredAndLoad) {
for (const auto &AvailableLoad : AvailableLoads) {		for (const auto &AvailableLoad : AvailableLoads) {
BasicBlock *UnavailableBlock = AvailableLoad.first;		BasicBlock *UnavailableBlock = AvailableLoad.first;
Value *LoadPtr = AvailableLoad.second;		Value *LoadPtr = AvailableLoad.second;

auto *NewLoad =		auto *NewLoad =
new LoadInst(Load->getType(), LoadPtr, Load->getName() + ".pre",		new LoadInst(Load->getType(), LoadPtr, Load->getName() + ".pre",
Load->isVolatile(), Load->getAlign(), Load->getOrdering(),		Load->isVolatile(), Load->getAlign(), Load->getOrdering(),
Load->getSyncScopeID(), UnavailableBlock->getTerminator());		Load->getSyncScopeID(), UnavailableBlock->getTerminator());
Show All 37 Lines	for (const auto &AvailableLoad : AvailableLoads) {
// FIXME: How do we retain source locations without causing poor debugging		// FIXME: How do we retain source locations without causing poor debugging
// behavior?		// behavior?

// Add the newly created load.		// Add the newly created load.
ValuesPerBlock.push_back(		ValuesPerBlock.push_back(
AvailableValueInBlock::get(UnavailableBlock, NewLoad));		AvailableValueInBlock::get(UnavailableBlock, NewLoad));
MD->invalidateCachedPointerInfo(LoadPtr);		MD->invalidateCachedPointerInfo(LoadPtr);
LLVM_DEBUG(dbgs() << "GVN INSERTED " << *NewLoad << '\n');		LLVM_DEBUG(dbgs() << "GVN INSERTED " << *NewLoad << '\n');

		// For PredBB in CriticalEdgePredAndLoad we need to replace the uses of old
		// load instruction with the new created load instruction.
		if (CriticalEdgePredAndLoad) {
		auto I = CriticalEdgePredAndLoad->find(UnavailableBlock);
		if (I != CriticalEdgePredAndLoad->end()) {
		++NumPRELoadMoved2CEPred;
		mkazantsevUnsubmitted Done Reply Inline Actions Add some statistic here? mkazantsev: Add some statistic here?
		ICF->insertInstructionTo(NewLoad, UnavailableBlock);
		LoadInst *OldLoad = I->second;
		OldLoad->replaceAllUsesWith(NewLoad);
		replaceValuesPerBlockEntry(ValuesPerBlock, OldLoad->getParent(),
		OldLoad, NewLoad);
		if (uint32_t ValNo = VN.lookup(OldLoad, false))
		removeFromLeaderTable(ValNo, OldLoad, OldLoad->getParent());
		nikicUnsubmitted Not Done Reply Inline Actions If we're not removing the load, we probably shouldn't be removing it from the leader table either? nikic: If we're not removing the load, we probably shouldn't be removing it from the leader table…
		CarrotAuthorUnsubmitted Done Reply Inline Actions Because we expect it to be deleted in the next iteration, and the same value is also available in the NewLoad instruction, so I think it should not be used by other optimizations, and assume it's not available in the leader table. Carrot: Because we expect it to be deleted in the next iteration, and the same value is also available…
		// To avoid deleting an instruction from different BB, we just leave
		// the dead load here, it will be deleted in next iteration.
		}
		}
}		}

// Perform PHI construction.		// Perform PHI construction.
Value V = ConstructSSAForLoadSet(Load, ValuesPerBlock, this);		Value V = ConstructSSAForLoadSet(Load, ValuesPerBlock, this);
Load->replaceAllUsesWith(V);		Load->replaceAllUsesWith(V);
if (isa<PHINode>(V))		if (isa<PHINode>(V))
V->takeName(Load);		V->takeName(Load);
if (Instruction *I = dyn_cast<Instruction>(V))		if (Instruction *I = dyn_cast<Instruction>(V))
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	bool GVNPass::PerformLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
// available.		// available.
MapVector<BasicBlock , Value > PredLoads;		MapVector<BasicBlock , Value > PredLoads;
DenseMap<BasicBlock *, AvailabilityState> FullyAvailableBlocks;		DenseMap<BasicBlock *, AvailabilityState> FullyAvailableBlocks;
for (const AvailableValueInBlock &AV : ValuesPerBlock)		for (const AvailableValueInBlock &AV : ValuesPerBlock)
FullyAvailableBlocks[AV.BB] = AvailabilityState::Available;		FullyAvailableBlocks[AV.BB] = AvailabilityState::Available;
for (BasicBlock *UnavailableBB : UnavailableBlocks)		for (BasicBlock *UnavailableBB : UnavailableBlocks)
FullyAvailableBlocks[UnavailableBB] = AvailabilityState::Unavailable;		FullyAvailableBlocks[UnavailableBB] = AvailabilityState::Unavailable;

SmallVector<BasicBlock *, 4> CriticalEdgePred;		// The edge from Pred to LoadBB is a critical edge will be splitted.
		SmallVector<BasicBlock *, 4> CriticalEdgePredSplit;
		// The edge from Pred to LoadBB is a critical edge, another successor of Pred
		// contains a load can be moved to Pred. This data structure maps the Pred to
		// the movable load.
		MapVector<BasicBlock , LoadInst > CriticalEdgePredAndLoad;
		mkazantsevUnsubmitted Not Done Reply Inline Actions What if the same block goes into LoadBB multiple times? Smth like switch cond case 1: LoadBB case 2: LoadBB case 3: LoadBB default: LoadBB Will this work correctly for this case? Please add some tests for situations like this. mkazantsev: What if the same block goes into LoadBB multiple times? Smth like ``` switch cond case…
		CarrotAuthorUnsubmitted Done Reply Inline Actions It looks similar to your comment in findLoadToHoistIntoPred. Either the switch BB contains an identical load, nothing should be handled. Or switch BB doesn't contains an identical load, findLoadToHoistIntoPred returns nullptr because of too many edges. Test added. Carrot: It looks similar to your comment in findLoadToHoistIntoPred. Either the switch BB contains an…
for (BasicBlock *Pred : predecessors(LoadBB)) {		for (BasicBlock *Pred : predecessors(LoadBB)) {
// If any predecessor block is an EH pad that does not allow non-PHI		// If any predecessor block is an EH pad that does not allow non-PHI
// instructions before the terminator, we can't PRE the load.		// instructions before the terminator, we can't PRE the load.
if (Pred->getTerminator()->isEHPad()) {		if (Pred->getTerminator()->isEHPad()) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "COULD NOT PRE LOAD BECAUSE OF AN EH PAD PREDECESSOR '"		dbgs() << "COULD NOT PRE LOAD BECAUSE OF AN EH PAD PREDECESSOR '"
<< Pred->getName() << "': " << *Load << '\n');		<< Pred->getName() << "': " << *Load << '\n');
return false;		return false;
Show All 23 Lines	if (Pred->getTerminator()->getNumSuccessors() != 1) {
if (DT->dominates(LoadBB, Pred)) {		if (DT->dominates(LoadBB, Pred)) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs()		dbgs()
<< "COULD NOT PRE LOAD BECAUSE OF A BACKEDGE CRITICAL EDGE '"		<< "COULD NOT PRE LOAD BECAUSE OF A BACKEDGE CRITICAL EDGE '"
<< Pred->getName() << "': " << *Load << '\n');		<< Pred->getName() << "': " << *Load << '\n');
return false;		return false;
}		}

CriticalEdgePred.push_back(Pred);		if (LoadInst *LI = findLoadToHoistIntoPred(Pred, LoadBB, Load))
		CriticalEdgePredAndLoad[Pred] = LI;
		else
		CriticalEdgePredSplit.push_back(Pred);
		mkazantsevUnsubmitted Done Reply Inline Actions nit: `{ }` not needed mkazantsev: nit: `{ }` not needed
} else {		} else {
// Only add the predecessors that will not be split for now.		// Only add the predecessors that will not be split for now.
PredLoads[Pred] = nullptr;		PredLoads[Pred] = nullptr;
}		}
}		}

// Decide whether PRE is profitable for this load.		// Decide whether PRE is profitable for this load.
unsigned NumUnavailablePreds = PredLoads.size() + CriticalEdgePred.size();		unsigned NumInsertPreds = PredLoads.size() + CriticalEdgePredSplit.size();
		unsigned NumUnavailablePreds = NumInsertPreds +
		mkazantsevUnsubmitted Done Reply Inline Actions Is it really needed? There is literally same check just below. mkazantsev: Is it really needed? There is literally same check just below.
		CriticalEdgePredAndLoad.size();
assert(NumUnavailablePreds != 0 &&		assert(NumUnavailablePreds != 0 &&
"Fully available value should already be eliminated!");		"Fully available value should already be eliminated!");

// If this load is unavailable in multiple predecessors, reject it.		// If we need to insert new load in multiple predecessors, reject it.
// FIXME: If we could restructure the CFG, we could make a common pred with		// FIXME: If we could restructure the CFG, we could make a common pred with
// all the preds that don't have an available Load and insert a new load into		// all the preds that don't have an available Load and insert a new load into
// that one block.		// that one block.
if (NumUnavailablePreds != 1)		if (NumInsertPreds > 1)
return false;		return false;

// Now we know where we will insert load. We must ensure that it is safe		// Now we know where we will insert load. We must ensure that it is safe
// to speculatively execute the load at that points.		// to speculatively execute the load at that points.
if (MustEnsureSafetyOfSpeculativeExecution) {		if (MustEnsureSafetyOfSpeculativeExecution) {
if (CriticalEdgePred.size())		if (CriticalEdgePredSplit.size())
if (!isSafeToSpeculativelyExecute(Load, LoadBB->getFirstNonPHI(), AC, DT))		if (!isSafeToSpeculativelyExecute(Load, LoadBB->getFirstNonPHI(), AC, DT))
return false;		return false;
for (auto &PL : PredLoads)		for (auto &PL : PredLoads)
if (!isSafeToSpeculativelyExecute(Load, PL.first->getTerminator(), AC,		if (!isSafeToSpeculativelyExecute(Load, PL.first->getTerminator(), AC,
DT))		DT))
return false;		return false;
		for (auto &CEP : CriticalEdgePredAndLoad)
		if (!isSafeToSpeculativelyExecute(Load, CEP.first->getTerminator(), AC,
		DT))
		return false;
}		}

// Split critical edges, and update the unavailable predecessors accordingly.		// Split critical edges, and update the unavailable predecessors accordingly.
for (BasicBlock *OrigPred : CriticalEdgePred) {		for (BasicBlock *OrigPred : CriticalEdgePredSplit) {
BasicBlock *NewPred = splitCriticalEdges(OrigPred, LoadBB);		BasicBlock *NewPred = splitCriticalEdges(OrigPred, LoadBB);
assert(!PredLoads.count(OrigPred) && "Split edges shouldn't be in map!");		assert(!PredLoads.count(OrigPred) && "Split edges shouldn't be in map!");
PredLoads[NewPred] = nullptr;		PredLoads[NewPred] = nullptr;
LLVM_DEBUG(dbgs() << "Split critical edge " << OrigPred->getName() << "->"		LLVM_DEBUG(dbgs() << "Split critical edge " << OrigPred->getName() << "->"
<< LoadBB->getName() << '\n');		<< LoadBB->getName() << '\n');
}		}

		for (auto &CEP : CriticalEdgePredAndLoad)
		PredLoads[CEP.first] = nullptr;

// Check if the load can safely be moved to all the unavailable predecessors.		// Check if the load can safely be moved to all the unavailable predecessors.
bool CanDoPRE = true;		bool CanDoPRE = true;
const DataLayout &DL = Load->getModule()->getDataLayout();		const DataLayout &DL = Load->getModule()->getDataLayout();
SmallVector<Instruction*, 8> NewInsts;		SmallVector<Instruction*, 8> NewInsts;
for (auto &PredLoad : PredLoads) {		for (auto &PredLoad : PredLoads) {
BasicBlock *UnavailablePred = PredLoad.first;		BasicBlock *UnavailablePred = PredLoad.first;

// Do PHI translation to get its value in the predecessor if necessary. The		// Do PHI translation to get its value in the predecessor if necessary. The
Show All 40 Lines	while (!NewInsts.empty()) {
// trying to number them. PHI translation might insert instructions		// trying to number them. PHI translation might insert instructions
// in basic blocks other than the current one, and we delete them		// in basic blocks other than the current one, and we delete them
// directly, as markInstructionForDeletion only allows removing from the		// directly, as markInstructionForDeletion only allows removing from the
// current basic block.		// current basic block.
NewInsts.pop_back_val()->eraseFromParent();		NewInsts.pop_back_val()->eraseFromParent();
}		}
// HINT: Don't revert the edge-splitting as following transformation may		// HINT: Don't revert the edge-splitting as following transformation may
// also need to split these critical edges.		// also need to split these critical edges.
return !CriticalEdgePred.empty();		return !CriticalEdgePredSplit.empty();
}		}

// Okay, we can eliminate this load by inserting a reload in the predecessor		// Okay, we can eliminate this load by inserting a reload in the predecessor
// and using PHI construction to get the value in the other predecessors, do		// and using PHI construction to get the value in the other predecessors, do
// it.		// it.
LLVM_DEBUG(dbgs() << "GVN REMOVING PRE LOAD: " << *Load << '\n');		LLVM_DEBUG(dbgs() << "GVN REMOVING PRE LOAD: " << *Load << '\n');
LLVM_DEBUG(if (!NewInsts.empty()) dbgs() << "INSERTED " << NewInsts.size()		LLVM_DEBUG(if (!NewInsts.empty()) dbgs() << "INSERTED " << NewInsts.size()
<< " INSTS: " << *NewInsts.back()		<< " INSTS: " << *NewInsts.back()
<< '\n');		<< '\n');

// Assign value numbers to the new instructions.		// Assign value numbers to the new instructions.
for (Instruction *I : NewInsts) {		for (Instruction *I : NewInsts) {
// Instructions that have been inserted in predecessor(s) to materialize		// Instructions that have been inserted in predecessor(s) to materialize
// the load address do not retain their original debug locations. Doing		// the load address do not retain their original debug locations. Doing
// so could lead to confusing (but correct) source attributions.		// so could lead to confusing (but correct) source attributions.
I->updateLocationAfterHoist();		I->updateLocationAfterHoist();

// FIXME: We really _ought_ to insert these value numbers into their		// FIXME: We really _ought_ to insert these value numbers into their
// parent's availability map. However, in doing so, we risk getting into		// parent's availability map. However, in doing so, we risk getting into
// ordering issues. If a block hasn't been processed yet, we would be		// ordering issues. If a block hasn't been processed yet, we would be
// marking a value as AVAIL-IN, which isn't what we intend.		// marking a value as AVAIL-IN, which isn't what we intend.
VN.lookupOrAdd(I);		VN.lookupOrAdd(I);
}		}

eliminatePartiallyRedundantLoad(Load, ValuesPerBlock, PredLoads);		eliminatePartiallyRedundantLoad(Load, ValuesPerBlock, PredLoads,
		&CriticalEdgePredAndLoad);
++NumPRELoad;		++NumPRELoad;
return true;		return true;
}		}

bool GVNPass::performLoopLoadPRE(LoadInst *Load,		bool GVNPass::performLoopLoadPRE(LoadInst *Load,
AvailValInBlkVect &ValuesPerBlock,		AvailValInBlkVect &ValuesPerBlock,
UnavailBlkVect &UnavailableBlocks) {		UnavailBlkVect &UnavailableBlocks) {
if (!LI)		if (!LI)
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	if (LoadPtr->canBeFreed())
return false;		return false;

// TODO: Support critical edge splitting if blocker has more than 1 successor.		// TODO: Support critical edge splitting if blocker has more than 1 successor.
MapVector<BasicBlock , Value > AvailableLoads;		MapVector<BasicBlock , Value > AvailableLoads;
AvailableLoads[LoopBlock] = LoadPtr;		AvailableLoads[LoopBlock] = LoadPtr;
AvailableLoads[Preheader] = LoadPtr;		AvailableLoads[Preheader] = LoadPtr;

LLVM_DEBUG(dbgs() << "GVN REMOVING PRE LOOP LOAD: " << *Load << '\n');		LLVM_DEBUG(dbgs() << "GVN REMOVING PRE LOOP LOAD: " << *Load << '\n');
eliminatePartiallyRedundantLoad(Load, ValuesPerBlock, AvailableLoads);		eliminatePartiallyRedundantLoad(Load, ValuesPerBlock, AvailableLoads,
		/CriticalEdgePredAndLoad/ nullptr);
++NumPRELoopLoad;		++NumPRELoopLoad;
return true;		return true;
}		}

static void reportLoadElim(LoadInst Load, Value AvailableValue,		static void reportLoadElim(LoadInst Load, Value AvailableValue,
OptimizationRemarkEmitter *ORE) {		OptimizationRemarkEmitter *ORE) {
using namespace ore;		using namespace ore;

▲ Show 20 Lines • Show All 954 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator BI = BB->begin(), BE = BB->end();
NumGVNInstr += InstrsToErase.size();		NumGVNInstr += InstrsToErase.size();

// Avoid iterator invalidation.		// Avoid iterator invalidation.
bool AtStart = BI == BB->begin();		bool AtStart = BI == BB->begin();
if (!AtStart)		if (!AtStart)
--BI;		--BI;

for (auto *I : InstrsToErase) {		for (auto *I : InstrsToErase) {
assert(I->getParent() == BB && "Removing instruction from wrong block?");		assert(I->getParent() == BB && "Removing instruction from wrong block?");
mkazantsevUnsubmitted Not Done Reply Inline Actions I think this assert should not fail, and if it fails, you have a bug. `ICF` may keep cached information for it bound to its old parent, and will try to remove it from its new parent. You may get inconsistent state of `ICF` because of it. At least I don't see how you update it. You should not move instructions. The right approach is to create a new one. mkazantsev: I think this assert should not fail, and if it fails, you have a bug. `ICF` may keep cached…
CarrotAuthorUnsubmitted Done Reply Inline Actions Add a call to ICF->insertInstructionTo for the new created load instruction. For the deleted instruction ICF->removeInstruction is called in line 2711. Maybe a naive question, does Load instruction impact implicit control flow? Carrot: Add a call to ICF->insertInstructionTo for the new created load instruction. For the deleted…
mkazantsevUnsubmitted Not Done Reply Inline Actions Sorry, I didn't formulate the problem I'm seeing correctly. It's not that loads create implicit control flow. It's that by removing this assert, you allow instruction motion here. There is no check that you could only have moved a load, right? So potentially it creates a room for this kind of bugs. mkazantsev: Sorry, I didn't formulate the problem I'm seeing correctly. It's not that loads create implicit…
mkazantsevUnsubmitted Not Done Reply Inline Actions Let's not scatter cache updates across the code. There can be new caches added in the future, not only `ICF` or whatever it is now. We don't want multiple places where we need to update them. This opens doors for bugs. Any serious reasons to move instruction rather than create a new one? mkazantsev: Let's not scatter cache updates across the code. There can be new caches added in the future…
mkazantsevUnsubmitted Not Done Reply Inline Actions Just imagine that someone will accidentally move an ICF instruction in later change. Currently, the assert is protecting us from it. By giving it up, we make mistakes like this harder to find. mkazantsev: Just imagine that someone will accidentally move an ICF instruction in later change. Currently…
CarrotAuthorUnsubmitted Done Reply Inline Actions Sounds reasonable. So now I just create a new load, replace all uses of old load with the new load. The dead old load instruction can be deleted in the next iteration of GVN. Carrot: Sounds reasonable. So now I just create a new load, replace all uses of old load with the new…
LLVM_DEBUG(dbgs() << "GVN removed: " << *I << '\n');		LLVM_DEBUG(dbgs() << "GVN removed: " << *I << '\n');
salvageKnowledge(I, AC);		salvageKnowledge(I, AC);
salvageDebugInfo(*I);		salvageDebugInfo(*I);
if (MD) MD->removeInstruction(I);		if (MD) MD->removeInstruction(I);
if (MSSAU)		if (MSSAU)
MSSAU->removeMemoryAccess(I);		MSSAU->removeMemoryAccess(I);
LLVM_DEBUG(verifyRemoved(I));		LLVM_DEBUG(verifyRemoved(I));
ICF->removeInstruction(I);		ICF->removeInstruction(I);
▲ Show 20 Lines • Show All 320 Lines • ▼ Show 20 Lines	void GVNPass::cleanupGlobalSets() {
ICF->clear();		ICF->clear();
InvalidBlockRPONumbers = true;		InvalidBlockRPONumbers = true;
}		}

/// Verify that the specified instruction does not occur in our		/// Verify that the specified instruction does not occur in our
/// internal data structures.		/// internal data structures.
void GVNPass::verifyRemoved(const Instruction *Inst) const {		void GVNPass::verifyRemoved(const Instruction *Inst) const {
VN.verifyRemoved(Inst);		VN.verifyRemoved(Inst);

		uabelhoUnsubmitted Not Done Reply Inline Actions A bit strange to hide verification in LLVM_DEBUG? So we only run that with debug printouts turned on? uabelho: A bit strange to hide verification in LLVM_DEBUG? So we only run that with debug printouts…
// Walk through the value number scope to make sure the instruction isn't		// Walk through the value number scope to make sure the instruction isn't
// ferreted away in it.		// ferreted away in it.
for (const auto &I : LeaderTable) {		for (const auto &I : LeaderTable) {
const LeaderTableEntry *Node = &I.second;		const LeaderTableEntry *Node = &I.second;
assert(Node->Val != Inst && "Inst still in value numbering scope!");		assert(Node->Val != Inst && "Inst still in value numbering scope!");

while (Node->Next) {		while (Node->Next) {
Node = Node->Next;		Node = Node->Next;
▲ Show 20 Lines • Show All 199 Lines • Show Last 20 Lines

llvm/test/Transforms/GVN/PRE/2011-06-01-NonLocalMemdepMiscompile.ll

	Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines

	; CHECK-LABEL: bb6:			; CHECK-LABEL: bb6:
	; CHECK: br i1 undef, label %bb15split, label %bb10			; CHECK: br i1 undef, label %bb15split, label %bb10

	; CHECK-LABEL: bb15split: ; preds = %bb6			; CHECK-LABEL: bb15split: ; preds = %bb6
	; CHECK-NEXT: br label %bb15			; CHECK-NEXT: br label %bb15

	; CHECK-LABEL: bb15:			; CHECK-LABEL: bb15:
	; CHECK: %tmp17 = phi i8 [ %tmp8, %bb15split ], [ %tmp17.pre, %bb1.bb15_crit_edge ]			; CHECK: %tmp17 = phi i8 [ %tmp12.pre3, %bb15split ], [ %tmp17.pre, %bb1.bb15_crit_edge ]

	bb19: ; preds = %bb15			bb19: ; preds = %bb15
	ret i1 %tmp18			ret i1 %tmp18
	}			}

	declare void @isalnum() nounwind inlinehint ssp			declare void @isalnum() nounwind inlinehint ssp

llvm/test/Transforms/GVN/PRE/2017-06-28-pre-load-dbgloc.ll

	; This test checks if debug loc is propagated to load/store created by GVN/Instcombine.			; This test checks if debug loc is propagated to load/store created by GVN/Instcombine.
	; RUN: opt < %s -passes=gvn -S \| FileCheck %s --check-prefixes=ALL,GVN			; RUN: opt < %s -passes=gvn -S \| FileCheck %s --check-prefixes=ALL
	; RUN: opt < %s -passes=gvn,instcombine -S \| FileCheck %s --check-prefixes=ALL,INSTCOMBINE			; RUN: opt < %s -passes=gvn,instcombine -S \| FileCheck %s --check-prefixes=ALL
	mkazantsevUnsubmitted Not Done Reply Inline Actions Why change that? mkazantsev: Why change that?
	CarrotAuthorUnsubmitted Done Reply Inline Actions Because with this optimization, both cases generate same result. The PRE of load %desc now can be detected and moved to entry block. Carrot: Because with this optimization, both cases generate same result. The PRE of load %desc now can…

	; struct node {			; struct node {
	; int *v;			; int *v;
	; struct desc *descs;			; struct desc *descs;
	; };			; };

	; struct desc {			; struct desc {
	; struct node *node;			; struct node *node;
	Show All 18 Lines

	%struct.desc = type { ptr }			%struct.desc = type { ptr }
	%struct.node = type { ptr, ptr }			%struct.node = type { ptr, ptr }

	define i32 @test(ptr readonly %desc) local_unnamed_addr #0 !dbg !4 {			define i32 @test(ptr readonly %desc) local_unnamed_addr #0 !dbg !4 {
	entry:			entry:
	%tobool = icmp eq ptr %desc, null			%tobool = icmp eq ptr %desc, null
	br i1 %tobool, label %cond.end, label %cond.false, !dbg !9			br i1 %tobool, label %cond.end, label %cond.false, !dbg !9
	; ALL: br i1 %tobool, label %entry.cond.end_crit_edge, label %cond.false, !dbg [[LOC_15_6:![0-9]+]]			; ALL: %.pre = load ptr, ptr %desc, align 8, !dbg [[LOC_16_13:![0-9]+]]
	; ALL: entry.cond.end_crit_edge:			; ALL: br i1 %tobool, label %cond.end, label %cond.false, !dbg [[LOC_15_6:![0-9]+]]
	; GVN: %.pre = load ptr, ptr null, align 8, !dbg [[LOC_16_13:![0-9]+]]			; ALL: cond.false:
	; INSTCOMBINE:store ptr poison, ptr null, align 4294967296, !dbg [[LOC_16_13:![0-9]+]]

	cond.false:			cond.false:
	%0 = load ptr, ptr %desc, align 8, !dbg !11			%0 = load ptr, ptr %desc, align 8, !dbg !11
	%1 = load ptr, ptr %0, align 8			%1 = load ptr, ptr %0, align 8
	br label %cond.end, !dbg !9			br label %cond.end, !dbg !9

	cond.end:			cond.end:
	%2 = phi ptr [ %1, %cond.false ], [ null, %entry ], !dbg !9			%2 = phi ptr [ %1, %cond.false ], [ null, %entry ], !dbg !9
	Show All 17 Lines
	!6 = !{!7}			!6 = !{!7}
	!7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)			!7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
	!8 = !{}			!8 = !{}
	!9 = !DILocation(line: 15, column: 6, scope: !4)			!9 = !DILocation(line: 15, column: 6, scope: !4)
	!10 = !DILocation(line: 16, column: 13, scope: !4)			!10 = !DILocation(line: 16, column: 13, scope: !4)
	!11 = !DILocation(line: 15, column: 34, scope: !4)			!11 = !DILocation(line: 15, column: 34, scope: !4)

	;ALL: [[SCOPE:![0-9]+]] = distinct !DISubprogram(name: "test",{{.*}}			;ALL: [[SCOPE:![0-9]+]] = distinct !DISubprogram(name: "test",{{.*}}
	;ALL: [[LOC_15_6]] = !DILocation(line: 15, column: 6, scope: [[SCOPE]])
	;ALL: [[LOC_16_13]] = !DILocation(line: 16, column: 13, scope: [[SCOPE]])			;ALL: [[LOC_16_13]] = !DILocation(line: 16, column: 13, scope: [[SCOPE]])
				;ALL: [[LOC_15_6]] = !DILocation(line: 15, column: 6, scope: [[SCOPE]])

llvm/test/Transforms/GVN/PRE/pre-load.ll

Show First 20 Lines • Show All 681 Lines • ▼ Show 20 Lines
; Same as test13, but %x here is dereferenceable. A pointer that is		; Same as test13, but %x here is dereferenceable. A pointer that is
; dereferenceable can be loaded from speculatively without a risk of trapping.		; dereferenceable can be loaded from speculatively without a risk of trapping.
; Since it is OK to speculate, PRE is allowed.		; Since it is OK to speculate, PRE is allowed.

define i32 @test15(ptr noalias nocapture readonly dereferenceable(8) align 4 %x, ptr noalias nocapture %r, i32 %a) nofree nosync {		define i32 @test15(ptr noalias nocapture readonly dereferenceable(8) align 4 %x, ptr noalias nocapture %r, i32 %a) nofree nosync {
; CHECK-LABEL: @test15(		; CHECK-LABEL: @test15(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TOBOOL:%.]] = icmp eq i32 [[A:%.]], 0		; CHECK-NEXT: [[TOBOOL:%.]] = icmp eq i32 [[A:%.]], 0
; CHECK-NEXT: br i1 [[TOBOOL]], label [[ENTRY_IF_END_CRIT_EDGE:%.]], label [[IF_THEN:%.]]
; CHECK: entry.if.end_crit_edge:
; CHECK-NEXT: [[VV_PRE:%.]] = load i32, ptr [[X:%.]], align 4		; CHECK-NEXT: [[VV_PRE:%.]] = load i32, ptr [[X:%.]], align 4
; CHECK-NEXT: br label [[IF_END:%.*]]		; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_END:%.]], label [[IF_THEN:%.]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[UU:%.*]] = load i32, ptr [[X]], align 4		; CHECK-NEXT: store i32 [[VV_PRE]], ptr [[R:%.*]], align 4
; CHECK-NEXT: store i32 [[UU]], ptr [[R:%.*]], align 4
; CHECK-NEXT: br label [[IF_END]]		; CHECK-NEXT: br label [[IF_END]]
; CHECK: if.end:		; CHECK: if.end:
; CHECK-NEXT: [[VV:%.*]] = phi i32 [ [[VV_PRE]], [[ENTRY_IF_END_CRIT_EDGE]] ], [ [[UU]], [[IF_THEN]] ]
; CHECK-NEXT: call void @f()		; CHECK-NEXT: call void @f()
; CHECK-NEXT: ret i32 [[VV]]		; CHECK-NEXT: ret i32 [[VV_PRE]]
;		;

entry:		entry:
%tobool = icmp eq i32 %a, 0		%tobool = icmp eq i32 %a, 0
br i1 %tobool, label %if.end, label %if.then		br i1 %tobool, label %if.end, label %if.then


if.then:		if.then:
Show All 13 Lines
; Same as test14, but %x here is dereferenceable. A pointer that is		; Same as test14, but %x here is dereferenceable. A pointer that is
; dereferenceable can be loaded from speculatively without a risk of trapping.		; dereferenceable can be loaded from speculatively without a risk of trapping.
; Since it is OK to speculate, PRE is allowed.		; Since it is OK to speculate, PRE is allowed.

define i32 @test16(ptr noalias nocapture readonly dereferenceable(8) align 4 %x, ptr noalias nocapture %r, i32 %a) nofree nosync {		define i32 @test16(ptr noalias nocapture readonly dereferenceable(8) align 4 %x, ptr noalias nocapture %r, i32 %a) nofree nosync {
; CHECK-LABEL: @test16(		; CHECK-LABEL: @test16(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TOBOOL:%.]] = icmp eq i32 [[A:%.]], 0		; CHECK-NEXT: [[TOBOOL:%.]] = icmp eq i32 [[A:%.]], 0
; CHECK-NEXT: br i1 [[TOBOOL]], label [[ENTRY_IF_END_CRIT_EDGE:%.]], label [[IF_THEN:%.]]
; CHECK: entry.if.end_crit_edge:
; CHECK-NEXT: [[VV_PRE:%.]] = load i32, ptr [[X:%.]], align 4		; CHECK-NEXT: [[VV_PRE:%.]] = load i32, ptr [[X:%.]], align 4
; CHECK-NEXT: br label [[IF_END:%.*]]		; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_END:%.]], label [[IF_THEN:%.]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[UU:%.*]] = load i32, ptr [[X]], align 4		; CHECK-NEXT: store i32 [[VV_PRE]], ptr [[R:%.*]], align 4
; CHECK-NEXT: store i32 [[UU]], ptr [[R:%.*]], align 4
; CHECK-NEXT: br label [[IF_END]]		; CHECK-NEXT: br label [[IF_END]]
; CHECK: if.end:		; CHECK: if.end:
; CHECK-NEXT: [[VV:%.*]] = phi i32 [ [[VV_PRE]], [[ENTRY_IF_END_CRIT_EDGE]] ], [ [[UU]], [[IF_THEN]] ]
; CHECK-NEXT: call void @f()		; CHECK-NEXT: call void @f()
; CHECK-NEXT: ret i32 [[VV]]		; CHECK-NEXT: ret i32 [[VV_PRE]]
;		;

entry:		entry:
%tobool = icmp eq i32 %a, 0		%tobool = icmp eq i32 %a, 0
br i1 %tobool, label %if.end, label %if.then		br i1 %tobool, label %if.end, label %if.then


if.then:		if.then:
Show All 31 Lines
; CHECK: bb1:		; CHECK: bb1:
; CHECK-NEXT: [[COND2:%.*]] = icmp sgt i64 [[V1]], 100		; CHECK-NEXT: [[COND2:%.*]] = icmp sgt i64 [[V1]], 100
; CHECK-NEXT: br i1 [[COND2]], label [[BB100:%.]], label [[BB2:%.]]		; CHECK-NEXT: br i1 [[COND2]], label [[BB100:%.]], label [[BB2:%.]]
; CHECK: bb2:		; CHECK: bb2:
; CHECK-NEXT: [[V2:%.*]] = add nsw i64 [[V1]], 1		; CHECK-NEXT: [[V2:%.*]] = add nsw i64 [[V1]], 1
; CHECK-NEXT: store i64 [[V2]], ptr [[P1]], align 8		; CHECK-NEXT: store i64 [[V2]], ptr [[P1]], align 8
; CHECK-NEXT: br label [[BB3:%.*]]		; CHECK-NEXT: br label [[BB3:%.*]]
; CHECK: bb3:		; CHECK: bb3:
; CHECK-NEXT: [[V3:%.*]] = load i64, ptr [[P1]], align 8		; CHECK-NEXT: [[V3:%.]] = phi i64 [ [[V3_PRE:%.]], [[BB200]] ], [ [[V3_PRE1:%.*]], [[BB100]] ], [ [[V2]], [[BB2]] ]
; CHECK-NEXT: store i64 [[V3]], ptr [[P2:%.*]], align 8		; CHECK-NEXT: store i64 [[V3]], ptr [[P2:%.*]], align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
; CHECK: bb100:		; CHECK: bb100:
; CHECK-NEXT: [[COND3:%.*]] = call i1 @foo()		; CHECK-NEXT: [[COND3:%.*]] = call i1 @foo()
		; CHECK-NEXT: [[V3_PRE1]] = load i64, ptr [[P1]], align 8
; CHECK-NEXT: br i1 [[COND3]], label [[BB3]], label [[BB101:%.*]]		; CHECK-NEXT: br i1 [[COND3]], label [[BB3]], label [[BB101:%.*]]
; CHECK: bb101:		; CHECK: bb101:
; CHECK-NEXT: [[V4:%.*]] = load i64, ptr [[P1]], align 8		; CHECK-NEXT: store i64 [[V3_PRE1]], ptr [[P3:%.*]], align 8
; CHECK-NEXT: store i64 [[V4]], ptr [[P3:%.*]], align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
; CHECK: bb200:		; CHECK: bb200:
; CHECK-NEXT: [[COND4:%.*]] = call i1 @bar()		; CHECK-NEXT: [[COND4:%.*]] = call i1 @bar()
		; CHECK-NEXT: [[V3_PRE]] = load i64, ptr [[P1]], align 8
; CHECK-NEXT: br i1 [[COND4]], label [[BB3]], label [[BB201:%.*]]		; CHECK-NEXT: br i1 [[COND4]], label [[BB3]], label [[BB201:%.*]]
; CHECK: bb201:		; CHECK: bb201:
; CHECK-NEXT: [[V5:%.*]] = load i64, ptr [[P1]], align 8		; CHECK-NEXT: store i64 [[V3_PRE]], ptr [[P4:%.*]], align 8
; CHECK-NEXT: store i64 [[V5]], ptr [[P4:%.*]], align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
{		{
entry:		entry:
%v1 = load i64, ptr %p1, align 8		%v1 = load i64, ptr %p1, align 8
%cond1 = icmp sgt i64 %v1, 200		%cond1 = icmp sgt i64 %v1, 200
br i1 %cond1, label %bb200, label %bb1		br i1 %cond1, label %bb200, label %bb1

Show All 24 Lines	bb200:
%cond4 = call i1 @bar()		%cond4 = call i1 @bar()
br i1 %cond4, label %bb3, label %bb201		br i1 %cond4, label %bb3, label %bb201

bb201:		bb201:
%v5 = load i64, ptr %p1, align 8		%v5 = load i64, ptr %p1, align 8
store i64 %v5, ptr %p4, align 8		store i64 %v5, ptr %p4, align 8
ret void		ret void
}		}

		; The output value from %if.then block is %dec, not loaded %v1.
		; So ValuesPerBlock[%if.then] should not be replaced when the load instruction
		; is moved to %entry.
		define void @test18(i1 %cond, ptr %p1, ptr %p2) {
		; CHECK-LABEL: @test18(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[V2_PRE:%.]] = load i16, ptr [[P1:%.]], align 2
		; CHECK-NEXT: br i1 [[COND:%.]], label [[IF_END:%.]], label [[IF_THEN:%.*]]
		; CHECK: if.then:
		; CHECK-NEXT: [[DEC:%.*]] = add i16 [[V2_PRE]], -1
		; CHECK-NEXT: store i16 [[DEC]], ptr [[P1]], align 2
		; CHECK-NEXT: br label [[IF_END]]
		; CHECK: if.end:
		; CHECK-NEXT: [[V2:%.]] = phi i16 [ [[DEC]], [[IF_THEN]] ], [ [[V2_PRE]], [[ENTRY:%.]] ]
		; CHECK-NEXT: store i16 [[V2]], ptr [[P2:%.*]], align 2
		; CHECK-NEXT: ret void
		;
		entry:
		br i1 %cond, label %if.end, label %if.then

		if.then:
		%v1 = load i16, ptr %p1
		%dec = add i16 %v1, -1
		store i16 %dec, ptr %p1
		br label %if.end

		if.end:
		%v2 = load i16, ptr %p1
		store i16 %v2, ptr %p2
		ret void
		}

		; PRE of load instructions should not cross exception handling instructions.
		define void @test19(i1 %cond, ptr %p1, ptr %p2)
		; CHECK-LABEL: @test19(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: br i1 [[COND:%.]], label [[THEN:%.]], label [[ELSE:%.*]]
		; CHECK: then:
		; CHECK-NEXT: [[V2:%.]] = load i64, ptr [[P2:%.]], align 8
		; CHECK-NEXT: [[ADD:%.*]] = add i64 [[V2]], 1
		; CHECK-NEXT: store i64 [[ADD]], ptr [[P1:%.*]], align 8
		; CHECK-NEXT: br label [[END:%.*]]
		; CHECK: else:
		; CHECK-NEXT: invoke void @f()
		; CHECK-NEXT: to label [[ELSE_END_CRIT_EDGE:%.]] unwind label [[LPAD:%.]]
		; CHECK: else.end_crit_edge:
		; CHECK-NEXT: [[V1_PRE:%.*]] = load i64, ptr [[P1]], align 8
		; CHECK-NEXT: br label [[END]]
		; CHECK: end:
		; CHECK-NEXT: [[V1:%.*]] = phi i64 [ [[V1_PRE]], [[ELSE_END_CRIT_EDGE]] ], [ [[ADD]], [[THEN]] ]
		; CHECK-NEXT: [[AND:%.*]] = and i64 [[V1]], 100
		; CHECK-NEXT: store i64 [[AND]], ptr [[P2]], align 8
		; CHECK-NEXT: ret void
		; CHECK: lpad:
		; CHECK-NEXT: [[LP:%.*]] = landingpad { ptr, i32 }
		; CHECK-NEXT: cleanup
		; CHECK-NEXT: [[V3:%.*]] = load i64, ptr [[P1]], align 8
		; CHECK-NEXT: [[OR:%.*]] = or i64 [[V3]], 200
		; CHECK-NEXT: store i64 [[OR]], ptr [[P1]], align 8
		; CHECK-NEXT: resume { ptr, i32 } [[LP]]
		;
		personality ptr @__CxxFrameHandler3 {
		entry:
		br i1 %cond, label %then, label %else

		then:
		%v2 = load i64, ptr %p2
		%add = add i64 %v2, 1
		store i64 %add, ptr %p1
		br label %end

		else:
		invoke void @f()
		to label %end unwind label %lpad

		end:
		%v1 = load i64, ptr %p1
		%and = and i64 %v1, 100
		store i64 %and, ptr %p2
		ret void

		lpad:
		%lp = landingpad { ptr, i32 }
		cleanup
		%v3 = load i64, ptr %p1
		%or = or i64 %v3, 200
		store i64 %or, ptr %p1
		resume { ptr, i32 } %lp
		}

		; A predecessor BB has both successors to the same BB, for simplicity we don't
		; handle it, nothing should be changed.
		define void @test20(i1 %cond, i1 %cond2, ptr %p1, ptr %p2) {
		; CHECK-LABEL: @test20(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: br i1 [[COND:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
		; CHECK: if.then:
		; CHECK-NEXT: [[V1:%.]] = load i16, ptr [[P1:%.]], align 2
		; CHECK-NEXT: [[DEC:%.*]] = add i16 [[V1]], -1
		; CHECK-NEXT: store i16 [[DEC]], ptr [[P1]], align 2
		; CHECK-NEXT: br label [[IF_END:%.*]]
		; CHECK: if.else:
		; CHECK-NEXT: br i1 [[COND2:%.*]], label [[IF_END]], label [[IF_END]]
		; CHECK: if.end:
		; CHECK-NEXT: [[V2:%.*]] = load i16, ptr [[P1]], align 2
		; CHECK-NEXT: store i16 [[V2]], ptr [[P2:%.*]], align 2
		; CHECK-NEXT: ret void
		;
		entry:
		br i1 %cond, label %if.then, label %if.else

		if.then:
		%v1 = load i16, ptr %p1
		%dec = add i16 %v1, -1
		store i16 %dec, ptr %p1
		br label %if.end

		if.else:
		br i1 %cond2, label %if.end, label %if.end

		if.end:
		%v2 = load i16, ptr %p1
		store i16 %v2, ptr %p2
		ret void
		}

		; More edges from the same BB to LoadBB. Don't change anything.
		define void @test21(i1 %cond, i32 %code, ptr %p1, ptr %p2) {
		; CHECK-LABEL: @test21(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: br i1 [[COND:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
		; CHECK: if.then:
		; CHECK-NEXT: [[V1:%.]] = load i16, ptr [[P1:%.]], align 2
		; CHECK-NEXT: [[DEC:%.*]] = add i16 [[V1]], -1
		; CHECK-NEXT: store i16 [[DEC]], ptr [[P1]], align 2
		; CHECK-NEXT: br label [[IF_END:%.*]]
		; CHECK: if.else:
		; CHECK-NEXT: switch i32 [[CODE:%.*]], label [[IF_END]] [
		; CHECK-NEXT: i32 1, label [[IF_END]]
		; CHECK-NEXT: i32 2, label [[IF_END]]
		; CHECK-NEXT: i32 3, label [[IF_END]]
		; CHECK-NEXT: ]
		; CHECK: if.end:
		; CHECK-NEXT: [[V2:%.*]] = load i16, ptr [[P1]], align 2
		; CHECK-NEXT: store i16 [[V2]], ptr [[P2:%.*]], align 2
		; CHECK-NEXT: ret void
		;
		entry:
		br i1 %cond, label %if.then, label %if.else

		if.then:
		%v1 = load i16, ptr %p1
		%dec = add i16 %v1, -1
		store i16 %dec, ptr %p1
		br label %if.end

		if.else:
		switch i32 %code, label %if.end [
		i32 1, label %if.end
		i32 2, label %if.end
		i32 3, label %if.end
		]

		if.end:
		%v2 = load i16, ptr %p1
		store i16 %v2, ptr %p2
		ret void
		}

llvm/test/Transforms/GVN/PRE/volatile.ll

	Show First 20 Lines • Show All 116 Lines • ▼ Show 20 Lines
	exit:			exit:
	ret i32 %add			ret i32 %add
	}			}

	; Does cross block PRE work with volatiles?			; Does cross block PRE work with volatiles?
	define i32 @test7(i1 %c, ptr noalias nocapture %p, ptr noalias nocapture %q) {			define i32 @test7(i1 %c, ptr noalias nocapture %p, ptr noalias nocapture %q) {
	; CHECK-LABEL: @test7(			; CHECK-LABEL: @test7(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 [[C:%.]], label [[ENTRY_HEADER_CRIT_EDGE:%.]], label [[SKIP:%.*]]
	; CHECK: entry.header_crit_edge:
	; CHECK-NEXT: [[Y_PRE:%.]] = load i32, ptr [[P:%.]], align 4			; CHECK-NEXT: [[Y_PRE:%.]] = load i32, ptr [[P:%.]], align 4
	; CHECK-NEXT: br label [[HEADER:%.*]]			; CHECK-NEXT: br i1 [[C:%.]], label [[HEADER:%.]], label [[SKIP:%.*]]
	; CHECK: skip:			; CHECK: skip:
	; CHECK-NEXT: [[Y1:%.*]] = load i32, ptr [[P]], align 4			; CHECK-NEXT: call void @use(i32 [[Y_PRE]])
	; CHECK-NEXT: call void @use(i32 [[Y1]])
	; CHECK-NEXT: br label [[HEADER]]			; CHECK-NEXT: br label [[HEADER]]
	; CHECK: header:			; CHECK: header:
	; CHECK-NEXT: [[Y:%.*]] = phi i32 [ [[Y_PRE]], [[ENTRY_HEADER_CRIT_EDGE]] ], [ [[Y]], [[HEADER]] ], [ [[Y1]], [[SKIP]] ]
	; CHECK-NEXT: [[X:%.]] = load volatile i32, ptr [[Q:%.]], align 4			; CHECK-NEXT: [[X:%.]] = load volatile i32, ptr [[Q:%.]], align 4
	; CHECK-NEXT: [[ADD:%.*]] = sub i32 [[Y]], [[X]]			; CHECK-NEXT: [[ADD:%.*]] = sub i32 [[Y_PRE]], [[X]]
	; CHECK-NEXT: [[CND:%.*]] = icmp eq i32 [[ADD]], 0			; CHECK-NEXT: [[CND:%.*]] = icmp eq i32 [[ADD]], 0
	; CHECK-NEXT: br i1 [[CND]], label [[EXIT:%.*]], label [[HEADER]]			; CHECK-NEXT: br i1 [[CND]], label [[EXIT:%.*]], label [[HEADER]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	br i1 %c, label %header, label %skip			br i1 %c, label %header, label %skip
	skip:			skip:
	▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

llvm/test/Transforms/GVN/condprop.ll

	Show First 20 Lines • Show All 515 Lines • ▼ Show 20 Lines
	; that gep2 does not alias ptr1 on that path (as it would require that			; that gep2 does not alias ptr1 on that path (as it would require that
	; ptr2==ptr2+2), so we can perform PRE of the load.			; ptr2==ptr2+2), so we can perform PRE of the load.
	define i32 @test13(ptr %ptr1, ptr %ptr2) {			define i32 @test13(ptr %ptr1, ptr %ptr2) {
	; CHECK-LABEL: @test13(			; CHECK-LABEL: @test13(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr i32, ptr [[PTR2:%.]], i32 1			; CHECK-NEXT: [[GEP1:%.]] = getelementptr i32, ptr [[PTR2:%.]], i32 1
	; CHECK-NEXT: [[GEP2:%.*]] = getelementptr i32, ptr [[PTR2]], i32 2			; CHECK-NEXT: [[GEP2:%.*]] = getelementptr i32, ptr [[PTR2]], i32 2
	; CHECK-NEXT: [[CMP:%.]] = icmp eq ptr [[PTR1:%.]], [[PTR2]]			; CHECK-NEXT: [[CMP:%.]] = icmp eq ptr [[PTR1:%.]], [[PTR2]]
	; CHECK-NEXT: br i1 [[CMP]], label [[IF:%.]], label [[ENTRY_END_CRIT_EDGE:%.]]
	; CHECK: entry.end_crit_edge:
	; CHECK-NEXT: [[VAL2_PRE:%.*]] = load i32, ptr [[GEP2]], align 4			; CHECK-NEXT: [[VAL2_PRE:%.*]] = load i32, ptr [[GEP2]], align 4
	; CHECK-NEXT: br label [[END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[IF:%.]], label [[END:%.]]
	; CHECK: if:			; CHECK: if:
	; CHECK-NEXT: [[VAL1:%.*]] = load i32, ptr [[GEP2]], align 4
	; CHECK-NEXT: br label [[END]]			; CHECK-NEXT: br label [[END]]
	; CHECK: end:			; CHECK: end:
	; CHECK-NEXT: [[VAL2:%.*]] = phi i32 [ [[VAL1]], [[IF]] ], [ [[VAL2_PRE]], [[ENTRY_END_CRIT_EDGE]] ]			; CHECK-NEXT: [[PHI1:%.]] = phi ptr [ [[PTR2]], [[IF]] ], [ [[GEP1]], [[ENTRY:%.]] ]
	; CHECK-NEXT: [[PHI1:%.*]] = phi ptr [ [[PTR2]], [[IF]] ], [ [[GEP1]], [[ENTRY_END_CRIT_EDGE]] ]			; CHECK-NEXT: [[PHI2:%.*]] = phi i32 [ [[VAL2_PRE]], [[IF]] ], [ 0, [[ENTRY]] ]
	; CHECK-NEXT: [[PHI2:%.*]] = phi i32 [ [[VAL1]], [[IF]] ], [ 0, [[ENTRY_END_CRIT_EDGE]] ]
	; CHECK-NEXT: store i32 0, ptr [[PHI1]], align 4			; CHECK-NEXT: store i32 0, ptr [[PHI1]], align 4
	; CHECK-NEXT: [[RET:%.*]] = add i32 [[PHI2]], [[VAL2]]			; CHECK-NEXT: [[RET:%.*]] = add i32 [[PHI2]], [[VAL2_PRE]]
	; CHECK-NEXT: ret i32 [[RET]]			; CHECK-NEXT: ret i32 [[RET]]
	;			;
	entry:			entry:
	%gep1 = getelementptr i32, ptr %ptr2, i32 1			%gep1 = getelementptr i32, ptr %ptr2, i32 1
	%gep2 = getelementptr i32, ptr %ptr2, i32 2			%gep2 = getelementptr i32, ptr %ptr2, i32 2
	%cmp = icmp eq ptr %ptr1, %ptr2			%cmp = icmp eq ptr %ptr1, %ptr2
	br i1 %cmp, label %if, label %end			br i1 %cmp, label %if, label %end

	▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[GVN] Improve PRE on load instructionsClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 490663

llvm/include/llvm/Transforms/Scalar/GVN.h

llvm/lib/Transforms/Scalar/GVN.cpp

llvm/test/Transforms/GVN/PRE/2011-06-01-NonLocalMemdepMiscompile.ll

llvm/test/Transforms/GVN/PRE/2017-06-28-pre-load-dbgloc.ll

llvm/test/Transforms/GVN/PRE/pre-load.ll

llvm/test/Transforms/GVN/PRE/volatile.ll

llvm/test/Transforms/GVN/condprop.ll

[GVN] Improve PRE on load instructions
ClosedPublic