This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
MoveAutoInit.h
-
lib/
-
Passes/
-
PassBuilder.cpp
5/5
PassBuilderPipelines.cpp
-
PassRegistry.def
-
Transforms/Utils/
-
Utils/
-
CMakeLists.txt
43/57
MoveAutoInit.cpp
-
test/
-
Other/
1/1
new-pm-defaults.ll
-
new-pm-lto-defaults.ll
-
new-pm-thinlto-postlink-defaults.ll
-
new-pm-thinlto-postlink-pgo-defaults.ll
-
new-pm-thinlto-postlink-samplepgo-defaults.ll
-
new-pm-thinlto-prelink-defaults.ll
-
new-pm-thinlto-prelink-pgo-defaults.ll
-
new-pm-thinlto-prelink-samplepgo-defaults.ll
-
Transforms/MoveAutoInit/
-
MoveAutoInit/
3/4
branch.ll
1
clobber.ll
1/1
fence.ll
-
loop.ll
-
scalar.ll

Differential D137707

Move "auto-init" instructions to the dominator of their users
ClosedPublic

Authored by serge-sans-paille on Nov 9 2022, 5:11 AM.

Download Raw Diff

Details

Reviewers

jfb
nickdesaulniers
void
efriedma
nikic
asbirlea

Commits

rG50b2a113db19: Move "auto-init" instructions to the dominator of their users
rGcca01008cc31: Move "auto-init" instructions to the dominator of their users

Summary

As a result of -ftrivial-auto-var-init, clang generates instructions to set alloca'd memory to a given pattern, right after the allocation site. In some cases, this (somehow costly) operation could be delayed, leading to conditional execution in some cases.

This is not an uncommon situation: it happens ~550 times on the cPython code base, and much more on the LLVM codebase. The benefit greatly varies on the execution path, but it should not regress on performance.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

No comment on the approach. Minor drive by comments on style. But I'd like to see IR tests added for this pass. I haven't run it yet on the Linux kernel but can do so to provide measurements.

llvm/include/llvm/InitializePasses.h
201 ↗	(On Diff #474231)	drop this hunk
llvm/include/llvm/LinkAllPasses.h
116 ↗	(On Diff #474231)	drop this hunk
139 ↗	(On Diff #474231)	add space before cast
llvm/lib/Transforms/Utils/MoveAutoInit.cpp
50	delete
73	auto
81	delete. See comment below about ternaries, which also applies here.
126	delete
130	delete
151–155	Please use if (x) y() else z() rather than if (!x) z() else y() Additionally, since we're assigning to the same variable, consider using a ternary statement. DominatingPredecessor = (DominatingPredecessor ? DT.find... : Pred)

This revision now requires changes to proceed.Nov 10 2022, 11:47 AM

This looks overly specific.
Is this not a yet another manifestation of the lack of a generic sinking pass?

In D137707#3920060, @lebedev.ri wrote:

This looks overly specific.
Is this not a yet another manifestation of the lack of a generic sinking pass?

I'd appreciate pointer to the literature on "sinking pass". That being said, it's pretty clear to me that this pass generalizes far beyond "auto-init". There's an advantage in auto-init though: we can work under the assumption of "no aliasing" which helps a lot. I'm interested in making that pass more generic, probably as a second step though.

In D137707#3919991, @nickdesaulniers wrote:

No comment on the approach. Minor drive by comments on style. But I'd like to see IR tests added for this pass. I haven't run it yet on the Linux kernel but can do so to provide measurements.

+1 for the tests and thanks for the review. On Firefox codebase, we get decent speedups compated to raw -ftrivial-auto-var-init. Some more data here: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=77549e4eef51c70336d2ca9e3d086bf7767f8196&newProject=try&newRevision=b9fc0eb0e1e29c37364d69a9da203c00e87df5b2&page=1&framework=13&showOnlyConfident=1 where "Base is without this commit and New with it" The benchmarks are to be taken with a grain of salt though, they are not terribly stable.

Fix nits and address reviews
Fix bug when looking for unique predecessor
Add basic tests

@nickdesaulniers I'm really curious about the impact on ClangBuiltLinux

Harbormaster completed remote builds in B197968: Diff 475770.Nov 16 2022, 5:24 AM

I tested D137707 Diff 475770 against an x86_64 defconfig build of the Linux kernel (commit 81ac25651a62 ("Merge tag 'nfsd-6.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux")), which sets CONFIG_INIT_STACK_ALL_ZERO=y. Booted fine.

$ du -h vmlinux.orig vmlinux.D137707.475770
63M	vmlinux.orig
63M	vmlinux.D137707.475770
$ bloaty vmlinux.D137707.475770 -- vmlinux.orig
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.0%    +336  [ = ]       0    .rela.orc_unwind_ip
  +0.0%    +144  [ = ]       0    .rela.text
  +2.6%    +141  +2.6%    +141    [LOAD #3 [RWX]]
  +0.0%     +84  +0.0%     +84    .orc_unwind
  +0.0%     +56  +0.0%     +56    .orc_unwind_ip
  +0.0%     +24  [ = ]       0    .rela.smp_locks
  +0.3%      +3  [ = ]       0    .shstrtab
  -0.0%     -11  [ = ]       0    .strtab
  -0.0%     -24  [ = ]       0    .rela.return_sites
  -0.0%     -24  [ = ]       0    .symtab
  -8.8%    -140  -8.8%    -140    [LOAD #1 [RW]]
  -0.0%    -141  -0.0%    -141    .init.text
  +0.0%    +448  [ = ]       0    TOTAL
$ llvm-readelf -S vmlinux.orig
...
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
...
  [ 1] .text             PROGBITS        ffffffff81000000 200000 10032cb 00  AX  0   0 4096
...
$ llvm-readelf -S vmlinux.D137707.475770
...
  [ 1] .text             PROGBITS        ffffffff81000000 200000 10032cb 00  AX  0   0 4096

So it appears that this patch made no difference to the size of the .text section. I triple checked this w/ and w/o D137707 applied.

When I build with make LLVM=1 -j128 KCFLAGS="-mllvm -stats", then process the stats from move-auto-init, I observe 49810 moves! Feel free to add that measurement into the commit description.

$ make LLVM=1 -j128 KCFLAGS="-mllvm -stats" &> log.txt
$ grep move-auto-init log.txt | tr -s ' ' | cut -d ' ' -f 2 | python3 -c "import sys; print(sum((float(l) for l in sys.stdin)))"
49810.0
# Triple check with sed+bc rather than python3
$ grep move-auto-init log.txt | tr -s ' ' | cut -d ' ' -f 2 | sed ':a;N;s/\n/+/;ta' |bc    
49810

Please let me know if there's any other measurements you'd like me to make.

Also, it might be nice if the tests demonstrated diffs against existing (or precommitted changes) to better demonstrate how this pass changes the generated code. Mind breaking the newly added tests into 2 patches:

child patch that adds them BEFORE this patch
rebase this patch on that to demonstrate how this patch differs?

llvm/lib/Passes/PassBuilderPipelines.cpp
974	Is it possible to skip this pass if `-ftrivial-auto-var-init=zero` wasn't specified?
llvm/lib/Transforms/Utils/MoveAutoInit.cpp
93	is there a faster way to do this rather than having to scan the metadata of every operand?

nickdesaulniers added reviewers: void, efriedma.Nov 17 2022, 12:02 PM

I'd appreciate pointer to the literature on "sinking pass".

Look for PRE (partial redundancy elimination) of stores or partial dead store elimination. Maybe start with https://dl.acm.org/doi/10.1145/277650.277659 . See also D29865 .

So it appears that this patch made no difference to the size of the .text section.

This isn't fundamentally surprising; you have the same number of stores before and after sinking.

In D137707#3935019, @efriedma wrote:

I'd appreciate pointer to the literature on "sinking pass".

Look for PRE (partial redundancy elimination) of stores or partial dead store elimination. Maybe start with https://dl.acm.org/doi/10.1145/277650.277659 . See also D29865 .

Thanks.

So it appears that this patch made no difference to the size of the .text section.

This isn't fundamentally surprising; you have the same number of stores before and after sinking.

That's my understanding too. @nickdesaulniers does this patch has an impact on some kernel benchmark, maybe one where -ftrivial-auto-var-init brought some slowdown?

llvm/lib/Passes/PassBuilderPipelines.cpp
974	It would be possible through a module flag or something, but if we're going to extend that pass beyond auto-var-init, that's going to be useless that would mean a greater coupling with the front end The following shows no big impact on compile time when the flag is not used, so I don't think it's worth the extra step https://llvm-compile-time-tracker.com/compare.php?from=f71d32a0eea47b3d2bb43d6be15cf09d47ef6971&to=6ea450ef673666ed6b8a8257ca56f093aae7469c&stat=instructions:u
llvm/lib/Transforms/Utils/MoveAutoInit.cpp
93	Well, there's an early exit on the instruction metadata, I guess that's enough?

aeubanks added a subscriber: aeubanks.Nov 21 2022, 10:05 PM

aeubanks added inline comments.

llvm/lib/Transforms/Utils/MoveAutoInit.cpp
204	any reason for all the legacy pass infra? if it's going in the optimization pipeline then no need for a legacy pass

Remove legacy pass manager support
Minor nits

@nickdesaulniers I don't think it makes sense to update the test cases : as they only run the considered pass, the diff before/after is pretty clear based on the CHECK: lines.

I personnaly consider this ready to land :-) I'm obviously open to changes.

serge-sans-paille retitled this revision from [WIP] Move "auto-init" instructions to the dominator of their users to Move "auto-init" instructions to the dominator of their users.Nov 22 2022, 2:00 PM

Harbormaster completed remote builds in B199047: Diff 477293.Nov 22 2022, 2:40 PM

Are there any worries about moving stores that are on atomic/volatile "objects", either because the auto-init store itself is atomic/volatile, or because a subsequent access is marked as such? I don't think there's a worry because auto-init stores are technically outside the abstract machine (they "don't exist"), but I'm not 100% convinced.

llvm/lib/Transforms/Utils/MoveAutoInit.cpp
120	Typo "want"

Are there any worries about moving stores that are on atomic/volatile "objects", either because the auto-init store itself is atomic/volatile, or because a subsequent access is marked as such? I don't think there's a worry because auto-init stores are technically outside the abstract machine (they "don't exist"), but I'm not 100% convinced.

The thing I'd be concerned about is sinking the store past an atomic fence.

Allocating/initializing an object is never atomic. The requirement of the abstract machine is that the allocation/initialization of the variable has to have a happens-before relationship with any access to it. Consider something like the following:

int x = 3;
atomic_thread_fence(memory_order_release);
atomic_store_explicit(y, &x, memory_order_relaxed);

After the atomic store, it's legal for another thread to access the value "3" through the pointer y, or to write another value to x. (The operations on x don't need to be atomic, as long as the operations on y form a happens-before edge.) So it's illegal to sink the store of "3" past the fence.

If instead x is uninitialized, you run into basically the same thing with implicit zero init: it's illegal to sink the implicit zero-init of x past the fence, because it could race with stores on another thread.

Thanks @efriedma for the (frightening) answer.

If instead x is uninitialized, you run into basically the same thing with implicit zero init: it's illegal to sink the implicit zero-init of x past the fence, because it could race with stores on another thread.

As x is a stack allocated variable, in order to have it shared with another thread (and potentially generating race), we need to *use* it, which implies the initialization already happens. Why does it matter it's done before or after the fence?

If I understand correctly, the fence is global, which means basically any function call we don't have strong guarantee on could imply a memory fence, and prevent the sink, correct?

As x is a stack allocated variable, in order to have it shared with another thread (and potentially generating race), we need to *use* it, which implies the initialization already happens.

To share a variable with another thread, you need to use its address. That isn't directly connected to the memory that contains the variable itself. Without the fence, neither the compiler nor the CPU itself are aware that the initialization of the variable has to be visible to other CPUs before the address of the variable.

If I understand correctly, the fence is global, which means basically any function call we don't have strong guarantee on could imply a memory fence, and prevent the sink, correct?

I think so? I don't see any way to avoid that conclusion.

I guess in theory, you could insert a fence after the zero-initialization to solve the issue: all fences are equivalent, so you don't need the original fence if you insert another one. Not completely sure that works, though, and not sure what effects the extra fence would have.

If you can prove the address doesn't escape, then the whole multi-threaded thing becomes irrelevant, but I guess we SROA most of those cases anyway, so not sure how helpful that is.

serge-sans-paille mentioned this in D138898: [Nomination] Adding Mozilla representative to security group.Nov 28 2022, 11:02 PM

Thanks @efriedma for the clarification. I understand your arguments and I think they apply to generic code motion, but I also think we have strong enough hypothesis for this particular pass to not be concerned by these issue.

If you can prove the address doesn't escape, then the whole multi-threaded thing becomes irrelevant, but I guess we SROA most of those cases anyway, so not sure how helpful that is.

I think we are in that situation

I think that the arguments are:

we only move initializer marked as auto-init and whose argument is an alloca.
we compute the dominator of the sets of the uses of this alloca (excluding the memory initialization marked as "auto-init")
we move the memory initialization marked as "auto-init" before that dominator

As a consequence of 2. and 3. there cannot be sharing of the initialized memory before it is initialized, so we can safely cross a memory barrier (and we're not breaking an existing semantic as the auto-init is inserted by the compiler).

Of course all this would collapse without this strong no-sharing assumptions.

Am I missing something?

we move the memory initialization marked as "auto-init" before that dominator

The problem with this is that "before" in the sense of dominators isn't sufficient; that only dictates ordering on the CPU the code is executing on. You need happens-before, i.e. the ordering visible to all CPUs, to ensure you don't have a race.

The user is responsible for ensuring there's a barrier between the allocation of the variable and the use, but if the auto-init is in a different place, you can't depend on the barrier the user wrote.

serge-sans-paille mentioned this in rGd14c2d408dcc: [Nomination] Adding Mozilla representative to security group.Nov 30 2022, 11:01 AM

Take into account memory fences, call sites etc so that the instruction move respects memory order enforced by memory barriers.

This is achieved through MemorySSA, which makes the pass effortlessly more generic.

There a re still a few test to be done on my side, but I'd love to have feedback from @efriedma on the approach first.

The approach seems reasonable.

llvm/lib/Transforms/Utils/MoveAutoInit.cpp
94	Can we avoid iterating over the function if it doesn't have any auto-init?
llvm/test/Other/new-pm-defaults.ll
109	It would be good to avoid introducing an additional run of MemorySSA.

fhahn added a subscriber: fhahn.Dec 5 2022, 3:15 PM

fhahn added inline comments.

llvm/lib/Passes/PassBuilderPipelines.cpp
976	What's the rational behind placing it here and not closer to other memory optimizations like DSE/MemCpyOpt?
976	(moving it closer to those will also allow re-using existing memorySSA)
llvm/lib/Transforms/Utils/MoveAutoInit.cpp
94	It's not clear to me from the FIXME why we would want to limit this to `auto-init` instructions. Shouldn't this be beneficial for all memory instructions?
183	This pass moves only memory instructions, so could it be marked as preserving all CFG analyses? This should ideally also preserve MemorySSA.
llvm/test/Transforms/MoveAutoInit/branch.ll
6	does this depend on the triple? If it does, this needs `REQUIRES:...` otherwise it will fail if the X86 backend is not built. (same for all tests)

it'd be good to put this through llvm-compile-time-tracker. a branch with a commit that turns on -ftrivial-auto-var-init, then this commit, to see how much this adds to compile time when using -ftrivial-auto-var-init

it'd be interesting to remove the "auto-init" restriction and see what effects on compile time this has in general, and maybe benchmark results

llvm/lib/Transforms/Utils/MoveAutoInit.cpp
25–26	can remove these now that there's no legacy pass
104–105	this sentence doesn't make sense to me
113	want
llvm/test/Transforms/MoveAutoInit/branch.ll
10	test nit: could you remove `dso_local`/`noundef`

Harbormaster completed remote builds in B201208: Diff 480228.Dec 5 2022, 8:32 PM

serge-sans-paille added inline comments.Dec 5 2022, 10:52 PM

llvm/lib/Passes/PassBuilderPipelines.cpp
976	+1 for that. The previous implementation was only using Dominator trees so it made sense to activate it early in the pipeline. This version is sensibly more costly, so it will propably appear at higher optimization level.
llvm/lib/Transforms/Utils/MoveAutoInit.cpp
94	Can we avoid iterating over the function if it doesn't have any auto-init? I could try a module pass and walk through the users of the auto-init metadata?
94	The algorithm beyond auto-init requires more care (it probably needs a reverse transversal of the CFG, and some optimization to avoid rebuilding the dominator of the users for each instruction). I'd rather start small with that pass and then upgrade it to a full-fledge version.

fhahn added inline comments.Dec 6 2022, 2:19 AM

llvm/lib/Transforms/Utils/MoveAutoInit.cpp
94	AFAIK the `!annotation` metadata at the moment is information-only and doesn't provide any semantic guarantees. So relying on it for correctness at the moment doesn't sound ideal. From the documentation in the pass, it is not clear what property you are relying on. Could you make this clear in both the documentation in the pass and the commit description?

Move the pass to the O2 pipeline and update test accordingly
Declare preserved analyses, although I'm not sure of the validity for MemorySSA

The new pass catches much less case than the previous one (~150 on the cPython codebase) but at least it's valid now.

serge-sans-paille marked 5 inline comments as done.Dec 15 2022, 8:56 AM

Harbormaster completed remote builds in B203370: Diff 483208.Dec 15 2022, 11:39 AM

@nickdesaulniers could you give a shot at this new version? It's sensibly different from the first one.

Extra numbers: this is the result of compiling the codes from compile-time-tracker with -ftrivial-auto-var-init=pattern:

-ftrivial-auto-var-init=none vs -ftrivial-auto-var-init=pattern:

https://llvm-compile-time-tracker.com/compare.php?from=50a1c9b1073d7842ef687e486dc842ffea39457c&to=0cc74fe5d7f455e8dd2a34c4cfd9c276aae9ee57&stat=instructions:u

Adding the transformation from this patch on top of -ftrivial-auto-var-init=pattern adds a very slight overhead:

-ftrivial-auto-var-init=pattern vs -ftrivial-auto-var-init=pattern + this pass:

https://llvm-compile-time-tracker.com/compare.php?from=0cc74fe5d7f455e8dd2a34c4cfd9c276aae9ee57&to=67cd4b64768d74a2335d5268967951558bca3226&stat=instructions:u

Basically as is, this pass doesn't add much overhead in compilation time.

Sorry for the delay in testing this. Do you mind rebasing it? arc patch D137707 is having a hard time applying it to ToT.

The test cases LGTM but I'd like @efriedma to review the memory ordering stuff; I don't understand that stuff as much as I would like to.

llvm/lib/Transforms/Utils/MoveAutoInit.cpp
162	Should we take a reference to the std::pair here? `auto &Job`?

Address review + rebase patch

Harbormaster completed remote builds in B209722: Diff 491873.Jan 24 2023, 5:00 PM

nickdesaulniers added inline comments.Jan 31 2023, 10:06 AM

llvm/lib/Transforms/Utils/MoveAutoInit.cpp
24	are you still using this header? I didn't see any references to any intrinsics. Can you recheck these? DebugInfo.h I would think is also unused.
45	Mind adding a comment for this function?
52–56	You might be able to eliminate this for loop by using llvm::make_pointer_range (llvm/include/llvm/ADT/iterator.h) to initialize the Worklist. You'd probably need to move the cast to the below loop though.
64	Instruction
73–74	Are you able to use a range-for here with `MemoryPhi::incoming_values()`?
104–105	I think you can just pass `successors(UsersDominator)`?
134–135	This sentence could be reworded. Looks like it was partially updated at some point?

Take into account @nickdesaulniers review.

@efriedma : any comment / opinion on this now that it's based on MemorySSA?

llvm/lib/Transforms/Utils/MoveAutoInit.cpp
73–74	Unfortunately not, `SmallVector::append` doesn't accept `iterator_range`. I'll implement that.

Harbormaster completed remote builds in B211472: Diff 494280.Feb 2 2023, 7:40 AM

nickdesaulniers added inline comments.Feb 2 2023, 11:03 AM

llvm/lib/Transforms/Utils/MoveAutoInit.cpp
73–74	I was just thinking for (MemoryAccess *MA : M->incoming_values()) Worklist.push_back(MA);

serge-sans-paille marked an inline comment as done.Feb 2 2023, 4:28 PM

serge-sans-paille added inline comments.

llvm/lib/Transforms/Utils/MoveAutoInit.cpp
73–74	Yeah, but incoming_values iterates over `Use*` so I'd need a cast, so I'd rather keep current implementation.

@efriedma : gentle ping :-) Any thoughts on this version, now based on MemorySSA?

efriedma added inline comments.Feb 14 2023, 1:20 PM

llvm/lib/Transforms/Utils/MoveAutoInit.cpp
73	I'm not sure I understand the PHI handling here... the incoming values of a PHI are essentially original store itself, and other stores that store to the same location. I'm not sure how analyzing the other stores helps here. You want the store to dominate the incoming edge of the PHI, not the other stores.

Fix MemoryPhi handling

llvm/lib/Transforms/Utils/MoveAutoInit.cpp
73	Indeed, I had it the other way around. Fixed in latest revision of the patch.

efriedma added inline comments.Feb 14 2023, 3:09 PM

llvm/lib/Transforms/Utils/MoveAutoInit.cpp
66	Is it possible to have a MemoryUseOrDef without an associated instruction? What would that represent?
73	I'm not confident that looking through PHIs like this is actually what you want. A MemoryPHI implies there's a loop that modifies the value in question. Maybe the cycle protection code protects you from running into issues with that, though? I haven't really worked with MemorySSA much; if someone wants to jump in to review this aspect, that would be welcome. I'm generally happy with the approach using MemorySSA, just not sure about the finer details.

nickdesaulniers added inline comments.Feb 14 2023, 3:20 PM

llvm/lib/Transforms/Utils/MoveAutoInit.cpp
73	Is there someone who would have worked more with MemorySSA that can provide such review? It's certainly not me.

efriedma added reviewers: nikic, asbirlea.Feb 14 2023, 3:29 PM

Herald added a subscriber: StephenFan. · View Herald TranscriptFeb 14 2023, 3:29 PM

Harbormaster completed remote builds in B213740: Diff 497447.Feb 14 2023, 3:36 PM

No need to check if getMemoryInst returns nullptr or not.

I'd like to move forward with this, and try to implement a more general approach that considers any instruction, and not only the instruction generated for auto init. But I need to unlock that review first :-)

Harbormaster completed remote builds in B215509: Diff 499839.Feb 23 2023, 8:42 AM

High level notes:

I'm not sure I fully understand the motivation for using MemorySSA here. Unless I'm missing something, It seems like when it comes to optimizing allocas, it would be sufficient to look at non-transparent users of the alloca (by which I mean, in first approximation, look through bitcast, gep, count everything else). The review mentions fences, but wouldn't these be covered as long as you consider pointer captures as "uses"?
This is missing some tests where there are other memory instructions beyond the auto-init stores. You should find that, as implemented, these are going to block the transform even if they are "obviously" unrelated. If they are MemoryUses, you can recover this by requesting optimized use form. However, MemoryDefs always depend on the preceding MemoryDef, even if they are non-clobbering. This means that your optimization will always stop at the next MemoryDef, even if it is to a different alloca. This is a big limitation. You could do alias checks to skip those, but that's also the point where this becomes an expensive transform.
There is a pending patch to add sinking support to DSE (https://reviews.llvm.org/D136218). I haven't really looked at it yet, but I expect it does the MSSA-based transform in the right way (i.e. skipping non-clobbering instructions). This brings me back to the first point...

llvm/lib/Transforms/Utils/MoveAutoInit.cpp
37	Use `static` instead of anon namespace (see https://llvm.org/docs/CodingStandards.html#anonymous-namespaces). Also please prefer the more typical `const Instruction &I` spelling.
67	The lifetime intrinsic check is missing test coverage.
76	`append_range(WorkList, UsersAsMemoryAccesses)`
94	Assert or check that use_empty(), the transform relies on it. I'd also assert or check that the instruction is not volatile. (Not sure if auto-init can be volatile...)
156	Is it possible for this to be a catchswitch block?
165	Drop braces.
169	`verifyMemorySSA()` here -- pretty sure there will be verification errors. E.g. you might be moving past unrelated MemoryInsts.
llvm/test/Transforms/MoveAutoInit/branch.ll
5	Shouldn't be relevant.
31	Zero-index GEP is redundant.
llvm/test/Transforms/MoveAutoInit/fence.ll
25	To make this test more meaningful, you probably want to capture `%val` before the fence? Otherwise the initialization is unobservable.

In D137707#4164689, @nikic wrote:

High level notes:

I'm not sure I fully understand the motivation for using MemorySSA here. Unless I'm missing something, It seems like when it comes to optimizing allocas, it would be sufficient to look at non-transparent users of the alloca (by which I mean, in first approximation, look through bitcast, gep, count everything else). The review mentions fences, but wouldn't these be covered as long as you consider pointer captures as "uses"?

I could cowardly let @efriedma answer there, because he motivated this change. You can refer to https://reviews.llvm.org/D137707#3945415 for his initial thoughts.

This is missing some tests where there are other memory instructions beyond the auto-init stores. You should find that, as implemented, these are going to block the transform even if they are "obviously" unrelated. If they are MemoryUses, you can recover this by requesting optimized use form. However, MemoryDefs always depend on the preceding MemoryDef, even if they are non-clobbering. This means that your optimization will always stop at the next MemoryDef, even if it is to a different alloca. This is a big limitation. You could do alias checks to skip those, but that's also the point where this becomes an expensive transform.

Yeah, test coverage is currently not sufficient, I'll add some and get a better understanding of that particular part.

There is a pending patch to add sinking support to DSE (https://reviews.llvm.org/D136218). I haven't really looked at it yet, but I expect it does the MSSA-based transform in the right way (i.e. skipping non-clobbering instructions). This brings me back to the first point...

Thanks for sharing!

Use Alias Analysis to filter-out non clobbering memory accesses, as suggested by @nikic . Also added a test case (clobber.ll) to ensure this works as expected.

@nikic : I've not addressed your inline comments yet, but I've improved the approach through alias analysis, this indeed looks better now, at least to me. What do you think?

Harbormaster completed remote builds in B218016: Diff 503256.Mar 8 2023, 1:41 AM

Take into account all @nikic comments.

llvm/lib/Transforms/Utils/MoveAutoInit.cpp
94	Assert or check that use_empty(), the transform relies on it I'm not quite sure which use you're referring to, can you explain?

Harbormaster completed remote builds in B220408: Diff 506540.Mar 20 2023, 5:53 AM

Update test case according to @nikic review.

Extra round of tests on my side: current version leads to 550 instruction moved when compiling cpython with -ftrivial-auto-var-init=pattern. @nikic I think the patch is ready for you to look at it again :-)

Harbormaster completed remote builds in B220968: Diff 507292.Mar 22 2023, 3:05 AM

gentle ping :-)

I redid quick Linux kernel (mainline, x86_64 and arm64 defconfig, arm64 thinlto) build+boot tests with this version. Clearing my previous -1. Thanks for the work on this patch! You may want to see if @efriedma or @nikic have any last minute thoughts before landing though (within a reasonable delay; we can always fix things after they've landed).

This revision is now accepted and ready to land.Mar 28 2023, 11:47 AM

MaskRay added a subscriber: MaskRay.Mar 28 2023, 12:19 PM

MaskRay added inline comments.

llvm/lib/Transforms/Utils/MoveAutoInit.cpp
61	To assert non-null, just use a reference.
73	For a work list, we should add elements to Visited before pushing elements to WorkList, otherwise WorkList may contain duplicates.
104	unneeded blank line
138	unneeded blank line
150	unneeded blank line
llvm/test/Transforms/MoveAutoInit/clobber.ll
2	Add a file-level comment what this test is about.

MaskRay added inline comments.Mar 28 2023, 12:49 PM

llvm/lib/Transforms/Utils/MoveAutoInit.cpp
47	delete blank line. `// If I exists and is deeper ...` What does `deeper` mean? In the loop body, `if (AA.getModRefInfo(MI, ML) != ModRefInfo::NoModRef && !MI->isLifetimeStartOrEnd() && MI != I) {` does something unclear but does not have a comment.
54	For auto-init instructions, MemIntrinsic is sufficient and AnyMemTransferInst is unneeded.
105	The pass name is MoveAutoInit. Handling just auto-init instructions is what the pass is designed for, so I don't think not generalizing it deserves a FIXME. You can drop FIXME.

Address reviews

serge-sans-paille updated this revision to Diff 509403.Mar 29 2023, 9:53 AM

Harbormaster completed remote builds in B222547: Diff 509403.Mar 29 2023, 12:53 PM

nikic added inline comments.Mar 31 2023, 4:14 AM

llvm/lib/Transforms/Utils/MoveAutoInit.cpp
59
69	if (Visited.size() > 128) return nullptr; for some value of 128 :)
77	Doesn't this mean we may potentially move the initialization past the lifetime.end?
138	This will miss cases where the dominator is not the cycle header, right? Possible improvement for the future.
172	The common dominator of all the predecessors is probably just the idom?
194	Could you please also add `-verify-memoryssa` to the tests?

Add threshold option + minor nits

fix warnings

Harbormaster completed remote builds in B223026: Diff 510050.Mar 31 2023, 11:48 AM

This revision was landed with ongoing or failed builds.Apr 3 2023, 6:28 AM

Closed by commit rGcca01008cc31: Move "auto-init" instructions to the dominator of their users (authored by serge-sans-paille). · Explain Why

This revision was automatically updated to reflect the committed changes.

serge-sans-paille added a commit: rGcca01008cc31: Move "auto-init" instructions to the dominator of their users.

serge-sans-paille added a reverting change: rG11ae47dfc675: Revert "Move "auto-init" instructions to the dominator of their users".Apr 3 2023, 6:46 AM

serge-sans-paille added a commit: rG50b2a113db19: Move "auto-init" instructions to the dominator of their users.Apr 3 2023, 10:30 PM

Hi. I think the reland broke our mac builder at https://ci.chromium.org/ui/p/fuchsia/builders/ci/clang_toolchain.ci.core.x64-host_test_only-mac-subbuild/b8784753709492580801/overview with:

Assertion failed: (!I.isVolatile() && "auto init instructions cannot be volatile."), function runMoveAutoInit, file llvm/lib/Transforms/Utils/MoveAutoInit.cpp, line 112.

Would you be able to take a look and send out a fix or revert?

In D137707#4243796, @leonardchan wrote:
Hi. I think the reland broke our mac builder at https://ci.chromium.org/ui/p/fuchsia/builders/ci/clang_toolchain.ci.core.x64-host_test_only-mac-subbuild/b8784753709492580801/overview with:
Assertion failed: (!I.isVolatile() && "auto init instructions cannot be volatile."), function runMoveAutoInit, file llvm/lib/Transforms/Utils/MoveAutoInit.cpp, line 112.

Thanks for reporting—I thought this situation wasn't possible— I'll send a fix that just skips these instructions for now.

serge-sans-paille mentioned this in rGad9ad3735c48: Do not move "auto-init" instruction if they're volatile.Apr 4 2023, 11:42 AM

bjope added a subscriber: bjope.Apr 4 2023, 12:42 PM

In D137707#4243969, @serge-sans-paille wrote:
In D137707#4243796, @leonardchan wrote:
Hi. I think the reland broke our mac builder at https://ci.chromium.org/ui/p/fuchsia/builders/ci/clang_toolchain.ci.core.x64-host_test_only-mac-subbuild/b8784753709492580801/overview with:
Assertion failed: (!I.isVolatile() && "auto init instructions cannot be volatile."), function runMoveAutoInit, file llvm/lib/Transforms/Utils/MoveAutoInit.cpp, line 112.
Thanks for reporting—I thought this situation wasn't possible— I'll send a fix that just skips these instructions for now.

FYI, Chrome also saw this crash as well: https://crbug.com/1430570. I verified that your commit rGad9ad3735c4821ff4651fab7537a75b8f0bb60f8 fixes it.

uabelho added a subscriber: uabelho.Apr 4 2023, 9:58 PM

This is also causing another regression in Chrome that is *not* fixed by rGad9ad3735c48: https://crbug.com/1431366#c3

In D137707#4253255, @ayzhao wrote:

This is also causing another regression in Chrome that is *not* fixed by rGad9ad3735c48: https://crbug.com/1431366#c3

Can you provide a minimal reproducer?

In D137707#4253764, @serge-sans-paille wrote:

In D137707#4253255, @ayzhao wrote:

This is also causing another regression in Chrome that is *not* fixed by rGad9ad3735c48: https://crbug.com/1431366#c3

Can you provide a minimal reproducer?

Currently working on it, but I'd like to point out that this is also causing another (unrelated?) failure in Chrome/V8: https://crbug.com/1431489

In D137707#4256648, @ayzhao wrote:

In D137707#4253764, @serge-sans-paille wrote:

In D137707#4253255, @ayzhao wrote:

This is also causing another regression in Chrome that is *not* fixed by rGad9ad3735c48: https://crbug.com/1431366#c3

Can you provide a minimal reproducer?

Currently working on it, but I'd like to point out that this is also causing another (unrelated?) failure in Chrome/V8: https://crbug.com/1431489

I now have a reproducible (but non-reduced) testcase: https://crbug.com/1431366#c5

This looks like a miscompile; the return parameter is not being initialized if we don't take the branch.

ayzhao added a subscriber: hans.Apr 11 2023, 4:26 PM

In D137707#4259567, @ayzhao wrote:

I now have a reproducible (but non-reduced) testcase: https://crbug.com/1431366#c5

This looks like a miscompile; the return parameter is not being initialized if we don't take the branch.

Here's a small repro based on that:

$ cat /tmp/a.cc
struct S {
  unsigned long long x;  
};

S g();

S f(int a) {
  S ret;
  if (a == 42)
    ret = g();
  return ret;
}

$ build/bin/clang.bad -target i686-linux-gnu -c -ftrivial-auto-var-init=pattern -O2 /tmp/a.cc -S -emit-llvm -o -
[...]
define dso_local void @_Z1fi(ptr noalias nocapture writeonly sret(%struct.S) align 4 %agg.result, i32 noundef %a) local_unnamed_addr #0 {
entry:
  %ref.tmp = alloca %struct.S, align 8
  %cmp = icmp eq i32 %a, 42
  br i1 %cmp, label %if.then, label %if.end

if.then:                                          ; preds = %entry
  store i64 -1, ptr %agg.result, align 4, !annotation !6        <------ This used to be in the %entry block.
  call void @llvm.lifetime.start.p0(i64 8, ptr nonnull %ref.tmp) #3
  call void @_Z1gv(ptr nonnull sret(%struct.S) align 4 %ref.tmp)
  %0 = load i64, ptr %ref.tmp, align 8, !tbaa !7
  store i64 %0, ptr %agg.result, align 4, !tbaa !7
  call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %ref.tmp) #3
  br label %if.end

if.end:                                           ; preds = %if.then, %entry
  ret void
}

This patch moved the store i64 -1, ptr %agg.result instruction from the %entry block to %if.then, meaning the return value doesn't always get initialized.

The assumption of this patch was that auto-init only occurs on stores to allocas, but apparently that assumption isn't right. We of course shouldn't move a store to an sret argument.

It's actually worse than my repro above showed. Since "real" initialization of the sret arg can get folded in with the "auto" initialization, this patch can also cause an actually initialized return value to become uninitialized: https://bugs.chromium.org/p/chromium/issues/detail?id=1431366#c8

I'll revert this for now.

hans added a reverting change: rGa6d9730f403a: Revert "Move "auto-init" instructions to the dominator of their users".Apr 12 2023, 4:37 AM

In D137707#4261118, @hans wrote:

It's actually worse than my repro above showed. Since "real" initialization of the sret arg can get folded in with the "auto" initialization, this patch can also cause an actually initialized return value to become uninitialized: https://bugs.chromium.org/p/chromium/issues/detail?id=1431366#c8

I'll revert this for now.

This is from clang --target=i686-linux-gnu. @serge-sans-paille Testing -m32 applications will create a lot of sret opportunities (structs are unions are indirect, even if small) and sort out such bugs.

Follow up patch: https://reviews.llvm.org/D148507

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Utils/

MoveAutoInit.h

29 lines

lib/

Passes/

PassBuilder.cpp

1 line

PassBuilderPipelines.cpp

4 lines

PassRegistry.def

1 line

Transforms/

Utils/

CMakeLists.txt

1 line

MoveAutoInit.cpp

220 lines

test/

Other/

new-pm-defaults.ll

1 line

new-pm-lto-defaults.ll

1 line

new-pm-thinlto-postlink-defaults.ll

1 line

new-pm-thinlto-postlink-pgo-defaults.ll

1 line

new-pm-thinlto-postlink-samplepgo-defaults.ll

1 line

new-pm-thinlto-prelink-defaults.ll

1 line

new-pm-thinlto-prelink-pgo-defaults.ll

1 line

new-pm-thinlto-prelink-samplepgo-defaults.ll

1 line

Transforms/

MoveAutoInit/

41 lines

100 lines

70 lines

102 lines

36 lines

Diff 510479

llvm/include/llvm/Transforms/Utils/MoveAutoInit.h

This file was added.

				//===- MoveAutoInit.h - Move insts marked as auto-init Pass --- C++ --======//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass moves instructions marked as auto-init closer to their use if
				// profitable, generally because it moves them under a guard, potentially
				// skipping the overhead of the auto-init under some execution paths.
				//
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_UTILS_MOVEAUTOINIT_H
				#define LLVM_TRANSFORMS_UTILS_MOVEAUTOINIT_H

				#include "llvm/IR/PassManager.h"

				namespace llvm {

				class MoveAutoInitPass : public PassInfoMixin<MoveAutoInitPass> {
				public:
				PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
				};
				} // end namespace llvm

				#endif // LLVM_TRANSFORMS_UTILS_MOVEAUTOINIT_H

llvm/lib/Passes/PassBuilder.cpp

	Show First 20 Lines • Show All 241 Lines • ▼ Show 20 Lines
	#include "llvm/Transforms/Utils/LoopSimplify.h"			#include "llvm/Transforms/Utils/LoopSimplify.h"
	#include "llvm/Transforms/Utils/LoopVersioning.h"			#include "llvm/Transforms/Utils/LoopVersioning.h"
	#include "llvm/Transforms/Utils/LowerGlobalDtors.h"			#include "llvm/Transforms/Utils/LowerGlobalDtors.h"
	#include "llvm/Transforms/Utils/LowerIFunc.h"			#include "llvm/Transforms/Utils/LowerIFunc.h"
	#include "llvm/Transforms/Utils/LowerInvoke.h"			#include "llvm/Transforms/Utils/LowerInvoke.h"
	#include "llvm/Transforms/Utils/LowerSwitch.h"			#include "llvm/Transforms/Utils/LowerSwitch.h"
	#include "llvm/Transforms/Utils/Mem2Reg.h"			#include "llvm/Transforms/Utils/Mem2Reg.h"
	#include "llvm/Transforms/Utils/MetaRenamer.h"			#include "llvm/Transforms/Utils/MetaRenamer.h"
				#include "llvm/Transforms/Utils/MoveAutoInit.h"
	#include "llvm/Transforms/Utils/NameAnonGlobals.h"			#include "llvm/Transforms/Utils/NameAnonGlobals.h"
	#include "llvm/Transforms/Utils/PredicateInfo.h"			#include "llvm/Transforms/Utils/PredicateInfo.h"
	#include "llvm/Transforms/Utils/RelLookupTableConverter.h"			#include "llvm/Transforms/Utils/RelLookupTableConverter.h"
	#include "llvm/Transforms/Utils/StripGCRelocates.h"			#include "llvm/Transforms/Utils/StripGCRelocates.h"
	#include "llvm/Transforms/Utils/StripNonLineTableDebugInfo.h"			#include "llvm/Transforms/Utils/StripNonLineTableDebugInfo.h"
	#include "llvm/Transforms/Utils/SymbolRewriter.h"			#include "llvm/Transforms/Utils/SymbolRewriter.h"
	#include "llvm/Transforms/Utils/UnifyFunctionExitNodes.h"			#include "llvm/Transforms/Utils/UnifyFunctionExitNodes.h"
	#include "llvm/Transforms/Utils/UnifyLoopExits.h"			#include "llvm/Transforms/Utils/UnifyLoopExits.h"
	▲ Show 20 Lines • Show All 1,740 Lines • Show Last 20 Lines

llvm/lib/Passes/PassBuilderPipelines.cpp

Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines
#include "llvm/Transforms/Scalar/WarnMissedTransforms.h"		#include "llvm/Transforms/Scalar/WarnMissedTransforms.h"
#include "llvm/Transforms/Utils/AddDiscriminators.h"		#include "llvm/Transforms/Utils/AddDiscriminators.h"
#include "llvm/Transforms/Utils/AssumeBundleBuilder.h"		#include "llvm/Transforms/Utils/AssumeBundleBuilder.h"
#include "llvm/Transforms/Utils/CanonicalizeAliases.h"		#include "llvm/Transforms/Utils/CanonicalizeAliases.h"
#include "llvm/Transforms/Utils/CountVisits.h"		#include "llvm/Transforms/Utils/CountVisits.h"
#include "llvm/Transforms/Utils/InjectTLIMappings.h"		#include "llvm/Transforms/Utils/InjectTLIMappings.h"
#include "llvm/Transforms/Utils/LibCallsShrinkWrap.h"		#include "llvm/Transforms/Utils/LibCallsShrinkWrap.h"
#include "llvm/Transforms/Utils/Mem2Reg.h"		#include "llvm/Transforms/Utils/Mem2Reg.h"
		#include "llvm/Transforms/Utils/MoveAutoInit.h"
#include "llvm/Transforms/Utils/NameAnonGlobals.h"		#include "llvm/Transforms/Utils/NameAnonGlobals.h"
#include "llvm/Transforms/Utils/RelLookupTableConverter.h"		#include "llvm/Transforms/Utils/RelLookupTableConverter.h"
#include "llvm/Transforms/Utils/SimplifyCFGOptions.h"		#include "llvm/Transforms/Utils/SimplifyCFGOptions.h"
#include "llvm/Transforms/Vectorize/LoopVectorize.h"		#include "llvm/Transforms/Vectorize/LoopVectorize.h"
#include "llvm/Transforms/Vectorize/SLPVectorizer.h"		#include "llvm/Transforms/Vectorize/SLPVectorizer.h"
#include "llvm/Transforms/Vectorize/VectorCombine.h"		#include "llvm/Transforms/Vectorize/VectorCombine.h"

using namespace llvm;		using namespace llvm;
▲ Show 20 Lines • Show All 515 Lines • ▼ Show 20 Lines	PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
// the simplifications and basic cleanup after all the simplifications.		// the simplifications and basic cleanup after all the simplifications.
// TODO: Investigate if this is too expensive.		// TODO: Investigate if this is too expensive.
FPM.addPass(ADCEPass());		FPM.addPass(ADCEPass());

// Specially optimize memory movement as it doesn't look like dataflow in SSA.		// Specially optimize memory movement as it doesn't look like dataflow in SSA.
FPM.addPass(MemCpyOptPass());		FPM.addPass(MemCpyOptPass());

FPM.addPass(DSEPass());		FPM.addPass(DSEPass());
		FPM.addPass(MoveAutoInitPass());

FPM.addPass(createFunctionToLoopPassAdaptor(		FPM.addPass(createFunctionToLoopPassAdaptor(
LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,		LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
/AllowSpeculation=/true),		/AllowSpeculation=/true),
/UseMemorySSA=/true, /UseBlockFrequencyInfo=/true));		/UseMemorySSA=/true, /UseBlockFrequencyInfo=/true));

FPM.addPass(CoroElidePass());		FPM.addPass(CoroElidePass());

for (auto &C : ScalarOptimizerLateEPCallbacks)		for (auto &C : ScalarOptimizerLateEPCallbacks)
▲ Show 20 Lines • Show All 296 Lines • ▼ Show 20 Lines	PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,

// Create an early function pass manager to cleanup the output of the		// Create an early function pass manager to cleanup the output of the
// frontend.		// frontend.
FunctionPassManager EarlyFPM;		FunctionPassManager EarlyFPM;
// Lower llvm.expect to metadata before attempting transforms.		// Lower llvm.expect to metadata before attempting transforms.
// Compare/branch metadata may alter the behavior of passes like SimplifyCFG.		// Compare/branch metadata may alter the behavior of passes like SimplifyCFG.
EarlyFPM.addPass(LowerExpectIntrinsicPass());		EarlyFPM.addPass(LowerExpectIntrinsicPass());
EarlyFPM.addPass(SimplifyCFGPass());		EarlyFPM.addPass(SimplifyCFGPass());
EarlyFPM.addPass(SROAPass(SROAOptions::ModifyCFG));		EarlyFPM.addPass(SROAPass(SROAOptions::ModifyCFG));
		nickdesaulniersUnsubmitted Done Reply Inline Actions Is it possible to skip this pass if `-ftrivial-auto-var-init=zero` wasn't specified? nickdesaulniers: Is it possible to skip this pass if `-ftrivial-auto-var-init=zero` wasn't specified?
		serge-sans-pailleAuthorUnsubmitted Done Reply Inline Actions It would be possible through a module flag or something, but if we're going to extend that pass beyond auto-var-init, that's going to be useless that would mean a greater coupling with the front end The following shows no big impact on compile time when the flag is not used, so I don't think it's worth the extra step https://llvm-compile-time-tracker.com/compare.php?from=f71d32a0eea47b3d2bb43d6be15cf09d47ef6971&to=6ea450ef673666ed6b8a8257ca56f093aae7469c&stat=instructions:u serge-sans-paille: It would be possible through a module flag or something, but 1. if we're going to extend that…
EarlyFPM.addPass(EarlyCSEPass());		EarlyFPM.addPass(EarlyCSEPass());
if (Level == OptimizationLevel::O3)		if (Level == OptimizationLevel::O3)
		fhahnUnsubmitted Done Reply Inline Actions What's the rational behind placing it here and not closer to other memory optimizations like DSE/MemCpyOpt? fhahn: What's the rational behind placing it here and not closer to other memory optimizations like…
		fhahnUnsubmitted Done Reply Inline Actions (moving it closer to those will also allow re-using existing memorySSA) fhahn: (moving it closer to those will also allow re-using existing memorySSA)
		serge-sans-pailleAuthorUnsubmitted Done Reply Inline Actions +1 for that. The previous implementation was only using Dominator trees so it made sense to activate it early in the pipeline. This version is sensibly more costly, so it will propably appear at higher optimization level. serge-sans-paille: +1 for that. The previous implementation was only using Dominator trees so it made sense to…
EarlyFPM.addPass(CallSiteSplittingPass());		EarlyFPM.addPass(CallSiteSplittingPass());

MPM.addPass(createModuleToFunctionPassAdaptor(std::move(EarlyFPM),		MPM.addPass(createModuleToFunctionPassAdaptor(std::move(EarlyFPM),
PTO.EagerlyInvalidateAnalyses));		PTO.EagerlyInvalidateAnalyses));

if (LoadSampleProfile) {		if (LoadSampleProfile) {
// Annotate sample profile right after early FPM to ensure freshness of		// Annotate sample profile right after early FPM to ensure freshness of
// the debug info.		// the debug info.
▲ Show 20 Lines • Show All 815 Lines • ▼ Show 20 Lines	PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
else		else
MainFPM.addPass(GVNPass());		MainFPM.addPass(GVNPass());

// Remove dead memcpy()'s.		// Remove dead memcpy()'s.
MainFPM.addPass(MemCpyOptPass());		MainFPM.addPass(MemCpyOptPass());

// Nuke dead stores.		// Nuke dead stores.
MainFPM.addPass(DSEPass());		MainFPM.addPass(DSEPass());
		MainFPM.addPass(MoveAutoInitPass());
MainFPM.addPass(MergedLoadStoreMotionPass());		MainFPM.addPass(MergedLoadStoreMotionPass());

LoopPassManager LPM;		LoopPassManager LPM;
if (EnableLoopFlatten && Level.getSpeedupLevel() > 1)		if (EnableLoopFlatten && Level.getSpeedupLevel() > 1)
LPM.addPass(LoopFlattenPass());		LPM.addPass(LoopFlattenPass());
LPM.addPass(IndVarSimplifyPass());		LPM.addPass(IndVarSimplifyPass());
LPM.addPass(LoopDeletionPass());		LPM.addPass(LoopDeletionPass());
// FIXME: Add loop interchange.		// FIXME: Add loop interchange.
▲ Show 20 Lines • Show All 201 Lines • Show Last 20 Lines

llvm/lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 327 Lines • ▼ Show 20 Lines
	FUNCTION_PASS("loop-simplify", LoopSimplifyPass())			FUNCTION_PASS("loop-simplify", LoopSimplifyPass())
	FUNCTION_PASS("loop-sink", LoopSinkPass())			FUNCTION_PASS("loop-sink", LoopSinkPass())
	FUNCTION_PASS("lowerinvoke", LowerInvokePass())			FUNCTION_PASS("lowerinvoke", LowerInvokePass())
	FUNCTION_PASS("lowerswitch", LowerSwitchPass())			FUNCTION_PASS("lowerswitch", LowerSwitchPass())
	FUNCTION_PASS("mem2reg", PromotePass())			FUNCTION_PASS("mem2reg", PromotePass())
	FUNCTION_PASS("memcpyopt", MemCpyOptPass())			FUNCTION_PASS("memcpyopt", MemCpyOptPass())
	FUNCTION_PASS("mergeicmps", MergeICmpsPass())			FUNCTION_PASS("mergeicmps", MergeICmpsPass())
	FUNCTION_PASS("mergereturn", UnifyFunctionExitNodesPass())			FUNCTION_PASS("mergereturn", UnifyFunctionExitNodesPass())
				FUNCTION_PASS("move-auto-init", MoveAutoInitPass())
	FUNCTION_PASS("nary-reassociate", NaryReassociatePass())			FUNCTION_PASS("nary-reassociate", NaryReassociatePass())
	FUNCTION_PASS("newgvn", NewGVNPass())			FUNCTION_PASS("newgvn", NewGVNPass())
	FUNCTION_PASS("jump-threading", JumpThreadingPass())			FUNCTION_PASS("jump-threading", JumpThreadingPass())
	FUNCTION_PASS("partially-inline-libcalls", PartiallyInlineLibCallsPass())			FUNCTION_PASS("partially-inline-libcalls", PartiallyInlineLibCallsPass())
	FUNCTION_PASS("kcfi", KCFIPass())			FUNCTION_PASS("kcfi", KCFIPass())
	FUNCTION_PASS("lcssa", LCSSAPass())			FUNCTION_PASS("lcssa", LCSSAPass())
	FUNCTION_PASS("loop-data-prefetch", LoopDataPrefetchPass())			FUNCTION_PASS("loop-data-prefetch", LoopDataPrefetchPass())
	FUNCTION_PASS("loop-load-elim", LoopLoadEliminationPass())			FUNCTION_PASS("loop-load-elim", LoopLoadEliminationPass())
	▲ Show 20 Lines • Show All 270 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/CMakeLists.txt

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	add_llvm_component_library(LLVMTransformUtils
LowerSwitch.cpp		LowerSwitch.cpp
MatrixUtils.cpp		MatrixUtils.cpp
MemoryOpRemark.cpp		MemoryOpRemark.cpp
MemoryTaggingSupport.cpp		MemoryTaggingSupport.cpp
Mem2Reg.cpp		Mem2Reg.cpp
MetaRenamer.cpp		MetaRenamer.cpp
MisExpect.cpp		MisExpect.cpp
ModuleUtils.cpp		ModuleUtils.cpp
		MoveAutoInit.cpp
NameAnonGlobals.cpp		NameAnonGlobals.cpp
PredicateInfo.cpp		PredicateInfo.cpp
PromoteMemoryToRegister.cpp		PromoteMemoryToRegister.cpp
RelLookupTableConverter.cpp		RelLookupTableConverter.cpp
ScalarEvolutionExpander.cpp		ScalarEvolutionExpander.cpp
SCCPSolver.cpp		SCCPSolver.cpp
StripGCRelocates.cpp		StripGCRelocates.cpp
SSAUpdater.cpp		SSAUpdater.cpp
Show All 30 Lines

llvm/lib/Transforms/Utils/MoveAutoInit.cpp

This file was added.

//===-- MoveAutoInit.cpp - move auto-init inst closer to their use site----===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

// This pass moves instruction maked as auto-init closer to the basic block that

// use it, eventually removing it from some control path of the function.

//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Utils/MoveAutoInit.h"

#include "llvm/ADT/STLExtras.h"

#include "llvm/ADT/Statistic.h"

#include "llvm/ADT/StringSet.h"

#include "llvm/Analysis/MemorySSA.h"

#include "llvm/Analysis/MemorySSAUpdater.h"

#include "llvm/IR/DebugInfo.h"

#include "llvm/IR/Dominators.h"

#include "llvm/IR/IRBuilder.h"

#include "llvm/IR/Instructions.h"

#include "llvm/IR/IntrinsicInst.h"

nickdesaulniersUnsubmitted

Done

are you still using this header? I didn't see any references to any intrinsics. Can you recheck these? DebugInfo.h I would think is also unused.

nickdesaulniers: are you still using this header? I didn't see any references to any intrinsics. Can you recheck…

#include "llvm/Support/CommandLine.h"

#include "llvm/Transforms/Utils.h"

aeubanksUnsubmitted

Done

can remove these now that there's no legacy pass

aeubanks: can remove these now that there's no legacy pass

#include "llvm/Transforms/Utils/LoopUtils.h"

using namespace llvm;

#define DEBUG_TYPE "move-auto-init"

STATISTIC(NumMoved, "Number of instructions moved");

static cl::opt<unsigned> MoveAutoInitThreshold(

"move-auto-init-threshold", cl::Hidden, cl::init(128),

cl::desc("Maximum instructions to analyze per moved initialization"));

nikicUnsubmitted

Done

Use static instead of anon namespace (see https://llvm.org/docs/CodingStandards.html#anonymous-namespaces).

Also please prefer the more typical const Instruction &I spelling.

nikic: Use `static` instead of anon namespace (see https://llvm.org/docs/CodingStandards.

static bool hasAutoInitMetadata(const Instruction &I) {

return I.hasMetadata(LLVMContext::MD_annotation) &&

any_of(I.getMetadata(LLVMContext::MD_annotation)->operands(),

[](const MDOperand &Op) {

return cast<MDString>(Op.get())->getString() == "auto-init";

});

}

nickdesaulniersUnsubmitted

Done

Mind adding a comment for this function?

nickdesaulniers: Mind adding a comment for this function?

/// Finds a BasicBlock in the CFG where instruction `I` can be moved to while

MaskRayUnsubmitted

Not Done

delete blank line.

// If I exists and is deeper ...

What does deeper mean? In the loop body, if (AA.getModRefInfo(MI, ML) != ModRefInfo::NoModRef && !MI->isLifetimeStartOrEnd() && MI != I) { does something unclear but does not have a comment.

MaskRay: delete blank line. `// If I exists and is deeper ...` What does `deeper` mean? In the loop…

/// not changing the Memory SSA ordering and being guarded by at least one

/// condition.

static BasicBlock *usersDominator(Instruction *I, DominatorTree &DT,

nickdesaulniersUnsubmitted

Done

delete

nickdesaulniers: delete

MemorySSA &MSSA) {

BasicBlock *CurrentDominator = nullptr;

MemoryLocation ML;

if (auto *MI = dyn_cast<MemIntrinsic>(I))

MaskRayUnsubmitted

Done

For auto-init instructions, MemIntrinsic is sufficient and AnyMemTransferInst is unneeded.

MaskRay: For auto-init instructions, MemIntrinsic is sufficient and AnyMemTransferInst is unneeded.

ML = MemoryLocation::getForDest(MI);

else if (auto *SI = dyn_cast<StoreInst>(I))

nickdesaulniersUnsubmitted

Done

You might be able to eliminate this for loop by using llvm::make_pointer_range (llvm/include/llvm/ADT/iterator.h) to initialize the Worklist. You'd probably need to move the cast to the below loop though.

nickdesaulniers: You might be able to eliminate this for loop by using llvm::make_pointer_range…

ML = MemoryLocation::get(SI);

else

assert(false && "memory location set");

nikicUnsubmitted

Done

MemoryUseOrDef &IMA = *MSSA.getMemoryAccess(I);

- auto &AA = MSSA.getAA();

+ BatchAAResults AA(MSSA.getAA());

SmallPtrSet<MemoryAccess *, 8> Visited;

nikic:

MemoryUseOrDef &IMA = *MSSA.getMemoryAccess(I);

MaskRayUnsubmitted

Done

To assert non-null, just use a reference.

MaskRay: To assert non-null, just use a reference.

BatchAAResults AA(MSSA.getAA());

SmallPtrSet<MemoryAccess *, 8> Visited;

nickdesaulniersUnsubmitted

Done

Instruction

nickdesaulniers: Instruction

auto AsMemoryAccess = [](User *U) { return cast<MemoryAccess>(U); };

efriedmaUnsubmitted

Done

Is it possible to have a MemoryUseOrDef without an associated instruction? What would that represent?

efriedma: Is it possible to have a MemoryUseOrDef without an associated instruction? What would that…

SmallVector<MemoryAccess *> WorkList(map_range(IMA.users(), AsMemoryAccess));

nikicUnsubmitted

Done

The lifetime intrinsic check is missing test coverage.

nikic: The lifetime intrinsic check is missing test coverage.

while (!WorkList.empty()) {

nikicUnsubmitted

Done

if (Visited.size() > 128)
  return nullptr;

for some value of 128 :)

nikic: ``` if (Visited.size() > 128) return nullptr; ``` for some value of 128 :)

MemoryAccess *MA = WorkList.pop_back_val();

if (!Visited.insert(MA).second)

continue;

nickdesaulniersUnsubmitted

Done

auto

nickdesaulniers: auto

efriedmaUnsubmitted

Done

I'm not sure I understand the PHI handling here... the incoming values of a PHI are essentially original store itself, and other stores that store to the same location. I'm not sure how analyzing the other stores helps here. You want the store to dominate the incoming edge of the PHI, not the other stores.

efriedma: I'm not sure I understand the PHI handling here... the incoming values of a PHI are essentially…

serge-sans-pailleAuthorUnsubmitted

Done

Indeed, I had it the other way around. Fixed in latest revision of the patch.

serge-sans-paille: Indeed, I had it the other way around. Fixed in latest revision of the patch.

efriedmaUnsubmitted

Done

I'm not confident that looking through PHIs like this is actually what you want. A MemoryPHI implies there's a loop that modifies the value in question. Maybe the cycle protection code protects you from running into issues with that, though?

I haven't really worked with MemorySSA much; if someone wants to jump in to review this aspect, that would be welcome. I'm generally happy with the approach using MemorySSA, just not sure about the finer details.

efriedma: I'm not confident that looking through PHIs like this is actually what you want. A MemoryPHI…

nickdesaulniersUnsubmitted

Done

Is there someone who would have worked more with MemorySSA that can provide such review? It's certainly not me.

nickdesaulniers: Is there someone who would have worked more with MemorySSA that can provide such review? It's…

MaskRayUnsubmitted

Not Done

For a work list, we should add elements to Visited before pushing elements to WorkList, otherwise WorkList may contain duplicates.

MaskRay: For a work list, we should add elements to Visited before pushing elements to WorkList…

if (Visited.size() > MoveAutoInitThreshold)

nickdesaulniersUnsubmitted

Done

Are you able to use a range-for here with MemoryPhi::incoming_values()?

nickdesaulniers: Are you able to use a range-for here with `MemoryPhi::incoming_values()`?

serge-sans-pailleAuthorUnsubmitted

Done

Unfortunately not, SmallVector::append doesn't accept iterator_range. I'll implement that.

serge-sans-paille: Unfortunately not, `SmallVector::append` doesn't accept `iterator_range`. I'll implement that.

nickdesaulniersUnsubmitted

Done

I was just thinking

for (MemoryAccess *MA : M->incoming_values())
  Worklist.push_back(MA);

nickdesaulniers: I was just thinking ``` for (MemoryAccess *MA : M->incoming_values()) Worklist.push_back(MA)…

serge-sans-pailleAuthorUnsubmitted

Done

Yeah, but incoming_values iterates over Use* so I'd need a cast, so I'd rather keep current implementation.

serge-sans-paille: Yeah, but incoming_values iterates over `Use*` so I'd need a cast, so I'd rather keep current…

return nullptr;

nikicUnsubmitted

Done

append_range(WorkList, UsersAsMemoryAccesses)

nikic: `append_range(WorkList, UsersAsMemoryAccesses)`

bool FoundClobberingUser = false;

nikicUnsubmitted

Not Done

Doesn't this mean we may potentially move the initialization past the lifetime.end?

nikic: Doesn't this mean we may potentially move the initialization past the lifetime.end?

if (auto *M = dyn_cast<MemoryUseOrDef>(MA)) {

Instruction *MI = M->getMemoryInst();

// If this memory instruction may not clobber `I`, we can skip it.

nickdesaulniersUnsubmitted

Done

delete.

See comment below about ternaries, which also applies here.

nickdesaulniers: delete. See comment below about ternaries, which also applies here.

// LifetimeEnd is a valid user, but we do not want it in the user

// dominator.

if (AA.getModRefInfo(MI, ML) != ModRefInfo::NoModRef &&

!MI->isLifetimeStartOrEnd() && MI != I) {

FoundClobberingUser = true;

CurrentDominator = CurrentDominator

? DT.findNearestCommonDominator(CurrentDominator,

MI->getParent())

: MI->getParent();

}

if (!FoundClobberingUser) {

nickdesaulniersUnsubmitted

Done

is there a faster way to do this rather than having to scan the metadata of every operand?

nickdesaulniers: is there a faster way to do this rather than having to scan the metadata of every operand?

serge-sans-pailleAuthorUnsubmitted

Done

Well, there's an early exit on the instruction metadata, I guess that's enough?

serge-sans-paille: Well, there's an early exit on the instruction metadata, I guess that's enough?

auto UsersAsMemoryAccesses = map_range(MA->users(), AsMemoryAccess);

efriedmaUnsubmitted

Not Done

Can we avoid iterating over the function if it doesn't have any auto-init?

efriedma: Can we avoid iterating over the function if it doesn't have any auto-init?

fhahnUnsubmitted

Not Done

It's not clear to me from the FIXME why we would want to limit this to auto-init instructions. Shouldn't this be beneficial for all memory instructions?

fhahn: It's not clear to me from the FIXME why we would want to limit this to `auto-init` instructions.

serge-sans-pailleAuthorUnsubmitted

Done

The algorithm beyond auto-init requires more care (it probably needs a reverse transversal of the CFG, and some optimization to avoid rebuilding the dominator of the users for each instruction). I'd rather start small with that pass and then upgrade it to a full-fledge version.

serge-sans-paille: The algorithm beyond auto-init requires more care (it probably needs a reverse transversal of…

fhahnUnsubmitted

Not Done

AFAIK the !annotation metadata at the moment is information-only and doesn't provide any semantic guarantees. So relying on it for correctness at the moment doesn't sound ideal. From the documentation in the pass, it is not clear what property you are relying on. Could you make this clear in both the documentation in the pass and the commit description?

fhahn: AFAIK the `!annotation` metadata at the moment is information-only and doesn't provide any…

serge-sans-pailleAuthorUnsubmitted

Done

Can we avoid iterating over the function if it doesn't have any auto-init?

I could try a module pass and walk through the users of the auto-init metadata?

serge-sans-paille: > Can we avoid iterating over the function if it doesn't have any auto-init? I could try a…

nikicUnsubmitted

Not Done

Assert or check that use_empty(), the transform relies on it.

I'd also assert or check that the instruction is not volatile. (Not sure if auto-init can be volatile...)

nikic: Assert or check that use_empty(), the transform relies on it. I'd also assert or check that…

serge-sans-pailleAuthorUnsubmitted

Done

Assert or check that use_empty(), the transform relies on it

I'm not quite sure which use you're referring to, can you explain?

serge-sans-paille: > Assert or check that use_empty(), the transform relies on it I'm not quite sure which use…

append_range(WorkList, UsersAsMemoryAccesses);

}

return CurrentDominator;

}

static bool runMoveAutoInit(Function &F, DominatorTree &DT, MemorySSA &MSSA) {

BasicBlock &EntryBB = F.getEntryBlock();

SmallVector<std::pair<Instruction *, Instruction *>> JobList;

MaskRayUnsubmitted

Not Done

unneeded blank line

MaskRay: unneeded blank line

aeubanksUnsubmitted

Done

this sentence doesn't make sense to me

aeubanks: this sentence doesn't make sense to me

nickdesaulniersUnsubmitted

Done

I think you can just pass successors(UsersDominator)?

nickdesaulniers: I think you can just pass `successors(UsersDominator)`?

MaskRayUnsubmitted

Not Done

The pass name is MoveAutoInit. Handling just auto-init instructions is what the pass is designed for, so I don't think not generalizing it deserves a FIXME. You can drop FIXME.

MaskRay: The pass name is MoveAutoInit. Handling just auto-init instructions is what the pass is…

// Compute movable instructions.

for (Instruction &I : EntryBB) {

if (!hasAutoInitMetadata(I))

continue;

assert(!I.isVolatile() && "auto init instructions cannot be volatile.");

aeubanksUnsubmitted

Done

want

aeubanks: want

BasicBlock *UsersDominator = usersDominator(&I, DT, MSSA);

if (!UsersDominator)

continue;

if (UsersDominator == &EntryBB)

continue;

jfbUnsubmitted

Done

Typo "want"

jfb: Typo "want"

// Traverse the CFG to detect cycles `UsersDominator` would be part of.

SmallPtrSet<BasicBlock *, 8> TransitiveSuccessors;

SmallVector<BasicBlock *> WorkList(successors(UsersDominator));

bool HasCycle = false;

while (!WorkList.empty()) {

BasicBlock *CurrBB = WorkList.pop_back_val();

nickdesaulniersUnsubmitted

Done

delete

nickdesaulniers: delete

if (CurrBB == UsersDominator)

// No early exit because we want to compute the full set of transitive

// successors.

HasCycle = true;

nickdesaulniersUnsubmitted

Done

delete

nickdesaulniers: delete

for (BasicBlock *Successor : successors(CurrBB)) {

if (!TransitiveSuccessors.insert(Successor).second)

continue;

WorkList.push_back(Successor);

}

nickdesaulniersUnsubmitted

Done

This sentence could be reworded. Looks like it was partially updated at some point?

nickdesaulniers: This sentence could be reworded. Looks like it was partially updated at some point?

}

// Don't insert if that could create multiple execution of I,

MaskRayUnsubmitted

Not Done

unneeded blank line

MaskRay: unneeded blank line

nikicUnsubmitted

Not Done

This will miss cases where the dominator is not the cycle header, right? Possible improvement for the future.

nikic: This will miss cases where the dominator is not the cycle header, right? Possible improvement…

// but we can insert it in the non back-edge predecessors, if it exists.

if (HasCycle) {

BasicBlock *UsersDominatorHead = UsersDominator;

while (BasicBlock *UniquePredecessor =

UsersDominatorHead->getUniquePredecessor())

UsersDominatorHead = UniquePredecessor;

if (UsersDominatorHead == &EntryBB)

continue;

BasicBlock *DominatingPredecessor = nullptr;

for (BasicBlock *Pred : predecessors(UsersDominatorHead)) {

MaskRayUnsubmitted

Not Done

unneeded blank line

MaskRay: unneeded blank line

// If one of the predecessor of the dominator also transitively is a

// successor, moving to the dominator would do the inverse of loop

// hoisting, and we don't want that.

if (TransitiveSuccessors.count(Pred))

continue;

nickdesaulniersUnsubmitted

Done

Please use

if (x)
  y()
else
  z()

rather than

if (!x)
  z()
else
  y()

Additionally, since we're assigning to the same variable, consider using a ternary statement.

DominatingPredecessor = (DominatingPredecessor ? DT.find... : Pred)

nickdesaulniers: Please use ``` if (x) y() else z() ``` rather than ``` if (!x) z() else y() ```…

nikicUnsubmitted

Done

Is it possible for this to be a catchswitch block?

nikic: Is it possible for this to be a catchswitch block?

DominatingPredecessor =

DominatingPredecessor

? DT.findNearestCommonDominator(DominatingPredecessor, Pred)

: Pred;

}

nickdesaulniersUnsubmitted

Done

Should we take a reference to the std::pair here? auto &Job?

nickdesaulniers: Should we take a reference to the std::pair here? `auto &Job`?

if (!DominatingPredecessor || DominatingPredecessor == &EntryBB)

continue;

nikicUnsubmitted

Done

Drop braces.

nikic: Drop braces.

UsersDominator = DominatingPredecessor;

}

// We finally found a place where I can be moved while not introducing extra

nikicUnsubmitted

Done

verifyMemorySSA() here -- pretty sure there will be verification errors. E.g. you might be moving past unrelated MemoryInsts.

nikic: `verifyMemorySSA()` here -- pretty sure there will be verification errors. E.g. you might be…

// execution, and guarded by at least one condition.

Instruction *UsersDominatorInsertionPt =

&*UsersDominator->getFirstInsertionPt();

nikicUnsubmitted

Not Done

The common dominator of all the predecessors is probably just the idom?

nikic: The common dominator of all the predecessors is probably just the idom?

// CatchSwitchInst blocks can only have one instruction, so they are not

// good candidates for insertion.

while (isa<CatchSwitchInst>(UsersDominatorInsertionPt)) {

for (BasicBlock *Pred : predecessors(UsersDominator))

UsersDominator = DT.findNearestCommonDominator(UsersDominator, Pred);

UsersDominatorInsertionPt = &*UsersDominator->getFirstInsertionPt();

}

JobList.emplace_back(&I, UsersDominatorInsertionPt);

}

fhahnUnsubmitted

Done

This pass moves only memory instructions, so could it be marked as preserving all CFG analyses?

This should ideally also preserve MemorySSA.

fhahn: This pass moves only memory instructions, so could it be marked as preserving all CFG analyses?

// Perform the actual substitution.

if (JobList.empty())

return false;

MemorySSAUpdater MSSAU(&MSSA);

for (auto &Job : JobList) {

Job.first->moveBefore(Job.second);

nikicUnsubmitted

Not Done

Could you please also add -verify-memoryssa to the tests?

nikic: Could you please also add `-verify-memoryssa` to the tests?

MSSAU.moveToPlace(MSSA.getMemoryAccess(Job.first), Job.second->getParent(),

MemorySSA::InsertionPlace::Beginning);

}

if (VerifyMemorySSA)

MSSA.verifyMemorySSA();

NumMoved += JobList.size();

return true;

aeubanksUnsubmitted

Done

any reason for all the legacy pass infra? if it's going in the optimization pipeline then no need for a legacy pass

aeubanks: any reason for all the legacy pass infra? if it's going in the optimization pipeline then no…

}

PreservedAnalyses MoveAutoInitPass::run(Function &F,

FunctionAnalysisManager &AM) {

auto &DT = AM.getResult<DominatorTreeAnalysis>(F);

auto &MSSA = AM.getResult<MemorySSAAnalysis>(F).getMSSA();

if (!runMoveAutoInit(F, DT, MSSA))

return PreservedAnalyses::all();

PreservedAnalyses PA;

PA.preserve<DominatorTreeAnalysis>();

PA.preserve<MemorySSAAnalysis>();

PA.preserveSet<CFGAnalyses>();

return PA;

}

llvm/test/Other/new-pm-defaults.ll

	Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running analysis: TargetIRAnalysis			; CHECK-O-NEXT: Running analysis: TargetIRAnalysis
	; CHECK-O-NEXT: Running analysis: AssumptionAnalysis			; CHECK-O-NEXT: Running analysis: AssumptionAnalysis
	; CHECK-O-NEXT: Running pass: SROAPass			; CHECK-O-NEXT: Running pass: SROAPass
	; CHECK-O-NEXT: Running analysis: DominatorTreeAnalysis			; CHECK-O-NEXT: Running analysis: DominatorTreeAnalysis
	; CHECK-O-NEXT: Running pass: EarlyCSEPass			; CHECK-O-NEXT: Running pass: EarlyCSEPass
	; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis			; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
	; CHECK-O3-NEXT: Running pass: CallSiteSplittingPass			; CHECK-O3-NEXT: Running pass: CallSiteSplittingPass
	; CHECK-O-NEXT: Running pass: OpenMPOptPass			; CHECK-O-NEXT: Running pass: OpenMPOptPass
				efriedmaUnsubmitted Done Reply Inline Actions It would be good to avoid introducing an additional run of MemorySSA. efriedma: It would be good to avoid introducing an additional run of MemorySSA.
	; CHECK-EP-PIPELINE-EARLY-SIMPLIFICATION-NEXT: Running pass: NoOpModulePass			; CHECK-EP-PIPELINE-EARLY-SIMPLIFICATION-NEXT: Running pass: NoOpModulePass
	; CHECK-O-NEXT: Running pass: IPSCCPPass			; CHECK-O-NEXT: Running pass: IPSCCPPass
	; CHECK-FUNC-SPEC-NEXT: Running analysis: LoopAnalysis			; CHECK-FUNC-SPEC-NEXT: Running analysis: LoopAnalysis
	; CHECK-O-NEXT: Running pass: CalledValuePropagationPass			; CHECK-O-NEXT: Running pass: CalledValuePropagationPass
	; CHECK-O-NEXT: Running pass: GlobalOptPass			; CHECK-O-NEXT: Running pass: GlobalOptPass
	; CHECK-O-NEXT: Running pass: PromotePass			; CHECK-O-NEXT: Running pass: PromotePass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis
	▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis			; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
	; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass			; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
	; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis			; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
	; CHECK-O1-NEXT: Running pass: CoroElidePass			; CHECK-O1-NEXT: Running pass: CoroElidePass
	; CHECK-O-NEXT: Running pass: ADCEPass			; CHECK-O-NEXT: Running pass: ADCEPass
	; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis			; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis
	; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass			; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass
	; CHECK-O23SZ-NEXT: Running pass: DSEPass			; CHECK-O23SZ-NEXT: Running pass: DSEPass
				; CHECK-O23SZ-NEXT: Running pass: MoveAutoInitPass on foo
	; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass			; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O23SZ-NEXT: Running pass: LCSSAPass			; CHECK-O23SZ-NEXT: Running pass: LCSSAPass
	; CHECK-O23SZ-NEXT: Running pass: LICMPass			; CHECK-O23SZ-NEXT: Running pass: LICMPass
	; CHECK-O23SZ-NEXT: Running pass: CoroElidePass			; CHECK-O23SZ-NEXT: Running pass: CoroElidePass
	; CHECK-EP-SCALAR-LATE-NEXT: Running pass: NoOpFunctionPass			; CHECK-EP-SCALAR-LATE-NEXT: Running pass: NoOpFunctionPass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-EP-PEEPHOLE-NEXT: Running pass: NoOpFunctionPass			; CHECK-EP-PEEPHOLE-NEXT: Running pass: NoOpFunctionPass
	▲ Show 20 Lines • Show All 97 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-lto-defaults.ll

	Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines
	; CHECK-O23SZ-NEXT: Running analysis: ScalarEvolutionAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: ScalarEvolutionAnalysis on foo
	; CHECK-O23SZ-NEXT: Running analysis: InnerAnalysisManagerProxy			; CHECK-O23SZ-NEXT: Running analysis: InnerAnalysisManagerProxy
	; CHECK-O23SZ-NEXT: Running pass: LICMPass on loop			; CHECK-O23SZ-NEXT: Running pass: LICMPass on loop
	; CHECK-O23SZ-NEXT: Running pass: GVNPass on foo			; CHECK-O23SZ-NEXT: Running pass: GVNPass on foo
	; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis on foo
	; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass on foo			; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass on foo
	; CHECK-O23SZ-NEXT: Running pass: DSEPass on foo			; CHECK-O23SZ-NEXT: Running pass: DSEPass on foo
	; CHECK-O23SZ-NEXT: Running analysis: PostDominatorTreeAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
				; CHECK-O23SZ-NEXT: Running pass: MoveAutoInitPass on foo
	; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass on foo			; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass on foo
	; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass on foo			; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass on foo
	; CHECK-O23SZ-NEXT: Running pass: LCSSAPass on foo			; CHECK-O23SZ-NEXT: Running pass: LCSSAPass on foo
	; CHECK-O23SZ-NEXT: Running pass: IndVarSimplifyPass on loop			; CHECK-O23SZ-NEXT: Running pass: IndVarSimplifyPass on loop
	; CHECK-O23SZ-NEXT: Running pass: LoopDeletionPass on loop			; CHECK-O23SZ-NEXT: Running pass: LoopDeletionPass on loop
	; CHECK-O23SZ-NEXT: Running pass: LoopFullUnrollPass on loop			; CHECK-O23SZ-NEXT: Running pass: LoopFullUnrollPass on loop
	; CHECK-O23SZ-NEXT: Running pass: LoopDistributePass on foo			; CHECK-O23SZ-NEXT: Running pass: LoopDistributePass on foo
	; CHECK-O23SZ-NEXT: Running analysis: LoopAccessAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: LoopAccessAnalysis on foo
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-defaults.ll

	Show First 20 Lines • Show All 143 Lines • ▼ Show 20 Lines
	; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis			; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
	; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass			; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
	; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis			; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
	; CHECK-O1-NEXT: Running pass: CoroElidePass			; CHECK-O1-NEXT: Running pass: CoroElidePass
	; CHECK-O-NEXT: Running pass: ADCEPass			; CHECK-O-NEXT: Running pass: ADCEPass
	; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis			; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis
	; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass			; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass
	; CHECK-O23SZ-NEXT: Running pass: DSEPass			; CHECK-O23SZ-NEXT: Running pass: DSEPass
				; CHECK-O23SZ-NEXT: Running pass: MoveAutoInitPass on foo
	; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass			; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O23SZ-NEXT: Running pass: LCSSAPass			; CHECK-O23SZ-NEXT: Running pass: LCSSAPass
	; CHECK-O23SZ-NEXT: Running pass: LICMPass on loop			; CHECK-O23SZ-NEXT: Running pass: LICMPass on loop
	; CHECK-O23SZ-NEXT: Running pass: CoroElidePass			; CHECK-O23SZ-NEXT: Running pass: CoroElidePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass			; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

	Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines
	; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass			; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
	; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis			; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
	; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass			; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
	; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis			; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
	; CHECK-O1-NEXT: Running pass: CoroElidePass			; CHECK-O1-NEXT: Running pass: CoroElidePass
	; CHECK-O-NEXT: Running pass: ADCEPass			; CHECK-O-NEXT: Running pass: ADCEPass
	; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass			; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass
	; CHECK-O23SZ-NEXT: Running pass: DSEPass			; CHECK-O23SZ-NEXT: Running pass: DSEPass
				; CHECK-O23SZ-NEXT: Running pass: MoveAutoInitPass on foo
	; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass			; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O23SZ-NEXT: Running pass: LCSSAPass			; CHECK-O23SZ-NEXT: Running pass: LCSSAPass
	; CHECK-O23SZ-NEXT: Running pass: LICMPass			; CHECK-O23SZ-NEXT: Running pass: LICMPass
	; CHECK-O23SZ-NEXT: Running pass: CoroElidePass			; CHECK-O23SZ-NEXT: Running pass: CoroElidePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass			; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
	▲ Show 20 Lines • Show All 116 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

	Show First 20 Lines • Show All 137 Lines • ▼ Show 20 Lines
	; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass			; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
	; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis			; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
	; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass			; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
	; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis			; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
	; CHECK-O1-NEXT: Running pass: CoroElidePass			; CHECK-O1-NEXT: Running pass: CoroElidePass
	; CHECK-O-NEXT: Running pass: ADCEPass			; CHECK-O-NEXT: Running pass: ADCEPass
	; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass			; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass
	; CHECK-O23SZ-NEXT: Running pass: DSEPass			; CHECK-O23SZ-NEXT: Running pass: DSEPass
				; CHECK-O23SZ-NEXT: Running pass: MoveAutoInitPass
	; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass			; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O23SZ-NEXT: Running pass: LCSSAPass			; CHECK-O23SZ-NEXT: Running pass: LCSSAPass
	; CHECK-O23SZ-NEXT: Running pass: LICMPass			; CHECK-O23SZ-NEXT: Running pass: LICMPass
	; CHECK-O23SZ-NEXT: Running pass: CoroElidePass			; CHECK-O23SZ-NEXT: Running pass: CoroElidePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass			; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-prelink-defaults.ll

	Show First 20 Lines • Show All 141 Lines • ▼ Show 20 Lines
	; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis			; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
	; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass			; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
	; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis			; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
	; CHECK-O1-NEXT: Running pass: CoroElidePass			; CHECK-O1-NEXT: Running pass: CoroElidePass
	; CHECK-O-NEXT: Running pass: ADCEPass			; CHECK-O-NEXT: Running pass: ADCEPass
	; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis			; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis
	; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass			; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass
	; CHECK-O23SZ-NEXT: Running pass: DSEPass			; CHECK-O23SZ-NEXT: Running pass: DSEPass
				; CHECK-O23SZ-NEXT: Running pass: MoveAutoInitPass
	; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass			; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O23SZ-NEXT: Running pass: LCSSAPass			; CHECK-O23SZ-NEXT: Running pass: LCSSAPass
	; CHECK-O23SZ-NEXT: Running pass: LICMPass on loop			; CHECK-O23SZ-NEXT: Running pass: LICMPass on loop
	; CHECK-O23SZ-NEXT: Running pass: CoroElidePass			; CHECK-O23SZ-NEXT: Running pass: CoroElidePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass			; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll

	Show First 20 Lines • Show All 166 Lines • ▼ Show 20 Lines
	; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass			; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
	; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis			; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
	; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass			; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
	; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis			; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
	; CHECK-O1-NEXT: Running pass: CoroElidePass			; CHECK-O1-NEXT: Running pass: CoroElidePass
	; CHECK-O-NEXT: Running pass: ADCEPass			; CHECK-O-NEXT: Running pass: ADCEPass
	; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass			; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass
	; CHECK-O23SZ-NEXT: Running pass: DSEPass			; CHECK-O23SZ-NEXT: Running pass: DSEPass
				; CHECK-O23SZ-NEXT: Running pass: MoveAutoInitPass on foo
	; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass			; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O23SZ-NEXT: Running pass: LCSSAPass			; CHECK-O23SZ-NEXT: Running pass: LCSSAPass
	; CHECK-O23SZ-NEXT: Running pass: LICMPass			; CHECK-O23SZ-NEXT: Running pass: LICMPass
	; CHECK-O23SZ-NEXT: Running pass: CoroElidePass			; CHECK-O23SZ-NEXT: Running pass: CoroElidePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass			; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll

	Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines
	; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass			; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
	; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis			; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
	; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass			; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
	; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis			; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
	; CHECK-O1-NEXT: Running pass: CoroElidePass			; CHECK-O1-NEXT: Running pass: CoroElidePass
	; CHECK-O-NEXT: Running pass: ADCEPass			; CHECK-O-NEXT: Running pass: ADCEPass
	; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass			; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass
	; CHECK-O23SZ-NEXT: Running pass: DSEPass			; CHECK-O23SZ-NEXT: Running pass: DSEPass
				; CHECK-O23SZ-NEXT: Running pass: MoveAutoInitPass on foo
	; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass			; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O23SZ-NEXT: Running pass: LCSSAPass			; CHECK-O23SZ-NEXT: Running pass: LCSSAPass
	; CHECK-O23SZ-NEXT: Running pass: LICMPass			; CHECK-O23SZ-NEXT: Running pass: LICMPass
	; CHECK-O23SZ-NEXT: Running pass: CoroElidePass			; CHECK-O23SZ-NEXT: Running pass: CoroElidePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass			; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/Transforms/MoveAutoInit/branch.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -S -passes='move-auto-init' -verify-memoryssa \| FileCheck %s

				@__const.foo.buffer = private unnamed_addr constant [8 x i32] [i32 -1431655766, i32 -1431655766, i32 -1431655766, i32 -1431655766, i32 -1431655766, i32 -1431655766, i32 -1431655766, i32 -1431655766], align 16

				nikicUnsubmitted Done Reply Inline Actions Shouldn't be relevant. nikic: Shouldn't be relevant.
				define void @foo(i32 %x) {
				fhahnUnsubmitted Done Reply Inline Actions does this depend on the triple? If it does, this needs `REQUIRES:...` otherwise it will fail if the X86 backend is not built. (same for all tests) fhahn: does this depend on the triple? If it does, this needs `REQUIRES:...` otherwise it will fail if…
				; CHECK-LABEL: @foo(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[BUFFER:%.*]] = alloca [8 x i32], align 16
				; CHECK-NEXT: [[TOBOOL:%.]] = icmp ne i32 [[X:%.]], 0
				aeubanksUnsubmitted Done Reply Inline Actions test nit: could you remove `dso_local`/`noundef` aeubanks: test nit: could you remove `dso_local`/`noundef`
				; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_THEN:%.]], label [[IF_END:%.]]
				; CHECK: if.then:
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 16 [[BUFFER]], ptr align 16 @__const.foo.buffer, i64 32, i1 false), !annotation !0
				; CHECK-NEXT: call void @dump(ptr [[BUFFER]])
				; CHECK-NEXT: br label [[IF_END]]
				; CHECK: if.end:
				; CHECK-NEXT: ret void
				;

				entry:
				%buffer = alloca [8 x i32], align 16
				call void @llvm.memcpy.p0.p0.i64(ptr align 16 %buffer, ptr align 16 @__const.foo.buffer, i64 32, i1 false), !annotation !0
				%tobool = icmp ne i32 %x, 0
				br i1 %tobool, label %if.then, label %if.end

				if.then: ; preds = %entry
				call void @dump(ptr %buffer)
				br label %if.end

				if.end: ; preds = %if.then, %entry
				ret void
				nikicUnsubmitted Not Done Reply Inline Actions Zero-index GEP is redundant. nikic: Zero-index GEP is redundant.
				}



				declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg)

				declare void @dump(ptr)

				!0 = !{!"auto-init"}

llvm/test/Transforms/MoveAutoInit/clobber.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; Checks that move-auto-init can move instruction passed unclobbering memory
				MaskRayUnsubmitted Not Done Reply Inline Actions Add a file-level comment what this test is about. MaskRay: Add a file-level comment what this test is about.
				; instructions.
				; RUN: opt < %s -S -passes='move-auto-init' -verify-memoryssa \| FileCheck %s

				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

				define i32 @foo(i32 noundef %0, i32 noundef %1, i32 noundef %2) #0 {
				; CHECK-LABEL: @foo(
				; CHECK-NEXT: [[TMP4:%.*]] = alloca [100 x i8], align 16
				; CHECK-NEXT: [[TMP5:%.*]] = alloca [2 x i8], align 1
				; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds [100 x i8], ptr [[TMP4]], i64 0, i64 0
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 100, ptr nonnull [[TMP6]]) #[[ATTR3:[0-9]+]]
				; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds [2 x i8], ptr [[TMP5]], i64 0, i64 0
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 2, ptr nonnull [[TMP7]]) #[[ATTR3]]
				; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds [2 x i8], ptr [[TMP5]], i64 0, i64 1
				; CHECK-NEXT: [[TMP9:%.]] = icmp eq i32 [[TMP1:%.]], 0
				; CHECK-NEXT: br i1 [[TMP9]], label [[TMP15:%.]], label [[TMP10:%.]]
				; CHECK: 10:
				; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr noundef nonnull align 16 dereferenceable(100) [[TMP6]], i8 -86, i64 100, i1 false), !annotation !0
				; CHECK-NEXT: [[TMP11:%.]] = sext i32 [[TMP0:%.]] to i64
				; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds [100 x i8], ptr [[TMP4]], i64 0, i64 [[TMP11]]
				; CHECK-NEXT: store i8 12, ptr [[TMP12]], align 1
				; CHECK-NEXT: [[TMP13:%.*]] = load i8, ptr [[TMP6]], align 16
				; CHECK-NEXT: [[TMP14:%.*]] = sext i8 [[TMP13]] to i32
				; CHECK-NEXT: br label [[TMP22:%.*]]
				; CHECK: 15:
				; CHECK-NEXT: [[TMP16:%.]] = icmp eq i32 [[TMP2:%.]], 0
				; CHECK-NEXT: br i1 [[TMP16]], label [[TMP22]], label [[TMP17:%.*]]
				; CHECK: 17:
				; CHECK-NEXT: store i8 -86, ptr [[TMP7]], align 1, !annotation !0
				; CHECK-NEXT: store i8 -86, ptr [[TMP8]], align 1, !annotation !0
				; CHECK-NEXT: [[TMP18:%.*]] = sext i32 [[TMP0]] to i64
				; CHECK-NEXT: [[TMP19:%.*]] = getelementptr inbounds [2 x i8], ptr [[TMP5]], i64 0, i64 [[TMP18]]
				; CHECK-NEXT: store i8 12, ptr [[TMP19]], align 1
				; CHECK-NEXT: [[TMP20:%.*]] = load i8, ptr [[TMP7]], align 1
				; CHECK-NEXT: [[TMP21:%.*]] = sext i8 [[TMP20]] to i32
				; CHECK-NEXT: br label [[TMP22]]
				; CHECK: 22:
				; CHECK-NEXT: [[TMP23:%.*]] = phi i32 [ [[TMP14]], [[TMP10]] ], [ [[TMP21]], [[TMP17]] ], [ 0, [[TMP15]] ]
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 2, ptr nonnull [[TMP7]]) #[[ATTR3]]
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 100, ptr nonnull [[TMP6]]) #[[ATTR3]]
				; CHECK-NEXT: ret i32 [[TMP23]]
				;

				%4 = alloca [100 x i8], align 16
				%5 = alloca [2 x i8], align 1
				%6 = getelementptr inbounds [100 x i8], [100 x i8]* %4, i64 0, i64 0
				call void @llvm.lifetime.start.p0i8(i64 100, i8* nonnull %6) #3
				; This memset must move.
				call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 16 dereferenceable(100) %6, i8 -86, i64 100, i1 false), !annotation !0
				%7 = getelementptr inbounds [2 x i8], [2 x i8]* %5, i64 0, i64 0
				call void @llvm.lifetime.start.p0i8(i64 2, i8* nonnull %7) #3
				; This store must move.
				store i8 -86, i8* %7, align 1, !annotation !0
				%8 = getelementptr inbounds [2 x i8], [2 x i8]* %5, i64 0, i64 1
				; This store must move.
				store i8 -86, i8* %8, align 1, !annotation !0
				%9 = icmp eq i32 %1, 0
				br i1 %9, label %15, label %10

				10:
				%11 = sext i32 %0 to i64
				%12 = getelementptr inbounds [100 x i8], [100 x i8]* %4, i64 0, i64 %11
				store i8 12, i8* %12, align 1
				%13 = load i8, i8* %6, align 16
				%14 = sext i8 %13 to i32
				br label %22

				15:
				%16 = icmp eq i32 %2, 0
				br i1 %16, label %22, label %17

				17:
				%18 = sext i32 %0 to i64
				%19 = getelementptr inbounds [2 x i8], [2 x i8]* %5, i64 0, i64 %18
				store i8 12, i8* %19, align 1
				%20 = load i8, i8* %7, align 1
				%21 = sext i8 %20 to i32
				br label %22

				22:
				%23 = phi i32 [ %14, %10 ], [ %21, %17 ], [ 0, %15 ]
				call void @llvm.lifetime.end.p0i8(i64 2, i8* nonnull %7) #3
				call void @llvm.lifetime.end.p0i8(i64 100, i8* nonnull %6) #3
				ret i32 %23
				}

				declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #1

				declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i1 immarg) #2

				declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #1

				attributes #0 = { mustprogress nofree nosync nounwind readnone uwtable willreturn }
				attributes #1 = { argmemonly mustprogress nofree nosync nounwind willreturn }
				attributes #2 = { argmemonly mustprogress nofree nounwind willreturn writeonly }
				attributes #3 = { nounwind }

				!0 = !{!"auto-init"}

llvm/test/Transforms/MoveAutoInit/fence.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -S -passes='move-auto-init' -verify-memoryssa \| FileCheck %s

				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

				; In that case, the store to %val happens before the fence and cannot go past
				; it.
				define void @foo(i32 %x) {
				; CHECK-LABEL: @foo(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[VAL:%.*]] = alloca i32, align 4
				; CHECK-NEXT: store i32 -1431655766, ptr [[VAL]], align 4, !annotation !0
				; CHECK-NEXT: [[TOBOOL:%.]] = icmp ne i32 [[X:%.]], 0
				; CHECK-NEXT: fence acquire
				; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_THEN:%.]], label [[IF_END:%.]]
				; CHECK: if.then:
				; CHECK-NEXT: call void @dump(ptr [[VAL]])
				; CHECK-NEXT: br label [[IF_END]]
				; CHECK: if.end:
				; CHECK-NEXT: ret void
				;
				entry:
				%val = alloca i32, align 4
				store i32 -1431655766, ptr %val, align 4, !annotation !0
				%tobool = icmp ne i32 %x, 0
				nikicUnsubmitted Done Reply Inline Actions To make this test more meaningful, you probably want to capture `%val` before the fence? Otherwise the initialization is unobservable. nikic: To make this test more meaningful, you probably want to capture `%val` before the fence?
				fence acquire
				br i1 %tobool, label %if.then, label %if.end

				if.then: ; preds = %entry
				call void @dump(ptr %val)
				br label %if.end

				if.end: ; preds = %if.then, %entry
				ret void
				}

				; In that case, the store to %val happens after the fence and it is moved within
				; the true branch as expected.
				define void @bar(i32 %x) {
				; CHECK-LABEL: @bar(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[VAL:%.*]] = alloca i32, align 4
				; CHECK-NEXT: [[TOBOOL:%.]] = icmp ne i32 [[X:%.]], 0
				; CHECK-NEXT: fence acquire
				; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_THEN:%.]], label [[IF_END:%.]]
				; CHECK: if.then:
				; CHECK-NEXT: store i32 -1431655766, ptr [[VAL]], align 4, !annotation !0
				; CHECK-NEXT: call void @dump(ptr [[VAL]])
				; CHECK-NEXT: br label [[IF_END]]
				; CHECK: if.end:
				; CHECK-NEXT: ret void
				;
				entry:
				%val = alloca i32, align 4
				%tobool = icmp ne i32 %x, 0
				fence acquire
				store i32 -1431655766, ptr %val, align 4, !annotation !0
				br i1 %tobool, label %if.then, label %if.end

				if.then: ; preds = %entry
				call void @dump(ptr %val)
				br label %if.end

				if.end: ; preds = %if.then, %entry
				ret void
				}

				declare void @dump(ptr)

				!0 = !{!"auto-init"}

llvm/test/Transforms/MoveAutoInit/loop.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -S -passes='move-auto-init' -verify-memoryssa \| FileCheck %s

				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

				define void @foo(i32 %x) {
				; CHECK-LABEL: @foo(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[BUFFER:%.*]] = alloca [80 x i32], align 16
				; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr align 16 [[BUFFER]], i8 -86, i64 320, i1 false), !annotation !0
				; CHECK-NEXT: br label [[DO_BODY:%.*]]
				; CHECK: do.body:
				; CHECK-NEXT: [[X_ADDR_0:%.]] = phi i32 [ [[X:%.]], [[ENTRY:%.]] ], [ [[DEC:%.]], [[DO_COND:%.*]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [80 x i32], ptr [[BUFFER]], i64 0, i64 0
				; CHECK-NEXT: call void @dump(ptr [[ARRAYIDX]])
				; CHECK-NEXT: br label [[DO_COND]]
				; CHECK: do.cond:
				; CHECK-NEXT: [[DEC]] = add nsw i32 [[X_ADDR_0]], -1
				; CHECK-NEXT: [[TOBOOL:%.*]] = icmp ne i32 [[X_ADDR_0]], 0
				; CHECK-NEXT: br i1 [[TOBOOL]], label [[DO_BODY]], label [[DO_END:%.*]]
				; CHECK: do.end:
				; CHECK-NEXT: ret void
				;

				entry:
				%buffer = alloca [80 x i32], align 16
				call void @llvm.memset.p0.i64(ptr align 16 %buffer, i8 -86, i64 320, i1 false), !annotation !0
				br label %do.body

				do.body: ; preds = %do.cond, %entry
				%x.addr.0 = phi i32 [ %x, %entry ], [ %dec, %do.cond ]
				%arrayidx = getelementptr inbounds [80 x i32], ptr %buffer, i64 0, i64 0
				call void @dump(ptr %arrayidx)
				br label %do.cond

				do.cond: ; preds = %do.body
				%dec = add nsw i32 %x.addr.0, -1
				%tobool = icmp ne i32 %x.addr.0, 0
				br i1 %tobool, label %do.body, label %do.end

				do.end: ; preds = %do.cond
				ret void
				}

				declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg)

				declare void @dump(ptr )

				define void @bar(i32 %x, i32 %y) {
				; CHECK-LABEL: @bar(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[BUFFER:%.*]] = alloca [80 x i32], align 16
				; CHECK-NEXT: [[TOBOOL:%.]] = icmp ne i32 [[Y:%.]], 0
				; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_THEN:%.]], label [[IF_END:%.]]
				; CHECK: if.then:
				; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr align 16 [[BUFFER]], i8 -86, i64 320, i1 false), !annotation !0
				; CHECK-NEXT: [[ADD:%.]] = add nsw i32 [[X:%.]], [[Y]]
				; CHECK-NEXT: br label [[DO_BODY:%.*]]
				; CHECK: do.body:
				; CHECK-NEXT: [[X_ADDR_0:%.]] = phi i32 [ [[ADD]], [[IF_THEN]] ], [ [[DEC:%.]], [[DO_COND:%.*]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [80 x i32], ptr [[BUFFER]], i64 0, i64 0
				; CHECK-NEXT: call void @dump(ptr [[ARRAYIDX]])
				; CHECK-NEXT: br label [[DO_COND]]
				; CHECK: do.cond:
				; CHECK-NEXT: [[DEC]] = add nsw i32 [[X_ADDR_0]], -1
				; CHECK-NEXT: [[TOBOOL1:%.*]] = icmp ne i32 [[X_ADDR_0]], 0
				; CHECK-NEXT: br i1 [[TOBOOL1]], label [[DO_BODY]], label [[DO_END:%.*]]
				; CHECK: do.end:
				; CHECK-NEXT: br label [[IF_END]]
				; CHECK: if.end:
				; CHECK-NEXT: ret void
				;

				entry:
				%buffer = alloca [80 x i32], align 16
				call void @llvm.memset.p0.i64(ptr align 16 %buffer, i8 -86, i64 320, i1 false), !annotation !0
				%tobool = icmp ne i32 %y, 0
				br i1 %tobool, label %if.then, label %if.end

				if.then: ; preds = %entry
				%add = add nsw i32 %x, %y
				br label %do.body

				do.body: ; preds = %do.cond, %if.then
				%x.addr.0 = phi i32 [ %add, %if.then ], [ %dec, %do.cond ]
				%arrayidx = getelementptr inbounds [80 x i32], ptr %buffer, i64 0, i64 0
				call void @dump(ptr %arrayidx)
				br label %do.cond

				do.cond: ; preds = %do.body
				%dec = add nsw i32 %x.addr.0, -1
				%tobool1 = icmp ne i32 %x.addr.0, 0
				br i1 %tobool1, label %do.body, label %do.end

				do.end: ; preds = %do.cond
				br label %if.end

				if.end: ; preds = %do.end, %entry
				ret void
				}

				!0 = !{!"auto-init"}

llvm/test/Transforms/MoveAutoInit/scalar.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -S -passes='move-auto-init' -verify-memoryssa \| FileCheck %s

				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

				define void @foo(i32 %x) {
				; CHECK-LABEL: @foo(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[VAL:%.*]] = alloca i32, align 4
				; CHECK-NEXT: [[TOBOOL:%.]] = icmp ne i32 [[X:%.]], 0
				; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_THEN:%.]], label [[IF_END:%.]]
				; CHECK: if.then:
				; CHECK-NEXT: store i32 -1431655766, ptr [[VAL]], align 4, !annotation !0
				; CHECK-NEXT: call void @dump(ptr [[VAL]])
				; CHECK-NEXT: br label [[IF_END]]
				; CHECK: if.end:
				; CHECK-NEXT: ret void
				;

				entry:
				%val = alloca i32, align 4
				store i32 -1431655766, ptr %val, align 4, !annotation !0
				%tobool = icmp ne i32 %x, 0
				br i1 %tobool, label %if.then, label %if.end

				if.then: ; preds = %entry
				call void @dump(ptr %val)
				br label %if.end

				if.end: ; preds = %if.then, %entry
				ret void
				}

				declare void @dump(ptr)

				!0 = !{!"auto-init"}

This is an archive of the discontinued LLVM Phabricator instance.

Move "auto-init" instructions to the dominator of their usersClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 510479

llvm/include/llvm/Transforms/Utils/MoveAutoInit.h

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Passes/PassBuilderPipelines.cpp

llvm/lib/Passes/PassRegistry.def

llvm/lib/Transforms/Utils/CMakeLists.txt

llvm/lib/Transforms/Utils/MoveAutoInit.cpp

llvm/test/Other/new-pm-defaults.ll

llvm/test/Other/new-pm-lto-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

llvm/test/Other/new-pm-thinlto-prelink-defaults.ll

llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll

llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll

llvm/test/Transforms/MoveAutoInit/branch.ll

llvm/test/Transforms/MoveAutoInit/clobber.ll

llvm/test/Transforms/MoveAutoInit/fence.ll

llvm/test/Transforms/MoveAutoInit/loop.ll

llvm/test/Transforms/MoveAutoInit/scalar.ll

Move "auto-init" instructions to the dominator of their users
ClosedPublic