This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
InitializePasses.h
-
Transforms/
-
Scalar.h
-
Scalar/
-
PartialMemToReg.h
-
Utils/
-
PromoteMemToReg.h
-
lib/
-
Passes/
-
PassBuilder.cpp
-
PassRegistry.def
-
Transforms/
-
Scalar/
-
CMakeLists.txt
-
PartialMemToReg.cpp
-
Scalar.cpp
-
Utils/
3
PromoteMemoryToRegister.cpp
-
test/Transforms/Mem2Reg/
-
Transforms/
-
Mem2Reg/
1
partial-mem2reg.ll

Differential D109749

Experimental Partial Mem2Reg
AbandonedPublic

Authored by huntergr on Sep 14 2021, 2:32 AM.

Download Raw Diff

Details

Reviewers

chandlerc
jdoerfert
kiranchandramohan
Meinersbur
ftynse
lebedev.ri

Summary

Clang's current lowering for OpenMP parallel worksharing loops with a reduction clause prevents lots of optimization opportunities because the address of the stack variable for the reduction is passed to an OpenMP runtime function after the loop; this causes SROA/mem2reg to skip over promoting it to SSA form.

The intent of this work is to partially promote the reduction variable to SSA form before the runtime call takes place for a loop like the following so that optimizations (like vectorization) can be performed.

int loop(int data[restrict 128U]) {
  int retval = 0;

#pragma omp parallel for simd schedule(simd:static) default(none) shared(data) reduction(+:retval)
  for (int i = 0; i < 128; i++) {
    int n = 0;

    if (data[i]) {
      n = 1;
      retval += n;
    }
  }
  return retval;
}

The code as it is right now was written to avoid clashing too much with other code in order to reduce maintenance costs downstream; I expect I'll need to refactor it considerably but I would like to hear from reviewers before undertaking that work.

I have a few questions to resolve first:

Is this feature something the community wants, or am I just overcomplicating things? Is there an easier way to get the above loop to vectorize?
I've been a bit paranoid about ensuring ordering here and used the PostDominatorTree; I think it may be possible to do this with a modification to the IDF algorithm used in mem2reg, but I haven't worked through it yet. Does anyone have more experience with it to help guide that?
This is currently a separate pass, but could be implemented as part of the normal SROA/mem2reg optimization pass. Would this be preferred? Does the outcome of the previous question about PostDom trees affect that?

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

huntergr created this revision.Sep 14 2021, 2:32 AM

Herald added subscribers: mgrang, hiraditya, mgorny. · View Herald TranscriptSep 14 2021, 2:32 AM

huntergr requested review of this revision.Sep 14 2021, 2:32 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 14 2021, 2:32 AM

Herald added a subscriber: sstefan1. · View Herald Transcript

Harbormaster completed remote builds in B123818: Diff 372440.Sep 14 2021, 3:24 AM

I have seen cases where this would be beneficial,
some of those are just due to lack of inlining, but not all.

I strongly believe this should be part of SROA,
it should analyze the alloca's ignoring captures,
and if it is otherwise promoteable, it should:

duplicate the original alloca (only for simplicity, this is fine since we know the old alloca goes away)
before each capture, load contents of the old alloca, and store it into new alloca
after each capture, load contents of the new alloca, and store it into old alloca
change captures to refer to the new alloca
run AggLoadStoreRewriter on the new alloca - so that all the uses of old alloca we've just introduced are analyzeable by SROA
proceed with normal handling of the old alloca - mem2reg will now succeed

kiranchandramohan added reviewers: Meinersbur, ftynse.Sep 14 2021, 3:49 AM

I agree this should be part of mem2reg/SROA unless there is a specific reason against it (e.g. computational complexity higher s.t. that it should not also run with every occurance of SROA/mem2reg in the default pipeline).

Your motivational code looks like it should be processable by LICM, s.t. it is promoted to registers while in the loop, then vectorized. Do you know why this doesn't happen?

I scanned the diff for nosync without hits. I doubt any of this reasoning is valid if I can have synchronization between threads.

That said, I think we need to use the fact that we know the value stored in the alloca is not captured. There was an email thread on this problem and email threads on how we could encode that it is not captured.
Given that this occurs in the OpenMP context, nosync is probably not an alternative.

In D109749#2999389, @lebedev.ri wrote:

I have seen cases where this would be beneficial,
some of those are just due to lack of inlining, but not all.

I strongly believe this should be part of SROA,
it should analyze the alloca's ignoring captures,
and if it is otherwise promoteable, it should:

duplicate the original alloca (only for simplicity, this is fine since we know the old alloca goes away)

before each capture, load contents of the old alloca, and store it into new alloca

after each capture, load contents of the new alloca, and store it into old alloca

change captures to refer to the new alloca

run AggLoadStoreRewriter on the new alloca - so that all the uses of old alloca we've just introduced are analyzeable by SROA

proceed with normal handling of the old alloca - mem2reg will now succeed

Hi, thanks for the suggestion (and sorry for the delay in responding).

I've implemented something similar to what you've suggested, but with a slight difference to make it fit the problem at hand -- the openmp reduction present in the loop. There's a key difference which I didn't state in my initial summary (though was present in the unit test), which is the way the alloca is captured -- it's not directly passed as an argument to the function, but the pointer is instead stored into another local memory address first and the pointer for the second memory address is then passed to __kmpc_reduce_nowait. This leads to the code being somewhat messy, as I have to check that the store of the pointer dominates the call, that there aren't other uses of the second alloca that might interfere with conversion, etc.

The way that's done makes me wonder whether libomp needs a lighter-weight interface for reductions involving a single scalar value, rather than just a single generic interface which accepts an arbitrary number of reduction variables. (For comparison, I looked into what gcc does -- it passes a pointer to a shared reduction variable into the outlined function, and it just performs the atomic operation directly instead of calling to the runtime).

So I think that I'll repurpose this patch to only cover the direct case of an alloca being used in a call and separate out the libomp side of things for another patch. I'll update the diff once I've implemented that.

In D109749#2999682, @Meinersbur wrote:

I agree this should be part of mem2reg/SROA unless there is a specific reason against it (e.g. computational complexity higher s.t. that it should not also run with every occurance of SROA/mem2reg in the default pipeline).

Your motivational code looks like it should be processable by LICM, s.t. it is promoted to registers while in the loop, then vectorized. Do you know why this doesn't happen?

mem2reg handles promotion to registers, but for LICM specifically there's a couple of things which would stop it.

Although the address is loop invariant, the data isn't.
For this loop in particular, the store is conditional so might never happen. We *could* add a second boolean reduction to determine whether or not to actually perform a store after the loop, but that's a bit more complicated than just letting mem2reg do what it should.

In D109749#2999818, @jdoerfert wrote:

I scanned the diff for nosync without hits. I doubt any of this reasoning is valid if I can have synchronization between threads.

That's part of the reason my original patch only changed uses before a capture (the other being possible aliasing within a thread -- a terrible idea, but someone somewhere has probably written something which relies on it). I could restrict it to avoid converting any allocas which use atomic operations.

In D109749#2999837, @jdoerfert wrote:

That said, I think we need to use the fact that we know the value stored in the alloca is not captured. There was an email thread on this problem and email threads on how we could encode that it is not captured.
Given that this occurs in the OpenMP context, nosync is probably not an alternative.

I think we can use Roman's approach when the alloca is passed as a 'nocapture' argument at least, which will give us some benefit even if it doesn't solve all of my initial problem. Do you agree?

I'm not sure about the best way of marking the store of the first alloca pointer into the second alloca's memory as nocapture, though. If we have a way of doing it then I can extend the work in a later patch to cover that case, or if not maybe we can change the way clang and libomp handle openmp reductions to make it easier to optimize outlined functions.

In D109749#3077297, @huntergr wrote:

Although the address is loop invariant, the data isn't.

LICM does scalar promotion (controlled by -disable-licm-promotion), as in "promote memory location to register". It doesn't matter whether the value at the location is invariant. Whether this belongs into a pass called "Loop Invariant Code Motion" is a different question.

For this loop in particular, the store is conditional so might never happen. We *could* add a second boolean reduction to determine whether or not to actually perform a store after the loop, but that's a bit more complicated than just letting mem2reg do what it should.

This patch adds another pass, not make mem2reg do it. LICM currently does not handle conditional control flow for scalar promotion, but it should require much less code to change that. See the use of isGuaranteedToExecute in llvm::promoteLoopAccessesToScalars.

PartialMemToReg uses isAllocaPromotable to ensure that the target is write-accessible and no bit is needed, why not do the same for LICM?

llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
77–78	This is not a sufficient condition for captures. I doubt that we can detect that something has been generated from a CapturedStmt just be looking at the IR.
llvm/test/Transforms/Mem2Reg/partial-mem2reg.ll
3	This tests too many passes at once

jdoerfert added inline comments.Oct 21 2021, 8:32 AM

llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
634	I doubt this logic works in loops. H: I = use(alloca); C: store alloca into mem if (...) goto H; Capture (C) post dominates the user (I) but it is executed after and before the use, just not in the same iteration of the loop defined by H. Once the alloca is captured you cannot judge anymore without a lot more analysis (incl. nosync). To salvage this, reachability, not post-dominaince, is what you are looking for. All that said, I still believe the problem at hand should be solved by marking the reduction thing as not capturing.

Does this work for you:

diff --git a/llvm/lib/Analysis/CaptureTracking.cpp b/llvm/lib/Analysis/CaptureTracking.cpp
index 8955658cb9e7..41251d2676e6 100644
--- a/llvm/lib/Analysis/CaptureTracking.cpp
+++ b/llvm/lib/Analysis/CaptureTracking.cpp
@@ -373,9 +373,13 @@ void llvm::PointerMayBeCaptured(const Value *V, CaptureTracker *Tracker,
     case Instruction::Store:
       // Stored the pointer - conservatively assume it may be captured.
       // Volatile stores make the address observable.
-      if (U->getOperandNo() == 0 || cast<StoreInst>(I)->isVolatile())
+      if (U->getOperandNo() == 0 || cast<StoreInst>(I)->isVolatile()) {
+        if (auto *AI = dyn_cast<AllocaInst>(I->getOperand(1)->stripInBoundsOffsets()))
+          if (AI->hasMetadata("nocapture_storage"))
+            break;
         if (Tracker->captured(U))
           return;
+      }
       break;
     case Instruction::AtomicRMW: {
       // atomicrmw conceptually includes both a load and store from

And then add !nocapture_storage !0 after the alloca in your example as well as !0 = !{!0} in the end of that file

In D109749#3078046, @Meinersbur wrote:

In D109749#3077297, @huntergr wrote:

Although the address is loop invariant, the data isn't.

LICM does scalar promotion (controlled by -disable-licm-promotion), as in "promote memory location to register". It doesn't matter whether the value at the location is invariant. Whether this belongs into a pass called "Loop Invariant Code Motion" is a different question.

For this loop in particular, the store is conditional so might never happen. We *could* add a second boolean reduction to determine whether or not to actually perform a store after the loop, but that's a bit more complicated than just letting mem2reg do what it should.

This patch adds another pass, not make mem2reg do it. LICM currently does not handle conditional control flow for scalar promotion, but it should require much less code to change that. See the use of isGuaranteedToExecute in llvm::promoteLoopAccessesToScalars.

Sorry, I should have made it more clear -- I'm dropping the new pass and using Roman's suggestion of improving SROA. I have implemented that but found the code a bit messy due to the store -> call separation.

In D109749#3078176, @jdoerfert wrote:

Does this work for you:

diff --git a/llvm/lib/Analysis/CaptureTracking.cpp b/llvm/lib/Analysis/CaptureTracking.cpp
index 8955658cb9e7..41251d2676e6 100644
--- a/llvm/lib/Analysis/CaptureTracking.cpp
+++ b/llvm/lib/Analysis/CaptureTracking.cpp
@@ -373,9 +373,13 @@ void llvm::PointerMayBeCaptured(const Value *V, CaptureTracker *Tracker,
     case Instruction::Store:
       // Stored the pointer - conservatively assume it may be captured.
       // Volatile stores make the address observable.
-      if (U->getOperandNo() == 0 || cast<StoreInst>(I)->isVolatile())
+      if (U->getOperandNo() == 0 || cast<StoreInst>(I)->isVolatile()) {
+        if (auto *AI = dyn_cast<AllocaInst>(I->getOperand(1)->stripInBoundsOffsets()))
+          if (AI->hasMetadata("nocapture_storage"))
+            break;
         if (Tracker->captured(U))
           return;
+      }
       break;
     case Instruction::AtomicRMW: {
       // atomicrmw conceptually includes both a load and store from

And then add !nocapture_storage !0 after the alloca in your example as well as !0 = !{!0} in the end of that file

Ah, the 'nocapture_storage' metadata is what I've been missing, thanks. I'll update the diff once I've added that and adjusted the tests.

In D109749#3078247, @huntergr wrote:

In D109749#3078176, @jdoerfert wrote:

Does this work for you:

diff --git a/llvm/lib/Analysis/CaptureTracking.cpp b/llvm/lib/Analysis/CaptureTracking.cpp
index 8955658cb9e7..41251d2676e6 100644
--- a/llvm/lib/Analysis/CaptureTracking.cpp
+++ b/llvm/lib/Analysis/CaptureTracking.cpp
@@ -373,9 +373,13 @@ void llvm::PointerMayBeCaptured(const Value *V, CaptureTracker *Tracker,
     case Instruction::Store:
       // Stored the pointer - conservatively assume it may be captured.
       // Volatile stores make the address observable.
-      if (U->getOperandNo() == 0 || cast<StoreInst>(I)->isVolatile())
+      if (U->getOperandNo() == 0 || cast<StoreInst>(I)->isVolatile()) {
+        if (auto *AI = dyn_cast<AllocaInst>(I->getOperand(1)->stripInBoundsOffsets()))
+          if (AI->hasMetadata("nocapture_storage"))
+            break;
         if (Tracker->captured(U))
           return;
+      }
       break;
     case Instruction::AtomicRMW: {
       // atomicrmw conceptually includes both a load and store from

And then add !nocapture_storage !0 after the alloca in your example as well as !0 = !{!0} in the end of that file

Ah, the 'nocapture_storage' metadata is what I've been missing, thanks. I'll update the diff once I've added that and adjusted the tests.

Technically, this is not yet something we have in the IR. We can reply to the old thread in which different solutions were discussed and
propose this one again. Then modify Clang to emit the metadata for the reduction case and land the diff I posted. All that said, it works
for your case, right?

Updated the diff based on the suggestion from @lebedev.ri

This patch now only deals with the case of an alloca being passed directly to a call which doesn't capture it.

In D109749#3085994, @jdoerfert wrote:
In D109749#3078247, @huntergr wrote:
In D109749#3078176, @jdoerfert wrote:
Does this work for you:
diff --git a/llvm/lib/Analysis/CaptureTracking.cpp b/llvm/lib/Analysis/CaptureTracking.cpp
index 8955658cb9e7..41251d2676e6 100644
--- a/llvm/lib/Analysis/CaptureTracking.cpp
+++ b/llvm/lib/Analysis/CaptureTracking.cpp
@@ -373,9 +373,13 @@ void llvm::PointerMayBeCaptured(const Value *V, CaptureTracker *Tracker,
     case Instruction::Store:
       // Stored the pointer - conservatively assume it may be captured.
       // Volatile stores make the address observable.
-      if (U->getOperandNo() == 0 || cast<StoreInst>(I)->isVolatile())
+      if (U->getOperandNo() == 0 || cast<StoreInst>(I)->isVolatile()) {
+        if (auto *AI = dyn_cast<AllocaInst>(I->getOperand(1)->stripInBoundsOffsets()))
+          if (AI->hasMetadata("nocapture_storage"))
+            break;
         if (Tracker->captured(U))
           return;
+      }
       break;
     case Instruction::AtomicRMW: {
       // atomicrmw conceptually includes both a load and store from
And then add !nocapture_storage !0 after the alloca in your example as well as !0 = !{!0} in the end of that file
Ah, the 'nocapture_storage' metadata is what I've been missing, thanks. I'll update the diff once I've added that and adjusted the tests.
Technically, this is not yet something we have in the IR. We can reply to the old thread in which different solutions were discussed and
propose this one again. Then modify Clang to emit the metadata for the reduction case and land the diff I posted. All that said, it works
for your case, right?

It does, yes. I'll have a look for the mailing list thread.

Nice! I like this, but now i have a fundamental concern.

llvm/lib/Transforms/Scalar/SROA.cpp
4709–4710 ↗	(On Diff #384025)
4715–4720 ↗	(On Diff #384025)	This isn't right. What if not the `alloca`, but `gep(alloca)`, is passed into the function?
4736–4739 ↗	(On Diff #384025)	Same, this should recurse down the uses of an alloca.
llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
109–110

This revision now requires changes to proceed.Nov 2 2021, 3:41 AM

llvm/lib/Transforms/Scalar/SROA.cpp
4715–4720 ↗	(On Diff #384025)	Ah, good catch. I'll fix that and add some more test cases.

Harbormaster completed remote builds in B131924: Diff 384025.Nov 2 2021, 3:52 AM

Added the ability to look through GEPs for a call. I've limited this to using indexes of all 0, and only for single value types, so I'm not sure how often we'll encounter that. The limitation on the indexes does match the existing checks for whether promotion is allowed.

It should be possible to perform this optimization for a single field in a struct and allow sroa to replace the rest of the struct, but I think there's at least some C code which relies on being able to cast back to the struct (not sure if that's fully legal in C, but such code does exist). Maybe going further will need analysis of the callee function to see whether it just treats the pointer as a single value or assumes it can access more than that.

Harbormaster completed remote builds in B132643: Diff 385009.Nov 5 2021, 4:35 AM

FWIW, I restarted the thread [0] in order to get that solution in for APIs. Doesn't mean we cannot teach SROA new tricks though.

[0] https://lists.llvm.org/pipermail/llvm-dev/2021-November/153622.html

lebedev.ri mentioned this in rG1000245e3a4f: [NFC][SROA] Precommit tests for promotion-with-spilling.Nov 9 2021, 2:54 PM

lebedev.ri mentioned this in D113520: [SROA] Maintain shadow/backing alloca when some slices are noncapturnig read-only calls to allow alloca partitioning/promotion.Nov 9 2021, 3:01 PM

Superseded by @lebedev.ri 's patches. I'll continue looking at the metadata angle separately.

a.elovikov added a subscriber: a.elovikov.Nov 30 2021, 11:27 AM

lebedev.ri mentioned this in rG703240c71fd6: [SROA] Maintain shadow/backing alloca when some slices are noncapturnig read….Mar 4 2022, 10:09 AM

lebedev.ri mentioned this in rGadc0984d81f5: Reland [SROA] Maintain shadow/backing alloca when some slices are noncapturnig….Mar 4 2022, 1:14 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

InitializePasses.h

1 line

Transforms/

Scalar.h

7 lines

Scalar/

PartialMemToReg.h

53 lines

Utils/

PromoteMemToReg.h

7 lines

lib/

Passes/

PassBuilder.cpp

9 lines

PassRegistry.def

1 line

Transforms/

Scalar/

CMakeLists.txt

1 line

PartialMemToReg.cpp

130 lines

Scalar.cpp

1 line

Utils/

PromoteMemoryToRegister.cpp

398 lines

test/

Transforms/

Mem2Reg/

partial-mem2reg.ll

348 lines

Diff 372440

llvm/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 328 Lines • ▼ Show 20 Lines
	void initializeObjCARCAAWrapperPassPass(PassRegistry&);			void initializeObjCARCAAWrapperPassPass(PassRegistry&);
	void initializeObjCARCAPElimPass(PassRegistry&);			void initializeObjCARCAPElimPass(PassRegistry&);
	void initializeObjCARCContractLegacyPassPass(PassRegistry &);			void initializeObjCARCContractLegacyPassPass(PassRegistry &);
	void initializeObjCARCExpandPass(PassRegistry&);			void initializeObjCARCExpandPass(PassRegistry&);
	void initializeObjCARCOptLegacyPassPass(PassRegistry &);			void initializeObjCARCOptLegacyPassPass(PassRegistry &);
	void initializeOptimizationRemarkEmitterWrapperPassPass(PassRegistry&);			void initializeOptimizationRemarkEmitterWrapperPassPass(PassRegistry&);
	void initializeOptimizePHIsPass(PassRegistry&);			void initializeOptimizePHIsPass(PassRegistry&);
	void initializePAEvalPass(PassRegistry&);			void initializePAEvalPass(PassRegistry&);
				void initializePartialMemToRegLegacyPassPass(PassRegistry &);
	void initializePEIPass(PassRegistry&);			void initializePEIPass(PassRegistry&);
	void initializePGOIndirectCallPromotionLegacyPassPass(PassRegistry&);			void initializePGOIndirectCallPromotionLegacyPassPass(PassRegistry&);
	void initializePGOInstrumentationGenLegacyPassPass(PassRegistry&);			void initializePGOInstrumentationGenLegacyPassPass(PassRegistry&);
	void initializePGOInstrumentationUseLegacyPassPass(PassRegistry&);			void initializePGOInstrumentationUseLegacyPassPass(PassRegistry&);
	void initializePGOInstrumentationGenCreateVarLegacyPassPass(PassRegistry&);			void initializePGOInstrumentationGenCreateVarLegacyPassPass(PassRegistry&);
	void initializePGOMemOPSizeOptLegacyPassPass(PassRegistry&);			void initializePGOMemOPSizeOptLegacyPassPass(PassRegistry&);
	void initializePHIEliminationPass(PassRegistry&);			void initializePHIEliminationPass(PassRegistry&);
	void initializePartialInlinerLegacyPassPass(PassRegistry&);			void initializePartialInlinerLegacyPassPass(PassRegistry&);
	▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// SROA - Replace aggregates or pieces of aggregates with scalar SSA values.			// SROA - Replace aggregates or pieces of aggregates with scalar SSA values.
	//			//
	FunctionPass *createSROAPass();			FunctionPass *createSROAPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
				// PartialMemToReg - Converts alloca uses into phi nodes until the address
				// is (potentially) captured.
				//
				FunctionPass *createPartialMemToRegPass();

				//===----------------------------------------------------------------------===//
				//
	// InductiveRangeCheckElimination - Transform loops to elide range checks on			// InductiveRangeCheckElimination - Transform loops to elide range checks on
	// linear functions of the induction variable.			// linear functions of the induction variable.
	//			//
	Pass *createInductiveRangeCheckEliminationPass();			Pass *createInductiveRangeCheckEliminationPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// InductionVariableSimplify - Transform induction variables in a program to all			// InductionVariableSimplify - Transform induction variables in a program to all
	▲ Show 20 Lines • Show All 442 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar/PartialMemToReg.h

This file was added.

				//===- PartialMemToReg.h ----------------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				/// Provides a pass which runs a partial mem2reg operation on allocas which
				/// are deemed to be captured at some point but are used extensively
				/// beforehand.
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_SCALAR_PARTIALMEMTOREG_H
				#define LLVM_TRANSFORMS_SCALAR_PARTIALMEMTOREG_H

				#include "llvm/ADT/SetVector.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/IR/PassManager.h"
				#include "llvm/IR/ValueHandle.h"
				#include <vector>

				namespace llvm {

				class AssumptionCache;
				class DominatorTree;
				class Function;
				class LoopInfo;
				class PostDominatorTree;

				class PartialMemToRegLegacyPass;

				class PartialMemToReg : public PassInfoMixin<PartialMemToReg> {

				public:
				PartialMemToReg() = default;

				/// Run the pass over the function.
				PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);

				private:
				friend class PartialMemToRegLegacyPass;

				/// Helper used by both the public run method and by the legacy pass.
				PreservedAnalyses runImpl(Function &F, DominatorTree &DT,
				PostDominatorTree &PDT, AssumptionCache &AC,
				LoopInfo &LI);
				};

				} // end namespace llvm

				#endif // LLVM_TRANSFORMS_SCALAR_PARTIALMEMTOREG_H

llvm/include/llvm/Transforms/Utils/PromoteMemToReg.h

	Show All 12 Lines

	#ifndef LLVM_TRANSFORMS_UTILS_PROMOTEMEMTOREG_H			#ifndef LLVM_TRANSFORMS_UTILS_PROMOTEMEMTOREG_H
	#define LLVM_TRANSFORMS_UTILS_PROMOTEMEMTOREG_H			#define LLVM_TRANSFORMS_UTILS_PROMOTEMEMTOREG_H

	namespace llvm {			namespace llvm {

	template <typename T> class ArrayRef;			template <typename T> class ArrayRef;
	class AllocaInst;			class AllocaInst;
				class LoopInfo;
	class DominatorTree;			class DominatorTree;
				class PostDominatorTree;
	class AssumptionCache;			class AssumptionCache;

	/// Return true if this alloca is legal for promotion.			/// Return true if this alloca is legal for promotion.
	///			///
	/// This is true if there are only loads, stores, and lifetime markers			/// This is true if there are only loads, stores, and lifetime markers
	/// (transitively) using this alloca. This also enforces that there is only			/// (transitively) using this alloca. This also enforces that there is only
	/// ever one layer of bitcasts or GEPs between the alloca and the lifetime			/// ever one layer of bitcasts or GEPs between the alloca and the lifetime
	/// markers.			/// markers.
	bool isAllocaPromotable(const AllocaInst *AI);			bool isAllocaPromotable(const AllocaInst *AI, bool AllowCaptures = false);

	/// Promote the specified list of alloca instructions into scalar			/// Promote the specified list of alloca instructions into scalar
	/// registers, inserting PHI nodes as appropriate.			/// registers, inserting PHI nodes as appropriate.
	///			///
	/// This function makes use of DominanceFrontier information. This function			/// This function makes use of DominanceFrontier information. This function
	/// does not modify the CFG of the function at all. All allocas must be from			/// does not modify the CFG of the function at all. All allocas must be from
	/// the same function.			/// the same function.
	///			///
	void PromoteMemToReg(ArrayRef<AllocaInst *> Allocas, DominatorTree &DT,			void PromoteMemToReg(ArrayRef<AllocaInst *> Allocas, DominatorTree &DT,
	AssumptionCache *AC = nullptr);			AssumptionCache *AC = nullptr);

				bool partialPromoteMemToReg(ArrayRef<AllocaInst *> Allocas, LoopInfo &LI,
				DominatorTree &DT, PostDominatorTree &PDT,
				AssumptionCache *AC = nullptr);
	} // End llvm namespace			} // End llvm namespace

	#endif			#endif

llvm/lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 187 Lines • ▼ Show 20 Lines
#include "llvm/Transforms/Scalar/LowerMatrixIntrinsics.h"		#include "llvm/Transforms/Scalar/LowerMatrixIntrinsics.h"
#include "llvm/Transforms/Scalar/LowerWidenableCondition.h"		#include "llvm/Transforms/Scalar/LowerWidenableCondition.h"
#include "llvm/Transforms/Scalar/MakeGuardsExplicit.h"		#include "llvm/Transforms/Scalar/MakeGuardsExplicit.h"
#include "llvm/Transforms/Scalar/MemCpyOptimizer.h"		#include "llvm/Transforms/Scalar/MemCpyOptimizer.h"
#include "llvm/Transforms/Scalar/MergeICmps.h"		#include "llvm/Transforms/Scalar/MergeICmps.h"
#include "llvm/Transforms/Scalar/MergedLoadStoreMotion.h"		#include "llvm/Transforms/Scalar/MergedLoadStoreMotion.h"
#include "llvm/Transforms/Scalar/NaryReassociate.h"		#include "llvm/Transforms/Scalar/NaryReassociate.h"
#include "llvm/Transforms/Scalar/NewGVN.h"		#include "llvm/Transforms/Scalar/NewGVN.h"
		#include "llvm/Transforms/Scalar/PartialMemToReg.h"
#include "llvm/Transforms/Scalar/PartiallyInlineLibCalls.h"		#include "llvm/Transforms/Scalar/PartiallyInlineLibCalls.h"
#include "llvm/Transforms/Scalar/Reassociate.h"		#include "llvm/Transforms/Scalar/Reassociate.h"
#include "llvm/Transforms/Scalar/Reg2Mem.h"		#include "llvm/Transforms/Scalar/Reg2Mem.h"
#include "llvm/Transforms/Scalar/RewriteStatepointsForGC.h"		#include "llvm/Transforms/Scalar/RewriteStatepointsForGC.h"
#include "llvm/Transforms/Scalar/SCCP.h"		#include "llvm/Transforms/Scalar/SCCP.h"
#include "llvm/Transforms/Scalar/SROA.h"		#include "llvm/Transforms/Scalar/SROA.h"
#include "llvm/Transforms/Scalar/ScalarizeMaskedMemIntrin.h"		#include "llvm/Transforms/Scalar/ScalarizeMaskedMemIntrin.h"
#include "llvm/Transforms/Scalar/Scalarizer.h"		#include "llvm/Transforms/Scalar/Scalarizer.h"
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	static cl::opt<bool> PerformMandatoryInliningsFirst(
"mandatory-inlining-first", cl::init(true), cl::Hidden, cl::ZeroOrMore,		"mandatory-inlining-first", cl::init(true), cl::Hidden, cl::ZeroOrMore,
cl::desc("Perform mandatory inlinings module-wide, before performing "		cl::desc("Perform mandatory inlinings module-wide, before performing "
"inlining."));		"inlining."));

static cl::opt<bool> EnableO3NonTrivialUnswitching(		static cl::opt<bool> EnableO3NonTrivialUnswitching(
"enable-npm-O3-nontrivial-unswitch", cl::init(true), cl::Hidden,		"enable-npm-O3-nontrivial-unswitch", cl::init(true), cl::Hidden,
cl::ZeroOrMore, cl::desc("Enable non-trivial loop unswitching for -O3"));		cl::ZeroOrMore, cl::desc("Enable non-trivial loop unswitching for -O3"));

		static cl::opt<bool> EnablePartialMemToReg(
		"enable-partial-mem2reg", cl::init(false), cl::Hidden, cl::ZeroOrMore,
		cl::desc("Enable partial mem2reg SSA transformation before captures."));

PipelineTuningOptions::PipelineTuningOptions() {		PipelineTuningOptions::PipelineTuningOptions() {
LoopInterleaving = true;		LoopInterleaving = true;
LoopVectorization = true;		LoopVectorization = true;
SLPVectorization = false;		SLPVectorization = false;
LoopUnrolling = true;		LoopUnrolling = true;
ForgetAllSCEVInLoopUnroll = ForgetSCEVInLoopUnroll;		ForgetAllSCEVInLoopUnroll = ForgetSCEVInLoopUnroll;
LicmMssaOptCap = SetLicmMssaOptCap;		LicmMssaOptCap = SetLicmMssaOptCap;
LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap;		LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap;
▲ Show 20 Lines • Show All 521 Lines • ▼ Show 20 Lines	PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
// All loop passes must preserve it, in order to be able to use it.		// All loop passes must preserve it, in order to be able to use it.
FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),		FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),
/UseMemorySSA=/false,		/UseMemorySSA=/false,
/UseBlockFrequencyInfo=/false));		/UseBlockFrequencyInfo=/false));

// Delete small array after loop unroll.		// Delete small array after loop unroll.
FPM.addPass(SROA());		FPM.addPass(SROA());

		// Partially promote some captured allocas to SSA form.
		if (EnablePartialMemToReg)
		FPM.addPass(PartialMemToReg());

// Eliminate redundancies.		// Eliminate redundancies.
FPM.addPass(MergedLoadStoreMotionPass());		FPM.addPass(MergedLoadStoreMotionPass());
if (RunNewGVN)		if (RunNewGVN)
FPM.addPass(NewGVNPass());		FPM.addPass(NewGVNPass());
else		else
FPM.addPass(GVN());		FPM.addPass(GVN());

// Sparse conditional constant propagation.		// Sparse conditional constant propagation.
▲ Show 20 Lines • Show All 2,505 Lines • Show Last 20 Lines

llvm/lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 273 Lines • ▼ Show 20 Lines
	FUNCTION_PASS("loop-data-prefetch", LoopDataPrefetchPass())			FUNCTION_PASS("loop-data-prefetch", LoopDataPrefetchPass())
	FUNCTION_PASS("loop-load-elim", LoopLoadEliminationPass())			FUNCTION_PASS("loop-load-elim", LoopLoadEliminationPass())
	FUNCTION_PASS("loop-fusion", LoopFusePass())			FUNCTION_PASS("loop-fusion", LoopFusePass())
	FUNCTION_PASS("loop-distribute", LoopDistributePass())			FUNCTION_PASS("loop-distribute", LoopDistributePass())
	FUNCTION_PASS("loop-versioning", LoopVersioningPass())			FUNCTION_PASS("loop-versioning", LoopVersioningPass())
	FUNCTION_PASS("objc-arc", ObjCARCOptPass())			FUNCTION_PASS("objc-arc", ObjCARCOptPass())
	FUNCTION_PASS("objc-arc-contract", ObjCARCContractPass())			FUNCTION_PASS("objc-arc-contract", ObjCARCContractPass())
	FUNCTION_PASS("objc-arc-expand", ObjCARCExpandPass())			FUNCTION_PASS("objc-arc-expand", ObjCARCExpandPass())
				FUNCTION_PASS("partial-mem2reg", PartialMemToReg())
	FUNCTION_PASS("pgo-memop-opt", PGOMemOPSizeOpt())			FUNCTION_PASS("pgo-memop-opt", PGOMemOPSizeOpt())
	FUNCTION_PASS("print", PrintFunctionPass(dbgs()))			FUNCTION_PASS("print", PrintFunctionPass(dbgs()))
	FUNCTION_PASS("print<assumptions>", AssumptionPrinterPass(dbgs()))			FUNCTION_PASS("print<assumptions>", AssumptionPrinterPass(dbgs()))
	FUNCTION_PASS("print<block-freq>", BlockFrequencyPrinterPass(dbgs()))			FUNCTION_PASS("print<block-freq>", BlockFrequencyPrinterPass(dbgs()))
	FUNCTION_PASS("print<branch-prob>", BranchProbabilityPrinterPass(dbgs()))			FUNCTION_PASS("print<branch-prob>", BranchProbabilityPrinterPass(dbgs()))
	FUNCTION_PASS("print<cost-model>", CostModelPrinterPass(dbgs()))			FUNCTION_PASS("print<cost-model>", CostModelPrinterPass(dbgs()))
	FUNCTION_PASS("print<da>", DependenceAnalysisPrinterPass(dbgs()))			FUNCTION_PASS("print<da>", DependenceAnalysisPrinterPass(dbgs()))
	FUNCTION_PASS("print<divergence>", DivergenceAnalysisPrinterPass(dbgs()))			FUNCTION_PASS("print<divergence>", DivergenceAnalysisPrinterPass(dbgs()))
	▲ Show 20 Lines • Show All 204 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/CMakeLists.txt

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	add_llvm_component_library(LLVMScalarOpts
LowerWidenableCondition.cpp		LowerWidenableCondition.cpp
MakeGuardsExplicit.cpp		MakeGuardsExplicit.cpp
MemCpyOptimizer.cpp		MemCpyOptimizer.cpp
MergeICmps.cpp		MergeICmps.cpp
MergedLoadStoreMotion.cpp		MergedLoadStoreMotion.cpp
NaryReassociate.cpp		NaryReassociate.cpp
NewGVN.cpp		NewGVN.cpp
PartiallyInlineLibCalls.cpp		PartiallyInlineLibCalls.cpp
		PartialMemToReg.cpp
PlaceSafepoints.cpp		PlaceSafepoints.cpp
Reassociate.cpp		Reassociate.cpp
Reg2Mem.cpp		Reg2Mem.cpp
RewriteStatepointsForGC.cpp		RewriteStatepointsForGC.cpp
SCCP.cpp		SCCP.cpp
SROA.cpp		SROA.cpp
Scalar.cpp		Scalar.cpp
Scalarizer.cpp		Scalarizer.cpp
Show All 29 Lines

llvm/lib/Transforms/Scalar/PartialMemToReg.cpp

This file was added.

				//===- PartialMemToReg.cpp --------------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				/// Provides a pass which runs a partial mem2reg operation on allocas which
				/// are deemed to be captured at some point but are used extensively
				/// beforehand.
				///
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/Scalar/PartialMemToReg.h"
				#include "llvm/ADT/SetVector.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/Analysis/AssumptionCache.h"
				#include "llvm/Analysis/GlobalsModRef.h"
				#include "llvm/Analysis/LoopInfo.h"
				#include "llvm/Analysis/PostDominators.h"
				#include "llvm/Analysis/ValueTracking.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/IR/InstrTypes.h"
				#include "llvm/IR/Instruction.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/IR/Intrinsics.h"
				#include "llvm/IR/PassManager.h"
				#include "llvm/IR/Type.h"
				#include "llvm/IR/Use.h"
				#include "llvm/IR/User.h"
				#include "llvm/IR/Value.h"
				#include "llvm/InitializePasses.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/Casting.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/Compiler.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/ErrorHandling.h"
				#include "llvm/Support/MathExtras.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Transforms/Scalar.h"
				#include "llvm/Transforms/Utils/PromoteMemToReg.h"

				using namespace llvm;

				#define DEBUG_TYPE "partial-mem2reg"

				namespace llvm {

				PreservedAnalyses PartialMemToReg::run(Function &F,
				FunctionAnalysisManager &AM) {
				auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
				auto &PDT = AM.getResult<PostDominatorTreeAnalysis>(F);
				auto &AC = AM.getResult<AssumptionAnalysis>(F);
				auto &LI = AM.getResult<LoopAnalysis>(F);

				return runImpl(F, DT, PDT, AC, LI);
				}

				PreservedAnalyses PartialMemToReg::runImpl(Function &F, DominatorTree &DT,
				PostDominatorTree &PDT,
				AssumptionCache &AC, LoopInfo &LI) {
				LLVM_DEBUG(dbgs() << "PartialMem2Reg on: " << F.getName() << "\n");
				SmallVector<AllocaInst *, 16> Worklist;

				BasicBlock &EntryBB = F.getEntryBlock();
				for (Instruction &I : EntryBB)
				if (AllocaInst *AI = dyn_cast<AllocaInst>(&I))
				if (!isa<ScalableVectorType>(AI->getAllocatedType()))
				Worklist.push_back(AI);

				if (!partialPromoteMemToReg(Worklist, LI, DT, PDT))
				return PreservedAnalyses::all();

				PreservedAnalyses PA;
				PA.preserveSet<CFGAnalyses>();
				PA.preserve<GlobalsAA>();
				return PA;
				}

				class PartialMemToRegLegacyPass : public FunctionPass {
				PartialMemToReg Impl;

				public:
				static char ID;

				PartialMemToRegLegacyPass() : FunctionPass(ID) {
				initializePartialMemToRegLegacyPassPass(*PassRegistry::getPassRegistry());
				}

				bool runOnFunction(Function &F) override {
				if (skipFunction(F))
				return false;

				auto PA = Impl.runImpl(
				F, getAnalysis<DominatorTreeWrapperPass>().getDomTree(),
				getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree(),
				getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F),
				getAnalysis<LoopInfoWrapperPass>().getLoopInfo());

				return !PA.areAllPreserved();
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<AssumptionCacheTracker>();
				AU.addRequired<DominatorTreeWrapperPass>();
				AU.addRequired<PostDominatorTreeWrapperPass>();
				AU.addRequired<LoopInfoWrapperPass>();
				AU.addPreserved<GlobalsAAWrapperPass>();
				AU.setPreservesCFG();
				}

				StringRef getPassName() const override { return "PartialMemToReg"; }
				};
				} // end namespace llvm

				char llvm::PartialMemToRegLegacyPass::ID = 0;

				FunctionPass *llvm::createPartialMemToRegPass() {
				return new PartialMemToRegLegacyPass();
				}

				INITIALIZE_PASS_BEGIN(PartialMemToRegLegacyPass, "partial-mem2reg",
				"PartialMemToReg", false, false)
				INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
				INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
				INITIALIZE_PASS_END(PartialMemToRegLegacyPass, "partial-mem2reg",
				"PartialMemToReg", false, false)
				No newline at end of file

llvm/lib/Transforms/Scalar/Scalar.cpp

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeLowerMatrixIntrinsicsLegacyPassPass(Registry);		initializeLowerMatrixIntrinsicsLegacyPassPass(Registry);
initializeLowerMatrixIntrinsicsMinimalLegacyPassPass(Registry);		initializeLowerMatrixIntrinsicsMinimalLegacyPassPass(Registry);
initializeLowerWidenableConditionLegacyPassPass(Registry);		initializeLowerWidenableConditionLegacyPassPass(Registry);
initializeMemCpyOptLegacyPassPass(Registry);		initializeMemCpyOptLegacyPassPass(Registry);
initializeMergeICmpsLegacyPassPass(Registry);		initializeMergeICmpsLegacyPassPass(Registry);
initializeMergedLoadStoreMotionLegacyPassPass(Registry);		initializeMergedLoadStoreMotionLegacyPassPass(Registry);
initializeNaryReassociateLegacyPassPass(Registry);		initializeNaryReassociateLegacyPassPass(Registry);
initializePartiallyInlineLibCallsLegacyPassPass(Registry);		initializePartiallyInlineLibCallsLegacyPassPass(Registry);
		initializePartialMemToRegLegacyPassPass(Registry);
initializeReassociateLegacyPassPass(Registry);		initializeReassociateLegacyPassPass(Registry);
initializeRedundantDbgInstEliminationPass(Registry);		initializeRedundantDbgInstEliminationPass(Registry);
initializeRegToMemLegacyPass(Registry);		initializeRegToMemLegacyPass(Registry);
initializeRewriteStatepointsForGCLegacyPassPass(Registry);		initializeRewriteStatepointsForGCLegacyPassPass(Registry);
initializeScalarizeMaskedMemIntrinLegacyPassPass(Registry);		initializeScalarizeMaskedMemIntrinLegacyPassPass(Registry);
initializeSCCPLegacyPassPass(Registry);		initializeSCCPLegacyPassPass(Registry);
initializeSROALegacyPassPass(Registry);		initializeSROALegacyPassPass(Registry);
initializeCFGSimplifyPassPass(Registry);		initializeCFGSimplifyPassPass(Registry);
▲ Show 20 Lines • Show All 206 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp

Show All 19 Lines

#include "llvm/ADT/SmallPtrSet.h"

#include "llvm/ADT/SmallVector.h"

#include "llvm/ADT/Statistic.h"

#include "llvm/ADT/TinyPtrVector.h"

#include "llvm/ADT/Twine.h"

#include "llvm/Analysis/AssumptionCache.h"

#include "llvm/Analysis/InstructionSimplify.h"

#include "llvm/Analysis/IteratedDominanceFrontier.h"

#include "llvm/Transforms/Utils/Local.h"

#include "llvm/Analysis/LoopInfo.h"

#include "llvm/Analysis/PostDominators.h"

#include "llvm/Analysis/ValueTracking.h"

#include "llvm/IR/BasicBlock.h"

#include "llvm/IR/CFG.h"

#include "llvm/IR/Constant.h"

#include "llvm/IR/Constants.h"

#include "llvm/IR/DIBuilder.h"

#include "llvm/IR/DerivedTypes.h"

#include "llvm/IR/Dominators.h"

#include "llvm/IR/Function.h"

#include "llvm/IR/InstrTypes.h"

#include "llvm/IR/Instruction.h"

#include "llvm/IR/Instructions.h"

#include "llvm/IR/IntrinsicInst.h"

#include "llvm/IR/Intrinsics.h"

#include "llvm/IR/LLVMContext.h"

#include "llvm/IR/Module.h"

#include "llvm/IR/Type.h"

#include "llvm/IR/User.h"

#include "llvm/Support/Casting.h"

#include "llvm/Transforms/Utils/Local.h"

#include "llvm/Transforms/Utils/PromoteMemToReg.h"

#include <algorithm>

#include <cassert>

#include <iterator>

#include <utility>

#include <vector>

using namespace llvm;

#define DEBUG_TYPE "mem2reg"

STATISTIC(NumLocalPromoted, "Number of alloca's promoted within one block");

STATISTIC(NumSingleStore, "Number of alloca's promoted with a single store");

STATISTIC(NumDeadAlloca, "Number of dead alloca's removed");

STATISTIC(NumPHIInsert, "Number of PHI nodes inserted");

bool llvm::isAllocaPromotable(const AllocaInst *AI) {

bool llvm::isAllocaPromotable(const AllocaInst *AI, bool AllowCaptures) {

// Only allow direct and non-volatile loads and stores...

for (const User *U : AI->users()) {

if (const LoadInst *LI = dyn_cast<LoadInst>(U)) {

// Note that atomic loads can be transformed; atomic semantics do

// not have any meaning for a local alloca.

if (LI->isVolatile())

return false;

} else if (const StoreInst *SI = dyn_cast<StoreInst>(U)) {

if (SI->getValueOperand() == AI ||

// Don't allow a store OF the AI, only INTO the AI, unless we're

SI->getValueOperand()->getType() != AI->getAllocatedType())

// looking for captures during partialmem2reg.

return false; // Don't allow a store OF the AI, only INTO the AI.

if ((SI->getValueOperand() == AI ||

SI->getValueOperand()->getType() != AI->getAllocatedType()) &&

MeinersburUnsubmitted

Not Done

This is not a sufficient condition for captures. I doubt that we can detect that something has been generated from a CapturedStmt just be looking at the IR.

Meinersbur: This is not a sufficient condition for captures. I doubt that we can detect that something has…

!AllowCaptures)

return false;

// Note that atomic stores can be transformed; atomic semantics do

// not have any meaning for a local alloca.

if (SI->isVolatile())

return false;

} else if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(U)) {

if (!II->isLifetimeStartOrEnd() && !II->isDroppable())

return false;

} else if (const BitCastInst *BCI = dyn_cast<BitCastInst>(U)) {

Show All 12 Lines

for (const User *U : AI->users()) {

}

return true;

}

namespace {

struct AllocaInfo {

using DbgUserVec = SmallVector<DbgVariableIntrinsic *, 1>;

lebedev.riUnsubmitted

Not Done

bool NoCaptures = true;

- for (unsigned i = 0; i < CI->arg_size(); ++i)

- if (CI->getArgOperand(i) == AI)

+ for (Value*Arg : CI->args())

+ if (Arg == AI)

NoCaptures &= CI->paramHasAttr(i, Attribute::NoCapture);

lebedev.ri:

SmallVector<BasicBlock *, 32> DefiningBlocks;

SmallVector<BasicBlock *, 32> UsingBlocks;

StoreInst *OnlyStore;

BasicBlock *OnlyBlock;

bool OnlyUsedInOneBlock;

DbgUserVec DbgUsers;

void clear() {

DefiningBlocks.clear();

UsingBlocks.clear();

OnlyStore = nullptr;

OnlyBlock = nullptr;

OnlyUsedInOneBlock = true;

DbgUsers.clear();

}

/// Scan the uses of the specified alloca, filling in the AllocaInfo used

/// by the rest of the pass to reason about the uses of this alloca.

void AnalyzeAlloca(AllocaInst *AI) {

void AnalyzeAlloca(AllocaInst *AI, bool Capturing = false,

Lint: Pre-merge checks

clang-tidy: warning: invalid case style for function 'AnalyzeAlloca' [readability-identifier-naming]
not useful

Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'AnalyzeAlloca' [readability-identifier…

PostDominatorTree *PDT = nullptr,

StoreInst *Capture = nullptr) {

clear();

// As we scan the uses of the alloca instruction, keep track of stores,

// and decide whether all of the loads and stores to the alloca are within

// the same basic block.

for (User *U : AI->users()) {

Instruction *User = cast<Instruction>(U);

// If we're analyzing a captured alloca, only consider the users that

// are postdominated by the capture. Anything after the capture or

// in a block which may bypass the capture should not be converted

// to SSA form.

if (Capturing && !PDT->dominates(Capture, User))

continue;

if (StoreInst *SI = dyn_cast<StoreInst>(User)) {

// Remember the basic blocks which define new values for the alloca

if (SI->getOperand(0) == AI) {

assert(Capturing && "Unexpected capture for non-captured alloca.");

UsingBlocks.push_back(SI->getParent());

} else {

DefiningBlocks.push_back(SI->getParent());

OnlyStore = SI;

} else {

}

LoadInst *LI = cast<LoadInst>(User);

} else if (LoadInst *LI = dyn_cast<LoadInst>(User)) {

// Otherwise it must be a load instruction, keep track of variable

// reads.

UsingBlocks.push_back(LI->getParent());

}

} else

assert(Capturing && "Unexpected user for non-captured alloca.");

if (OnlyUsedInOneBlock) {

if (!OnlyBlock)

OnlyBlock = User->getParent();

else if (OnlyBlock != User->getParent())

OnlyUsedInOneBlock = false;

}

▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines

struct PromoteMem2Reg {

/// The PhiNodes we're adding.

///

/// That map is used to simplify some Phi nodes as we iterate over it, so

/// it should have deterministic iterators. We could use a MapVector, but

/// since we already maintain a map from BasicBlock* to a stable numbering

/// (BBNumbers), the DenseMap is more efficient (also supports removal).

DenseMap<std::pair<unsigned, unsigned>, PHINode *> NewPhiNodes;

DenseMap<unsigned, StoreInst *> NewPreCaptureStores;

DenseMap<unsigned, Instruction *> Captures;

/// For each PHI node, keep track of which entry in Allocas it corresponds

/// to.

DenseMap<PHINode *, unsigned> PhiToAllocaMap;

/// For each alloca, we keep track of the dbg.declare intrinsic that

/// describes it, if any, so that we can convert it to a dbg.value

/// intrinsic if the alloca gets promoted.

SmallVector<AllocaInfo::DbgUserVec, 8> AllocaDbgUsers;

Show All 12 Lines

public:

PromoteMem2Reg(ArrayRef<AllocaInst *> Allocas, DominatorTree &DT,

AssumptionCache *AC)

: Allocas(Allocas.begin(), Allocas.end()), DT(DT),

DIB(*DT.getRoot()->getParent()->getParent(), /*AllowUnresolved*/ false),

AC(AC), SQ(DT.getRoot()->getParent()->getParent()->getDataLayout(),

nullptr, &DT, AC) {}

void run();

bool runPartial(PostDominatorTree &PDT, LoopInfo &LI);

private:

void RemoveFromAllocasList(unsigned &AllocaIdx) {

Allocas[AllocaIdx] = Allocas.back();

Allocas.pop_back();

--AllocaIdx;

}

unsigned getNumPreds(const BasicBlock *BB) {

unsigned &NP = BBNumPreds[BB];

if (NP == 0)

NP = pred_size(BB) + 1;

return NP - 1;

}

void ComputeLiveInBlocks(AllocaInst *AI, AllocaInfo &Info,

const SmallPtrSetImpl<BasicBlock *> &DefBlocks,

SmallPtrSetImpl<BasicBlock *> &LiveInBlocks);

void RenamePass(BasicBlock *BB, BasicBlock *Pred,

RenamePassData::ValVector &IncVals,

RenamePassData::LocationVector &IncLocs,

std::vector<RenamePassData> &Worklist);

bool QueuePhiNode(BasicBlock *BB, unsigned AllocaIdx, unsigned &Version);

bool QueuePreCaptureStore(BasicBlock *BB, unsigned AllocaIdx);

Lint: Pre-merge checks

clang-tidy: warning: invalid case style for function 'QueuePreCaptureStore' [readability-identifier-naming]
not useful

Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'QueuePreCaptureStore' [readability…

};

} // end anonymous namespace

/// Given a LoadInst LI this adds assume(LI != null) after it.

static void addAssumeNonNull(AssumptionCache *AC, LoadInst *LI) {

Function *AssumeIntrinsic =

Intrinsic::getDeclaration(LI->getModule(), Intrinsic::assume);

▲ Show 20 Lines • Show All 233 Lines • ▼ Show 20 Lines

static bool promoteSingleBlockAlloca(AllocaInst *AI, const AllocaInfo &Info,

for (DbgVariableIntrinsic *DII : Info.DbgUsers)

if (DII->isAddressOfVariable() || DII->getExpression()->startsWithDeref())

DII->eraseFromParent();

++NumLocalPromoted;

return true;

}

bool PromoteMem2Reg::runPartial(PostDominatorTree &PDT, LoopInfo &LI) {

Function &F = *DT.getRoot()->getParent();

AllocaDbgUsers.resize(Allocas.size());

AllocaInfo Info;

LargeBlockInfo LBI;

ForwardIDFCalculator IDF(DT);

// Create a stable numbering for basic blocks to avoid any non-deterministic

// behaviour with ordering.

if (BBNumbers.empty()) {

unsigned ID = 0;

for (auto &BB : F)

BBNumbers[&BB] = ID++;

}

for (unsigned AllocaNum = 0; AllocaNum != Allocas.size(); ++AllocaNum) {

AllocaInst *AI = Allocas[AllocaNum];

StoreInst *Capture = nullptr;

LLVM_DEBUG(dbgs() << "PM2R: Analyzing alloca: " << *AI << "\n");

if (!isAllocaPromotable(AI, /*AllowCaptures=*/true)) {

LLVM_DEBUG(dbgs() << "PM2R: Unhandled uses.\n");

RemoveFromAllocasList(AllocaNum);

continue;

}

// For now, be a bit paranoid and only consider allocas with a single

// capture user, and only stores as a capture.

// TODO: Allow for multiple captures and captures in call instructions.

bool MultiCapture = false;

for (User *U : AI->users()) {

if (StoreInst *SI = dyn_cast<StoreInst>(U))

if (SI->getValueOperand() == AI) {

if (!Capture)

Capture = SI;

else

MultiCapture = true;

}

if (!Capture || MultiCapture) {

LLVM_DEBUG(dbgs() << "PM2R: No capture or multiple captures.\n");

RemoveFromAllocasList(AllocaNum);

continue;

}

Captures[AllocaNum] = Capture;

// Find the set of users that are postdominated by the capture.

SmallVector<Instruction *, 8> PreCaptureUsers;

for (User *U : AI->users()) {

Instruction *I = cast<Instruction>(U);

// Ignore the capture itself.

if (Capture == I)

continue;

// We only consider users that are postdominated by the capture -- that

// is, we _know_ that the capture will definitely be executed after the

// user. For users that are executed after the capture, or users where

// the subsequent execution path might not go through the block containing

// the capture, we don't want to convert right now.

if (PDT.dominates(Capture, I))

PreCaptureUsers.push_back(I);

}

jdoerfertUnsubmitted

Not Done

I doubt this logic works in loops.

H: 
   I = use(alloca);
C: store alloca into mem
if (...) goto H;

Capture (C) post dominates the user (I) but it is executed *after* and *before* the use, just not in the same iteration of the loop defined by H.

Once the alloca is captured you cannot judge anymore without a lot more analysis (incl. nosync). To salvage this, reachability, not post-dominaince, is what you are looking for.

All that said, I still believe the problem at hand should be solved by marking the reduction thing as not capturing.

jdoerfert: I doubt this logic works in loops. ``` H: I = use(alloca); C: store alloca into mem if (...

// If there are no users postdominated by the capture, we won't try this

// since anything after the capture could be reached without storing

// the value into the alloca location.

// TODO: We should be able to identify all blocks which need to store

// the value into memory before entering code that may follow a capture.

if (PreCaptureUsers.empty()) {

LLVM_DEBUG(dbgs() << "PM2R: No users postdominated by capture.\n");

RemoveFromAllocasList(AllocaNum);

continue;

}

// For now, only perform the partial conversion if some of the uses are

// present in a loop -- while it may be worthwhile to do this anyway,

// we're currently interested in enabling loop transformations that would

// otherwise be prevented by the presence of loads/stores to the alloca

// within the loop.

bool UsedInLoop = false;

for (Loop *L : LI)

UsedInLoop |= any_of(PreCaptureUsers,

[&L](Instruction *I) { return L->contains(I); });

if (!UsedInLoop) {

LLVM_DEBUG(dbgs() << "PM2R: No users in loops.\n");

RemoveFromAllocasList(AllocaNum);

continue;

}

// Determine which blocks define, use, and/or capture the alloca.

Info.AnalyzeAlloca(AI, /* Capturing == */ true, &PDT, Capture);

// Unique the set of defining blocks for efficient lookup.

SmallPtrSet<BasicBlock *, 32> DefBlocks(Info.DefiningBlocks.begin(),

Info.DefiningBlocks.end());

// Determine which blocks the value is live in. These are blocks which lead

// to uses.

SmallPtrSet<BasicBlock *, 32> LiveInBlocks;

ComputeLiveInBlocks(AI, Info, DefBlocks, LiveInBlocks);

if (!all_of(LiveInBlocks, [&PDT, &Capture](BasicBlock *BB) {

return PDT.dominates(Capture->getParent(), BB);

})) {

LLVM_DEBUG(

dbgs()

<< "PM2R: not all live blocks are postdominated by capture.\n");

RemoveFromAllocasList(AllocaNum);

continue;

}

// Ok, we've passed all our criteria for partially promoting an alloca.

// Proceed with figuring out what to do with it but throw in a few extra

// checks out of an abundance of caution.

// Calculate dominance frontiers so we know where to plant phi nodes for

// SSA conversion.

IDF.setLiveInBlocks(LiveInBlocks);

IDF.setDefiningBlocks(DefBlocks);

SmallVector<BasicBlock *, 32> PHIBlocks;

IDF.calculate(PHIBlocks);

llvm::sort(PHIBlocks, [this](BasicBlock *A, BasicBlock *B) {

return BBNumbers.find(A)->second < BBNumbers.find(B)->second;

});

if (PHIBlocks.empty()) {

LLVM_DEBUG(dbgs() << "PM2R: could not identify a usable phi block\n");

continue;

}

// Only proceed if all phi blocks are postdominated by the capture.

// Maybe this should be an assert?

if (!all_of(PHIBlocks, [&PDT, &Capture](BasicBlock *BB) {

return PDT.dominates(Capture->getParent(), BB);

})) {

LLVM_DEBUG(

dbgs() << "PM2R: not all phi blocks are postdominated by capture.\n");

RemoveFromAllocasList(AllocaNum);

continue;

}

// We may want to find a better way of doing this in future, but for

// now just be paranoid and sort the blocks by postdomination order

// before adding the store to ensure the correct value is in place

// for the capture.

// TODO: Add support for phis on diverging paths that are still

// postdominated by the capture.

SmallVector<BasicBlock *, 32> PDOrderBlocks(PHIBlocks.begin(),

PHIBlocks.end());

llvm::sort(PDOrderBlocks, [&PDT](BasicBlock *A, BasicBlock *B) {

return PDT.dominates(B, A);

});

BasicBlock *DefBlock = PDOrderBlocks.back();

// Maybe this should be an assert?

if (!all_of(PDOrderBlocks, [&PDT, &DefBlock](BasicBlock *BB) {

return PDT.dominates(DefBlock, BB);

})) {

LLVM_DEBUG(dbgs() << "PM2R: definition block for capture doesn't "

"postdominate all other phi blocks.");

RemoveFromAllocasList(AllocaNum);

continue;

}

// Point of no return; we're making changes to the IR now.

// Remember the dbg.declare intrinsic describing this alloca, if any.

if (!Info.DbgUsers.empty())

AllocaDbgUsers[AllocaNum] = Info.DbgUsers;

LLVM_DEBUG(dbgs() << "PM2R: Partially promoting alloca: " << *AI << "\n");

// Keep the reverse mapping of the 'Allocas' array for the rename pass.

AllocaLookup[Allocas[AllocaNum]] = AllocaNum;

unsigned CurrentVersion = 0;

for (BasicBlock *BB : PHIBlocks)

QueuePhiNode(BB, AllocaNum, CurrentVersion);

// Create the store that will set the allocated memory to the right value

// before the capture occurs.

QueuePreCaptureStore(DefBlock, AllocaNum);

}

if (Allocas.empty())

return false;

RenamePassData::ValVector Values(Allocas.size());

for (unsigned i = 0, e = Allocas.size(); i != e; ++i)

Lint: Pre-merge checks

clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming]
not useful
clang-tidy: warning: invalid case style for variable 'e' [readability-identifier-naming]
not useful

Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming]…

Values[i] = UndefValue::get(Allocas[i]->getAllocatedType());

// When handling debug info, treat all incoming values as if they have unknown

// locations until proven otherwise.

RenamePassData::LocationVector Locations(Allocas.size());

// Walks all basic blocks in the function performing the SSA rename algorithm

// and inserting the phi nodes we marked as necessary

std::vector<RenamePassData> RenamePassWorkList;

RenamePassWorkList.emplace_back(&F.front(), nullptr, std::move(Values),

std::move(Locations));

do {

RenamePassData RPD = std::move(RenamePassWorkList.back());

RenamePassWorkList.pop_back();

// RenamePass may add new worklist entries.

RenamePass(RPD.BB, RPD.Pred, RPD.Values, RPD.Locations, RenamePassWorkList);

} while (!RenamePassWorkList.empty());

// The renamer uses the Visited set to avoid infinite loops. Clear it now.

Visited.clear();

// Loop over all of the PHI nodes and see if there are any that we can get

// rid of because they merge all of the same incoming values. This can

// happen due to undef values coming into the PHI nodes. This process is

// iterative, because eliminating one PHI node can cause others to be removed.

bool EliminatedAPHI = true;

while (EliminatedAPHI) {

EliminatedAPHI = false;

// Iterating over NewPhiNodes is deterministic, so it is safe to try to

// simplify and RAUW them as we go. If it was not, we could add uses to

// the values we replace with in a non-deterministic order, thus creating

// non-deterministic def->use chains.

for (DenseMap<std::pair<unsigned, unsigned>, PHINode *>::iterator

I = NewPhiNodes.begin(),

E = NewPhiNodes.end();

I != E;) {

PHINode *PN = I->second;

// If this PHI node merges one value and/or undefs, get the value.

if (Value *V = SimplifyInstruction(PN, SQ)) {

PN->replaceAllUsesWith(V);

PN->eraseFromParent();

NewPhiNodes.erase(I++);

EliminatedAPHI = true;

continue;

}

++I;

}

// At this point, the renamer has added entries to PHI nodes for all reachable

// code. Unfortunately, there may be unreachable blocks which the renamer

// hasn't traversed. If this is the case, the PHI nodes may not

// have incoming values for all predecessors. Loop over all PHI nodes we have

// created, inserting undef values if they are missing any incoming values.

for (DenseMap<std::pair<unsigned, unsigned>, PHINode *>::iterator

I = NewPhiNodes.begin(),

E = NewPhiNodes.end();

I != E; ++I) {

// We want to do this once per basic block. As such, only process a block

// when we find the PHI that is the first entry in the block.

PHINode *SomePHI = I->second;

BasicBlock *BB = SomePHI->getParent();

if (&BB->front() != SomePHI)

continue;

// Only do work here if there the PHI nodes are missing incoming values. We

// know that all PHI nodes that were inserted in a block will have the same

// number of incoming values, so we can just check any of them.

if (SomePHI->getNumIncomingValues() == getNumPreds(BB))

continue;

// Get the preds for BB.

SmallVector<BasicBlock *, 16> Preds(predecessors(BB));

// Ok, now we know that all of the PHI nodes are missing entries for some

// basic blocks. Start by sorting the incoming predecessors for efficient

// access.

auto CompareBBNumbers = [this](BasicBlock *A, BasicBlock *B) {

return BBNumbers.find(A)->second < BBNumbers.find(B)->second;

};

llvm::sort(Preds, CompareBBNumbers);

// Now we loop through all BB's which have entries in SomePHI and remove

// them from the Preds list.

for (unsigned i = 0, e = SomePHI->getNumIncomingValues(); i != e; ++i) {

Lint: Pre-merge checks

clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming]
not useful
clang-tidy: warning: invalid case style for variable 'e' [readability-identifier-naming]
not useful

Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming]…

// Do a log(n) search of the Preds list for the entry we want.

SmallVectorImpl<BasicBlock *>::iterator EntIt = llvm::lower_bound(

Preds, SomePHI->getIncomingBlock(i), CompareBBNumbers);

assert(EntIt != Preds.end() && *EntIt == SomePHI->getIncomingBlock(i) &&

"PHI node has entry for a block which is not a predecessor!");

// Remove the entry

Preds.erase(EntIt);

}

// At this point, the blocks left in the preds list must have dummy

// entries inserted into every PHI nodes for the block. Update all the phi

// nodes in this block that we are inserting (there could be phis before

// mem2reg runs).

unsigned NumBadPreds = SomePHI->getNumIncomingValues();

BasicBlock::iterator BBI = BB->begin();

while ((SomePHI = dyn_cast<PHINode>(BBI++)) &&

SomePHI->getNumIncomingValues() == NumBadPreds) {

Value *UndefVal = UndefValue::get(SomePHI->getType());

for (BasicBlock *Pred : Preds)

SomePHI->addIncoming(UndefVal, Pred);

}

NewPhiNodes.clear();

return true;

}

void PromoteMem2Reg::run() {

Function &F = *DT.getRoot()->getParent();

AllocaDbgUsers.resize(Allocas.size());

AllocaInfo Info;

LargeBlockInfo LBI;

ForwardIDFCalculator IDF(DT);

▲ Show 20 Lines • Show All 311 Lines • ▼ Show 20 Lines

bool PromoteMem2Reg::QueuePhiNode(BasicBlock *BB, unsigned AllocaNo,

PN = PHINode::Create(Allocas[AllocaNo]->getAllocatedType(), getNumPreds(BB),

Allocas[AllocaNo]->getName() + "." + Twine(Version++),

&BB->front());

++NumPHIInsert;

PhiToAllocaMap[PN] = AllocaNo;

return true;

}

/// Adds a store after the last phi node before a capturing store, so that

/// the value is up-to-date before the capture of the alloca.

bool PromoteMem2Reg::QueuePreCaptureStore(BasicBlock *BB, unsigned AllocaIdx) {

StoreInst *&SI = NewPreCaptureStores[AllocaIdx];

PHINode *&PN = NewPhiNodes[std::make_pair(BBNumbers[BB], AllocaIdx)];

AllocaInst *AI = Allocas[AllocaIdx];

if (SI)

return false;

assert(PN && "No Phi node available for capture!\n");

Instruction *InsertBefore = BB->getFirstNonPHI();

SI = new StoreInst(PN, AI, InsertBefore);

return true;

}

/// Update the debug location of a phi. \p ApplyMergedLoc indicates whether to

/// create a merged location incorporating \p DL, or to set \p DL directly.

static void updateForIncomingValueLocation(PHINode *PN, DebugLoc DL,

bool ApplyMergedLoc) {

if (ApplyMergedLoc)

PN->applyMergedLocation(PN->getDebugLoc(), DL);

else

PN->setDebugLoc(DL);

▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines

if (LoadInst *LI = dyn_cast<LoadInst>(I)) {

AllocaInst *Src = dyn_cast<AllocaInst>(LI->getPointerOperand());

if (!Src)

continue;

DenseMap<AllocaInst *, unsigned>::iterator AI = AllocaLookup.find(Src);

if (AI == AllocaLookup.end())

continue;

Value *V = IncomingVals[AI->second];

unsigned AllocaNo = AI->second;

// If this is a load after the capture (but potentially in the same block)

// then we must not convert it.

if (Captures[AllocaNo] && DT.dominates(Captures[AllocaNo], LI))

continue;

Value *V = IncomingVals[AllocaNo];

// If the load was marked as nonnull we don't want to lose

// that information when we erase this Load. So we preserve

// it with an assume.

if (AC && LI->getMetadata(LLVMContext::MD_nonnull) &&

!isKnownNonZero(V, SQ.DL, 0, AC, LI, &DT))

addAssumeNonNull(AC, LI);

// Anything using the load now uses the current value.

LI->replaceAllUsesWith(V);

BB->getInstList().erase(LI);

} else if (StoreInst *SI = dyn_cast<StoreInst>(I)) {

// Delete this instruction and mark the name as the current holder of the

// value

AllocaInst *Dest = dyn_cast<AllocaInst>(SI->getPointerOperand());

if (!Dest)

continue;

DenseMap<AllocaInst *, unsigned>::iterator ai = AllocaLookup.find(Dest);

DenseMap<AllocaInst *, unsigned>::iterator AI = AllocaLookup.find(Dest);

if (ai == AllocaLookup.end())

if (AI == AllocaLookup.end())

continue;

unsigned AllocaNo = AI->second;

StoreInst *&PCS = NewPreCaptureStores[AllocaNo];

// If this is for a capture, then we don't want to remove it.

if (PCS && SI == PCS)

continue;

// what value were we writing?

unsigned AllocaNo = ai->second;

IncomingVals[AllocaNo] = SI->getOperand(0);

// Record debuginfo for the store before removing it.

IncomingLocs[AllocaNo] = SI->getDebugLoc();

for (DbgVariableIntrinsic *DII : AllocaDbgUsers[ai->second])

for (DbgVariableIntrinsic *DII : AllocaDbgUsers[AllocaNo])

if (DII->isAddressOfVariable())

ConvertDebugDeclareToDebugValue(DII, SI, DIB);

BB->getInstList().erase(SI);

}

// 'Recurse' to our successors.

succ_iterator I = succ_begin(BB), E = succ_end(BB);

Show All 19 Lines

void llvm::PromoteMemToReg(ArrayRef<AllocaInst *> Allocas, DominatorTree &DT,

AssumptionCache *AC) {

// If there is nothing to do, bail out...

if (Allocas.empty())

return;

PromoteMem2Reg(Allocas, DT, AC).run();

}

bool llvm::partialPromoteMemToReg(ArrayRef<AllocaInst *> Allocas, LoopInfo &LI,

DominatorTree &DT, PostDominatorTree &PDT,

AssumptionCache *AC) {

return PromoteMem2Reg(Allocas, DT, AC).runPartial(PDT, LI);

}

No newline at end of file

llvm/test/Transforms/Mem2Reg/partial-mem2reg.ll

This file was added.

				; RUN: opt -partial-mem2reg -debug-only=partial-mem2reg,mem2reg -S < %s 2>&1 \| FileCheck %s --check-prefix=DEBUG
				; RUN: opt -partial-mem2reg -S < %s 2>&1 \| FileCheck %s --check-prefix=XFORM
				; RUN: opt -partial-mem2reg -gvn -loop-vectorize -debug-only=loop-vectorize -S < %s 2>&1 \| FileCheck %s --check-prefix=VDEBUG
				MeinersburUnsubmitted Not Done Reply Inline Actions This tests too many passes at once Meinersbur: This tests too many passes at once
				; REQUIRES: asserts

				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-unknown-linux-gnu"

				;; Tests based on the following C code, simplified. The OpenMP codegen from
				;; clang created an outlined function where the alloca for the reduction
				;; was captured by a call to the runtime and couldn't be promoted to register
				;; SSA form. This prevented vectorization.
				;;
				;; We can now vectorize this by partially promoting the alloca, just converting
				;; definitions and uses that are postdominated by the capture. This is a
				;; conservative first attempt at this optimization.
				;;
				;; int loop(int data[restrict 128U])
				;; {
				;; int retval = 0;
				;;
				;; #pragma omp parallel for simd schedule(simd:static) default(none) shared(data) reduction(+:retval)
				;; for (int i = 0; i < 128; i++) {
				;; int n = 0;
				;;
				;; if (data[i]) {
				;; n = 1;
				;; retval += n;
				;; }
				;; }
				;; return retval;
				;; }

				; DEBUG-LABEL: PartialMem2Reg on: captured_reduction
				; DEBUG-NEXT: PM2R: Analyzing alloca: %retval = alloca i32, align 4
				; DEBUG-NEXT: PM2R: Partially promoting alloca: %retval = alloca i32, align 4
				; DEBUG-NEXT: PM2R: Analyzing alloca: %red_list = alloca i32*, align 8
				; DEBUG-NEXT: PM2R: Unhandled uses.

				; VDEBUG-LABEL: LV: Checking a loop in "captured_reduction" from <stdin>
				; VDEBUG: LV: We can vectorize this loop!

				; XFORM-LABEL: @captured_reduction
				define i32 @captured_reduction(i32* nocapture nonnull readonly %data, i32 %n) {
				entry:
				%retval = alloca i32, align 4
				%red_list = alloca i32*, align 8
				store i32 0, i32* %retval, align 4
				%limit = zext i32 %n to i64
				br label %loop.ph

				loop.ph:
				%iter.check = icmp ugt i64 %limit, 0
				br i1 %iter.check, label %loop.body, label %loop.exit

				; XFORM: loop.body:
				; XFORM-NEXT: %retval.0 = phi i32 [ 0, %loop.ph ], [ %retval.1, %if.end ]
				loop.body:
				%indvars.iv = phi i64 [ 0, %loop.ph ], [ %indvars.iv.next, %if.end ]
				%arrayidx = getelementptr inbounds i32, i32* %data, i64 %indvars.iv
				%pred = load i32, i32* %arrayidx, align 4
				%tobool.not = icmp eq i32 %pred, 0
				br i1 %tobool.not, label %if.end, label %if.then

				if.then:
				%rdx = load i32, i32* %retval, align 4
				%rdx.inc = add nsw i32 %rdx, 1
				store i32 %rdx.inc, i32* %retval, align 4
				br label %if.end

				; XFORM: if.end:
				; XFORM-NEXT: %retval.1 = phi i32 [ %retval.0, %loop.body ], [ %rdx.inc, %if.then ]
				if.end:
				%indvars.iv.next = add nsw i64 %indvars.iv, 1
				%exitcond = icmp ne i64 %indvars.iv.next, %limit
				br i1 %exitcond, label %loop.body, label %loop.exit

				; XFORM: loop.exit:
				; XFORM-NEXT: %retval.2 = phi i32 [ %retval.1, %if.end ], [ 0, %loop.ph ]
				; XFORM-NEXT: store i32 %retval.2, i32* %retval, align 4
				loop.exit:
				br label %capture

				capture:
				store i32* %retval, i32** %red_list, align 8
				%0 = call i32 @capturin_ur_allocas(i32** nonnull %red_list)
				%1 = load i32, i32* %retval, align 4
				ret i32 %1
				}

				; DEBUG-LABEL: PartialMem2Reg on: too_many_captures
				; DEBUG-NEXT: PM2R: Analyzing alloca: %retval = alloca i32, align 4
				; DEBUG-NEXT: PM2R: No capture or multiple captures.
				; DEBUG-NEXT: PM2R: Analyzing alloca: %red_list_2 = alloca i32*, align 8
				; DEBUG-NEXT: PM2R: Unhandled uses.
				; DEBUG-NEXT: PM2R: Analyzing alloca: %red_list = alloca i32*, align 8
				; DEBUG-NEXT: PM2R: Unhandled uses.

				; VDEBUG-LABEL: LV: Checking a loop in "too_many_captures" from <stdin>
				; VDEBUG: LV: Can't vectorize the instructions or CFG
				; VDEBUG: LV: Not vectorizing: Cannot prove legality.

				define i32 @too_many_captures(i32* nocapture nonnull readonly %data, i32 %n) {
				entry:
				%retval = alloca i32, align 4
				%red_list = alloca i32*, align 8
				%red_list_2 = alloca i32*, align 8
				store i32 0, i32* %retval, align 4
				%limit = zext i32 %n to i64
				br label %loop.ph

				loop.ph:
				%iter.check = icmp ugt i64 %limit, 0
				br i1 %iter.check, label %loop.body, label %loop.exit

				loop.body:
				%indvars.iv = phi i64 [ 0, %loop.ph ], [ %indvars.iv.next, %if.end ]
				%arrayidx = getelementptr inbounds i32, i32* %data, i64 %indvars.iv
				%pred = load i32, i32* %arrayidx, align 4
				%tobool.not = icmp eq i32 %pred, 0
				br i1 %tobool.not, label %if.end, label %if.then

				if.then:
				%rdx = load i32, i32* %retval, align 4
				%rdx.inc = add nsw i32 %rdx, 1
				store i32 %rdx.inc, i32* %retval, align 4
				br label %if.end

				if.end:
				%indvars.iv.next = add nsw i64 %indvars.iv, 1
				%exitcond = icmp ne i64 %indvars.iv.next, %limit
				br i1 %exitcond, label %loop.body, label %loop.exit

				loop.exit:
				br label %capture

				capture:
				store i32* %retval, i32** %red_list, align 8
				%0 = call i32 @capturin_ur_allocas(i32** nonnull %red_list)
				store i32* %retval, i32** %red_list_2, align 8
				%1 = call i32 @capturin_ur_allocas(i32** nonnull %red_list_2)
				%2 = load i32, i32* %retval, align 4
				ret i32 %2
				}

				; DEBUG-LABEL: PartialMem2Reg on: no_captures
				; DEBUG-NEXT: PM2R: Analyzing alloca: %retval = alloca i32, align 4
				; DEBUG-NEXT: PM2R: No capture or multiple captures.

				; VDEBUG-LABEL: LV: Checking a loop in "no_captures" from <stdin>
				; VDEBUG: LV: Can't vectorize the instructions or CFG
				; VDEBUG: LV: Not vectorizing: Cannot prove legality.

				define i32 @no_captures(i32* nocapture nonnull readonly %data, i32 %n) {
				entry:
				%retval = alloca i32, align 4
				store i32 0, i32* %retval, align 4
				%limit = zext i32 %n to i64
				br label %loop.ph

				loop.ph:
				%iter.check = icmp ugt i64 %limit, 0
				br i1 %iter.check, label %loop.body, label %loop.exit

				loop.body:
				%indvars.iv = phi i64 [ 0, %loop.ph ], [ %indvars.iv.next, %if.end ]
				%arrayidx = getelementptr inbounds i32, i32* %data, i64 %indvars.iv
				%pred = load i32, i32* %arrayidx, align 4
				%tobool.not = icmp eq i32 %pred, 0
				br i1 %tobool.not, label %if.end, label %if.then

				if.then:
				%rdx = load i32, i32* %retval, align 4
				%rdx.inc = add nsw i32 %rdx, 1
				store i32 %rdx.inc, i32* %retval, align 4
				br label %if.end

				if.end:
				%indvars.iv.next = add nsw i64 %indvars.iv, 1
				%exitcond = icmp ne i64 %indvars.iv.next, %limit
				br i1 %exitcond, label %loop.body, label %loop.exit

				loop.exit:
				br label %nocapture

				nocapture:
				%0 = load i32, i32* %retval, align 4
				ret i32 %0
				}

				; DEBUG-LABEL: PartialMem2Reg on: no_postdominated_users
				; DEBUG-NEXT: PM2R: Analyzing alloca: %retval = alloca i32, align 4
				; DEBUG-NEXT: PM2R: No users postdominated by capture.
				; DEBUG-NEXT: PM2R: Analyzing alloca: %red_list = alloca i32*, align 8
				; DEBUG-NEXT: PM2R: Unhandled uses.

				; VDEBUG-LABEL: LV: Checking a loop in "no_postdominated_users" from <stdin>
				; VDEBUG: LV: We can vectorize this loop!

				define i32 @no_postdominated_users(i32* nocapture nonnull readonly %data, i32 %n) {
				entry:
				%retval = alloca i32, align 4
				%red_list = alloca i32*, align 8
				%limit = zext i32 %n to i64
				br label %loop.ph

				loop.ph:
				%iter.check = icmp ugt i64 %limit, 0
				br i1 %iter.check, label %loop.body, label %loop.exit

				loop.body:
				%indvars.iv = phi i64 [ 0, %loop.ph ], [ %indvars.iv.next, %if.end ]
				%arrayidx = getelementptr inbounds i32, i32* %data, i64 %indvars.iv
				%pred = load i32, i32* %arrayidx, align 4
				%tobool.not = icmp eq i32 %pred, 0
				br i1 %tobool.not, label %if.end, label %if.then

				if.then:
				br label %if.end

				if.end:
				%indvars.iv.next = add nsw i64 %indvars.iv, 1
				%exitcond = icmp ne i64 %indvars.iv.next, %limit
				br i1 %exitcond, label %loop.body, label %loop.exit

				loop.exit:
				br label %capture

				capture:
				store i32* %retval, i32** %red_list, align 8
				%0 = call i32 @capturin_ur_allocas(i32** nonnull %red_list)
				store i32 0, i32* %retval, align 4
				%1 = load i32, i32* %retval, align 4
				ret i32 %1
				}

				; DEBUG-LABEL: PartialMem2Reg on: no_loops
				; DEBUG-NEXT: PM2R: Analyzing alloca: %retval = alloca i32, align 4
				; DEBUG-NEXT: PM2R: No users in loops.
				; DEBUG-NEXT: PM2R: Analyzing alloca: %red_list = alloca i32*, align 8
				; DEBUG-NEXT: PM2R: Unhandled uses.

				define i32 @no_loops(i32* nocapture nonnull readonly %data, i32 %n) {
				entry:
				%retval = alloca i32, align 4
				%red_list = alloca i32*, align 8
				store i32 0, i32* %retval, align 4
				%limit = zext i32 %n to i64
				br label %capture

				capture:
				store i32* %retval, i32** %red_list, align 8
				%0 = call i32 @capturin_ur_allocas(i32** nonnull %red_list)
				store i32 0, i32* %retval, align 4
				%1 = load i32, i32* %retval, align 4
				ret i32 %1
				}

				; DEBUG-LABEL: PartialMem2Reg on: live_block_not_postdominated
				; DEBUG-NEXT: PM2R: Analyzing alloca: %retval = alloca i32, align 4
				; DEBUG-NEXT: PM2R: not all live blocks are postdominated by capture.
				; DEBUG-NEXT: PM2R: Analyzing alloca: %red_list = alloca i32*, align 8
				; DEBUG-NEXT: PM2R: Unhandled uses.

				; VDEBUG-LABEL: LV: Checking a loop in "live_block_not_postdominated" from <stdin>
				; VDEBUG: LV: Can't vectorize the instructions or CFG
				; VDEBUG: LV: Not vectorizing: Cannot prove legality.
				; VDEBUG: LV: Checking a loop in "live_block_not_postdominated" from <stdin>
				; VDEBUG: LV: Can't vectorize the instructions or CFG
				; VDEBUG: LV: Not vectorizing: Cannot prove legality.

				define i32 @live_block_not_postdominated(i32* nocapture nonnull readonly %data, i32 %n, i32 %cond1, i32 %cond2) {
				entry:
				%retval = alloca i32, align 4
				%red_list = alloca i32*, align 8
				store i32 0, i32* %retval, align 4
				%limit = zext i32 %n to i64
				br label %cond1.check

				cond1.check:
				%cmp1 = icmp ugt i32 %cond1, 77
				br i1 %cmp1, label %loop.ph, label %loop2.ph

				loop.ph:
				%iter.check = icmp ugt i64 %limit, 0
				br i1 %iter.check, label %loop.body, label %loop2.exit

				loop.body:
				%indvars.iv = phi i64 [ 0, %loop.ph ], [ %indvars.iv.next, %if.end ]
				%arrayidx = getelementptr inbounds i32, i32* %data, i64 %indvars.iv
				%pred = load i32, i32* %arrayidx, align 4
				%tobool.not = icmp eq i32 %pred, 0
				br i1 %tobool.not, label %if.end, label %if.then

				if.then:
				%rdx = load i32, i32* %retval, align 4
				%rdx.inc = add nsw i32 %rdx, 1
				store i32 %rdx.inc, i32* %retval, align 4
				br label %if.end

				if.end:
				%indvars.iv.next = add nsw i64 %indvars.iv, 1
				%exitcond = icmp ne i64 %indvars.iv.next, %limit
				br i1 %exitcond, label %loop.body, label %loop.exit

				loop.exit:
				br label %capture

				loop2.ph:
				%iter.check2 = icmp ugt i64 %limit, 0
				br i1 %iter.check2, label %loop2.body, label %loop2.exit

				loop2.body:
				%indvars.iv2 = phi i64 [ 0, %loop2.ph ], [ %indvars.iv.next2, %if.end2 ]
				%arrayidx2 = getelementptr inbounds i32, i32* %data, i64 %indvars.iv2
				%pred2 = load i32, i32* %arrayidx2, align 4
				%tobool.not2 = icmp eq i32 %pred2, 0
				br i1 %tobool.not2, label %if.end2, label %if.then2

				if.then2:
				%rdx2 = load i32, i32* %retval, align 4
				%rdx.inc2 = add nsw i32 %rdx2, 1
				store i32 %rdx.inc2, i32* %retval, align 4
				br label %if.end2

				if.end2:
				%indvars.iv.next2 = add nsw i64 %indvars.iv2, 1
				%exitcond2 = icmp ne i64 %indvars.iv.next2, %limit
				br i1 %exitcond2, label %loop2.body, label %loop2.exit

				loop2.exit:
				br label %cond2.check

				cond2.check:
				%cmp2 = icmp eq i32 %cond2, 403
				br i1 %cmp2, label %post.capture, label %capture

				capture:
				store i32* %retval, i32** %red_list, align 8
				%0 = call i32 @capturin_ur_allocas(i32** nonnull %red_list)
				br label %post.capture

				post.capture:
				%1 = load i32, i32* %retval, align 4
				ret i32 %1
				}

				declare dso_local i32 @capturin_ur_allocas(i32** nonnull)

This is an archive of the discontinued LLVM Phabricator instance.

Experimental Partial Mem2RegAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 372440

llvm/include/llvm/InitializePasses.h

llvm/include/llvm/Transforms/Scalar.h

llvm/include/llvm/Transforms/Scalar/PartialMemToReg.h

llvm/include/llvm/Transforms/Utils/PromoteMemToReg.h

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Passes/PassRegistry.def

llvm/lib/Transforms/Scalar/CMakeLists.txt

llvm/lib/Transforms/Scalar/PartialMemToReg.cpp

llvm/lib/Transforms/Scalar/Scalar.cpp

llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp

llvm/test/Transforms/Mem2Reg/partial-mem2reg.ll

Experimental Partial Mem2Reg
AbandonedPublic