This is an archive of the discontinued LLVM Phabricator instance.

Differential D119013

[ArgPromotion][AMDGPU] New MSSA-based function argument promotion pass with input/output argument support
Needs RevisionPublic

Authored by vpykhtin on Feb 4 2022, 9:34 AM.

Download Raw Diff

Details

Reviewers

rampitec
arsenm
foad
jdoerfert
asbirlea
aeubanks
nikic

Group Reviewers

Restricted Project

Summary

Targets like AMDGPU can have performance benefit when replacing an argument passed by
reference with an argument passed by value on input and/or return value on output.
This isn't only simplifies the function body but also allows SROA for allocas for such
pointer arguments and helps analysis passes which quit on escaping pointers.

Although submitted very late this can be viewed as an early-preview. This definitely
requires more testing but I would like to know if I'm missing something fundamental.
Despite that this code has already been tested on some applications and seem working
to some extent.

I decided not to modify existing ArgumentPromotion pass because the change is substantial
and the new pass would allow gradual involvement for the concerned targets. Biggest change
is the use of MSSA for clobber testing and support for input/output arguments. So far GEPs
aren't supported but supposed to.

ArgumentPromotion code has been used as the source of inspiration and fragments of code. Another
used source is 'promoteLoopAccessesToScalars' function from LICM which does very similar thing
for loops. The way how a pointer can be treated as a safe to promote for an output argument
has been borrowed from there.

The original ArgumentPromotion has been submitted by Chris Lattner in 2004 and has another 78
contributors since then so I'm not really sure who to add as a reviewer here, please advise or add.

Clobber testing using MSSA is the most critical part here not only for performance reasons but
also for correctness - I would like it to be thoroughly reviewed.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,100 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
	60,160 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vloxseg.c
	60,190 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vluxseg.c
	60,170 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-overloaded::vloxseg.c
	60,170 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-overloaded::vluxseg.c
		View Full Test Results (10 Failed)

Event Timeline

vpykhtin created this revision.Feb 4 2022, 9:34 AM

Herald added subscribers: ormris, kerbowa, asbirlea and 10 others. · View Herald TranscriptFeb 4 2022, 9:34 AM

vpykhtin requested review of this revision.Feb 4 2022, 9:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 4 2022, 9:34 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B147656: Diff 406002.Feb 4 2022, 11:03 AM

This looks like a better version of AMDGPURewriteOutArguments which was never enabled. I wanted to replace that to use MSSA

Can you check if the testcases in llvm/test/CodeGen/AMDGPU/rewrite-out-arguments.ll are redundant and/or covered by the ones you add here?

llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp
176	dyn_cast instead of isa and cast
179	dyn_cast instead of isa and cast
185	No else after return
325–327	Weird formatting
814	The terminator can't be null
825	I would assume this happens in the regular verifier with expensive checks?
857	Demorgan this
913	Capitalize
929	Braces
955	Braces
1386	It's weird to have raw deletes in llvm code, why do you need this here?
llvm/test/Transforms/ArgumentPromotion/inoutargs.ll
1104	I'd like to see some tests with more exotic memory operations (atomics, memcpy/memset, target memory intrinsics etc.). I also don't see negative tests for some of the skipped conditions (e.g. musttail). Can you also add a test for invoke, and captured function address

arsenm added inline comments.Feb 4 2022, 11:10 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
746	Why require all the way to -O3? -O2?

rampitec added inline comments.Feb 4 2022, 12:04 PM

llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp
308	else after continue is not needed.
780	A single function and BBEntryValue/BBExitValue as an argument?
1345	Don't you need to add at least MemorySSAWrapperPass and AAResultsWrapperPass?

This probably needs to be limited to only private and flat pointers in case of the AMDGPU. A target callback may be needed to check if a pointer argument is beneficial to promote.

In D119013#3297857, @rampitec wrote:

This probably needs to be limited to only private and flat pointers in case of the AMDGPU. A target callback may be needed to check if a pointer argument is beneficial to promote.

I don't see why the type matters at all

In D119013#3297939, @arsenm wrote:

In D119013#3297857, @rampitec wrote:

This probably needs to be limited to only private and flat pointers in case of the AMDGPU. A target callback may be needed to check if a pointer argument is beneficial to promote.

I don't see why the type matters at all

The idea is to allow SROA in a caller, what's the point of doing it on a non-alloca pointers?

Please ensure your implementation does not depend on calls to getPointerElementType() -- you can determine the type based on load/store instructions.

This revision now requires changes to proceed.Feb 7 2022, 12:30 AM

Some of the per review issues addressed.

In D119013#3297983, @rampitec wrote:

In D119013#3297939, @arsenm wrote:

In D119013#3297857, @rampitec wrote:

This probably needs to be limited to only private and flat pointers in case of the AMDGPU. A target callback may be needed to check if a pointer argument is beneficial to promote.

I don't see why the type matters at all

The idea is to allow SROA in a caller, what's the point of doing it on a non-alloca pointers?

For return values, we can support way more values returned in registers than in pointer passed parameters. The same applies to passed parameters, we could pull more arguments into registers

Can you check if the testcases in llvm/test/CodeGen/AMDGPU/rewrite-out-arguments.ll are redundant and/or covered by the ones you add here?

@arsenm I looked into the test and its not directly suitable for this pass as behaives differently, however the cases can be used for testing.

Please ensure your implementation does not depend on calls to getPointerElementType() -- you can determine the type based on load/store instructions.

@nikic Thanks, why is this better? (half done BTW)

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
746	I thought its quite "agressive" kind of opt, may be 02 should suite better.
llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp
325–327	lint however tries to convince me the previous formatting was better.
825	Those checks aren't enough here because I modify MSSA and then reuse it for the next promoted argument.
1345	Well I cannot use it because this is a CallGraph pass and those analyses are available per function. Instead I use FunctionAnalysisManager to get per function results, the drawback is that I have to invalidate it. However this is only for the legacy pass manager.
1386	It was taken from ArgumentPromotion code, will take a look if its really needed.
llvm/test/Transforms/ArgumentPromotion/inoutargs.ll
1104	Right, will add more tests. However current implementation is quite conservative as relies on MSSA to find clobbers, it requires additional false-clobber checks as was done by Stas in https://reviews.llvm.org/D118419.

Harbormaster completed remote builds in B147950: Diff 406419.Feb 7 2022, 6:46 AM

ormris removed a subscriber: ormris.Feb 7 2022, 10:29 AM

In D119013#3300842, @arsenm wrote:

In D119013#3297983, @rampitec wrote:

In D119013#3297939, @arsenm wrote:

In D119013#3297857, @rampitec wrote:

This probably needs to be limited to only private and flat pointers in case of the AMDGPU. A target callback may be needed to check if a pointer argument is beneficial to promote.

I don't see why the type matters at all

The idea is to allow SROA in a caller, what's the point of doing it on a non-alloca pointers?

For return values, we can support way more values returned in registers than in pointer passed parameters. The same applies to passed parameters, we could pull more arguments into registers

Hm... That makes sense. We have discussed it offline with Valery, likely we do not need a target check for profitability. We would however want to limit a number of such promotions so that we do not accidentally turn a global store into a private store if we ran out of output registers.

In D119013#3300906, @vpykhtin wrote:

Please ensure your implementation does not depend on calls to getPointerElementType() -- you can determine the type based on load/store instructions.

@nikic Thanks, why is this better? (half done BTW)

This method is deprecated and will be removed in the future, see https://llvm.org/docs/OpaquePointers.html for context.

In D119013#3303944, @nikic wrote:

In D119013#3300906, @vpykhtin wrote:

Please ensure your implementation does not depend on calls to getPointerElementType() -- you can determine the type based on load/store instructions.

@nikic Thanks, why is this better? (half done BTW)

This method is deprecated and will be removed in the future, see https://llvm.org/docs/OpaquePointers.html for context.

See https://reviews.llvm.org/D118685 changes to the original argpromo pass to make it work with opaque pointers (which makes it nicer in general).

@nikic please take a look now, I've addressed the issue with opaque pointers.

Also changed namings for arguments and loaded values so they match existing pass to simplify testing.

Harbormaster completed remote builds in B149359: Diff 408386.Feb 14 2022, 6:52 AM

Updated opaque-ptr.ll test to use ALL prefix to simplify checks

xbolva00 added a reviewer: lebedev.ri.Feb 14 2022, 9:46 AM

Harbormaster completed remote builds in B149415: Diff 408460.Feb 14 2022, 10:28 AM

@arsenm I've added inoutargs2.ll test which is modified rewrite_out_arguments.ll. Original checks are saved with REF prefix, differences marked with "; ***" prefix, please take a look. Most of the differences are related to bitcasts which my patch doesn't support yet, I'm not sure if it should be included in this commit

Harbormaster completed remote builds in B150209: Diff 409608.Feb 17 2022, 6:04 AM

rampitec added inline comments.Feb 17 2022, 10:54 AM

llvm/test/Transforms/ArgumentPromotion/inoutargs2.ll
1 ↗	(On Diff #409608)	Fix file mode.

lebedev.ri removed a reviewer: lebedev.ri.Feb 17 2022, 3:46 PM

nikic added inline comments.Feb 18 2022, 3:41 AM

llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp
712	For the actual promotion, have you considered making use of PromoteMemToReg? Basically, replace the old argument with an alloca that is stored on entry and read on exit, and then run mem2reg on that alloca? define i32 @test(i32 %arg) { %old.arg = alloca i32 store i32, i32* %old.arg // Code uses old.arg %ret = load i32, i32* %old.arg ret i32 %ret }
llvm/test/Transforms/ArgumentPromotion/inoutargs.ll
2	Please use update_test_checks.py.

vpykhtin added inline comments.Feb 18 2022, 11:25 AM

llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp
712	I think this is a good idea to reuse the working code, however I would rather make some sort of refactoring on it than trying to please it :) There is similar code in LICM promoteLoopAccessesToScalars, so it seems we would benefit of such mem2reg framework.

tschuett added a subscriber: tschuett.Feb 18 2022, 11:28 AM

xbolva00 added a subscriber: xbolva00.Feb 20 2022, 2:40 AM

xbolva00 added inline comments.

llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp
2	So.. better version of ArgumentPromotion? Any plans to replace ArgumentPromotion with MSSAArgPromotionPass?

vpykhtin added inline comments.Feb 20 2022, 3:11 AM

llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp
2	For now it lacks of GEP, bitcast and byval struct passing support but eventually it will be added. I'm not sure if every target would benefit of [in/]out argument support and assume it would be done on per target basis: at least it would require to take in account added register pressure (if its an issue) with target-specific callback. There is no such callback in this patch yet and I'm going to add it later.

From a cursory look at the implementation, does this handle unwinding properly?

store i32 0, ptr %arg
call void @may_unwind() readnone
store i32 1, ptr %arg

I believe promotion is not possible in this case, because there's no good way to provide the new value on the unwind path to the caller.

llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp
712	LICM uses the LoadAndStorePromoter utility instead -- I'm not really familiar with the differences between these approaches. From what I can tell, LoadAndStorePromoter is SSAUpdater-based and can work without DomTree, while PromoteMemToReg is more similar to the code you use here, in that it uses IDF.
llvm/test/Transforms/ArgumentPromotion/inoutargs2.ll
5 ↗	(On Diff #409608)	What's up with these REF labels?

From a cursory look at the implementation, does this handle unwinding properly?

store i32 0, ptr %arg
call void @may_unwind() readnone
store i32 1, ptr %arg

I believe promotion is not possible in this case, because there's no good way to provide the new value on the unwind path to the caller.

Thanks, I completely missed this part and its not working correctly now, I will fix that.

llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp
712	Yes. you’re right PromoteMemToReg is good here and it does much more – handles intrinsics, debug info etc. It’s alloca centered but I think it’s relatively easy to make it more generic: separate the part which decides if the transformation is safe and the part that actually performs it. It uses LargeBlockInfo to keep track of instruction ordering which can be substituted with similar MSSA functionality if it’s present.
llvm/test/Transforms/ArgumentPromotion/inoutargs2.ll
5 ↗	(On Diff #409608)	We have AMDGPURewriteOutArguments class which has similar functionality but works differently. @arsenm has tests for it and I put it here (modified in a way it suits my pass) so we could compare covered testcases. REFs are original checks from rewrite-out-arguments.ll so we could see the difference, maybe they should be removed later.

tschuett mentioned this in D137497: [ArgumentPromotion] Allow the frontend to specify the maximum number of elements to promote on a per-function basis via metadata..Nov 6 2022, 6:16 AM

arsenm added inline comments.Nov 16 2022, 3:41 PM

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
746	I think it should be in the default, which is -O2

Herald added a project: Restricted Project. · View Herald TranscriptNov 16 2022, 3:41 PM

Herald added subscribers: nlopes, kosarev. · View Herald Transcript

nlopes added inline comments.Nov 17 2022, 12:21 AM

llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp
793	Please use PoisonValue instead of UndefValue whenever possible as we are trying to remove undef from LLVM. Thank you!

Marking this as changes requested per the unwinding comment above.

I think I also got a mail at some point about the removed isMustSet() function. I think the best way to do that now is to use BatchAA and do a separate isMustAlias() query, which should be cached at that point.

This revision now requires changes to proceed.Mar 7 2023, 1:38 AM

Herald added subscribers: kmitropoulou, StephenFan. · View Herald TranscriptMar 7 2023, 1:38 AM

Going to reclaim this work

Revision Contents

Path

Size

llvm/

include/

llvm/

InitializePasses.h

1 line

Transforms/

IPO.h

5 lines

IPO/

MSSAArgPromotion.h

26 lines

lib/

Passes/

PassBuilder.cpp

1 line

PassRegistry.def

1 line

Target/

AMDGPU/

AMDGPUTargetMachine.cpp

13 lines

Transforms/

IPO/

CMakeLists.txt

1 line

IPO.cpp

1 line

MSSAArgPromotion.cpp

1397 lines

test/

CodeGen/

AMDGPU/

opt-pipeline.ll

1 line

Transforms/

ArgumentPromotion/

inoutargs.ll

1103 lines

Diff 406419

llvm/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 319 Lines • ▼ Show 20 Lines
	void initializeMergeFunctionsLegacyPassPass(PassRegistry&);			void initializeMergeFunctionsLegacyPassPass(PassRegistry&);
	void initializeMergeICmpsLegacyPassPass(PassRegistry &);			void initializeMergeICmpsLegacyPassPass(PassRegistry &);
	void initializeMergedLoadStoreMotionLegacyPassPass(PassRegistry&);			void initializeMergedLoadStoreMotionLegacyPassPass(PassRegistry&);
	void initializeMetaRenamerPass(PassRegistry&);			void initializeMetaRenamerPass(PassRegistry&);
	void initializeModuleDebugInfoLegacyPrinterPass(PassRegistry &);			void initializeModuleDebugInfoLegacyPrinterPass(PassRegistry &);
	void initializeModuleMemProfilerLegacyPassPass(PassRegistry &);			void initializeModuleMemProfilerLegacyPassPass(PassRegistry &);
	void initializeModuleSummaryIndexWrapperPassPass(PassRegistry&);			void initializeModuleSummaryIndexWrapperPassPass(PassRegistry&);
	void initializeModuloScheduleTestPass(PassRegistry&);			void initializeModuloScheduleTestPass(PassRegistry&);
				void initializeMSSAArgPromotionPass(PassRegistry &);
	void initializeMustExecutePrinterPass(PassRegistry&);			void initializeMustExecutePrinterPass(PassRegistry&);
	void initializeMustBeExecutedContextPrinterPass(PassRegistry&);			void initializeMustBeExecutedContextPrinterPass(PassRegistry&);
	void initializeNameAnonGlobalLegacyPassPass(PassRegistry&);			void initializeNameAnonGlobalLegacyPassPass(PassRegistry&);
	void initializeNaryReassociateLegacyPassPass(PassRegistry&);			void initializeNaryReassociateLegacyPassPass(PassRegistry&);
	void initializeNewGVNLegacyPassPass(PassRegistry&);			void initializeNewGVNLegacyPassPass(PassRegistry&);
	void initializeObjCARCAAWrapperPassPass(PassRegistry&);			void initializeObjCARCAAWrapperPassPass(PassRegistry&);
	void initializeObjCARCAPElimPass(PassRegistry&);			void initializeObjCARCAPElimPass(PassRegistry&);
	void initializeObjCARCContractLegacyPassPass(PassRegistry &);			void initializeObjCARCContractLegacyPassPass(PassRegistry &);
	▲ Show 20 Lines • Show All 134 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/IPO.h

	Show First 20 Lines • Show All 153 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	/// createArgumentPromotionPass - This pass promotes "by reference" arguments to			/// createArgumentPromotionPass - This pass promotes "by reference" arguments to
	/// be passed by value if the number of elements passed is smaller or			/// be passed by value if the number of elements passed is smaller or
	/// equal to maxElements (maxElements == 0 means always promote).			/// equal to maxElements (maxElements == 0 means always promote).
	///			///
	Pass *createArgumentPromotionPass(unsigned maxElements = 3);			Pass *createArgumentPromotionPass(unsigned maxElements = 3);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
				/// createMSSAArgPromotionPass - This pass promotes "by reference" arguments to
				/// be passed by value. Input or/and Output arguments supported.
				Pass *createMSSAArgPromotionPass();

				//===----------------------------------------------------------------------===//
	/// createOpenMPOptLegacyPass - OpenMP specific optimizations.			/// createOpenMPOptLegacyPass - OpenMP specific optimizations.
	Pass *createOpenMPOptCGSCCLegacyPass();			Pass *createOpenMPOptCGSCCLegacyPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	/// createIPSCCPPass - This pass propagates constants from call sites into the			/// createIPSCCPPass - This pass propagates constants from call sites into the
	/// bodies of functions, and keeps track of whether basic blocks are executable			/// bodies of functions, and keeps track of whether basic blocks are executable
	/// in the process.			/// in the process.
	///			///
	▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/IPO/MSSAArgPromotion.h

This file was added.

				//===- MSSAArgPromotionPass.cpp - Promote by-reference arguments -----===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Analysis/CGSCCPassManager.h"
				#include "llvm/Analysis/LazyCallGraph.h"
				#include "llvm/IR/PassManager.h"

				namespace llvm {

				class MSSAArgPromotionPass : public PassInfoMixin<MSSAArgPromotionPass> {
				public:
				MSSAArgPromotionPass() {}

				PreservedAnalyses run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &AM,
				LazyCallGraph &CG, CGSCCUpdateResult &UR);
				};

				} // end namespace llvm

llvm/lib/Passes/PassBuilder.cpp

	Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
	#include "llvm/Transforms/IPO/GlobalSplit.h"			#include "llvm/Transforms/IPO/GlobalSplit.h"
	#include "llvm/Transforms/IPO/HotColdSplitting.h"			#include "llvm/Transforms/IPO/HotColdSplitting.h"
	#include "llvm/Transforms/IPO/IROutliner.h"			#include "llvm/Transforms/IPO/IROutliner.h"
	#include "llvm/Transforms/IPO/InferFunctionAttrs.h"			#include "llvm/Transforms/IPO/InferFunctionAttrs.h"
	#include "llvm/Transforms/IPO/Inliner.h"			#include "llvm/Transforms/IPO/Inliner.h"
	#include "llvm/Transforms/IPO/Internalize.h"			#include "llvm/Transforms/IPO/Internalize.h"
	#include "llvm/Transforms/IPO/LoopExtractor.h"			#include "llvm/Transforms/IPO/LoopExtractor.h"
	#include "llvm/Transforms/IPO/LowerTypeTests.h"			#include "llvm/Transforms/IPO/LowerTypeTests.h"
				#include "llvm/Transforms/IPO/MSSAArgPromotion.h"
	#include "llvm/Transforms/IPO/MergeFunctions.h"			#include "llvm/Transforms/IPO/MergeFunctions.h"
	#include "llvm/Transforms/IPO/ModuleInliner.h"			#include "llvm/Transforms/IPO/ModuleInliner.h"
	#include "llvm/Transforms/IPO/OpenMPOpt.h"			#include "llvm/Transforms/IPO/OpenMPOpt.h"
	#include "llvm/Transforms/IPO/PartialInlining.h"			#include "llvm/Transforms/IPO/PartialInlining.h"
	#include "llvm/Transforms/IPO/SCCP.h"			#include "llvm/Transforms/IPO/SCCP.h"
	#include "llvm/Transforms/IPO/SampleProfile.h"			#include "llvm/Transforms/IPO/SampleProfile.h"
	#include "llvm/Transforms/IPO/SampleProfileProbe.h"			#include "llvm/Transforms/IPO/SampleProfileProbe.h"
	#include "llvm/Transforms/IPO/StripDeadPrototypes.h"			#include "llvm/Transforms/IPO/StripDeadPrototypes.h"
	▲ Show 20 Lines • Show All 1,711 Lines • Show Last 20 Lines

llvm/lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 155 Lines • ▼ Show 20 Lines
	CGSCC_ANALYSIS("fam-proxy", FunctionAnalysisManagerCGSCCProxy())			CGSCC_ANALYSIS("fam-proxy", FunctionAnalysisManagerCGSCCProxy())
	CGSCC_ANALYSIS("pass-instrumentation", PassInstrumentationAnalysis(PIC))			CGSCC_ANALYSIS("pass-instrumentation", PassInstrumentationAnalysis(PIC))
	#undef CGSCC_ANALYSIS			#undef CGSCC_ANALYSIS

	#ifndef CGSCC_PASS			#ifndef CGSCC_PASS
	#define CGSCC_PASS(NAME, CREATE_PASS)			#define CGSCC_PASS(NAME, CREATE_PASS)
	#endif			#endif
	CGSCC_PASS("argpromotion", ArgumentPromotionPass())			CGSCC_PASS("argpromotion", ArgumentPromotionPass())
				CGSCC_PASS("mssaargpromotion", MSSAArgPromotionPass())
	CGSCC_PASS("invalidate<all>", InvalidateAllAnalysesPass())			CGSCC_PASS("invalidate<all>", InvalidateAllAnalysesPass())
	CGSCC_PASS("function-attrs", PostOrderFunctionAttrsPass())			CGSCC_PASS("function-attrs", PostOrderFunctionAttrsPass())
	CGSCC_PASS("attributor-cgscc", AttributorCGSCCPass())			CGSCC_PASS("attributor-cgscc", AttributorCGSCCPass())
	CGSCC_PASS("openmp-opt-cgscc", OpenMPOptCGSCCPass())			CGSCC_PASS("openmp-opt-cgscc", OpenMPOptCGSCCPass())
	CGSCC_PASS("coro-split", CoroSplitPass())			CGSCC_PASS("coro-split", CoroSplitPass())
	CGSCC_PASS("no-op-cgscc", NoOpCGSCCPass())			CGSCC_PASS("no-op-cgscc", NoOpCGSCCPass())
	#undef CGSCC_PASS			#undef CGSCC_PASS

	▲ Show 20 Lines • Show All 354 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/MC/TargetRegistry.h"		#include "llvm/MC/TargetRegistry.h"
#include "llvm/Passes/PassBuilder.h"		#include "llvm/Passes/PassBuilder.h"
#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
#include "llvm/Transforms/IPO/AlwaysInliner.h"		#include "llvm/Transforms/IPO/AlwaysInliner.h"
#include "llvm/Transforms/IPO/GlobalDCE.h"		#include "llvm/Transforms/IPO/GlobalDCE.h"
#include "llvm/Transforms/IPO/Internalize.h"		#include "llvm/Transforms/IPO/Internalize.h"
		#include "llvm/Transforms/IPO/MSSAArgPromotion.h"
#include "llvm/Transforms/IPO/PassManagerBuilder.h"		#include "llvm/Transforms/IPO/PassManagerBuilder.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Scalar/GVN.h"		#include "llvm/Transforms/Scalar/GVN.h"
#include "llvm/Transforms/Scalar/InferAddressSpaces.h"		#include "llvm/Transforms/Scalar/InferAddressSpaces.h"
#include "llvm/Transforms/Utils.h"		#include "llvm/Transforms/Utils.h"
#include "llvm/Transforms/Utils/SimplifyLibCalls.h"		#include "llvm/Transforms/Utils/SimplifyLibCalls.h"
#include "llvm/Transforms/Vectorize.h"		#include "llvm/Transforms/Vectorize.h"

▲ Show 20 Lines • Show All 511 Lines • ▼ Show 20 Lines	[AMDGPUAA, LibCallSimplify, this](const PassManagerBuilder &,
PM.add(createAMDGPUExternalAAWrapperPass());		PM.add(createAMDGPUExternalAAWrapperPass());
}		}
PM.add(llvm::createAMDGPUPropagateAttributesEarlyPass(this));		PM.add(llvm::createAMDGPUPropagateAttributesEarlyPass(this));
PM.add(llvm::createAMDGPUUseNativeCallsPass());		PM.add(llvm::createAMDGPUUseNativeCallsPass());
if (LibCallSimplify)		if (LibCallSimplify)
PM.add(llvm::createAMDGPUSimplifyLibCallsPass(this));		PM.add(llvm::createAMDGPUSimplifyLibCallsPass(this));
});		});

		bool EnableAggressiveOpt = getOptLevel() >= CodeGenOpt::Aggressive;
Builder.addExtension(		Builder.addExtension(
PassManagerBuilder::EP_CGSCCOptimizerLate,		PassManagerBuilder::EP_CGSCCOptimizerLate,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - PassManagerBuilder::EP_CGSCCOptimizerLate, - [=](const PassManagerBuilder &, legacy::PassManagerBase &PM) { - // Add pass to promote arguments passed by reference - if (EnableAggressiveOpt) - PM.add(createMSSAArgPromotionPass()); - - // Add promote kernel arguments pass to the opt pipeline right before - // infer address spaces which is needed to do actual address space - // rewriting. - if (PromoteKernelArguments) 40 diff lines are omitted. See full path. Lint: Pre-merge checks: clang-format: please reformat the code ``` - PassManagerBuilder::EP_CGSCCOptimizerLate…
[EnableOpt, PromoteKernelArguments](const PassManagerBuilder &,		[=](const PassManagerBuilder &, legacy::PassManagerBase &PM) {
legacy::PassManagerBase &PM) {		// Add pass to promote arguments passed by reference
		if (EnableAggressiveOpt)
		PM.add(createMSSAArgPromotionPass());

// Add promote kernel arguments pass to the opt pipeline right before		// Add promote kernel arguments pass to the opt pipeline right before
// infer address spaces which is needed to do actual address space		// infer address spaces which is needed to do actual address space
// rewriting.		// rewriting.
if (PromoteKernelArguments)		if (PromoteKernelArguments)
PM.add(createAMDGPUPromoteKernelArgumentsPass());		PM.add(createAMDGPUPromoteKernelArgumentsPass());

// Add infer address spaces pass to the opt pipeline after inlining		// Add infer address spaces pass to the opt pipeline after inlining
// but before SROA to increase SROA opportunities.		// but before SROA to increase SROA opportunities.
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	PB.registerCGSCCOptimizerLateEPCallback(

if (Level != OptimizationLevel::O0) {		if (Level != OptimizationLevel::O0) {
// Promote alloca to vector before SROA and loop unroll. If we		// Promote alloca to vector before SROA and loop unroll. If we
// manage to eliminate allocas before unroll we may choose to unroll		// manage to eliminate allocas before unroll we may choose to unroll
// less.		// less.
FPM.addPass(AMDGPUPromoteAllocaToVectorPass(*this));		FPM.addPass(AMDGPUPromoteAllocaToVectorPass(*this));
}		}

		// Add pass to promote arguments passed by reference
		if (Level.getSpeedupLevel() >= OptimizationLevel::O3.getSpeedupLevel())
		arsenmUnsubmitted Not Done Reply Inline Actions Why require all the way to -O3? -O2? arsenm: Why require all the way to -O3? -O2?
		vpykhtinAuthorUnsubmitted Done Reply Inline Actions I thought its quite "agressive" kind of opt, may be 02 should suite better. vpykhtin: I thought its quite "agressive" kind of opt, may be 02 should suite better.
		arsenmUnsubmitted Not Done Reply Inline Actions I think it should be in the default, which is -O2 arsenm: I think it should be in the default, which is -O2
		PM.addPass(MSSAArgPromotionPass());

PM.addPass(createCGSCCToFunctionPassAdaptor(std::move(FPM)));		PM.addPass(createCGSCCToFunctionPassAdaptor(std::move(FPM)));
});		});
}		}

int64_t AMDGPUTargetMachine::getNullPointerValue(unsigned AddrSpace) {		int64_t AMDGPUTargetMachine::getNullPointerValue(unsigned AddrSpace) {
return (AddrSpace == AMDGPUAS::LOCAL_ADDRESS \|\|		return (AddrSpace == AMDGPUAS::LOCAL_ADDRESS \|\|
AddrSpace == AMDGPUAS::PRIVATE_ADDRESS \|\|		AddrSpace == AMDGPUAS::PRIVATE_ADDRESS \|\|
AddrSpace == AMDGPUAS::REGION_ADDRESS)		AddrSpace == AMDGPUAS::REGION_ADDRESS)
▲ Show 20 Lines • Show All 797 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/CMakeLists.txt

Show All 24 Lines	add_llvm_component_library(LLVMipo
InferFunctionAttrs.cpp		InferFunctionAttrs.cpp
InlineSimple.cpp		InlineSimple.cpp
Inliner.cpp		Inliner.cpp
Internalize.cpp		Internalize.cpp
LoopExtractor.cpp		LoopExtractor.cpp
LowerTypeTests.cpp		LowerTypeTests.cpp
MergeFunctions.cpp		MergeFunctions.cpp
ModuleInliner.cpp		ModuleInliner.cpp
		MSSAArgPromotion.cpp
OpenMPOpt.cpp		OpenMPOpt.cpp
PartialInlining.cpp		PartialInlining.cpp
PassManagerBuilder.cpp		PassManagerBuilder.cpp
PruneEH.cpp		PruneEH.cpp
SampleContextTracker.cpp		SampleContextTracker.cpp
SampleProfile.cpp		SampleProfile.cpp
SampleProfileProbe.cpp		SampleProfileProbe.cpp
SCCP.cpp		SCCP.cpp
Show All 36 Lines

llvm/lib/Transforms/IPO/IPO.cpp

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	void llvm::initializeIPO(PassRegistry &Registry) {
initializeSimpleInlinerPass(Registry);		initializeSimpleInlinerPass(Registry);
initializeInferFunctionAttrsLegacyPassPass(Registry);		initializeInferFunctionAttrsLegacyPassPass(Registry);
initializeInternalizeLegacyPassPass(Registry);		initializeInternalizeLegacyPassPass(Registry);
initializeLoopExtractorLegacyPassPass(Registry);		initializeLoopExtractorLegacyPassPass(Registry);
initializeBlockExtractorLegacyPassPass(Registry);		initializeBlockExtractorLegacyPassPass(Registry);
initializeSingleLoopExtractorPass(Registry);		initializeSingleLoopExtractorPass(Registry);
initializeLowerTypeTestsPass(Registry);		initializeLowerTypeTestsPass(Registry);
initializeMergeFunctionsLegacyPassPass(Registry);		initializeMergeFunctionsLegacyPassPass(Registry);
		initializeMSSAArgPromotionPass(Registry);
initializePartialInlinerLegacyPassPass(Registry);		initializePartialInlinerLegacyPassPass(Registry);
initializeAttributorLegacyPassPass(Registry);		initializeAttributorLegacyPassPass(Registry);
initializeAttributorCGSCCLegacyPassPass(Registry);		initializeAttributorCGSCCLegacyPassPass(Registry);
initializePostOrderFunctionAttrsLegacyPassPass(Registry);		initializePostOrderFunctionAttrsLegacyPassPass(Registry);
initializeReversePostOrderFunctionAttrsLegacyPassPass(Registry);		initializeReversePostOrderFunctionAttrsLegacyPassPass(Registry);
initializePruneEHPass(Registry);		initializePruneEHPass(Registry);
initializeIPSCCPLegacyPassPass(Registry);		initializeIPSCCPLegacyPassPass(Registry);
initializeStripDeadPrototypesLegacyPassPass(Registry);		initializeStripDeadPrototypesLegacyPassPass(Registry);
▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp

This file was added.

Property	Old Value	New Value
File Mode	null	100755

				//===------ MSSAArgPromotionPass.cpp - Promote by-reference arguments -----===//
				//
				xbolva00Unsubmitted Not Done Reply Inline Actions So.. better version of ArgumentPromotion? Any plans to replace ArgumentPromotion with MSSAArgPromotionPass? xbolva00: So.. better version of ArgumentPromotion? Any plans to replace ArgumentPromotion with…
				vpykhtinAuthorUnsubmitted Done Reply Inline Actions For now it lacks of GEP, bitcast and byval struct passing support but eventually it will be added. I'm not sure if every target would benefit of [in/]out argument support and assume it would be done on per target basis: at least it would require to take in account added register pressure (if its an issue) with target-specific callback. There is no such callback in this patch yet and I'm going to add it later. vpykhtin: For now it lacks of GEP, bitcast and byval struct passing support but eventually it will be…
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass promotes function argument passed by reference:
				// 1. Input argument: if the argument is read it is promoted to the argument
				// passed by value. Callers load the argument's value and pass it to the
				// function.
				// 2. Output argument: if the argument is modified the function return type is
				// transformed into an aggregate and the final argument's value is returned
				// as a component of the return value. Callers store the returned value
				// using the original argument pointer.
				// 3. Input/Output argument: the combination of the above.
				//
				// int foo(int a, int *x) {
				// *x += 2;
				// return a;
				// }
				// int MemVar;
				// int X = foo(1, &MemVar);
				//
				// into:
				//
				// struct { int, int } foo (int a, int x) {
				// return { a, x + 2 };
				// }
				// int MemVar;
				// struct { int, int } S = foo(1, MemVar);
				// int X = S.first;
				// MemVar = S.second;
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/IPO/MSSAArgPromotion.h"
				#include "llvm/ADT/DepthFirstIterator.h"
				#include "llvm/ADT/None.h"
				#include "llvm/ADT/Optional.h"
				#include "llvm/ADT/STLExtras.h"
				#include "llvm/ADT/ScopeExit.h"
				#include "llvm/ADT/SmallPtrSet.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/ADT/Twine.h"
				#include "llvm/Analysis/AssumptionCache.h"
				#include "llvm/Analysis/BasicAliasAnalysis.h"
				#include "llvm/Analysis/CGSCCPassManager.h"
				#include "llvm/Analysis/CallGraph.h"
				#include "llvm/Analysis/CallGraphSCCPass.h"
				#include "llvm/Analysis/CaptureTracking.h"
				#include "llvm/Analysis/InstructionSimplify.h"
				#include "llvm/Analysis/IteratedDominanceFrontier.h"
				#include "llvm/Analysis/LazyCallGraph.h"
				#include "llvm/Analysis/Loads.h"
				#include "llvm/Analysis/MemoryBuiltins.h"
				#include "llvm/Analysis/MemoryLocation.h"
				#include "llvm/Analysis/MemorySSA.h"
				#include "llvm/Analysis/MemorySSAUpdater.h"
				#include "llvm/Analysis/TargetLibraryInfo.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/Analysis/ValueTracking.h"
				#include "llvm/IR/Argument.h"
				#include "llvm/IR/Attributes.h"
				#include "llvm/IR/BasicBlock.h"
				#include "llvm/IR/CFG.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/DataLayout.h"
				#include "llvm/IR/DerivedTypes.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/IRPrintingPasses.h"
				#include "llvm/IR/InstrTypes.h"
				#include "llvm/IR/Instruction.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/Metadata.h"
				#include "llvm/IR/Module.h"
				#include "llvm/IR/NoFolder.h"
				#include "llvm/IR/PassManager.h"
				#include "llvm/IR/Type.h"
				#include "llvm/IR/Use.h"
				#include "llvm/IR/User.h"
				#include "llvm/IR/Value.h"
				#include "llvm/InitializePasses.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/Casting.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/FormatVariadic.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Transforms/IPO.h"
				#include "llvm/Transforms/Utils/SSAUpdater.h"
				#include <algorithm>
				#include <cassert>
				#include <cstdint>
				#include <deque>
				#include <functional>
				#include <iterator>
				#include <map>
				#include <set>
				#include <string>
				#include <utility>
				#include <vector>

				using namespace llvm;

				#define DEBUG_TYPE "mssaargpromotion"

				STATISTIC(NumInArgCandidates, "Number of input argument candidates found");
				STATISTIC(NumInArgPromoted, "Number of of input argument promoted");
				STATISTIC(NumInOutArgCandidates, "Number of in/out argument candidates found");
				STATISTIC(NumInOutArgPromoted, "Number of of in/out argument promoted");

				// When searching for a clobber for an argument we constrain the number of
				// expensive uncached MSSA walks.
				static cl::opt<unsigned> MaxMSSAWalksNum(
				"argpromo-mssa-walks-limit", cl::Hidden, cl::init(10000),
				cl::desc(
				"Function argument promotion pass: the maximum number of MSSA walks"
				" per argument on a clobber search (default = 1000)"));

				// Return dot prefixed string twine if S isn't empty (used for BB's names).
				static inline Twine dot(const StringRef &S) {
				return !S.empty() ? Twine('.') + S : Twine();
				}

				// Structure describing argument for promotion.
				struct ArgPromotionInfo {
				Argument *Arg;
				Type *ArgType;
				Align ArgAlign;
				uint32_t Preload : 1; // Argument requires initial value to be passed to
				// the function.
				uint32_t Return : 1; // Argument should be returned by the function.

				// When the argument is promoted we need a new argument for the incoming
				// preloaded value but the new function signature isn't known yet and
				// therefore isn't created. We use a dummy argument to start with and
				// after the new function is created its RAUWed with the function's
				// argument, see createNewFunction.
				std::unique_ptr<Argument> PreloadArgDummy;

				// Index of the value in the aggregated return type (insert/extract_value idx)
				unsigned ReturnValueIndex = (unsigned)-1;

				// If one candidate clobbers another this field denotes the relationship.
				// Used to find "declobbering" promotion sequence.
				ArgPromotionInfo *ClobberedBy = nullptr;

				AAMDNodes AAMD; // Merged AA metadata for the load/store.

				ArgPromotionInfo(Argument Arg_ = nullptr, Type ArgType_ = nullptr,
				Align ArgAlign_ = Align())
				: Arg(Arg_), ArgType(ArgType_), ArgAlign(ArgAlign_) {
				Preload = Return = 0;
				}

				unsigned getArgNo() const { return Arg->getArgNo(); }

				bool isUnusedArg() const { return !Preload && !Return; }

				// Return true if this argument is promoted.
				bool isPromoted() const {
				return PreloadArgDummy \|\| ReturnValueIndex != (unsigned)-1;
				}

				// TODO: this is a placeholder for checking GEP indexes
				bool isMyPtr(Value *Ptr) const { return Ptr && Ptr == Arg; }

				// Predicates returning true if the value is a load or store by this
				// argument (TODO: this will check GEPs later).
				bool isMyLoad(Value *V) const {
				LoadInst *LI = dyn_cast<LoadInst>(V);
				return LI ? isMyPtr(LI->getPointerOperand()) : false;
				arsenmUnsubmitted Done Reply Inline Actions dyn_cast instead of isa and cast arsenm: dyn_cast instead of isa and cast
				}
				bool isMyStore(Value *V) const {
				StoreInst *SI = dyn_cast<StoreInst>(V);
				arsenmUnsubmitted Done Reply Inline Actions dyn_cast instead of isa and cast arsenm: dyn_cast instead of isa and cast
				return SI ? isMyPtr(SI->getPointerOperand()) : false;
				}
				bool isMyLoadOrStore(Value *V) const {
				if (LoadInst *LI = dyn_cast<LoadInst>(V))
				return isMyPtr(LI->getPointerOperand());
				if (StoreInst *SI = dyn_cast<StoreInst>(V))
				arsenmUnsubmitted Done Reply Inline Actions No else after return arsenm: No else after return
				return isMyPtr(SI->getPointerOperand());
				return false;
				}

				MemoryLocation getMemLoc() const {
				const auto &DL = Arg->getParent()->getParent()->getDataLayout();
				return MemoryLocation(Arg,
				LocationSize::precise(DL.getTypeStoreSize(ArgType)));
				}

				bool isClobberedBy(const ArgPromotionInfo &A) const {
				const ArgPromotionInfo *P = this;
				while ((P = P->ClobberedBy)) {
				if (&A == P)
				return true;
				}
				return false;
				}

				Twine getParamName(StringRef &&LifeTimeOwner = StringRef()) const {
				// The problem with a twine is that StringRef it references should be alive
				// when the twine is alive: use LifeTimeOwner to keep the StringRef alive
				// at least for the lifetime of the full expression.
				LifeTimeOwner = Arg->getName();
				return LifeTimeOwner + ".val";
				}

				#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
				LLVM_DUMP_METHOD StringRef getKindStr() const {
				if (Preload && Return)
				return "inout";
				return Preload ? "in" : "out";
				}
				#endif

				Argument *getOrCreatePreloadArgDummy() {
				if (!PreloadArgDummy)
				PreloadArgDummy = std::make_unique<Argument>(ArgType);
				return PreloadArgDummy.get();
				}

				LoadInst createLoad(IRBuilder<NoFolder> &IRB, Value Ptr,
				const StringRef &Name) const {
				LoadInst *Load = IRB.CreateLoad(ArgType, Ptr, Name + ".val.pre");
				Load->setAlignment(ArgAlign);
				if (AAMD)
				Load->setAAMetadata(AAMD);
				return Load;
				}

				StoreInst createStore(IRBuilder<NoFolder> &IRB, Value V, Value *Ptr) const {
				StoreInst *Store = IRB.CreateStore(V, Ptr);
				Store->setAlignment(ArgAlign);
				if (AAMD)
				Store->setAAMetadata(AAMD);
				return Store;
				}

				// Iterator to hide impl details on iterating promoted argument's users,
				// espesially when GEPs added, by now - minimal trivial implementation.
				class user_iterator
				: public iterator_facade_base<user_iterator, std::forward_iterator_tag,
				Value *> {
				Argument::user_iterator ArgUserI;
				friend struct ArgPromotionInfo;
				user_iterator(const Argument::user_iterator &I) : ArgUserI(I) {}

				public:
				value_type operator() const { return ArgUserI; }
				user_iterator &operator++() {
				++ArgUserI;
				return *this;
				}
				user_iterator operator++(int) {
				auto R = *this;
				++ArgUserI;
				return R;
				}
				bool operator==(const user_iterator &RHS) const {
				return ArgUserI == RHS.ArgUserI;
				}
				};
				user_iterator user_begin() const { return user_iterator(Arg->user_begin()); }
				user_iterator user_end() const { return user_iterator(Arg->user_end()); }
				iterator_range<user_iterator> users() const {
				return make_range(user_begin(), user_end());
				}
				};

				// Return true if Pred is true for all callers passing P.Arg.
				static bool allCallersPass(
				const ArgPromotionInfo &P,
				function_ref<bool(CallBase , Value , const ArgPromotionInfo &)> Pred) {
				Function *Callee = P.Arg->getParent();
				for (User *U : Callee->users()) {
				assert(isa<CallBase>(U));
				CallBase *CB = cast<CallBase>(U);
				if (!Pred(CB, CB->getArgOperand(P.getArgNo()), P))
				return false;
				}
				return true;
				}

				// Given the function pointer argument that is only used by loads
				// return true if the value pointed by the argument can be loaded before the
				// function call and passed in:
				// either the value is loaded by the ptr arg on every function path
				// or the pointer is valid for all callsites in the program.
				static bool isROCandidate(ArgPromotionInfo &Candidate) {
				SmallPtrSet<BasicBlock *, 16> ReadPerBB;
				for (Value *U : Candidate.users()) {
				assert(Candidate.isMyLoad(U));
				ReadPerBB.insert(cast<Instruction>(U)->getParent());
				}
				bool HasLoadOnEveryPath = true;
				Function *F = Candidate.Arg->getParent();
				auto *EntryBB = &F->getEntryBlock();
				for (auto DFI = df_begin(EntryBB), E = df_end(EntryBB); DFI != E;) {
				BasicBlock BB = DFI;
				if (ReadPerBB.count(BB)) {
				DFI.skipChildren(); // This path already have load - skipping children.
				continue;
				}
				rampitecUnsubmitted Done Reply Inline Actions else after continue is not needed. rampitec: else after continue is not needed.
				if (isa<ReturnInst>(BB->getTerminator())) {
				HasLoadOnEveryPath = false;
				break;
				}
				++DFI;
				}

				// Return true if we can prove that caller pass in a valid pointer.
				const DataLayout &DL = F->getParent()->getDataLayout();
				auto ValidPtr = [&DL](CallBase , Value ActualPtr,
				const ArgPromotionInfo &P) {
				return isDereferenceablePointer(ActualPtr, P.ArgType, DL);
				};

				Candidate.Preload = HasLoadOnEveryPath \|\| allCallersPass(Candidate, ValidPtr);

				LLVM_DEBUG(dbgs() << " - ";
				if (HasLoadOnEveryPath)
				Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - if (HasLoadOnEveryPath) - dbgs() << "has a load on every path,"; - else - dbgs() << (Candidate.Preload ? "" : "not") - << " all callers pass a valid dereferenceable ptr,"); + if (HasLoadOnEveryPath) dbgs() << "has a load on every path,"; + else dbgs() << (Candidate.Preload ? "" : "not") + << " all callers pass a valid dereferenceable ptr,"); Lint: Pre-merge checks: clang-format: please reformat the code ``` - if (HasLoadOnEveryPath) - dbgs() << "has a…
				dbgs() << "has a load on every path,";
				arsenmUnsubmitted Done Reply Inline Actions Weird formatting arsenm: Weird formatting
				vpykhtinAuthorUnsubmitted Done Reply Inline Actions lint however tries to convince me the previous formatting was better. vpykhtin: lint however tries to convince me the previous formatting was better.
				else
				dbgs() << (Candidate.Preload ? "" : "not")
				<< " all callers pass a valid dereferenceable ptr,");

				return Candidate.Preload;
				}

				// Given the function pointer argument that is only used by stores and maybe
				// loads return true if the value pointed by the argument can be stored after
				// and loaded before the function call and passed in/returned by the function:
				// either the value is stored on every function path
				// or the pointer points to a thread local memory that doesn't escape before
				// the function call for every callsite in the program.
				// Check is made if a load precedes stores on any path so the initial value
				// should be passed in as a parameter.
				static bool isRWCandidate(FunctionAnalysisManager &FAM,
				ArgPromotionInfo &Candidate) {
				SmallDenseMap<BasicBlock *, unsigned, 16> RWPerBB;
				enum { HasReads = 1, HasWrites = 2 };
				for (Value *U : Candidate.users()) {
				assert(Candidate.isMyLoadOrStore(U));
				RWPerBB[cast<Instruction>(U)->getParent()] \|=
				isa<LoadInst>(U) ? HasReads : HasWrites;
				}
				bool HasLoadBeforeStore = false;
				bool HasStoreOnEveryPath = true;
				Function *F = Candidate.Arg->getParent();
				auto *EntryBB = &F->getEntryBlock();
				for (auto DFI = df_begin(EntryBB), E = df_end(EntryBB); DFI != E;) {
				BasicBlock BB = DFI;
				auto RW = RWPerBB.find(BB);
				if (RW != RWPerBB.end()) { // There is load or store within the BB.
				if (!HasLoadBeforeStore && (RW->second & HasReads)) {
				if (RW->second & HasWrites) {
				// Determine if load locally dominates store.
				auto LorS = find_if(*BB, [&Candidate](Instruction &I) -> bool {
				return Candidate.isMyLoadOrStore(&I);
				});
				assert(LorS != BB->end());
				HasLoadBeforeStore = isa<LoadInst>(*LorS);
				} else
				HasLoadBeforeStore = true;
				}
				if (RW->second & HasWrites) {
				DFI.skipChildren(); // This path already have store - skipping children.
				continue;
				}
				}
				if (isa<ReturnInst>(BB->getTerminator()))
				HasStoreOnEveryPath = false;

				// Short-circuit: all the info is collected - nothing left to do.
				if (HasLoadBeforeStore && !HasStoreOnEveryPath)
				break;
				++DFI;
				}

				auto ValidThreadLocalPtr = [&FAM, F](CallBase CallInst, Value ActualPtr,
				const ArgPromotionInfo &P) {
				Value *Object = getUnderlyingObject(ActualPtr);
				if (!isa<AllocaInst>(Object) &&
				!isAllocLikeFn(Object, &FAM.getResult<TargetLibraryAnalysis>(*F)))
				return false;

				return !PointerMayBeCapturedBefore(
				Object, /* ReturnCaptures */ false,
				/* StoreCaptures */ true, CallInst,
				&FAM.getResult<DominatorTreeAnalysis>(*F));
				};

				if (HasStoreOnEveryPath) {
				Candidate.Preload = HasLoadBeforeStore;
				Candidate.Return = true;
				LLVM_DEBUG(dbgs() << " - has store on every path,");
				} else {
				// Preload the value so it can be returned unchanged on some path.
				Candidate.Preload = Candidate.Return =
				allCallersPass(Candidate, ValidThreadLocalPtr);
				LLVM_DEBUG(dbgs() << " - " << (Candidate.Return ? "" : "not")
				<< " all callers pass a valid thread local ptr,");
				}
				return Candidate.Return;
				}

				// Fill Candidates with the list of arguments potentially suitable for promotion
				static bool
				getPromotionCandidates(FunctionAnalysisManager &FAM, Argument *PtrArg,
				SmallVectorImpl<ArgPromotionInfo> &Candidates,
				bool InArgsOnly) {
				LLVM_DEBUG(dbgs() << " Trying arg: " << *PtrArg);

				Type *ValueTy = cast<PointerType>(PtrArg->getType())->getPointerElementType();
				if (!ValueTy->isSingleValueType()) {
				LLVM_DEBUG(dbgs() << " - unsupported type " << *ValueTy << '\n');
				return false;
				}

				unsigned NumLoads = 0, NumStores = 0;
				Align ArgAlign; // Receives max alignment among the instructions.
				for (auto *U : PtrArg->users()) {
				const unsigned NumLoadStoresBefore = NumLoads + NumStores;
				if (auto *LI = dyn_cast<LoadInst>(U)) {
				if (LI->isSimple()) {
				ArgAlign = std::max(ArgAlign, LI->getAlign());
				++NumLoads;
				}
				} else if (auto *SI = dyn_cast<StoreInst>(U)) {
				if (SI->isSimple() && SI->getValueOperand() != PtrArg && !InArgsOnly) {
				ArgAlign = std::max(ArgAlign, SI->getAlign());
				++NumStores;
				}
				}
				if (NumLoads + NumStores == NumLoadStoresBefore) {
				LLVM_DEBUG(dbgs() << " - unsupported use " << *U << '\n');
				return false;
				}
				}

				Candidates.emplace_back(PtrArg, ValueTy, ArgAlign);
				if (NumLoads + NumStores) {
				auto &C = Candidates.back();
				if (!(NumStores ? isRWCandidate(FAM, C) : isROCandidate(C))) {
				Candidates.pop_back();
				LLVM_DEBUG(dbgs() << " discard\n");
				return false;
				}
				LLVM_DEBUG(dbgs() << " promote as " << C.getKindStr() << " arg\n");
				} else {
				// Otherwise - useless argument - to get rid off later.
				LLVM_DEBUG(dbgs() << " - unused arg, remove\n");
				}
				return true;
				}

				class ArgumentPromoter {
				Function *F;
				FunctionAnalysisManager &FAM;
				MemorySSA &MSSA;
				unsigned NumMSSAWalksLeft;
				SmallPtrSet<MemoryAccess *, 16> VisitedMA;

				enum ClobberTestResult {
				CheckOtherPhiPath,
				ContinueThisPhiPath,
				FoundClobber
				};
				using ClobberTestFx = enum ClobberTestResult(MemoryAccess *);

				MemoryAccess getClobber(MemoryAccess MA, const MemoryLocation &Loc,
				function_ref<ClobberTestFx> ClobberTest,
				SmallPtrSetImpl<MemoryAccess *> &Visited);

				MemoryAccess getClobber(Instruction I,
				function_ref<ClobberTestFx> ClobberTest,
				SmallPtrSetImpl<MemoryAccess *> &Visited);

				MemoryAccess *getInOutArgClobber(const ArgPromotionInfo &ArgInfo);

				using RetValuesMap =
				SmallDenseMap<ReturnInst *, SmallVector<TrackingVH<Value>, 4>>;
				void promoteInOutArg(ArgPromotionInfo &ArgInfo, RetValuesMap &RetValues);

				Type *promoteInOutCandidates(
				SmallVectorImpl<ArgPromotionInfo> &Candidates,
				SmallVectorImpl<ArgPromotionInfo *> &RetValuesStoreOrder);

				bool isInArgClobbered(const ArgPromotionInfo &ArgInfo);
				void promoteInArg(ArgPromotionInfo &ArgInfo);

				static Function *
				createNewFunction(Function OldF, Type RetTy,
				const SmallVectorImpl<ArgPromotionInfo *> &PromotedArgs);

				static void promoteCallsite(
				CallBase &CB, Function *NF,
				const SmallVectorImpl<ArgPromotionInfo *> &PromotedArgs,
				const SmallVectorImpl<ArgPromotionInfo *> &RetValuesStoreOrder);

				public:
				ArgumentPromoter(Function *F_, FunctionAnalysisManager &FAM_)
				: F(F_), FAM(FAM_), MSSA(FAM.getResult<MemorySSAAnalysis>(*F).getMSSA()) {
				}

				Function *run(SmallVectorImpl<ArgPromotionInfo> &Candidates);
				};

				// Search memory access that clobbers Loc starting from MA. Does a BFS search
				// on phi paths. ClobberTest is run over every found clobber to negotiate it
				// further by the ClobberTest's return value:
				// FoundClobber - stop search and return found clobber;
				// ContinueThisPhiPath - skip found clobber and continue searching the path;
				// CheckOtherPhiPath - skip found clobber and try other phi paths if any.
				// Return found clobber, LiveOnEntryDef if no clobber or nullptr if the maximum
				// number of uncached MSSA walks reached.
				MemoryAccess *
				ArgumentPromoter::getClobber(MemoryAccess *MA, const MemoryLocation &Loc,
				function_ref<ClobberTestFx> ClobberTest,
				SmallPtrSetImpl<MemoryAccess *> &Visited) {
				std::deque<MemoryAccess *> FIFO;
				do {
				while (true) {
				if (!Visited.insert(MA).second)
				break;
				if (MemoryPhi *Phi = dyn_cast<MemoryPhi>(MA)) {
				for (auto *DefMA : make_range(Phi->defs_begin(), Phi->defs_end()))
				FIFO.push_back(DefMA);
				break;
				}
				if (--NumMSSAWalksLeft == 0) // Constrain the number of uncached walks.
				return nullptr;
				auto *ClobberMA = MSSA.getWalker()->getClobberingMemoryAccess(MA, Loc);
				if (isa<MemoryPhi>(ClobberMA)) {
				MA = ClobberMA;
				} else if (!MSSA.isLiveOnEntryDef(ClobberMA)) {
				ClobberTestResult R = ClobberTest(ClobberMA);
				if (R == FoundClobber)
				return ClobberMA;
				else if (R == ContinueThisPhiPath)
				MA = cast<MemoryUseOrDef>(ClobberMA)->getDefiningAccess();
				else
				break; // CheckOtherPhiPath
				}
				}
				if (FIFO.empty())
				break;
				MA = FIFO.front();
				FIFO.pop_front();
				} while (true);
				return MSSA.getLiveOnEntryDef();
				}

				// Similar the previous routine but searches memory access that clobbers
				// memory accessed by the I instruction.
				MemoryAccess *
				ArgumentPromoter::getClobber(Instruction *I,
				function_ref<ClobberTestFx> ClobberTest,
				SmallPtrSetImpl<MemoryAccess *> &Visited) {
				assert(MemoryLocation::getOrNone(I).hasValue());
				auto *ClobberMA = MSSA.getWalker()->getClobberingMemoryAccess(I);
				if (MSSA.isLiveOnEntryDef(ClobberMA))
				return ClobberMA;
				if (isa<MemoryPhi>(ClobberMA))
				return getClobber(ClobberMA, MemoryLocation::get(I), ClobberTest, Visited);

				switch (ClobberTest(ClobberMA)) {
				case FoundClobber:
				return ClobberMA;
				case CheckOtherPhiPath:
				break; // No other path to test.
				case ContinueThisPhiPath:
				return getClobber(cast<MemoryUseOrDef>(ClobberMA)->getDefiningAccess(),
				MemoryLocation::get(I), ClobberTest, Visited);
				}
				return MSSA.getLiveOnEntryDef();
				}

				// TODO: move this to the MemorySSA class
				// Find last memory def or phi in the BB or in its dominating predecessors.
				// Note that a def in non-dominating predecessor would create phi in the BB.
				static MemoryAccess getLastDef(BasicBlock BB, MemorySSA &MSSA) {
				if (auto *Defs = MSSA.getBlockDefs(BB))
				return const_cast<MemoryAccess >(&Defs->rbegin());

				DomTreeNode *Node = MSSA.getDomTree().getNode(BB);
				while ((Node = Node->getIDom()))
				if (auto *Defs = MSSA.getBlockDefs(Node->getBlock()))
				return const_cast<MemoryAccess >(&Defs->rbegin());
				return MSSA.getLiveOnEntryDef();
				}

				#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
				LLVM_DUMP_METHOD
				static void printClobber(raw_ostream &os, MemoryAccess *ClobberMA,
				Instruction *I) {
				if (!ClobberMA) {
				os << "clobber search reached limit\n";
				return;
				}
				auto *ClobberI = cast<MemoryUseOrDef>(ClobberMA)->getMemoryInst();
				os << "found clobber:" << *I << '@' << I->getParent()->getName()
				<< " is clobbered by" << *ClobberI << '@'
				<< ClobberI->getParent()->getName() << '\n';
				}
				#endif

				// Check if memops by the argument are clobbered by or clobber other memops.
				// Return found clobber, LiveOnEntryDef if no clobber or nullptr if the maximum
				// number of uncached MSSA walks reached.
				MemoryAccess *
				ArgumentPromoter::getInOutArgClobber(const ArgPromotionInfo &ArgInfo) {
				LLVM_DEBUG(dbgs() << " Searching for a clobber for " << ArgInfo.getKindStr()
				<< " arg " << *ArgInfo.Arg << ": ");
				auto SkipMyStore = [&ArgInfo](MemoryAccess *MA) -> ClobberTestResult {
				return ArgInfo.isMyStore(cast<MemoryUseOrDef>(MA)->getMemoryInst())
				? CheckOtherPhiPath
				: FoundClobber;
				};
				VisitedMA.clear(); // Using VisitedMA to track SkipMyStore condition tests.
				// Check if a load by the argument is clobbered by something else than
				// a store by the argument.
				for (Value *U : ArgInfo.users()) {
				assert(ArgInfo.isMyLoadOrStore(U));
				if (LoadInst *LI = dyn_cast<LoadInst>(U)) {
				auto *Clob = getClobber(LI, SkipMyStore, VisitedMA);
				if (!MSSA.isLiveOnEntryDef(Clob)) {
				LLVM_DEBUG(printClobber(dbgs(), Clob, LI));
				return Clob;
				}
				}
				}
				// Check if the argument has been clobbered between last store by the arg
				// and return on any path.
				MemoryLocation Loc(ArgInfo.getMemLoc());
				for (auto &BB : *F) {
				if (!isa<ReturnInst>(BB.getTerminator()))
				continue;
				auto *Clob = getClobber(getLastDef(&BB, MSSA), Loc, SkipMyStore, VisitedMA);
				if (!MSSA.isLiveOnEntryDef(Clob)) {
				LLVM_DEBUG(printClobber(dbgs(), Clob, BB.getTerminator()));
				return Clob;
				}
				}
				// Check if any other load is clobbered by a store by the argument.
				AliasAnalysis &AA = FAM.getResult<AAManager>(*F);
				for (auto &BB : *F) {
				if (auto *L = MSSA.getBlockAccesses(&BB)) {
				for (auto &MA : *L) {
				if (auto *MU = dyn_cast<MemoryUse>(&MA)) {
				Instruction *UseI = MU->getMemoryInst();
				if (ArgInfo.isMyLoad(UseI))
				continue;
				auto UseLoc = MemoryLocation::getOrNone(UseI);
				if (!UseLoc.hasValue()) {
				LLVM_DEBUG(dbgs() << "cannot get memloc for " << *UseI << '\n');
				// Conservatively consider this as a clobber.
				return const_cast<MemoryUse *>(MU);
				}
				auto FindMyStore = [&](MemoryAccess *MA) -> ClobberTestResult {
				Instruction *DefI = cast<MemoryUseOrDef>(MA)->getMemoryInst();
				if (ArgInfo.isMyStore(DefI))
				return FoundClobber;
				// If the UseI's location is definitely overwritten with the clober
				// we can skip this path, otherwise it can be clobbered earlier.
				ModRefInfo MRI = AA.getModRefInfo(DefI, UseLoc);
				return (isMustSet(MRI) && isModSet(MRI)) ? CheckOtherPhiPath
				: ContinueThisPhiPath;
				};
				VisitedMA.clear();
				auto *Clob = getClobber(UseI, FindMyStore, VisitedMA);
				if (!MSSA.isLiveOnEntryDef(Clob)) {
				LLVM_DEBUG(printClobber(dbgs(), Clob, UseI));
				return Clob;
				}
				}
				}
				}
				}
				LLVM_DEBUG(dbgs() << "no clobber\n");
				return MSSA.getLiveOnEntryDef();
				}

				// Annotate each return with a value for the argument ArgInfo.
				// Create Phis and rewrites code.
				void ArgumentPromoter::promoteInOutArg(ArgPromotionInfo &ArgInfo,
				RetValuesMap &RetValues) {
				SmallDenseMap<BasicBlock , SmallVector<Instruction , 4>, 16> MemInsts;
				SmallPtrSet<BasicBlock *, 4> DefBB;
				for (Value *U : ArgInfo.users()) {
				assert(ArgInfo.isMyLoadOrStore(U));

				Instruction *I = cast<Instruction>(U);
				if (MemInsts.empty())
				ArgInfo.AAMD = I->getAAMetadata();
				else if (ArgInfo.AAMD) // Merging AA metadata BTW.
				ArgInfo.AAMD.merge(I->getAAMetadata());

				BasicBlock *BB = I->getParent();
				if (isa<StoreInst>(I))
				DefBB.insert(BB);
				MemInsts[BB].push_back(I);
				}

				SmallDenseMap<BasicBlock *, TrackingVH<Value>, 16> BBExitValue;

				// Processing stores.
				nikicUnsubmitted Not Done Reply Inline Actions For the actual promotion, have you considered making use of PromoteMemToReg? Basically, replace the old argument with an alloca that is stored on entry and read on exit, and then run mem2reg on that alloca? define i32 @test(i32 %arg) { %old.arg = alloca i32 store i32, i32* %old.arg // Code uses old.arg %ret = load i32, i32* %old.arg ret i32 %ret } nikic: For the actual promotion, have you considered making use of PromoteMemToReg? Basically, replace…
				vpykhtinAuthorUnsubmitted Done Reply Inline Actions I think this is a good idea to reuse the working code, however I would rather make some sort of refactoring on it than trying to please it :) There is similar code in LICM promoteLoopAccessesToScalars, so it seems we would benefit of such mem2reg framework. vpykhtin: I think this is a good idea to reuse the working code, however I would rather make some sort of…
				nikicUnsubmitted Not Done Reply Inline Actions LICM uses the LoadAndStorePromoter utility instead -- I'm not really familiar with the differences between these approaches. From what I can tell, LoadAndStorePromoter is SSAUpdater-based and can work without DomTree, while PromoteMemToReg is more similar to the code you use here, in that it uses IDF. nikic: LICM uses the LoadAndStorePromoter utility instead -- I'm not really familiar with the…
				vpykhtinAuthorUnsubmitted Done Reply Inline Actions Yes. you’re right PromoteMemToReg is good here and it does much more – handles intrinsics, debug info etc. It’s alloca centered but I think it’s relatively easy to make it more generic: separate the part which decides if the transformation is safe and the part that actually performs it. It uses LargeBlockInfo to keep track of instruction ordering which can be substituted with similar MSSA functionality if it’s present. vpykhtin: Yes. you’re right PromoteMemToReg is good here and it does much more – handles intrinsics…
				for (BasicBlock *BB : DefBB) {
				auto &BBMemInsts = MemInsts[BB];
				// Sort mem instructions in the program order.
				sort(BBMemInsts, [this](Instruction A, Instruction B) {
				return MSSA.locallyDominates(MSSA.getMemoryAccess(A),
				MSSA.getMemoryAccess(B));
				});
				// Propagate store values down to the end of the basic block,
				// loads preceding the first store will be processed later.
				auto FirstStore =
				find_if(BBMemInsts, [](Instruction *I) { return isa<StoreInst>(I); });
				assert(FirstStore != BBMemInsts.end());
				Value *V = nullptr;
				for (Instruction *I : make_range(FirstStore, BBMemInsts.end())) {
				if (isa<LoadInst>(I)) {
				assert(V); // Since we started with a store.
				I->replaceAllUsesWith(V);
				} else
				V = cast<StoreInst>(I)->getValueOperand();
				}
				assert(V);
				BBExitValue[BB] = V;
				}

				SmallDenseMap<BasicBlock *, TrackingVH<Value>, 16> BBEntryValue;
				auto setEntryValue = [&](BasicBlock BB, Value V) {
				BBEntryValue[BB] = V;
				// Keep BBExitValue left from store processing.
				BBExitValue.try_emplace(BB, V);
				};

				if (ArgInfo.Preload)
				setEntryValue(&F->getEntryBlock(), ArgInfo.getOrCreatePreloadArgDummy());

				{ // Inserting phis.
				SmallVector<BasicBlock *, 16> PHIBlocks;
				ForwardIDFCalculator IDF(MSSA.getDomTree());
				IDF.setDefiningBlocks(DefBB);
				IDF.calculate(PHIBlocks);

				for (auto *JoinBB : PHIBlocks) {
				auto P = MemInsts.find(JoinBB);
				// If JoinBB starts with a store then phi value isn't used.
				if (P == MemInsts.end() \|\| isa<LoadInst>(P->second.front())) {
				PHINode *Phi = PHINode::Create(ArgInfo.ArgType, 2,
				ArgInfo.getParamName() +
				dot(JoinBB->getName()) + ".phi",
				&JoinBB->front());
				setEntryValue(JoinBB, Phi);
				}
				}
				}

				auto findIncomingValue = [&](BasicBlock BB) -> Value {
				DomTreeNode *Node = MSSA.getDomTree().getNode(BB);
				while ((Node = Node->getIDom())) {
				auto I = BBExitValue.find(Node->getBlock());
				if (I != BBExitValue.end())
				return I->second;
				}
				return UndefValue::get(ArgInfo.ArgType);
				};

				auto getBBExitValue = [&](BasicBlock BB) -> Value {
				auto I = BBExitValue.find(BB);
				if (I != BBExitValue.end())
				return I->second;
				return findIncomingValue(BB);
				rampitecUnsubmitted Not Done Reply Inline Actions A single function and BBEntryValue/BBExitValue as an argument? rampitec: A single function and BBEntryValue/BBExitValue as an argument?
				};

				auto getBBEntryValue = [&](BasicBlock BB) -> Value {
				auto I = BBEntryValue.find(BB);
				if (I != BBEntryValue.end())
				return I->second;
				return findIncomingValue(BB);
				};

				// Processing phis.
				const DataLayout &DL = F->getParent()->getDataLayout();
				for (auto &P : BBEntryValue)
				if (PHINode Phi = dyn_cast<PHINode>(&P.second)) {
				nlopesUnsubmitted Not Done Reply Inline Actions Please use PoisonValue instead of UndefValue whenever possible as we are trying to remove undef from LLVM. Thank you! nlopes: Please use PoisonValue instead of UndefValue whenever possible as we are trying to remove undef…
				for (BasicBlock *PredBB : predecessors(P.first))
				Phi->addIncoming(getBBExitValue(PredBB), PredBB);

				if (Value *V = SimplifyInstruction(Phi, DL)) {
				Phi->replaceAllUsesWith(V);
				Phi->eraseFromParent();
				}
				}

				// Processing loads.
				for (auto &P : MemInsts) {
				auto &BBMemInsts = P.second;
				if (!isa<LoadInst>(BBMemInsts.front()))
				continue;
				Value *V = getBBEntryValue(P.first);
				auto I = BBMemInsts.begin(), E = BBMemInsts.end();
				do {
				(*I)->replaceAllUsesWith(V);
				} while (++I != E && isa<LoadInst>(*I));
				}

				arsenmUnsubmitted Done Reply Inline Actions The terminator can't be null arsenm: The terminator can't be null
				// Annotate returns.
				for (BasicBlock &BB : *F)
				if (auto *RetInst = dyn_cast<ReturnInst>(BB.getTerminator()))
				RetValues[RetInst].push_back(getBBExitValue(&BB));

				// Finally erase load/stores.
				MemorySSAUpdater UMSSA(&MSSA);
				for (Value *U : make_early_inc_range(ArgInfo.users())) {
				assert(ArgInfo.isMyLoadOrStore(U));
				UMSSA.removeMemoryAccess(cast<Instruction>(U));
				cast<Instruction>(U)->eraseFromParent();
				arsenmUnsubmitted Done Reply Inline Actions I would assume this happens in the regular verifier with expensive checks? arsenm: I would assume this happens in the regular verifier with expensive checks?
				vpykhtinAuthorUnsubmitted Done Reply Inline Actions Those checks aren't enough here because I modify MSSA and then reuse it for the next promoted argument. vpykhtin: Those checks aren't enough here because I modify MSSA and then reuse it for the next promoted…
				}
				#ifndef NDEBUG
				MSSA.verifyMemorySSA();
				#endif
				}

				// Tries to promote [input/]output ptr arguments. It may happen that store
				// instructions for several arguments clobber one another, to solve this
				// an attempt to find an "unclobbering" promotion sequence is made.
				// For example:
				// store PtrArgA(may alias), 1;
				// store PtrArgB(may alias), 0; <- clobbers store PtrArgA
				//
				// First PtrArgB is promoted unclobbering PtrArgA which is promoted second.
				// Notice that it is only possible if such stores obey the same order in every
				// basic block, otherwise we cannot unclobber these at all. Promoted stores are
				// then placed in the caller in the same order making the transformation safe.
				//
				// This could be left for the following passes but it's better to perform such
				// unclobbering all at once not only because of compilation speed but it also
				// allows to simplify the return value of the function: otherwise we would have
				// to deal with an onion-like aggregated return type with a bulky INSERT_VALUE/
				// EXTRACT_VALUE sequence.
				Type *ArgumentPromoter::promoteInOutCandidates(
				SmallVectorImpl<ArgPromotionInfo> &Candidates,
				SmallVectorImpl<ArgPromotionInfo *> &RetValuesStoreOrder) {

				// Priority queue is ordered so that clobbered candidates pop last.
				struct ClobberedPopLast {
				// Returns true if its first argument comes before its second argument in a
				// weak ordering. But because the priority queue outputs largest elements
				// first, the elements that "come before" are actually output last.
				arsenmUnsubmitted Done Reply Inline Actions Demorgan this arsenm: Demorgan this
				bool operator()(const ArgPromotionInfo *A1,
				const ArgPromotionInfo *A2) const {
				assert(!A1->isClobberedBy(A2) \|\| !A2->isClobberedBy(A1));
				return A1->isClobberedBy(*A2);
				}
				};
				struct CandidateQueue
				: std::priority_queue<ArgPromotionInfo *,
				SmallVector<ArgPromotionInfo *, 4>,
				ClobberedPopLast> {
				CandidateQueue(SmallVectorImpl<ArgPromotionInfo> &Candidates) {
				// This might seem as a dirty hack but until ClobberedBy is set no order
				// on candidates can be established, so just store them as is
				for (auto &C : Candidates) {
				if (C.Return) {
				assert(!C.ClobberedBy); // but let's be carefull.
				c.push_back(&C);
				}
				}
				}
				// This is placed here because priority_queue container is protected.
				ArgPromotionInfo findClobber(StoreInst SI) const {
				auto Clobber =
				std::find_if(c.begin(), c.end(), [SI](const ArgPromotionInfo *A) {
				return A->isMyStore(SI);
				});
				return Clobber != c.end() ? *Clobber : nullptr;
				}
				} Queue(Candidates);

				RetValuesMap RetValues;
				unsigned NumPromoted = 0;
				while (!Queue.empty()) {
				ArgPromotionInfo &C = *Queue.top();
				Queue.pop();
				if (C.ClobberedBy && !C.ClobberedBy->isPromoted()) // [1]
				continue; // the clobber isn't gone
				MemoryAccess *ClobberMA = getInOutArgClobber(C);
				if (MSSA.isLiveOnEntryDef(ClobberMA)) {
				promoteInOutArg(C, RetValues);
				// ReturnValueIndex is used as the index of the arg's value in the map
				// up until this function's exit, see below.
				C.ReturnValueIndex = NumPromoted++;
				continue;
				}
				MemoryDef *MDef = dyn_cast_or_null<MemoryDef>(ClobberMA);
				if (!MDef)
				continue;
				// If the clobbering store belongs to another candidate in the queue
				// enqueue the current candidate back with the ClobberedBy set so we can
				// retry it after the clobbering candidate has been promoted.
				StoreInst *SI = dyn_cast<StoreInst>(MDef->getMemoryInst());
				if (!SI \|\| !SI->isSimple() \|\| C.isMyStore(SI))
				continue;
				if (ArgPromotionInfo *Clobber = Queue.findClobber(SI)) {
				C.ClobberedBy = Clobber;
				arsenmUnsubmitted Done Reply Inline Actions Capitalize arsenm: Capitalize
				if (!Clobber->isClobberedBy(C))
				Queue.push(&C);
				// Otherwise this is a circular dependency, other candidates will be
				// removed by the condition [1].
				}
				}

				Type *OldRetTy = F->getReturnType();
				if (!NumPromoted)
				return OldRetTy;

				SmallVector<Type *, 5> ReturnArgTypes;
				ReturnArgTypes.reserve(NumPromoted + 1);
				if (!OldRetTy->isVoidTy())
				ReturnArgTypes.push_back(OldRetTy);

				arsenmUnsubmitted Done Reply Inline Actions Braces arsenm: Braces
				SmallVector<ArgPromotionInfo *, 4> ReturnArgs;
				ReturnArgs.reserve(NumPromoted);
				for (ArgPromotionInfo &C : Candidates) {
				if (C.isPromoted()) {
				assert(C.Return);
				ReturnArgs.push_back(&C);
				ReturnArgTypes.push_back(C.ArgType);
				}
				}

				Type *RetTy = ReturnArgTypes.size() > 1
				? StructType::get(F->getContext(), ReturnArgTypes)
				: ReturnArgTypes.front();

				// Replace old return instructions using annotated return values.
				for (auto &P : RetValues) {
				ReturnInst *OldRetInst = P.first;
				const auto &Values = P.second;
				assert(Values.size() == NumPromoted);
				Value *RetValue;
				if (OldRetTy->isVoidTy() && NumPromoted == 1)
				RetValue = Values[0];
				else {
				SmallString<256> NameData;
				StringRef Name =
				(F->getName() + dot(OldRetInst->getParent()->getName()) + ".ret")
				arsenmUnsubmitted Done Reply Inline Actions Braces arsenm: Braces
				.toStringRef(NameData);
				RetValue = UndefValue::get(RetTy);
				unsigned I = 0;
				if (!OldRetTy->isVoidTy()) {
				RetValue = InsertValueInst::Create(
				RetValue, OldRetInst->getReturnValue(), {I++}, Name, OldRetInst);
				}
				for (const ArgPromotionInfo *C : ReturnArgs) {
				RetValue =
				InsertValueInst::Create(RetValue, Values[C->ReturnValueIndex], {I},
				Name + Twine(I), OldRetInst);
				++I;
				}
				}
				ReturnInst::Create(OldRetInst->getContext(), RetValue, OldRetInst);
				OldRetInst->eraseFromParent();
				}

				RetValuesStoreOrder.resize(NumPromoted);
				for (unsigned I = 0; I < NumPromoted; I++) {
				ArgPromotionInfo *C = ReturnArgs[I];
				RetValuesStoreOrder[NumPromoted - 1 - C->ReturnValueIndex] = C;
				// ReturnValueIndex is now the index in the aggregated return type.
				C->ReturnValueIndex = I + (OldRetTy->isVoidTy() ? 0 : 1);
				}
				return RetTy;
				}

				bool ArgumentPromoter::isInArgClobbered(const ArgPromotionInfo &ArgInfo) {
				LLVM_DEBUG(dbgs() << " Searching for a clobber for in arg " << *ArgInfo.Arg
				<< ": ");
				assert(!ArgInfo.Return && ArgInfo.Preload);
				auto *Walker = MSSA.getWalker();
				for (Value *U : ArgInfo.users()) {
				assert(ArgInfo.isMyLoad(U));
				LoadInst *LI = cast<LoadInst>(U);
				auto *ClobberMA = Walker->getClobberingMemoryAccess(LI);
				if (!MSSA.isLiveOnEntryDef(ClobberMA)) {
				LLVM_DEBUG(printClobber(dbgs(), ClobberMA, LI));
				return true;
				}
				}
				return false;
				}

				void ArgumentPromoter::promoteInArg(ArgPromotionInfo &ArgInfo) {
				assert(!ArgInfo.Return && ArgInfo.Preload);
				MemorySSAUpdater UMSSA(&MSSA);
				bool FirstAAMD = true;
				for (Value *U : make_early_inc_range(ArgInfo.users())) {
				assert(ArgInfo.isMyLoad(U));
				LoadInst *LI = cast<LoadInst>(U);
				if (FirstAAMD) {
				ArgInfo.AAMD = LI->getAAMetadata();
				FirstAAMD = false;
				} else if (ArgInfo.AAMD)
				ArgInfo.AAMD.merge(LI->getAAMetadata());
				LI->replaceAllUsesWith(ArgInfo.getOrCreatePreloadArgDummy());
				UMSSA.removeMemoryAccess(LI);
				LI->eraseFromParent();
				}
				#ifndef NDEBUG
				MSSA.verifyMemorySSA();
				#endif
				}

				// Create the function with the new signature.
				Function *ArgumentPromoter::createNewFunction(
				Function OldF, Type RetTy,
				const SmallVectorImpl<ArgPromotionInfo *> &PromotedArgs) {

				SmallVector<Type *, 8> Params;
				SmallVector<AttributeSet, 8> ParamAttr;
				AttributeList PAL = OldF->getAttributes();
				auto PA = PromotedArgs.begin();
				for (unsigned ArgNo = 0; ArgNo < OldF->arg_size(); ++ArgNo) {
				if (PA != PromotedArgs.end() && (*PA)->getArgNo() == ArgNo) {
				assert((PA)->isPromoted() \|\| (PA)->isUnusedArg());
				if ((*PA)->PreloadArgDummy) {
				Params.push_back((*PA)->ArgType);
				ParamAttr.push_back(AttributeSet());
				}
				++PA;
				} else {
				Params.push_back(OldF->getArg(ArgNo)->getType());
				ParamAttr.push_back(PAL.getParamAttrs(ArgNo));
				}
				}
				assert(PA == PromotedArgs.end());

				FunctionType *OldFTy = OldF->getFunctionType();
				FunctionType *NFTy = FunctionType::get(RetTy, Params, OldFTy->isVarArg());
				Function *NF = Function::Create(NFTy, OldF->getLinkage(),
				OldF->getAddressSpace(), OldF->getName());
				NF->copyAttributesFrom(OldF);
				NF->copyMetadata(OldF, 0);
				NF->setAttributes(AttributeList::get(OldF->getContext(), PAL.getFnAttrs(),
				PAL.getRetAttrs(), ParamAttr));

				// The new function will have the !dbg metadata copied from the original
				// function. The original function may not be deleted, and dbg metadata need
				// to be unique so we need to drop it.
				OldF->setSubprogram(nullptr);
				OldF->getParent()->getFunctionList().insert(OldF->getIterator(), NF);
				NF->takeName(OldF);
				NF->getBasicBlockList().splice(NF->begin(), OldF->getBasicBlockList());

				auto NewArgI = NF->arg_begin();
				PA = PromotedArgs.begin();
				for (unsigned ArgNo = 0; ArgNo < OldF->arg_size(); ++ArgNo) {
				Argument &OldArg = *OldF->getArg(ArgNo);
				if (PA != PromotedArgs.end() && (*PA)->getArgNo() == ArgNo) {
				assert((PA)->isPromoted() \|\| (PA)->isUnusedArg());
				if ((*PA)->PreloadArgDummy) {
				(*PA)->PreloadArgDummy->replaceAllUsesWith(NewArgI);
				NewArgI->setName((*PA)->getParamName());
				// Replace potential metadata uses (like llvm.dbg.value) with undef.
				OldArg.replaceAllUsesWith(UndefValue::get(OldArg.getType()));
				++NewArgI;
				}
				++PA;
				} else {
				OldArg.replaceAllUsesWith(&*NewArgI);
				NewArgI->takeName(&OldArg);
				++NewArgI;
				}
				}
				assert(PA == PromotedArgs.end());
				return NF;
				}

				// Promote callsite to call the new function signature inserting loads and
				// stores before and after the callsite.
				void ArgumentPromoter::promoteCallsite(
				CallBase &CB, Function *NF,
				const SmallVectorImpl<ArgPromotionInfo *> &PromotedArgs,
				const SmallVectorImpl<ArgPromotionInfo *> &RetValuesStoreOrder) {

				SmallVector<Value *, 16> Args;
				SmallVector<AttributeSet, 8> ArgsAttr;
				const AttributeList &CallPAL = CB.getAttributes();
				IRBuilder<NoFolder> IRB(&CB);
				auto PA = PromotedArgs.begin();
				for (unsigned ArgNo = 0; ArgNo < CB.arg_size(); ++ArgNo) {
				Value *CallOp = CB.getArgOperand(ArgNo);
				if (PA != PromotedArgs.end() && (*PA)->getArgNo() == ArgNo) {
				assert((PA)->isPromoted() \|\| (PA)->isUnusedArg());
				if ((*PA)->PreloadArgDummy) {
				Args.push_back((*PA)->createLoad(IRB, CallOp, CallOp->getName()));
				ArgsAttr.push_back(AttributeSet());
				}
				++PA;
				} else {
				Args.push_back(CallOp);
				ArgsAttr.push_back(CallPAL.getParamAttrs(ArgNo));
				}
				}
				assert(PA == PromotedArgs.end());

				SmallVector<OperandBundleDef, 1> OpBundles;
				CB.getOperandBundlesAsDefs(OpBundles);
				CallBase *NewCS = nullptr;
				if (InvokeInst *II = dyn_cast<InvokeInst>(&CB)) {
				NewCS = InvokeInst::Create(NF, II->getNormalDest(), II->getUnwindDest(),
				Args, OpBundles, "", &CB);
				} else {
				auto *NewCall = CallInst::Create(NF, Args, OpBundles, "", &CB);
				NewCall->setTailCallKind(cast<CallInst>(&CB)->getTailCallKind());
				NewCS = NewCall;
				}
				NewCS->setCallingConv(CB.getCallingConv());
				NewCS->copyMetadata(CB, {LLVMContext::MD_prof, LLVMContext::MD_dbg});
				NewCS->takeName(&CB);
				NewCS->setAttributes(AttributeList::get(
				NF->getContext(), CallPAL.getFnAttrs(), CallPAL.getRetAttrs(), ArgsAttr));

				if (RetValuesStoreOrder.empty()) {
				CB.replaceAllUsesWith(NewCS);
				return;
				}

				// Processing return values.
				bool OldRetTyIsVoid = CB.getCalledFunction()->getReturnType()->isVoidTy();
				if (OldRetTyIsVoid && RetValuesStoreOrder.size() == 1) {
				const ArgPromotionInfo *A = RetValuesStoreOrder.front();
				A->createStore(IRB, NewCS, CB.getArgOperand(A->getArgNo()));
				} else {
				if (!OldRetTyIsVoid && !CB.user_empty())
				CB.replaceAllUsesWith(
				IRB.CreateExtractValue(NewCS, {0}, NewCS->getName() + ".ret"));
				for (const ArgPromotionInfo *A : RetValuesStoreOrder) {
				Value *CallOp = CB.getArgOperand(A->getArgNo());
				Value *RetVal = IRB.CreateExtractValue(NewCS, {A->ReturnValueIndex},
				CallOp->getName() + ".val.ret");
				A->createStore(IRB, RetVal, CallOp);
				}
				}
				}

				// Try to promote function argument candidates and update callsites.
				Function *ArgumentPromoter::run(SmallVectorImpl<ArgPromotionInfo> &Candidates) {
				// Reload MSSA uncached walks constraint.
				NumMSSAWalksLeft = MaxMSSAWalksNum * Candidates.size();

				SmallVector<ArgPromotionInfo *, 4> RetValuesStoreOrder;
				Type *RetType = promoteInOutCandidates(Candidates, RetValuesStoreOrder);

				SmallVector<ArgPromotionInfo *, 4> PromotedArgs;
				for (ArgPromotionInfo &C : Candidates) {
				if (C.Return) {
				++NumInOutArgCandidates;
				if (C.isPromoted()) {
				PromotedArgs.push_back(&C);
				++NumInOutArgPromoted;
				}
				} else if (C.Preload) {
				++NumInArgCandidates;
				if (!isInArgClobbered(C)) {
				promoteInArg(C);
				PromotedArgs.push_back(&C);
				++NumInArgPromoted;
				}
				} else {
				assert(C.isUnusedArg());
				PromotedArgs.push_back(&C); // Will be removed from the func signature.
				}
				}

				if (PromotedArgs.empty())
				return nullptr;

				Function *NF = createNewFunction(F, RetType, PromotedArgs);

				// Update callsites.
				for (auto *U : make_early_inc_range(F->users())) {
				assert(isa<CallBase>(U));
				CallBase &CB = *cast<CallBase>(U);
				assert(CB.getCalledFunction() == F && CB.getParent()->getParent() != F);
				promoteCallsite(CB, NF, PromotedArgs, RetValuesStoreOrder);
				CB.eraseFromParent();
				}
				return NF;
				}

				// This method checks the specified function to see if there're any
				// promotable arguments and if it is safe to promote the function (for
				// example, all callers are direct) and performs the promotion.
				static Function promoteArguments(Function F, FunctionAnalysisManager &FAM) {
				// Don't perform argument promotion for naked functions; otherwise we can end
				// up removing parameters that are seemingly 'not used' as they are referred
				// to in the assembly.
				if (F->hasFnAttribute(Attribute::Naked))
				return nullptr;

				// Make sure that it is local to this module.
				if (!F->hasLocalLinkage())
				return nullptr;

				// Don't promote arguments for variadic functions. Adding, removing, or
				// changing non-pack parameters can change the classification of pack
				// parameters. Frontends encode that classification at the call site in the
				// IR, while in the callee the classification is determined dynamically based
				// on the number of registers consumed so far.
				if (F->isVarArg())
				return nullptr;

				// Don't transform functions that receive inallocas, as the transformation may
				// not be safe depending on calling convention.
				if (F->getAttributes().hasAttrSomewhere(Attribute::InAlloca))
				return nullptr;

				// See if there are any pointer arguments.
				if (F->args().end() == find_if(F->args(), [](Argument &A) {
				return A.getType()->isPointerTy();
				}))
				return nullptr;

				LLVM_DEBUG(dbgs() << "Trying to promote arguments for " << F->getName()
				<< '\n');

				// If the function has attributes for the return value they most likely
				// would not make sense for the aggregated return value, so we discard any
				// in/out arguments. The same applies to the return attributes at callsites.
				bool InArgsOnly = F->getAttributes().getRetAttrs().hasAttributes();

				for (Use &U : F->uses()) {
				CallBase *CB = dyn_cast<CallBase>(U.getUser());
				// Must be a direct call.
				if (CB == nullptr \|\| !CB->isCallee(&U)) // [1]
				return nullptr;

				// Can't change signature of musttail callee
				if (CB->isMustTailCall())
				return nullptr;

				if (!InArgsOnly && CB->getAttributes().getRetAttrs().hasAttributes())
				InArgsOnly = true;
				}

				// Can't change signature of musttail caller
				for (BasicBlock &BB : *F)
				if (BB.getTerminatingMustTailCall())
				return nullptr;

				SmallVector<ArgPromotionInfo, 4> Candidates;
				for (Argument &A : F->args())
				if (A.getType()->isPointerTy())
				getPromotionCandidates(FAM, &A, Candidates, InArgsOnly);

				if (Candidates.empty())
				return nullptr;

				{ // Make sure preloaded arguments are ABI compatible.
				// TODO: Check individual arguments so we can promote a subset?
				SmallVector<Type *, 32> Types;
				for (auto &C : Candidates) {
				if (C.Preload)
				Types.push_back(C.ArgType);
				}
				if (!Types.empty()) {
				const TargetTransformInfo &TTI = FAM.getResult<TargetIRAnalysis>(*F);
				for (const Use &U : F->uses()) {
				CallBase *CB = cast<CallBase>(U.getUser()); // due to check [1]
				if (!TTI.areTypesABICompatible(CB->getCaller(), F, Types))
				return nullptr;
				}
				}
				}

				return ArgumentPromoter(F, FAM).run(Candidates);
				}

				PreservedAnalyses MSSAArgPromotionPass::run(LazyCallGraph::SCC &C,
				CGSCCAnalysisManager &AM,
				LazyCallGraph &CG,
				CGSCCUpdateResult &UR) {
				bool Changed = false, LocalChange;
				do { // Iterate until we stop promoting from this SCC.
				LocalChange = false;
				for (LazyCallGraph::Node &N : C) {
				Function &OldF = N.getFunction();
				FunctionAnalysisManager &FAM =
				AM.getResult<FunctionAnalysisManagerCGSCCProxy>(C, CG).getManager();
				if (Function *NewF = promoteArguments(&OldF, FAM)) {
				// Directly substitute the functions in the call graph. Note that this
				// requires the old function to be completely dead and completely
				// replaced by the new function. It does no call graph updates, it
				// merely swaps out the particular function mapped to a particular node
				// in the graph.
				C.getOuterRefSCC().replaceNodeFunction(N, *NewF);
				FAM.clear(OldF, OldF.getName());
				OldF.eraseFromParent();
				LocalChange = true;
				}
				}
				Changed \|= LocalChange;
				} while (LocalChange);

				if (!Changed)
				return PreservedAnalyses::all();

				return PreservedAnalyses::none(); // Since the function signature is changed.
				}

				namespace {
				struct MSSAArgPromotion : public CallGraphSCCPass {
				static char ID;

				FunctionAnalysisManager FAM;

				explicit MSSAArgPromotion() : CallGraphSCCPass(ID) {
				initializeMSSAArgPromotionPass(*PassRegistry::getPassRegistry());
				FAM.registerPass([&] { return PassInstrumentationAnalysis(); });
				FAM.registerPass([&] { return TargetIRAnalysis(); });
				FAM.registerPass([&] { return TargetLibraryAnalysis(); });
				FAM.registerPass([&] { return AAManager(); });
				FAM.registerPass([&] { return DominatorTreeAnalysis(); });
				FAM.registerPass([&] { return MemorySSAAnalysis(); });
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				CallGraphSCCPass::getAnalysisUsage(AU);
				}

				bool runOnSCC(CallGraphSCC &SCC) override;
				};
				} // end anonymous namespace

				char MSSAArgPromotion::ID = 0;

				rampitecUnsubmitted Not Done Reply Inline Actions Don't you need to add at least MemorySSAWrapperPass and AAResultsWrapperPass? rampitec: Don't you need to add at least MemorySSAWrapperPass and AAResultsWrapperPass?
				vpykhtinAuthorUnsubmitted Done Reply Inline Actions Well I cannot use it because this is a CallGraph pass and those analyses are available per function. Instead I use FunctionAnalysisManager to get per function results, the drawback is that I have to invalidate it. However this is only for the legacy pass manager. vpykhtin: Well I cannot use it because this is a CallGraph pass and those analyses are available per…
				INITIALIZE_PASS_BEGIN(MSSAArgPromotion, "mssaargpromotion",
				"MSSA Promote 'by reference' arguments to scalars", false,
				false)
				INITIALIZE_PASS_DEPENDENCY(CallGraphWrapperPass)
				INITIALIZE_PASS_END(MSSAArgPromotion, "mssaargpromotion",
				"MSSA Promote 'by reference' arguments to scalars", false,
				false)

				Pass *llvm::createMSSAArgPromotionPass() { return new MSSAArgPromotion(); }

				bool MSSAArgPromotion::runOnSCC(CallGraphSCC &SCC) {
				if (skipSCC(SCC))
				return false;

				CallGraph &CG = getAnalysis<CallGraphWrapperPass>().getCallGraph();
				bool Changed = false, LocalChange;
				do {
				LocalChange = false;
				for (CallGraphNode *OldNode : SCC) {
				Function *OldF = OldNode->getFunction();
				if (!OldF)
				continue;

				// Clear FAM but preserve immutable results.
				FAM.invalidate(*OldF, PreservedAnalyses::none());

				if (Function *NewF = promoteArguments(OldF, FAM)) {
				LocalChange = true;

				// Update the call graph for the newly promoted function.
				CallGraphNode *NewNode = CG.getOrInsertFunction(NewF);
				NewNode->stealCalledFunctionsFrom(OldNode);

				// Update call edges
				SmallDenseSet<CallGraphNode *> ClearedNodes;
				for (auto *U : make_early_inc_range(NewF->users())) {
				assert(isa<CallBase>(U));
				CallBase &CB = *cast<CallBase>(U);
				CallGraphNode *CallerNode = CG[CB.getParent()->getParent()];
				if (ClearedNodes.insert(CallerNode).second)
				CallerNode->removeAnyCallEdgeTo(OldNode);
				arsenmUnsubmitted Not Done Reply Inline Actions It's weird to have raw deletes in llvm code, why do you need this here? arsenm: It's weird to have raw deletes in llvm code, why do you need this here?
				vpykhtinAuthorUnsubmitted Done Reply Inline Actions It was taken from ArgumentPromotion code, will take a look if its really needed. vpykhtin: It was taken from ArgumentPromotion code, will take a look if its really needed.
				CallerNode->addCalledFunction(&CB, NewNode);
				}
				assert(OldNode->getNumReferences() == 0);
				delete CG.removeFunctionFromModule(OldNode);
				SCC.ReplaceNode(OldNode, NewNode);
				}
				}
				Changed \|= LocalChange;
				} while (LocalChange);
				return Changed;
				}

llvm/test/CodeGen/AMDGPU/opt-pipeline.ll

	Show First 20 Lines • Show All 764 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: CallGraph Construction			; GCN-O3-NEXT: CallGraph Construction
	; GCN-O3-NEXT: Globals Alias Analysis			; GCN-O3-NEXT: Globals Alias Analysis
	; GCN-O3-NEXT: Call Graph SCC Pass Manager			; GCN-O3-NEXT: Call Graph SCC Pass Manager
	; GCN-O3-NEXT: Remove unused exception handling info			; GCN-O3-NEXT: Remove unused exception handling info
	; GCN-O3-NEXT: Function Integration/Inlining			; GCN-O3-NEXT: Function Integration/Inlining
	; GCN-O3-NEXT: OpenMP specific optimizations			; GCN-O3-NEXT: OpenMP specific optimizations
	; GCN-O3-NEXT: Deduce function attributes			; GCN-O3-NEXT: Deduce function attributes
	; GCN-O3-NEXT: Promote 'by reference' arguments to scalars			; GCN-O3-NEXT: Promote 'by reference' arguments to scalars
				; GCN-O3-NEXT: MSSA Promote 'by reference' arguments to scalars
	; GCN-O3-NEXT: FunctionPass Manager			; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Dominator Tree Construction			; GCN-O3-NEXT: Dominator Tree Construction
	; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Memory SSA			; GCN-O3-NEXT: Memory SSA
	; GCN-O3-NEXT: AMDGPU Promote Kernel Arguments			; GCN-O3-NEXT: AMDGPU Promote Kernel Arguments
	; GCN-O3-NEXT: Infer address spaces			; GCN-O3-NEXT: Infer address spaces
	; GCN-O3-NEXT: AMDGPU Kernel Attributes			; GCN-O3-NEXT: AMDGPU Kernel Attributes
	▲ Show 20 Lines • Show All 282 Lines • Show Last 20 Lines

llvm/test/Transforms/ArgumentPromotion/inoutargs.ll

This file was added.

				; RUN: opt < %s -passes=mssaargpromotion -S \| FileCheck %s

				nikicUnsubmitted Not Done Reply Inline Actions Please use update_test_checks.py. nikic: Please use update_test_checks.py.
				;------- chain of calls
				; CHECK-LABEL: define internal i32 @inner_fx(i32 %P.val) {
				; CHECK-NEXT: %V = add i32 %P.val, 1
				; CHECK-NEXT: ret i32 %V
				define internal void @inner_fx(i32* %P) {
				%L = load i32, i32* %P;
				%V = add i32 %L, 1;
				store i32 %V, i32* %P
				ret void
				}

				; CHECK-LABEL: define internal i32 @outer_fx(i32 %P.val) {
				; CHECK-NEXT: %V1 = add i32 %P.val, 2
				; CHECK-NEXT: %1 = call i32 @inner_fx(i32 %V1)
				; CHECK-NEXT: %V2 = add i32 %1, 3
				; CHECK-NEXT: ret i32 %V2
				define internal void @outer_fx(i32* %P) {
				%L1 = load i32, i32* %P;
				%V1 = add i32 %L1, 2;
				store i32 %V1, i32* %P
				call void @inner_fx(i32* %P)
				%L2 = load i32, i32* %P;
				%V2 = add i32 %L2, 3;
				store i32 %V2, i32* %P
				ret void
				}

				; CHECK-LABEL: define void @test_chain_of_calls(i32* %P) {
				; CHECK-NEXT: %P.val.pre = load i32, i32* %P, align 4
				; CHECK-NEXT: %1 = call i32 @outer_fx(i32 %P.val.pre)
				; CHECK-NEXT: store i32 %1, i32* %P, align 4
				define void @test_chain_of_calls(i32* %P) {
				call void @outer_fx(i32* %P)
				ret void
				}


				;-------
				;CHECK-LABEL: define internal { i32, i32 } @test_not_all_path_store(i1 %c, i32 %P.val) {
				;CHECK-NEXT: br i1 %c, label %exit1, label %exit2
				;CHECK-LABEL: exit1:
				;CHECK-NEXT: %test_not_all_path_store.exit1.ret = insertvalue { i32, i32 } undef, i32 1, 0
				;CHECK-NEXT: %test_not_all_path_store.exit1.ret1 = insertvalue { i32, i32 } %test_not_all_path_store.exit1.ret, i32 42, 1
				;CHECK-NEXT: ret { i32, i32 } %test_not_all_path_store.exit1.ret1
				;CHECK-LABEL: exit2:
				;CHECK-NEXT: %test_not_all_path_store.exit2.ret = insertvalue { i32, i32 } undef, i32 2, 0
				;CHECK-NEXT: %test_not_all_path_store.exit2.ret1 = insertvalue { i32, i32 } %test_not_all_path_store.exit2.ret, i32 %P.val, 1
				;CHECK-NEXT: ret { i32, i32 } %test_not_all_path_store.exit2.ret1
				define internal i32 @test_not_all_path_store(i1 %c, i32* %P) {
				br i1 %c, label %exit1, label %exit2

				exit1:
				store i32 42, i32* %P
				ret i32 1

				exit2:
				ret i32 2
				}

				;CHECK-LABEL: define i32 @test_not_all_path_store_caller(i1 %c) {
				;CHECK-NEXT: %M = alloca i32, align 4
				;CHECK-NEXT: %M.val.pre = load i32, i32* %M, align 4
				;CHECK-NEXT: %R = call { i32, i32 } @test_not_all_path_store(i1 %c, i32 %M.val.pre)
				;CHECK-NEXT: %R.ret = extractvalue { i32, i32 } %R, 0
				;CHECK-NEXT: %M.val.ret = extractvalue { i32, i32 } %R, 1
				;CHECK-NEXT: store i32 %M.val.ret, i32* %M, align 4
				;CHECK-NEXT: %V = load i32, i32* %M, align 4
				;CHECK-NEXT: %Sum = add i32 %R.ret, %V
				;CHECK-NEXT: ret i32 %Sum
				define i32 @test_not_all_path_store_caller(i1 %c) {
				%M = alloca i32;
				%R = call i32 @test_not_all_path_store(i1 %c, i32* %M)
				%V = load i32, i32* %M
				%Sum = add i32 %R, %V
				ret i32 %Sum
				}

				;------- test that clobber of L2 load by P1 store is detected
				;CHECK-LABEL: define internal void @test_getInOutArgClobber_visited
				define internal void @test_getInOutArgClobber_visited(i1 %c, i32* %P1, i32* %P2) { ; P1 may alias P2
				br i1 %c, label %left, label %right

				;CHECK-LABEL: left:
				;CHECK-NEXT: store
				left:
				store i32 1, i32* %P1 ; clobbers L2 load
				br i1 %c, label %exit1, label %exit2

				right:
				br i1 %c, label %exit1, label %exit2

				exit1:
				%L1 = load i32, i32* %P1
				ret void

				exit2:
				%L2 = load i32, i32* %P2
				ret void
				}

				define void @test_getInOutArgClobber_visited_caller(i1 %c, i32* %P2) {
				%M = alloca i32
				call void @test_getInOutArgClobber_visited(i1 %c, i32* %M, i32* %P2);
				ret void
				}

				;------- check store clobbering other loads

				;CHECK-LABEL: define internal void @test_store_clobber1
				define internal void @test_store_clobber1(i32* %P1, i32* %P2, i32* %P3) { ; P1, P2, P3 may alias
				;CHECK: store i32 42, i32* %P1
				;CHECK: store i32 1, i32* %P3
				;CHECK: %V2 = load i32, i32* %P2
				store i32 42, i32* %P1 ; this store clobbers V2 load (e.g. P1 == P2 != P3)
				store i32 1, i32* %P3
				%V2 = load i32, i32* %P2
				store i32 43, i32* %P1 ; this store makes sure that P1 pointee isn't clobbered by P3 store
				ret void
				}

				define void @test_store_clobber1_caller(i32* %P1, i32* %P2, i32* %P3) {
				call void @test_store_clobber1(i32* %P1, i32* %P2, i32* %P3)
				ret void
				}


				;CHECK-LABEL: define internal { i32, i32 } @test_store_no_clobber1
				define internal void @test_store_no_clobber1(i32* %P1, i32* %P2) { ; P1 may alias P2
				store i32 42, i32* %P1
				store i32 1, i32* %P2
				%V2 = load i32, i32* %P2
				store i32 43, i32* %P1 ; this store makes sure that P1 pointee isn't clobbered by P2 store
				ret void
				}

				;CHECK-LABEL: define void @test_store_no_clobber1_caller
				define void @test_store_no_clobber1_caller(i32* %P1, i32* %P2) {
				call void @test_store_no_clobber1(i32* %P1, i32* %P2)
				;CHECK: %1 = call { i32, i32 } @test_store_no_clobber1
				; store P2 and then P1 to preserve order in @test_store_no_clobber1
				;CHECK: store i32 %P2
				;CHECK: store i32 %P1
				ret void
				}


				;CHECK-LABEL: define internal { i32, i32 } @test_store_diamond_clobber1
				define internal void @test_store_diamond_clobber1(i1 %c, i32* %P1, i32* %P2) { ; P1 may alias P2
				br i1 %c, label %st1, label %st2
				;CHECK-LABEL: st1:
				;CHECK-NEXT: br
				st1:
				store i32 42, i32* %P1
				br label %exit

				st2:
				br label %exit

				exit:
				store i32 1, i32* %P2
				%V2 = load i32, i32* %P2
				store i32 43, i32* %P1 ; this store makes sure that P1 pointee isn't clobbered by P2 store
				ret void
				}

				;CHECK-LABEL: define void @test_store_diamond_clobber1_caller
				define void @test_store_diamond_clobber1_caller(i1 %c, i32* %P1, i32* %P2) {
				call void @test_store_diamond_clobber1(i1 %c, i32* %P1, i32* %P2)
				;CHECK: %1 = call { i32, i32 } @test_store_diamond_clobber1
				; store P2 and then P1 to preserve order in @test_store_diamond_clobber1
				;CHECK: store i32 %P2
				;CHECK: store i32 %P1
				ret void
				}


				;CHECK-LABEL: define internal { i32, i32 } @test_store_diamond_clobber2
				define internal void @test_store_diamond_clobber2(i1 %c, i32* %P1, i32* %P2) { ; P1 may alias P2
				;CHECK-NEXT: br
				store i32 42, i32* %P1
				store i32 1, i32* %P2
				%V2 = load i32, i32* %P2
				br i1 %c, label %st1, label %st2

				st1:
				br label %exit

				st2:
				br label %exit

				exit:
				store i32 43, i32* %P1 ; this store makes sure that P1 pointee isn't clobbered by P2 store
				ret void
				}

				;CHECK-LABEL: define void @test_store_diamond_clobber2_caller
				define void @test_store_diamond_clobber2_caller(i1 %c, i32* %P1, i32* %P2) {
				call void @test_store_diamond_clobber2(i1 %c, i32* %P1, i32* %P2)
				;CHECK: %1 = call { i32, i32 } @test_store_diamond_clobber2
				; store P2 and then P1 to preserve order in @test_store_diamond_clobber2
				;CHECK: store i32 %P2
				;CHECK: store i32 %P1
				ret void
				}


				;CHECK-LABEL: define internal { i32, i32 } @test_store_diamond_clobber3
				define internal void @test_store_diamond_clobber3(i1 %c, i32* %P1, i32* %P2) { ; P1 may alias P2
				;CHECK-NEXT: br
				store i32 42, i32* %P1
				store i32 1, i32* %P2
				br i1 %c, label %st1, label %st2

				;CHECK-LABEL: st1:
				;CHECK-NEXT: br
				st1:
				%V2 = load i32, i32* %P2
				br label %exit

				st2:
				br label %exit

				exit:
				store i32 43, i32* %P1 ; this store makes sure that P1 pointee isn't clobbered by P2 store
				ret void
				}

				;CHECK-LABEL: define void @test_store_diamond_clobber3_caller
				define void @test_store_diamond_clobber3_caller(i1 %c, i32* %P1, i32* %P2) {
				call void @test_store_diamond_clobber3(i1 %c, i32* %P1, i32* %P2)
				;CHECK: %1 = call { i32, i32 } @test_store_diamond_clobber3
				; store P2 and then P1 to preserve order in @test_store_diamond_clobber3
				;CHECK: store i32 %P2
				;CHECK: store i32 %P1
				ret void
				}


				;CHECK-LABEL: define internal i32 @test_store_diamond_clobber4
				define internal void @test_store_diamond_clobber4(i1 %c, i32* %P1, i32* %P2) { ; P1 may alias P2
				;CHECK-NEXT: br
				store i32 42, i32* %P1
				br i1 %c, label %st1, label %st2

				st1:
				store i32 1, i32* %P2
				%V2 = load i32, i32* %P2
				br label %exit

				st2:
				br label %exit

				exit:
				store i32 43, i32* %P1 ; this store makes sure that P1 pointee isn't clobbered by P2 store
				ret void
				}

				define void @test_store_diamond_clobber4_caller(i1 %c, i32* %P1, i32* %P2) {
				call void @test_store_diamond_clobber4(i1 %c, i32* %P1, i32* %P2)
				ret void
				}


				;CHECK-LABEL: define internal void @test_store_diamond_clobber5
				define internal void @test_store_diamond_clobber5(i1 %c, i32* %P1, i32* %P2) { ; P1 may alias P2
				;CHECK-NEXT: store
				store i32 42, i32* %P1 ; clobbers V2 load on st2 path
				br i1 %c, label %st1, label %st2

				;CHECK-LABEL: st1:
				;CHECK-NEXT: store
				st1:
				store i32 1, i32* %P2
				br label %exit

				st2:
				br label %exit

				exit:
				%V2 = load i32, i32* %P2
				store i32 43, i32* %P1 ; this store makes sure that P1 pointee isn't clobbered by P2 store
				ret void
				}

				define void @test_store_diamond_clobber5_caller(i1 %c, i32* %P1, i32* %P2) {
				call void @test_store_diamond_clobber5(i1 %c, i32* %P1, i32* %P2)
				ret void
				}


				;CHECK-LABEL: define internal i32 @test_store_diamond_clobber6
				define internal void @test_store_diamond_clobber6(i1 %c, i32* %P1, i32* %P2) { ; P1 may alias P2
				;CHECK-NEXT: br
				br i1 %c, label %st1, label %st2

				;CHECK-LABEL: st1:
				;CHECK-NEXT: store i32 1, i32* %P2
				st1:
				store i32 42, i32* %P1
				store i32 1, i32* %P2
				br label %exit

				st2:
				br label %exit

				exit:
				%V2 = load i32, i32* %P2
				store i32 43, i32* %P1 ; this store makes sure that P1 pointee isn't clobbered by P2 store
				ret void
				}

				define void @test_store_diamond_clobber6_caller(i1 %c, i32* %P1, i32* %P2) {
				call void @test_store_diamond_clobber6(i1 %c, i32* %P1, i32* %P2)
				ret void
				}


				;------- check clobbering in diamond

				;CHECK-LABEL: define internal void @test_clobber1
				define internal void @test_clobber1(i1 %c, i32* %P1, i32* %P2) { ; P1 may alias P2
				;CHECK-NEXT: store
				store i32 42, i32* %P2 ; clobbered by P1 stores, cannot promote
				%V1 = load i32, i32* %P1 ; clobbered by P2 store above, cannot promote
				br i1 %c, label %st1, label %st2

				;CHECK-LABEL: st1:
				;CHECK-NEXT: store
				st1:
				store i32 1, i32* %P1
				br label %exit

				;CHECK-LABEL: st2:
				;CHECK-NEXT: store
				st2:
				store i32 2, i32* %P1
				br label %exit

				;CHECK-LABEL: exit:
				;CHECK-NEXT: load
				exit:
				%V2 = load i32, i32* %P1
				ret void
				}

				define void @test_clobber1_caller(i1 %c, i32* %P1, i32* %P2) {
				call void @test_clobber1(i1 %c, i32* %P1, i32* %P2)
				ret void
				}


				;CHECK-LABEL: define internal { i32, i32 } @test_clobber2
				define internal void @test_clobber2(i1 %c, i32* %P1, i32* %P2) { ; P1 may alias P2
				%V1 = load i32, i32* %P1 ; no clobber
				store i32 42, i32* %P2 ; clobbered by P1 stores, but unclobbered by P1 promotion
				br i1 %c, label %st1, label %st2

				;CHECK-LABEL: st1:
				;CHECK-NEXT: br
				st1:
				store i32 1, i32* %P1
				br label %exit

				;CHECK-LABEL: st2:
				;CHECK-NEXT: br
				st2:
				store i32 2, i32* %P1
				br label %exit

				exit:
				%V2 = load i32, i32* %P1 ; no clobber as every path writes by P1
				ret void
				}

				;CHECK-LABEL: define void @test_clobber2_caller
				define void @test_clobber2_caller(i1 %c, i32* %P1, i32* %P2) {
				call void @test_clobber2(i1 %c, i32* %P1, i32* %P2)
				;CHECK: %1 = call { i32, i32 } @test_clobber2
				; store P2 and then P1 to preserve order in @test_clobber2
				;CHECK: store i32 %P2
				;CHECK: store i32 %P1
				ret void
				}


				;CHECK-LABEL: define internal i32 @test_clobber3
				define internal void @test_clobber3(i1 %c, i32* %P1, i32* %P2) { ; P1 may alias P2
				%V1 = load i32, i32* %P1 ; no clobber
				br i1 %c, label %st1, label %st2

				;CHECK-LABEL: st1:
				;CHECK-NEXT: store i32 42, i32* %P2
				;CHECK-NEXT: br
				st1:
				; P2 isn't selected for promotion: not all paths have stores and
				; not a valid threal-local ptr
				store i32 42, i32* %P2
				store i32 1, i32* %P1
				br label %exit

				;CHECK-LABEL: st2:
				;CHECK-NEXT: br
				st2:
				store i32 2, i32* %P1
				br label %exit

				exit:
				%V2 = load i32, i32* %P1 ; no clobber as every path writes by P1
				ret void
				}

				;CHECK-LABEL: define void @test_clobber3_caller
				define void @test_clobber3_caller(i1 %c, i32* %P1, i32* %P2) {
				call void @test_clobber3(i1 %c, i32* %P1, i32* %P2)
				;CHECK: %1 = call i32 @test_clobber3
				;CHECK: store i32 %1, i32* %P1
				ret void
				}


				;CHECK-LABEL: define internal void @test_clobber4
				define internal void @test_clobber4(i1 %c, i32* %P1, i32* %P2) { ; P1 may alias P2
				%V1 = load i32, i32* %P1 ; no clobber
				br i1 %c, label %st1, label %st2

				;CHECK-LABEL: st1:
				;CHECK-NEXT: store i32 1, i32* %P1
				;CHECK-NEXT: store i32 42, i32* %P2
				;CHECK-NEXT: br
				st1:
				store i32 1, i32* %P1
				; P2 isn't selected for promotion: not all paths have stores and
				; not a valid threal-local ptr
				store i32 42, i32* %P2
				br label %exit

				;CHECK-LABEL: st2:
				;CHECK-NEXT: store i32 2, i32* %P1
				st2:
				store i32 2, i32* %P1
				br label %exit

				exit:
				%V2 = load i32, i32* %P1 ; clobbered by P2 write
				ret void
				}

				define void @test_clobber4_caller(i1 %c, i32* %P1, i32* %P2) {
				call void @test_clobber4(i1 %c, i32* %P1, i32* %P2)
				ret void
				}


				;CHECK-LABEL: define internal void @test_clobber5
				define internal void @test_clobber5(i1 %c, i32* %P1, i32* %P2) { ; P1 may alias P2
				%V1 = load i32, i32* %P1 ; no clobber
				br i1 %c, label %st1, label %st2

				;CHECK-LABEL: st1:
				;CHECK-NEXT: store
				st1:
				store i32 1, i32* %P1
				br label %exit

				;CHECK-LABEL: st2:
				;CHECK-NEXT: store
				st2:
				store i32 2, i32* %P1
				br label %exit

				exit:
				store i32 42, i32* %P2 ; clobbers V2 load
				%V2 = load i32, i32* %P1 ; clobbered by P2 write
				ret void
				}

				define void @test_clobber5_caller(i1 %c, i32* %P1, i32* %P2) {
				call void @test_clobber5(i1 %c, i32* %P1, i32* %P2)
				ret void
				}


				;CHECK-LABEL: define internal { i32, i32 } @test_clobber6
				define internal void @test_clobber6(i1 %c, i32* %P1, i32* %P2) { ; P1 may alias P2
				%V1 = load i32, i32* %P1 ; no clobber
				br i1 %c, label %st1, label %st2

				;CHECK-LABEL: st1:
				;CHECK-NEXT: br
				st1:
				store i32 1, i32* %P1
				br label %exit

				;CHECK-LABEL: st2:
				;CHECK-NEXT: br
				st2:
				store i32 2, i32* %P1
				br label %exit

				;CHECK-LABEL: exit:
				;CHECK-NOT: load
				exit:
				%V2 = load i32, i32* %P1
				; P1 pointee is clobbered by P2 write, but unclobbered after P2 promotion
				store i32 42, i32* %P2
				ret void
				}

				;CHECK-LABEL: define void @test_clobber6_caller
				define void @test_clobber6_caller(i1 %c, i32* %P1, i32* %P2) {
				call void @test_clobber6(i1 %c, i32* %P1, i32* %P2)
				;CHECK: %1 = call { i32, i32 } @test_clobber6
				;CHECK: store i32 %P1
				;CHECK: store i32 %P2
				ret void
				}

				;------- check clobbering in loops

				;CHECK-LABEL: define internal void @test_loop_clobber1
				define internal void @test_loop_clobber1(i32 %n, i32* %P1, i32* %P2) { ; P1 may alias P2
				;CHECK-LABEL: entry:
				;CHECK-NEXT: store i32 42, i32* %P2
				;CHECK-NEXT: %V1 = load i32, i32* %P1
				entry:
				store i32 42, i32* %P2 ; clobbers V1 load
				%V1 = load i32, i32* %P1
				store i32 1, i32* %P1 ; clobbers P2 store
				br label %loop_header

				loop_header:
				%i = phi i32 [%i.next, %loop], [%n, %entry]
				%c = icmp eq i32 %i, 0
				br i1 %c, label %exit, label %loop

				loop:
				store i32 2, i32* %P1
				%i.next = sub i32 %i, 1
				br label %loop_header

				exit:
				%V2 = load i32, i32* %P1
				ret void
				}

				define void @test_loop_clobber1_caller(i32 %n, i32* %P1, i32* %P2) {
				call void @test_loop_clobber1(i32 %n, i32* %P1, i32* %P2)
				ret void
				}


				;CHECK-LABEL: define internal { i32, i32 } @test_loop_clobber2
				define internal void @test_loop_clobber2(i32 %n, i32* %P1, i32* %P2) { ; P1 may alias P2
				;CHECK-LABEL: entry:
				;CHECK-NEXT: br
				entry:
				%V1 = load i32, i32* %P1
				store i32 42, i32* %P2 ; clobbered by P1 store, but then unclobbered
				store i32 1, i32* %P1
				br label %loop_header

				loop_header:
				%i = phi i32 [%i.next, %loop], [%n, %entry]
				%c = icmp eq i32 %i, 0
				br i1 %c, label %exit, label %loop

				;CHECK-LABEL: loop:
				;CHECK-NEXT: %i.next = sub i32 %i, 1
				loop:
				store i32 2, i32* %P1
				%i.next = sub i32 %i, 1
				br label %loop_header

				exit:
				%V2 = load i32, i32* %P1
				ret void
				}

				;CHECK-LABEL: define void @test_loop_clobber2_caller
				define void @test_loop_clobber2_caller(i32 %n, i32* %P1, i32* %P2) {
				call void @test_loop_clobber2(i32 %n, i32* %P1, i32* %P2)
				;CHECK: %1 = call { i32, i32 } @test_loop_clobber2
				;CHECK: store i32 %P2
				;CHECK: store i32 %P1
				ret void
				}


				;CHECK-LABEL: define internal void @test_loop_clobber3
				define internal void @test_loop_clobber3(i32 %n, i32* %P1, i32* %P2) { ; P1 may alias P2
				;CHECK-LABEL: entry:
				;CHECK-NEXT: %V1 = load i32, i32* %P1
				;CHECK-NEXT: store i32 1, i32* %P1
				;CHECK-NEXT: store i32 42, i32* %P2
				entry:
				%V1 = load i32, i32* %P1
				store i32 1, i32* %P1
				store i32 42, i32* %P2 ; clobbered by P1 store in loop BB
				br label %loop_header

				loop_header:
				%i = phi i32 [%i.next, %loop], [%n, %entry]
				%c = icmp eq i32 %i, 0
				br i1 %c, label %exit, label %loop

				loop:
				store i32 2, i32* %P1
				%i.next = sub i32 %i, 1
				br label %loop_header

				exit:
				%V2 = load i32, i32* %P1 ; clobbered by P2 store
				ret void
				}

				define void @test_loop_clobber3_caller(i32 %n, i32* %P1, i32* %P2) {
				call void @test_loop_clobber3(i32 %n, i32* %P1, i32* %P2)
				ret void
				}


				;CHECK-LABEL: define internal void @test_loop_clobber4
				define internal void @test_loop_clobber4(i32 %n, i32* %P1, i32* %P2) { ; P1 may alias P2
				;CHECK-LABEL: entry:
				;CHECK-NEXT: %V1 = load i32, i32* %P1
				;CHECK-NEXT: store i32 1, i32* %P1
				entry:
				%V1 = load i32, i32* %P1
				store i32 1, i32* %P1
				br label %loop_header

				;CHECK-LABEL: loop_header:
				;CHECK: store i32 42, i32* %P2
				loop_header:
				%i = phi i32 [%i.next, %loop], [%n, %entry]
				store i32 42, i32* %P2 ; clobbered by P1 store in loop BB
				%c = icmp eq i32 %i, 0
				br i1 %c, label %exit, label %loop

				loop:
				store i32 2, i32* %P1
				%i.next = sub i32 %i, 1
				br label %loop_header

				exit:
				%V2 = load i32, i32* %P1 ; clobbered by P2 store
				ret void
				}

				define void @test_loop_clobber4_caller(i32 %n, i32* %P1, i32* %P2) {
				call void @test_loop_clobber4(i32 %n, i32* %P1, i32* %P2)
				ret void
				}


				;CHECK-LABEL: define internal i32 @test_loop_clobber5
				define internal void @test_loop_clobber5(i32 %n, i32* %P1, i32* %P2) { ; P1 may alias P2
				;CHECK-LABEL: entry:
				;CHECK-NEXT: br
				entry:
				%V1 = load i32, i32* %P1
				store i32 1, i32* %P1
				br label %loop_header

				;CHECK-LABEL: loop_header:
				;CHECK-NEXT: %P1.val.loop_header.phi = phi i32 [ 2, %loop ], [ 1, %entry ]
				loop_header:
				%i = phi i32 [%i.next, %loop], [%n, %entry]
				%c = icmp eq i32 %i, 0
				br i1 %c, label %exit, label %loop

				;CHECK-LABEL: loop:
				;CHECK-NEXT: store i32 42, i32* %P2
				;CHECK-NEXT: %i.next = sub i32 %i, 1
				loop:
				store i32 42, i32* %P2 ; not selected for promotion (no stores at every path)
				store i32 2, i32* %P1
				%i.next = sub i32 %i, 1
				br label %loop_header

				exit:
				%V2 = load i32, i32* %P1
				ret void
				}

				;CHECK-LABEL: define void @test_loop_clobber5_caller
				define void @test_loop_clobber5_caller(i32 %n, i32* %P1, i32* %P2) {
				call void @test_loop_clobber5(i32 %n, i32* %P1, i32* %P2)
				;CHECK: %1 = call i32 @test_loop_clobber5
				;CHECK: store i32 %1, i32* %P1
				ret void
				}


				;CHECK-LABEL: define internal void @test_loop_clobber6
				define internal void @test_loop_clobber6(i32 %n, i32* %P1, i32* %P2) { ; P1 may alias P2
				;CHECK-LABEL: entry:
				;CHECK-NEXT: %V1 = load i32, i32* %P1
				;CHECK-NEXT: store i32 1, i32* %P1
				entry:
				%V1 = load i32, i32* %P1
				store i32 1, i32* %P1
				br label %loop_header

				loop_header:
				%i = phi i32 [%i.next, %loop], [%n, %entry]
				%c = icmp eq i32 %i, 0
				br i1 %c, label %exit, label %loop

				;CHECK-LABEL: loop:
				;CHECK-NEXT: store i32 2, i32* %P1
				;CHECK-NEXT: store i32 42, i32* %P2
				loop:
				store i32 2, i32* %P1
				store i32 42, i32* %P2 ; not selected for promotion (no stores at every path)
				%i.next = sub i32 %i, 1
				br label %loop_header

				exit:
				%V2 = load i32, i32* %P1 ; clobbered by P2 store
				ret void
				}

				define void @test_loop_clobber6_caller(i32 %n, i32* %P1, i32* %P2) {
				call void @test_loop_clobber6(i32 %n, i32* %P1, i32* %P2)
				ret void
				}


				;CHECK-LABEL: define internal void @test_loop_clobber7
				define internal void @test_loop_clobber7(i32 %n, i32* %P1, i32* %P2) { ; P1 may alias P2
				;CHECK-LABEL: entry:
				;CHECK-NEXT: %V1 = load i32, i32* %P1
				;CHECK-NEXT: store i32 1, i32* %P1
				entry:
				%V1 = load i32, i32* %P1
				store i32 1, i32* %P1
				br label %loop_header

				loop_header:
				%i = phi i32 [%i.next, %loop], [%n, %entry]
				%c = icmp eq i32 %i, 0
				br i1 %c, label %exit, label %loop

				loop:
				store i32 2, i32* %P1
				%i.next = sub i32 %i, 1
				br label %loop_header

				;CHECK-LABEL: exit:
				;CHECK-NEXT: store i32 42, i32* %P2
				exit:
				store i32 42, i32* %P2 ; clobbers V2 load
				%V2 = load i32, i32* %P1 ; clobbered by P2 store
				ret void
				}

				define void @test_loop_clobber7_caller(i32 %n, i32* %P1, i32* %P2) {
				call void @test_loop_clobber7(i32 %n, i32* %P1, i32* %P2)
				ret void
				}


				;CHECK-LABEL: define internal { i32, i32 } @test_loop_clobber8
				define internal void @test_loop_clobber8(i32 %n, i32* %P1, i32* %P2) { ; P1 may alias P2
				;CHECK-LABEL: entry:
				;CHECK-NEXT: br
				entry:
				%V1 = load i32, i32* %P1
				store i32 1, i32* %P1
				br label %loop_header

				loop_header:
				%i = phi i32 [%i.next, %loop], [%n, %entry]
				%c = icmp eq i32 %i, 0
				br i1 %c, label %exit, label %loop

				;CHECK-LABEL: loop:
				;CHECK-NEXT: %i.next = sub i32 %i, 1
				loop:
				store i32 2, i32* %P1
				%i.next = sub i32 %i, 1
				br label %loop_header

				exit:
				%V2 = load i32, i32* %P1
				store i32 42, i32* %P2 ; clobbers P1 pointee but it is unclobbered after P2 promotion
				ret void
				}

				;CHECK-LABEL: define void @test_loop_clobber8_caller
				define void @test_loop_clobber8_caller(i32 %n, i32* %P1, i32* %P2) {
				call void @test_loop_clobber8(i32 %n, i32* %P1, i32* %P2)
				;CHECK: %1 = call { i32, i32 } @test_loop_clobber8
				;CHECK: store i32 %P1
				;CHECK: store i32 %P2
				ret void
				}

				; -----------------------------------------------------------------------------
				; Test declobbering sequences

				;CHECK-LABEL: define internal { i32, i32, i32 } @test_store_unclobber1
				define internal void @test_store_unclobber1(i32* %P1, i32* %P2, i32* %P3) { ; P1, P2, P3 may alias
				store i32 1, i32* %P1
				store i32 2, i32* %P2
				store i32 3, i32* %P3
				; note that values are inserted in the order of arguments of the function
				;CHECK: [[R:%[a-zA-Z0-9_]+]].ret0 = insertvalue { i32, i32, i32 } undef, i32 1, 0
				;CHECK-DAG: [[R]].ret1 = insertvalue { i32, i32, i32 } [[R]].ret0, i32 2, 1
				;CHECK-DAG: [[R]].ret2 = insertvalue { i32, i32, i32 } [[R]].ret1, i32 3, 2
				ret void
				}

				;CHECK-LABEL: define void @test_store_unclobber1_caller
				define void @test_store_unclobber1_caller(i32* %P1, i32* %P2, i32* %P3) {
				call void @test_store_unclobber1(i32* %P1, i32* %P2, i32* %P3)
				;CHECK: %1 = call { i32, i32, i32 } @test_store_unclobber1
				;CHECK: store i32 %P1
				;CHECK: store i32 %P2
				;CHECK: store i32 %P3
				ret void
				}

				;CHECK-LABEL: define internal { i32, i32, i32 } @test_store_unclobber2
				define internal void @test_store_unclobber2(i32* %P1, i32* %P2, i32* %P3) { ; P1, P2, P3 may alias
				store i32 1, i32* %P1
				store i32 3, i32* %P3
				store i32 2, i32* %P2
				; note that values are inserted in the order of arguments of the function
				;CHECK: [[R:%[a-zA-Z0-9_]+]].ret0 = insertvalue { i32, i32, i32 } undef, i32 1, 0
				;CHECK-DAG: [[R]].ret1 = insertvalue { i32, i32, i32 } [[R]].ret0, i32 2, 1
				;CHECK-DAG: [[R]].ret2 = insertvalue { i32, i32, i32 } [[R]].ret1, i32 3, 2
				ret void
				}

				;CHECK-LABEL: define void @test_store_unclobber2_caller
				define void @test_store_unclobber2_caller(i32* %P1, i32* %P2, i32* %P3) {
				call void @test_store_unclobber2(i32* %P1, i32* %P2, i32* %P3)
				;CHECK: %1 = call { i32, i32, i32 } @test_store_unclobber2
				;CHECK: store i32 %P1
				;CHECK: store i32 %P3
				;CHECK: store i32 %P2
				ret void
				}

				;CHECK-LABEL: define internal { i32, i32, i32 } @test_store_unclobber3
				define internal void @test_store_unclobber3(i32* %P1, i32* %P2, i32* %P3) { ; P1, P2, P3 may alias
				store i32 2, i32* %P2
				store i32 1, i32* %P1
				store i32 3, i32* %P3
				; note that values are inserted in the order of arguments of the function
				;CHECK: [[R:%[a-zA-Z0-9_]+]].ret0 = insertvalue { i32, i32, i32 } undef, i32 1, 0
				;CHECK-DAG: [[R]].ret1 = insertvalue { i32, i32, i32 } [[R]].ret0, i32 2, 1
				;CHECK-DAG: [[R]].ret2 = insertvalue { i32, i32, i32 } [[R]].ret1, i32 3, 2
				ret void
				}

				;CHECK-LABEL: define void @test_store_unclobber3_caller
				define void @test_store_unclobber3_caller(i32* %P1, i32* %P2, i32* %P3) {
				call void @test_store_unclobber3(i32* %P1, i32* %P2, i32* %P3)
				;CHECK: %1 = call { i32, i32, i32 } @test_store_unclobber3
				;CHECK: store i32 %P2
				;CHECK: store i32 %P1
				;CHECK: store i32 %P3
				ret void
				}

				;CHECK-LABEL: define internal { i32, i32, i32 } @test_store_unclobber4
				define internal void @test_store_unclobber4(i32* %P1, i32* %P2, i32* %P3) { ; P1, P2, P3 may alias
				store i32 2, i32* %P2
				store i32 3, i32* %P3
				store i32 1, i32* %P1
				; note that values are inserted in the order of arguments of the function
				;CHECK: [[R:%[a-zA-Z0-9_]+]].ret0 = insertvalue { i32, i32, i32 } undef, i32 1, 0
				;CHECK-DAG: [[R]].ret1 = insertvalue { i32, i32, i32 } [[R]].ret0, i32 2, 1
				;CHECK-DAG: [[R]].ret2 = insertvalue { i32, i32, i32 } [[R]].ret1, i32 3, 2
				ret void
				}

				;CHECK-LABEL: define void @test_store_unclobber4_caller
				define void @test_store_unclobber4_caller(i32* %P1, i32* %P2, i32* %P3) {
				call void @test_store_unclobber4(i32* %P1, i32* %P2, i32* %P3)
				;CHECK: %1 = call { i32, i32, i32 } @test_store_unclobber4
				;CHECK: store i32 %P2
				;CHECK: store i32 %P3
				;CHECK: store i32 %P1
				ret void
				}

				;CHECK-LABEL: define internal { i32, i32, i32 } @test_store_unclobber5
				define internal void @test_store_unclobber5(i32* %P1, i32* %P2, i32* %P3) { ; P1, P2, P3 may alias
				store i32 3, i32* %P3
				store i32 1, i32* %P1
				store i32 2, i32* %P2
				; note that values are inserted in the order of arguments of the function
				;CHECK: [[R:%[a-zA-Z0-9_]+]].ret0 = insertvalue { i32, i32, i32 } undef, i32 1, 0
				;CHECK-DAG: [[R]].ret1 = insertvalue { i32, i32, i32 } [[R]].ret0, i32 2, 1
				;CHECK-DAG: [[R]].ret2 = insertvalue { i32, i32, i32 } [[R]].ret1, i32 3, 2
				ret void
				}

				;CHECK-LABEL: define void @test_store_unclobber5_caller
				define void @test_store_unclobber5_caller(i32* %P1, i32* %P2, i32* %P3) {
				call void @test_store_unclobber5(i32* %P1, i32* %P2, i32* %P3)
				;CHECK: %1 = call { i32, i32, i32 } @test_store_unclobber5
				;CHECK: store i32 %P3
				;CHECK: store i32 %P1
				;CHECK: store i32 %P2
				ret void
				}

				;CHECK-LABEL: define internal { i32, i32, i32 } @test_store_unclobber6
				define internal void @test_store_unclobber6(i32* %P1, i32* %P2, i32* %P3) { ; P1, P2, P3 may alias
				store i32 3, i32* %P3
				store i32 2, i32* %P2
				store i32 1, i32* %P1
				; note that values are inserted in the order of arguments of the function
				;CHECK: [[R:%[a-zA-Z0-9_]+]].ret0 = insertvalue { i32, i32, i32 } undef, i32 1, 0
				;CHECK-DAG: [[R]].ret1 = insertvalue { i32, i32, i32 } [[R]].ret0, i32 2, 1
				;CHECK-DAG: [[R]].ret2 = insertvalue { i32, i32, i32 } [[R]].ret1, i32 3, 2
				ret void
				}

				;CHECK-LABEL: define void @test_store_unclobber6_caller
				define void @test_store_unclobber6_caller(i32* %P1, i32* %P2, i32* %P3) {
				call void @test_store_unclobber6(i32* %P1, i32* %P2, i32* %P3)
				;CHECK: %1 = call { i32, i32, i32 } @test_store_unclobber6
				;CHECK: store i32 %P3
				;CHECK: store i32 %P2
				;CHECK: store i32 %P1
				ret void
				}

				;CHECK-LABEL: define internal { i32, i32, i32 } @test_store_unclobber6_2x
				define internal void @test_store_unclobber6_2x(i32* %P1, i32* %P2, i32* %P3) { ; P1, P2, P3 may alias
				store i32 3, i32* %P3
				store i32 2, i32* %P2
				store i32 1, i32* %P1

				store i32 5, i32* %P3
				store i32 6, i32* %P2
				store i32 4, i32* %P1
				; note that values are inserted in the order of arguments of the function
				;CHECK: [[R:%[a-zA-Z0-9_]+]].ret0 = insertvalue { i32, i32, i32 } undef, i32 4, 0
				;CHECK-DAG: [[R]].ret1 = insertvalue { i32, i32, i32 } [[R]].ret0, i32 6, 1
				;CHECK-DAG: [[R]].ret2 = insertvalue { i32, i32, i32 } [[R]].ret1, i32 5, 2
				ret void
				}

				;CHECK-LABEL: define void @test_store_unclobber6_2x_caller
				define void @test_store_unclobber6_2x_caller(i32* %P1, i32* %P2, i32* %P3) {
				call void @test_store_unclobber6_2x(i32* %P1, i32* %P2, i32* %P3)
				;CHECK: %1 = call { i32, i32, i32 }
				;CHECK: store i32 %P3
				;CHECK: store i32 %P2
				;CHECK: store i32 %P1
				ret void
				}

				;CHECK-LABEL: define internal void @test_store_unclobber_fail1
				define internal void @test_store_unclobber_fail1(i1 %c, i32* %P1, i32* %P2, i32* %P3) { ; P1, P2, P3 may alias
				store i32 3, i32* %P1
				br i1 %c, label %st1, label %st2
				st1:
				store i32 1, i32* %P2
				store i32 2, i32* %P3
				br label %exit
				st2:
				store i32 1, i32* %P3
				store i32 2, i32* %P2
				br label %exit
				exit:
				ret void
				}

				;CHECK-LABEL: define void @test_store_unclobber_fail1_caller
				define void @test_store_unclobber_fail1_caller(i1 %c, i32* %P1, i32* %P2, i32* %P3) {
				; CHECK: call void
				call void @test_store_unclobber_fail1(i1 %c, i32* %P1, i32* %P2, i32* %P3)
				ret void
				}


				;CHECK-LABEL: define internal void @test_store_unclobber_fail2
				define internal void @test_store_unclobber_fail2(i1 %c, i32* %P1, i32* %P2, i32* %P3) { ; P1, P2, P3 may alias
				store i32 1, i32* %P2
				br i1 %c, label %st1, label %st2
				st1:
				store i32 3, i32* %P1
				store i32 2, i32* %P3
				br label %exit
				st2:
				store i32 1, i32* %P3
				store i32 3, i32* %P1
				br label %exit
				exit:
				ret void
				}

				;CHECK-LABEL: define void @test_store_unclobber_fail2_caller
				define void @test_store_unclobber_fail2_caller(i1 %c, i32* %P1, i32* %P2, i32* %P3) {
				; CHECK: call void
				call void @test_store_unclobber_fail2(i1 %c, i32* %P1, i32* %P2, i32* %P3)
				ret void
				}


				;CHECK-LABEL: define internal void @test_store_unclobber_fail3
				define internal void @test_store_unclobber_fail3(i1 %c, i32* %P1, i32* %P2, i32* %P3) { ; P1, P2, P3 may alias
				store i32 2, i32* %P3
				br i1 %c, label %st1, label %st2
				st1:
				store i32 3, i32* %P1
				store i32 1, i32* %P2
				br label %exit
				st2:
				store i32 1, i32* %P2
				store i32 3, i32* %P1
				br label %exit
				exit:
				ret void
				}

				;CHECK-LABEL: define void @test_store_unclobber_fail3_caller
				define void @test_store_unclobber_fail3_caller(i1 %c, i32* %P1, i32* %P2, i32* %P3) {
				; CHECK: call void
				call void @test_store_unclobber_fail3(i1 %c, i32* %P1, i32* %P2, i32* %P3)
				ret void
				}

				; -----------------------------------------------------------------------------
				; Test declobbering in a more complicated CFG
				;CHECK-LABEL: define internal { i32, i32, i32 } @nested_diamond
				define internal i32 @nested_diamond(i1 %D1C, i1 %D2C, i32 %X, i32 %Y, i32 %P1, i32 %P2) {
				; D1
				; / \
				; D2 \
				; D2L D2R D1R
				; D2E /
				; \ /
				; D1E
				D1:
				br i1 %D1C, label %D2, label %D1R

				D2:
				br i1 %D2C, label %D2L, label %D2R

				D2L:
				;CHECK-LABEL: D2L:
				;CHECK-NEXT: br
				store i32 %Y, i32* %P1
				store i32 %X, i32* %P1
				store i32 %X, i32* %P2
				store i32 %Y, i32* %P2
				br label %D2E

				D2R:
				;CHECK-LABEL: D2R:
				;CHECK-NEXT: br
				store i32 %Y, i32* %P1
				store i32 %X, i32* %P2
				br label %D2E

				D2E:
				;CHECK-LABEL: D2E:
				;CHECK-NEXT: %P1.val.D2E.phi = phi i32 [ %Y, %D2R ], [ %X, %D2L ]
				;CHECK-NEXT: %P2.val.D2E.phi = phi i32 [ %X, %D2R ], [ %Y, %D2L ]
				br label %D1E

				D1R:
				;CHECK-LABEL: D1R:
				;CHECK-NEXT: br
				store i32 %X, i32* %P1
				store i32 %Y, i32* %P2
				br label %D1E

				D1E:
				;CHECK-LABEL: D1E:
				;CHECK-NEXT: %P1.val.D1E.phi = phi i32 [ %X, %D1R ], [ %P1.val.D2E.phi, %D2E ]
				;CHECK-NEXT: %P2.val.D1E.phi = phi i32 [ %Y, %D1R ], [ %P2.val.D2E.phi, %D2E ]
				;CHECK-NEXT: [[R1:%.*]] = insertvalue { i32, i32, i32 } undef, i32 42, 0
				;CHECK-NEXT: [[R2:%.*]] = insertvalue { i32, i32, i32 } [[R1]], i32 %P1.val.D1E.phi, 1
				;CHECK-NEXT: [[R3:%.*]] = insertvalue { i32, i32, i32 } [[R2]], i32 %P2.val.D1E.phi, 2
				;CHECK-NEXT: ret { i32, i32, i32 } [[R3]]
				ret i32 42
				}

				;CHECK-LABEL: define i32 @nested_diamond_caller
				define i32 @nested_diamond_caller(i1 %D1C, i1 %D2C, i32 %X, i32 %Y, i32* %P1, i32* %P2) {
				%C = call i32 @nested_diamond(i1 %D1C, i1 %D2C, i32 %X, i32 %Y, i32* %P1, i32* %P2)
				; CHECK: %C = call { i32, i32, i32 } @nested_diamond(i1 %D1C, i1 %D2C, i32 %X, i32 %Y)
				; CHECK-NEXT: %P1.val.ret = extractvalue { i32, i32, i32 } %C, 1
				; CHECK-NEXT: store i32 %P1.val.ret, i32* %P1, align 4
				; CHECK-NEXT: %P2.val.ret = extractvalue { i32, i32, i32 } %C, 2
				; CHECK-NEXT: store i32 %P2.val.ret, i32* %P2, align 4
				%V1 = load i32, i32* %P1
				%V2 = load i32, i32* %P2
				%Sum = add i32 %V1, %V2
				ret i32 %Sum
				}
				arsenmUnsubmitted Not Done Reply Inline Actions I'd like to see some tests with more exotic memory operations (atomics, memcpy/memset, target memory intrinsics etc.). I also don't see negative tests for some of the skipped conditions (e.g. musttail). Can you also add a test for invoke, and captured function address arsenm: I'd like to see some tests with more exotic memory operations (atomics, memcpy/memset, target…
				vpykhtinAuthorUnsubmitted Done Reply Inline Actions Right, will add more tests. However current implementation is quite conservative as relies on MSSA to find clobbers, it requires additional false-clobber checks as was done by Stas in https://reviews.llvm.org/D118419. vpykhtin: Right, will add more tests. However current implementation is quite conservative as relies on…

This is an archive of the discontinued LLVM Phabricator instance.

[ArgPromotion][AMDGPU] New MSSA-based function argument promotion pass with input/output argument supportNeeds RevisionPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 406419

llvm/include/llvm/InitializePasses.h

llvm/include/llvm/Transforms/IPO.h

llvm/include/llvm/Transforms/IPO/MSSAArgPromotion.h

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Passes/PassRegistry.def

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/lib/Transforms/IPO/CMakeLists.txt

llvm/lib/Transforms/IPO/IPO.cpp

llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp

llvm/test/CodeGen/AMDGPU/opt-pipeline.ll

llvm/test/Transforms/ArgumentPromotion/inoutargs.ll

[ArgPromotion][AMDGPU] New MSSA-based function argument promotion pass with input/output argument support
Needs RevisionPublic