This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
CodeExtractor.h
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
4/4
CodeExtractor.cpp
-
unittests/Transforms/Utils/
-
Transforms/
-
Utils/
-
CodeExtractorTest.cpp

Differential D96854

[CodeExtractor] Enable partial aggregate arguments
ClosedPublic

Authored by ggeorgakoudis on Feb 17 2021, 3:43 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
vsk

Commits

rG95b981ca2ae3: [CodeExtractor] Enable partial aggregate arguments

Summary

Enable CodeExtractor to construct output functions that partially aggregate inputs/outputs in their argument list. A use case is the OMPIRBuilder to create outlined functions for parallel regions that aggregate in a struct the payload variables for the region while passing as scalars thread and bound identifiers.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ggeorgakoudis created this revision.Feb 17 2021, 3:43 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptFeb 17 2021, 3:43 AM

ggeorgakoudis requested review of this revision.Feb 17 2021, 3:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 17 2021, 3:43 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

ggeorgakoudis edited the summary of this revision. (Show Details)Feb 17 2021, 3:48 AM

ggeorgakoudis added reviewers: jdoerfert, vsk.

Harbormaster completed remote builds in B89527: Diff 324260.Feb 17 2021, 5:18 AM

Sorry it's taken me so long to get to this.

partially aggregate inputs/outputs in their argument list

Could you explain what this means, and what the pros/cons might be compared to any alternatives? It'd also help to see a test case.

Add test

In D96854#2594840, @vsk wrote:

Sorry it's taken me so long to get to this.

partially aggregate inputs/outputs in their argument list

Could you explain what this means, and what the pros/cons might be compared to any alternatives? It'd also help to see a test case.

Hi @vsk,

No problem! Let me make the use case of the OpenMP IR builder concrete, I'll do some simplifications that do not affect the point. Currently, the OMPIRBuilder uses CodeExtractor to outline an OpenMP callback as:

void omp.outlined(int global_tid, int bound_tid, int* arg0, int* arg1, ..., int* argn)

where global_tid, bound_tid are OpenMP runtime filled values passed to the outlined functions and arg0, arg1, ...,argn are inputs/outputs found by CodeExtractor. To implement parallel execution calling the outlined call function, OMPIRBuilder emits the call to the OpenMP runtime fork_join function, which use ellipsis to pass the variadic number of parameters to the OpenMP runtime:

__kmpc_fork_call(int argc, omp.outlined, ..,)

so the ellipsis contains the arguments to the outlined function (arg0, arg1, ..., argn). The OpenMP runtime library fills the values for the preceding arguments global_tid, bound_tid when calling omp.outlined and forwards the rest of the arguments through a cumbersome dispatch function that unwraps the variadic arguments and uses a switch-case to call the function pointer of the callback as in:

switch(argc) {
  case 1: fp_to_omp.outlined(global_tid, bound_tid, vararg[0]); return;
  case 2: fp_to_omp.outlined(global_tid, bound_tid, vararg[0], vararg[1]); return;
  ...
}

We would like to remove this ellipsis interface because it creates various problems: there is a hardcoded limit on the number of arguments that the runtime forwards (limited by the switch-case style unwrapping), it has been the source of ABI bugs, and makes hard to analyze and optimize OpenMP code in LLVM. For this we would like to aggregate the input/output arguments to the outlined function but leave the runtime-filled arguments unaggregated:

void omp.outlined(int global_tid, int bound_tid, struct structArg)

This patch enables to exclude arguments from the aggregate by extending extractCodeRegion in CodeExtractor with a parameter of which arguments to exclude (assuming AggregateArgs has been set when creating the CodeExtractor). In this specific use case for OpenMP, the exclude arguments are global_tid and bound_tid.

I have added a unit test that tests this functionality. When we complete the change in OMPIRBuilder to use partial aggregation from CodeExtractor we will add also IR tests that will test this functionality too.

Harbormaster completed remote builds in B91986: Diff 328048.Mar 4 2021, 9:03 AM

ggeorgakoudis mentioned this in D91556: [OpenMPIRBuilder} Add capturing of parameters to pass to omp::parallel.Mar 14 2021, 11:22 AM

Ping!

Thanks for explaining. I'd suggest making ExcludedAggArgs part of a CodeExtractor instances internal state: e.g. the client may call CE.addArgExludedFromAggregate(Value *) some number of times before CE.extractCodeRegion(). This way, the client doesn't need to maintain a SetVector, and the rest of the interface isn't polluted with an option that's specific to the AggregateArg case.

In D96854#2645458, @vsk wrote:

Thanks for explaining. I'd suggest making ExcludedAggArgs part of a CodeExtractor instances internal state: e.g. the client may call CE.addArgExludedFromAggregate(Value *) some number of times before CE.extractCodeRegion(). This way, the client doesn't need to maintain a SetVector, and the rest of the interface isn't polluted with an option that's specific to the AggregateArg case.

Hi @vsk, thank you for your feedback. I think the proposed interface is more flexible and easier to support than making exclusions internal to CodeExtractor and having repeated calls through addArgExludedFromAggregate to set them. My line of thinking is that (1) it is more flexible to let the client manage the state and request extraction through extractCodeRegion with it, for example it makes possible to extract the same region with different exclusions without creating another CodeExtractor instance, (2) there is only one, backwards behavior compatible change to the external interface of CodeExtractor, that is for extractCodeRegion, so changes in other methods (constructFunction, emitCallAndSwitchStatement) are internal and changes do not pollute this external API. Please let me know what you think.

for example it makes possible to extract the same region with different exclusions without creating another CodeExtractor instance,

As extractCodeRegion mutates the original function, I assumed it was not possible to reuse a CodeExtractor instance in this way. Is there an in-tree example of CE instance reuse I can take a look at?

so changes in other methods (constructFunction, emitCallAndSwitchStatement) are internal and changes do not pollute this external API

I don't quite follow. Wouldn't introducing a separate API for adding arg exclusions would also be backwards compatible?

In D96854#2695426, @vsk wrote:

for example it makes possible to extract the same region with different exclusions without creating another CodeExtractor instance,

As extractCodeRegion mutates the original function, I assumed it was not possible to reuse a CodeExtractor instance in this way. Is there an in-tree example of CE instance reuse I can take a look at?

Unfortunately, there isn't an example so far. The CodeExtractor instance can be re-used because the analysis of CE stays the same. The exclusion from aggregates affects only the extraction of the outlined function when calling extractCodeRegion.

so changes in other methods (constructFunction, emitCallAndSwitchStatement) are internal and changes do not pollute this external API

I don't quite follow. Wouldn't introducing a separate API for adding arg exclusions would also be backwards compatible?

It will be backwards compatible but IMO it unnecessarily binds the CE instance to a specific way of argument creation for the outlined function. In the same sense, the AggregateArgs that is part of the CE constructor could be only an argument to extractCodeRegion since CE analysis is orthogonal to it.

@vsk I'll re-structure the implementation to use internal state and an addArgExludedFromAggregate interface. I don't have/can't think right now a use case for the flexibility I'm proposing.

In D96854#2707304, @ggeorgakoudis wrote:

In D96854#2695426, @vsk wrote:

for example it makes possible to extract the same region with different exclusions without creating another CodeExtractor instance,

As extractCodeRegion mutates the original function, I assumed it was not possible to reuse a CodeExtractor instance in this way. Is there an in-tree example of CE instance reuse I can take a look at?

Unfortunately, there isn't an example so far. The CodeExtractor instance can be re-used because the analysis of CE stays the same. The exclusion from aggregates affects only the extraction of the outlined function when calling extractCodeRegion.

Oh, I think I see what you meant - was it that the CE analysis cache can be re-used? The motivation there was eliminating quadratic compile-time for repeated outlining (D68616); with that change some cached analysis of the caller became reusable, but not the extractor instance itself.

so changes in other methods (constructFunction, emitCallAndSwitchStatement) are internal and changes do not pollute this external API

I don't quite follow. Wouldn't introducing a separate API for adding arg exclusions would also be backwards compatible?

It will be backwards compatible but IMO it unnecessarily binds the CE instance to a specific way of argument creation for the outlined function.
[snip]

I believe this would be true of any API we pick today.

In D96854#2709726, @vsk wrote:

Oh, I think I see what you meant - was it that the CE analysis cache can be re-used? The motivation there was eliminating quadratic compile-time for repeated outlining (D68616); with that change some cached analysis of the caller became reusable, but not the extractor instance itself.

Yes, that's what I meant! The CE analysis is reused and the extractCodeRegion should only define the way to extract the analyzed region to an outlined function.

I believe this would be true of any API we pick today.

I think it doesn't have to be this way, if we move the AggregateArgs flag to be an argument to extractCodeRegion along with the exclusions. For the use case in the OpenMP IR builder either interface will work. Since you have a more holistic use of the clients of CodeExtractor, it makes sense that you make the call. Shall I go forth with the addArgExludedFromAggregate interface?

Ping @vsk! Waiting for your decision :)

@ggeorgakoudis apologies again for the delay here.

My opinion hasn't changed, I still feel it'd make for a simpler interface to introduce a "CE.addArgExcludedFromAggregate(Value *)" API (though I admit that that API name isn't ideal).

Update interface for comments

Harbormaster completed remote builds in B124437: Diff 373276.Sep 17 2021, 11:04 AM

ggeorgakoudis added a child revision: D110114: [OMPIRBuilder] Generate aggregate argument for parallel region outlined functions.Sep 20 2021, 4:46 PM

ggeorgakoudis mentioned this in D110114: [OMPIRBuilder] Generate aggregate argument for parallel region outlined functions.Sep 20 2021, 4:50 PM

Ping @vsk

vsk added subscribers: paquette, AndrewLitteken, jroelofs.Sep 22 2021, 10:05 AM

vsk added inline comments.

llvm/lib/Transforms/Utils/CodeExtractor.cpp
833	I notice there's a `paramTy = ScalarParamTy`: can we delete `paramTy` entirely? Having two copies of the same thing creates a risk of the copies diverging, adding complexity.
860	Suggest - `assert(StructValues.empty() \|\| AggregateArgs && "StructValues updated for arg structs only")`

Update for comments

ggeorgakoudis marked 2 inline comments as done.Nov 1 2021, 11:05 AM

ggeorgakoudis added inline comments.

llvm/lib/Transforms/Utils/CodeExtractor.cpp
833	In `paramTy` we concatenate the scalar parameter types and the aggregate type. The comment on diverging copies makes sense. I'll change `paramTy` to `ParamTy`, remove `ScalarParamTy`, and directly push non-aggregate arguments and concat the aggregate type.
860	Makes sense, will add the check

Harbormaster completed remote builds in B131769: Diff 383840.Nov 1 2021, 12:09 PM

Ping @vsk :)

Looks good to me, thanks.
(And apologies for more delay: I'm not working on llvm much these days, Jessica or Jon (both cc'd) may be able to provide better turnaround time going forward.)

Was accepted before, now mark it as such.

This revision is now accepted and ready to land.Jan 25 2022, 4:05 PM

This revision was landed with ongoing or failed builds.Jan 25 2022, 6:25 PM

Closed by commit rG95b981ca2ae3: [CodeExtractor] Enable partial aggregate arguments (authored by ggeorgakoudis, committed by jhuber6). · Explain Why

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rG95b981ca2ae3: [CodeExtractor] Enable partial aggregate arguments.

jhuber6 mentioned this in rG7cb4c2617391: [OMPIRBuilder] Generate aggregate argument for parallel region outlined….

Meinersbur mentioned this in D115218: [CodeExtractor] Refactor extractCodeRegion, fix parameter index confusion..Mar 31 2022, 1:28 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Utils/

CodeExtractor.h

8 lines

lib/

Transforms/

Utils/

CodeExtractor.cpp

179 lines

unittests/

Transforms/

Utils/

CodeExtractorTest.cpp

54 lines

Diff 403100

llvm/include/llvm/Transforms/Utils/CodeExtractor.h

Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines	public:
static bool verifyAssumptionCache(const Function &OldFunc,		static bool verifyAssumptionCache(const Function &OldFunc,
const Function &NewFunc,		const Function &NewFunc,
AssumptionCache *AC);		AssumptionCache *AC);

/// Test whether this code extractor is eligible.		/// Test whether this code extractor is eligible.
///		///
/// Based on the blocks used when constructing the code extractor,		/// Based on the blocks used when constructing the code extractor,
/// determine whether it is eligible for extraction.		/// determine whether it is eligible for extraction.
///		///
/// Checks that varargs handling (with vastart and vaend) is only done in		/// Checks that varargs handling (with vastart and vaend) is only done in
/// the outlined blocks.		/// the outlined blocks.
bool isEligible() const;		bool isEligible() const;

/// Compute the set of input values and output values for the code.		/// Compute the set of input values and output values for the code.
///		///
/// These can be used either when performing the extraction or to evaluate		/// These can be used either when performing the extraction or to evaluate
/// the expected size of a call to the extracted function. Note that this		/// the expected size of a call to the extracted function. Note that this
Show All 29 Lines	public:
///		///
/// CommonExitBlock is block outside the outline region. It is the common		/// CommonExitBlock is block outside the outline region. It is the common
/// successor of blocks inside the region. If there exists a single block		/// successor of blocks inside the region. If there exists a single block
/// inside the region that is the predecessor of CommonExitBlock, that block		/// inside the region that is the predecessor of CommonExitBlock, that block
/// will be returned. Otherwise CommonExitBlock will be split and the		/// will be returned. Otherwise CommonExitBlock will be split and the
/// original block will be added to the outline region.		/// original block will be added to the outline region.
BasicBlock findOrCreateBlockForHoisting(BasicBlock CommonExitBlock);		BasicBlock findOrCreateBlockForHoisting(BasicBlock CommonExitBlock);

		/// Exclude a value from aggregate argument passing when extracting a code
		/// region, passing it instead as a scalar.
		void excludeArgFromAggregate(Value *Arg);

private:		private:
struct LifetimeMarkerInfo {		struct LifetimeMarkerInfo {
bool SinkLifeStart = false;		bool SinkLifeStart = false;
bool HoistLifeEnd = false;		bool HoistLifeEnd = false;
Instruction *LifeStart = nullptr;		Instruction *LifeStart = nullptr;
Instruction *LifeEnd = nullptr;		Instruction *LifeEnd = nullptr;
};		};

		ValueSet ExcludeArgsFromAggregate;

LifetimeMarkerInfo		LifetimeMarkerInfo
getLifetimeMarkers(const CodeExtractorAnalysisCache &CEAC,		getLifetimeMarkers(const CodeExtractorAnalysisCache &CEAC,
Instruction Addr, BasicBlock ExitBlock) const;		Instruction Addr, BasicBlock ExitBlock) const;

void severSplitPHINodesOfEntry(BasicBlock *&Header);		void severSplitPHINodesOfEntry(BasicBlock *&Header);
void severSplitPHINodesOfExits(const SmallPtrSetImpl<BasicBlock *> &Exits);		void severSplitPHINodesOfExits(const SmallPtrSetImpl<BasicBlock *> &Exits);
void splitReturnBlocks();		void splitReturnBlocks();

Show All 21 Lines

llvm/lib/Transforms/Utils/CodeExtractor.cpp

Show First 20 Lines • Show All 823 Lines • ▼ Show 20 Lines	Function *CodeExtractor::constructFunction(const ValueSet &inputs,
// This function returns unsigned, outputs will go back by reference.		// This function returns unsigned, outputs will go back by reference.
switch (NumExitBlocks) {		switch (NumExitBlocks) {
case 0:		case 0:
case 1: RetTy = Type::getVoidTy(header->getContext()); break;		case 1: RetTy = Type::getVoidTy(header->getContext()); break;
case 2: RetTy = Type::getInt1Ty(header->getContext()); break;		case 2: RetTy = Type::getInt1Ty(header->getContext()); break;
default: RetTy = Type::getInt16Ty(header->getContext()); break;		default: RetTy = Type::getInt16Ty(header->getContext()); break;
}		}

std::vector<Type *> paramTy;		std::vector<Type *> ParamTy;
		std::vector<Type *> AggParamTy;
		vskUnsubmitted Done Reply Inline Actions I notice there's a `paramTy = ScalarParamTy`: can we delete `paramTy` entirely? Having two copies of the same thing creates a risk of the copies diverging, adding complexity. vsk: I notice there's a `paramTy = ScalarParamTy`: can we delete `paramTy` entirely? Having two…
		ggeorgakoudisAuthorUnsubmitted Done Reply Inline Actions In `paramTy` we concatenate the scalar parameter types and the aggregate type. The comment on diverging copies makes sense. I'll change `paramTy` to `ParamTy`, remove `ScalarParamTy`, and directly push non-aggregate arguments and concat the aggregate type. ggeorgakoudis: In `paramTy` we concatenate the scalar parameter types and the aggregate type. The comment on…
		ValueSet StructValues;

// Add the types of the input values to the function's argument list		// Add the types of the input values to the function's argument list
for (Value *value : inputs) {		for (Value *value : inputs) {
LLVM_DEBUG(dbgs() << "value used in func: " << *value << "\n");		LLVM_DEBUG(dbgs() << "value used in func: " << *value << "\n");
paramTy.push_back(value->getType());		if (AggregateArgs && !ExcludeArgsFromAggregate.contains(value)) {
		AggParamTy.push_back(value->getType());
		StructValues.insert(value);
		} else
		ParamTy.push_back(value->getType());
}		}

// Add the types of the output values to the function's argument list.		// Add the types of the output values to the function's argument list.
for (Value *output : outputs) {		for (Value *output : outputs) {
LLVM_DEBUG(dbgs() << "instr used in func: " << *output << "\n");		LLVM_DEBUG(dbgs() << "instr used in func: " << *output << "\n");
if (AggregateArgs)		if (AggregateArgs && !ExcludeArgsFromAggregate.contains(output)) {
paramTy.push_back(output->getType());		AggParamTy.push_back(output->getType());
else		StructValues.insert(output);
paramTy.push_back(PointerType::getUnqual(output->getType()));		} else
		ParamTy.push_back(PointerType::getUnqual(output->getType()));
		}

		assert(
		(ParamTy.size() + AggParamTy.size()) ==
		(inputs.size() + outputs.size()) &&
		"Number of scalar and aggregate params does not match inputs, outputs");
		assert(StructValues.empty() \|\|
		vskUnsubmitted Done Reply Inline Actions Suggest - `assert(StructValues.empty() \|\| AggregateArgs && "StructValues updated for arg structs only")` vsk: Suggest - `assert(StructValues.empty() \|\| AggregateArgs && "StructValues updated for arg…
		ggeorgakoudisAuthorUnsubmitted Done Reply Inline Actions Makes sense, will add the check ggeorgakoudis: Makes sense, will add the check
		AggregateArgs && "Expeced StructValues only with AggregateArgs set");

		// Concatenate scalar and aggregate params in ParamTy.
		size_t NumScalarParams = ParamTy.size();
		StructType *StructTy = nullptr;
		if (AggregateArgs && !AggParamTy.empty()) {
		StructTy = StructType::get(M->getContext(), AggParamTy);
		ParamTy.push_back(PointerType::getUnqual(StructTy));
}		}

LLVM_DEBUG({		LLVM_DEBUG({
dbgs() << "Function type: " << *RetTy << " f(";		dbgs() << "Function type: " << *RetTy << " f(";
for (Type *i : paramTy)		for (Type *i : ParamTy)
dbgs() << *i << ", ";		dbgs() << *i << ", ";
dbgs() << ")\n";		dbgs() << ")\n";
});		});

StructType *StructTy = nullptr;		FunctionType *funcType = FunctionType::get(
if (AggregateArgs && (inputs.size() + outputs.size() > 0)) {		RetTy, ParamTy, AllowVarArgs && oldFunction->isVarArg());
StructTy = StructType::get(M->getContext(), paramTy);
paramTy.clear();
paramTy.push_back(PointerType::getUnqual(StructTy));
}
FunctionType *funcType =
FunctionType::get(RetTy, paramTy,
AllowVarArgs && oldFunction->isVarArg());

std::string SuffixToUse =		std::string SuffixToUse =
Suffix.empty()		Suffix.empty()
? (header->getName().empty() ? "extracted" : header->getName().str())		? (header->getName().empty() ? "extracted" : header->getName().str())
: Suffix;		: Suffix;
// Create the new function		// Create the new function
Function *newFunction = Function::Create(		Function *newFunction = Function::Create(
funcType, GlobalValue::InternalLinkage, oldFunction->getAddressSpace(),		funcType, GlobalValue::InternalLinkage, oldFunction->getAddressSpace(),
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	if (Attr.isStringAttribute()) {
case Attribute::TombstoneKey:		case Attribute::TombstoneKey:
llvm_unreachable("Not a function attribute");		llvm_unreachable("Not a function attribute");
}		}

newFunction->addFnAttr(Attr);		newFunction->addFnAttr(Attr);
}		}
newFunction->getBasicBlockList().push_back(newRootNode);		newFunction->getBasicBlockList().push_back(newRootNode);

// Create an iterator to name all of the arguments we inserted.		// Create scalar and aggregate iterators to name all of the arguments we
Function::arg_iterator AI = newFunction->arg_begin();		// inserted.
		Function::arg_iterator ScalarAI = newFunction->arg_begin();
		Function::arg_iterator AggAI = std::next(ScalarAI, NumScalarParams);

// Rewrite all users of the inputs in the extracted region to use the		// Rewrite all users of the inputs in the extracted region to use the
// arguments (or appropriate addressing into struct) instead.		// arguments (or appropriate addressing into struct) instead.
for (unsigned i = 0, e = inputs.size(); i != e; ++i) {		for (unsigned i = 0, e = inputs.size(), aggIdx = 0; i != e; ++i) {
Value *RewriteVal;		Value *RewriteVal;
if (AggregateArgs) {		if (AggregateArgs && StructValues.contains(inputs[i])) {
Value *Idx[2];		Value *Idx[2];
Idx[0] = Constant::getNullValue(Type::getInt32Ty(header->getContext()));		Idx[0] = Constant::getNullValue(Type::getInt32Ty(header->getContext()));
Idx[1] = ConstantInt::get(Type::getInt32Ty(header->getContext()), i);		Idx[1] = ConstantInt::get(Type::getInt32Ty(header->getContext()), aggIdx);
Instruction *TI = newFunction->begin()->getTerminator();		Instruction *TI = newFunction->begin()->getTerminator();
GetElementPtrInst *GEP = GetElementPtrInst::Create(		GetElementPtrInst *GEP = GetElementPtrInst::Create(
StructTy, &*AI, Idx, "gep_" + inputs[i]->getName(), TI);		StructTy, &*AggAI, Idx, "gep_" + inputs[i]->getName(), TI);
RewriteVal = new LoadInst(StructTy->getElementType(i), GEP,		RewriteVal = new LoadInst(StructTy->getElementType(aggIdx), GEP,
"loadgep_" + inputs[i]->getName(), TI);		"loadgep_" + inputs[i]->getName(), TI);
		++aggIdx;
} else		} else
RewriteVal = &*AI++;		RewriteVal = &*ScalarAI++;

std::vector<User *> Users(inputs[i]->user_begin(), inputs[i]->user_end());		std::vector<User *> Users(inputs[i]->user_begin(), inputs[i]->user_end());
for (User *use : Users)		for (User *use : Users)
if (Instruction *inst = dyn_cast<Instruction>(use))		if (Instruction *inst = dyn_cast<Instruction>(use))
if (Blocks.count(inst->getParent()))		if (Blocks.count(inst->getParent()))
inst->replaceUsesOfWith(inputs[i], RewriteVal);		inst->replaceUsesOfWith(inputs[i], RewriteVal);
}		}

// Set names for input and output arguments.		// Set names for input and output arguments.
if (!AggregateArgs) {		if (NumScalarParams) {
AI = newFunction->arg_begin();		ScalarAI = newFunction->arg_begin();
for (unsigned i = 0, e = inputs.size(); i != e; ++i, ++AI)		for (unsigned i = 0, e = inputs.size(); i != e; ++i, ++ScalarAI)
AI->setName(inputs[i]->getName());		if (!StructValues.contains(inputs[i]))
for (unsigned i = 0, e = outputs.size(); i != e; ++i, ++AI)		ScalarAI->setName(inputs[i]->getName());
AI->setName(outputs[i]->getName()+".out");		for (unsigned i = 0, e = outputs.size(); i != e; ++i, ++ScalarAI)
		if (!StructValues.contains(outputs[i]))
		ScalarAI->setName(outputs[i]->getName() + ".out");
}		}

// Rewrite branches to basic blocks outside of the loop to new dummy blocks		// Rewrite branches to basic blocks outside of the loop to new dummy blocks
// within the new function. This must be done before we lose track of which		// within the new function. This must be done before we lose track of which
// blocks were originally in the code region.		// blocks were originally in the code region.
std::vector<User *> Users(header->user_begin(), header->user_end());		std::vector<User *> Users(header->user_begin(), header->user_end());
for (auto &U : Users)		for (auto &U : Users)
// The BasicBlock which contains the branch is not in the region		// The BasicBlock which contains the branch is not in the region
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
/// the call instruction, splitting any PHI nodes in the header block as		/// the call instruction, splitting any PHI nodes in the header block as
/// necessary.		/// necessary.
CallInst CodeExtractor::emitCallAndSwitchStatement(Function newFunction,		CallInst CodeExtractor::emitCallAndSwitchStatement(Function newFunction,
BasicBlock *codeReplacer,		BasicBlock *codeReplacer,
ValueSet &inputs,		ValueSet &inputs,
ValueSet &outputs) {		ValueSet &outputs) {
// Emit a call to the new function, passing in: *pointer to struct (if		// Emit a call to the new function, passing in: *pointer to struct (if
// aggregating parameters), or plan inputs and allocated memory for outputs		// aggregating parameters), or plan inputs and allocated memory for outputs
std::vector<Value *> params, StructValues, ReloadOutputs, Reloads;		std::vector<Value *> params, ReloadOutputs, Reloads;
		ValueSet StructValues;

Module *M = newFunction->getParent();		Module *M = newFunction->getParent();
LLVMContext &Context = M->getContext();		LLVMContext &Context = M->getContext();
const DataLayout &DL = M->getDataLayout();		const DataLayout &DL = M->getDataLayout();
CallInst *call = nullptr;		CallInst *call = nullptr;

// Add inputs as params, or to be filled into the struct		// Add inputs as params, or to be filled into the struct
unsigned ArgNo = 0;		unsigned ScalarInputArgNo = 0;
SmallVector<unsigned, 1> SwiftErrorArgs;		SmallVector<unsigned, 1> SwiftErrorArgs;
for (Value *input : inputs) {		for (Value *input : inputs) {
if (AggregateArgs)		if (AggregateArgs && !ExcludeArgsFromAggregate.contains(input))
StructValues.push_back(input);		StructValues.insert(input);
else {		else {
params.push_back(input);		params.push_back(input);
if (input->isSwiftError())		if (input->isSwiftError())
SwiftErrorArgs.push_back(ArgNo);		SwiftErrorArgs.push_back(ScalarInputArgNo);
}		}
++ArgNo;		++ScalarInputArgNo;
}		}

// Create allocas for the outputs		// Create allocas for the outputs
		unsigned ScalarOutputArgNo = 0;
for (Value *output : outputs) {		for (Value *output : outputs) {
if (AggregateArgs) {		if (AggregateArgs && !ExcludeArgsFromAggregate.contains(output)) {
StructValues.push_back(output);		StructValues.insert(output);
} else {		} else {
AllocaInst *alloca =		AllocaInst *alloca =
new AllocaInst(output->getType(), DL.getAllocaAddrSpace(),		new AllocaInst(output->getType(), DL.getAllocaAddrSpace(),
nullptr, output->getName() + ".loc",		nullptr, output->getName() + ".loc",
&codeReplacer->getParent()->front().front());		&codeReplacer->getParent()->front().front());
ReloadOutputs.push_back(alloca);		ReloadOutputs.push_back(alloca);
params.push_back(alloca);		params.push_back(alloca);
		++ScalarOutputArgNo;
}		}
}		}

StructType *StructArgTy = nullptr;		StructType *StructArgTy = nullptr;
AllocaInst *Struct = nullptr;		AllocaInst *Struct = nullptr;
if (AggregateArgs && (inputs.size() + outputs.size() > 0)) {		unsigned NumAggregatedInputs = 0;
		if (AggregateArgs && !StructValues.empty()) {
std::vector<Type *> ArgTypes;		std::vector<Type *> ArgTypes;
for (Value *V : StructValues)		for (Value *V : StructValues)
ArgTypes.push_back(V->getType());		ArgTypes.push_back(V->getType());

// Allocate a struct at the beginning of this function		// Allocate a struct at the beginning of this function
StructArgTy = StructType::get(newFunction->getContext(), ArgTypes);		StructArgTy = StructType::get(newFunction->getContext(), ArgTypes);
Struct = new AllocaInst(StructArgTy, DL.getAllocaAddrSpace(), nullptr,		Struct = new AllocaInst(StructArgTy, DL.getAllocaAddrSpace(), nullptr,
"structArg",		"structArg",
&codeReplacer->getParent()->front().front());		&codeReplacer->getParent()->front().front());
params.push_back(Struct);		params.push_back(Struct);

for (unsigned i = 0, e = inputs.size(); i != e; ++i) {		// Store aggregated inputs in the struct.
		for (unsigned i = 0, e = StructValues.size(); i != e; ++i) {
		if (inputs.contains(StructValues[i])) {
Value *Idx[2];		Value *Idx[2];
Idx[0] = Constant::getNullValue(Type::getInt32Ty(Context));		Idx[0] = Constant::getNullValue(Type::getInt32Ty(Context));
Idx[1] = ConstantInt::get(Type::getInt32Ty(Context), i);		Idx[1] = ConstantInt::get(Type::getInt32Ty(Context), i);
GetElementPtrInst *GEP = GetElementPtrInst::Create(		GetElementPtrInst *GEP = GetElementPtrInst::Create(
StructArgTy, Struct, Idx, "gep_" + StructValues[i]->getName());		StructArgTy, Struct, Idx, "gep_" + StructValues[i]->getName());
codeReplacer->getInstList().push_back(GEP);		codeReplacer->getInstList().push_back(GEP);
new StoreInst(StructValues[i], GEP, codeReplacer);		new StoreInst(StructValues[i], GEP, codeReplacer);
		NumAggregatedInputs++;
		}
}		}
}		}

// Emit the call to the function		// Emit the call to the function
call = CallInst::Create(newFunction, params,		call = CallInst::Create(newFunction, params,
NumExitBlocks > 1 ? "targetBlock" : "");		NumExitBlocks > 1 ? "targetBlock" : "");
// Add debug location to the new call, if the original function has debug		// Add debug location to the new call, if the original function has debug
// info. In that case, the terminator of the entry block of the extracted		// info. In that case, the terminator of the entry block of the extracted
// function contains the first debug location of the extracted function,		// function contains the first debug location of the extracted function,
// set in extractCodeRegion.		// set in extractCodeRegion.
if (codeReplacer->getParent()->getSubprogram()) {		if (codeReplacer->getParent()->getSubprogram()) {
if (auto DL = newFunction->getEntryBlock().getTerminator()->getDebugLoc())		if (auto DL = newFunction->getEntryBlock().getTerminator()->getDebugLoc())
call->setDebugLoc(DL);		call->setDebugLoc(DL);
}		}
codeReplacer->getInstList().push_back(call);		codeReplacer->getInstList().push_back(call);

// Set swifterror parameter attributes.		// Set swifterror parameter attributes.
for (unsigned SwiftErrArgNo : SwiftErrorArgs) {		for (unsigned SwiftErrArgNo : SwiftErrorArgs) {
call->addParamAttr(SwiftErrArgNo, Attribute::SwiftError);		call->addParamAttr(SwiftErrArgNo, Attribute::SwiftError);
newFunction->addParamAttr(SwiftErrArgNo, Attribute::SwiftError);		newFunction->addParamAttr(SwiftErrArgNo, Attribute::SwiftError);
}		}

Function::arg_iterator OutputArgBegin = newFunction->arg_begin();		// Reload the outputs passed in by reference, use the struct if output is in
unsigned FirstOut = inputs.size();		// the aggregate or reload from the scalar argument.
if (!AggregateArgs)		for (unsigned i = 0, e = outputs.size(), scalarIdx = 0,
std::advance(OutputArgBegin, inputs.size());		aggIdx = NumAggregatedInputs;
		i != e; ++i) {
// Reload the outputs passed in by reference.
for (unsigned i = 0, e = outputs.size(); i != e; ++i) {
Value *Output = nullptr;		Value *Output = nullptr;
if (AggregateArgs) {		if (AggregateArgs && StructValues.contains(outputs[i])) {
Value *Idx[2];		Value *Idx[2];
Idx[0] = Constant::getNullValue(Type::getInt32Ty(Context));		Idx[0] = Constant::getNullValue(Type::getInt32Ty(Context));
Idx[1] = ConstantInt::get(Type::getInt32Ty(Context), FirstOut + i);		Idx[1] = ConstantInt::get(Type::getInt32Ty(Context), aggIdx);
GetElementPtrInst *GEP = GetElementPtrInst::Create(		GetElementPtrInst *GEP = GetElementPtrInst::Create(
StructArgTy, Struct, Idx, "gep_reload_" + outputs[i]->getName());		StructArgTy, Struct, Idx, "gep_reload_" + outputs[i]->getName());
codeReplacer->getInstList().push_back(GEP);		codeReplacer->getInstList().push_back(GEP);
Output = GEP;		Output = GEP;
		++aggIdx;
} else {		} else {
Output = ReloadOutputs[i];		Output = ReloadOutputs[scalarIdx];
		++scalarIdx;
}		}
LoadInst *load = new LoadInst(outputs[i]->getType(), Output,		LoadInst *load = new LoadInst(outputs[i]->getType(), Output,
outputs[i]->getName() + ".reload",		outputs[i]->getName() + ".reload",
codeReplacer);		codeReplacer);
Reloads.push_back(load);		Reloads.push_back(load);
std::vector<User *> Users(outputs[i]->user_begin(), outputs[i]->user_end());		std::vector<User *> Users(outputs[i]->user_begin(), outputs[i]->user_end());
for (unsigned u = 0, e = Users.size(); u != e; ++u) {		for (unsigned u = 0, e = Users.size(); u != e; ++u) {
Instruction *inst = cast<Instruction>(Users[u]);		Instruction *inst = cast<Instruction>(Users[u]);
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i) {
// rewrite the original branch instruction with this new target		// rewrite the original branch instruction with this new target
TI->setSuccessor(i, NewTarget);		TI->setSuccessor(i, NewTarget);
}		}
}		}

// Store the arguments right after the definition of output value.		// Store the arguments right after the definition of output value.
// This should be proceeded after creating exit stubs to be ensure that invoke		// This should be proceeded after creating exit stubs to be ensure that invoke
// result restore will be placed in the outlined function.		// result restore will be placed in the outlined function.
Function::arg_iterator OAI = OutputArgBegin;		Function::arg_iterator ScalarOutputArgBegin = newFunction->arg_begin();
for (unsigned i = 0, e = outputs.size(); i != e; ++i) {		std::advance(ScalarOutputArgBegin, ScalarInputArgNo);
		Function::arg_iterator AggOutputArgBegin = newFunction->arg_begin();
		std::advance(AggOutputArgBegin, ScalarInputArgNo + ScalarOutputArgNo);

		for (unsigned i = 0, e = outputs.size(), aggIdx = NumAggregatedInputs; i != e;
		++i) {
auto *OutI = dyn_cast<Instruction>(outputs[i]);		auto *OutI = dyn_cast<Instruction>(outputs[i]);
if (!OutI)		if (!OutI)
continue;		continue;

// Find proper insertion point.		// Find proper insertion point.
BasicBlock::iterator InsertPt;		BasicBlock::iterator InsertPt;
// In case OutI is an invoke, we insert the store at the beginning in the		// In case OutI is an invoke, we insert the store at the beginning in the
// 'normal destination' BB. Otherwise we insert the store right after OutI.		// 'normal destination' BB. Otherwise we insert the store right after OutI.
if (auto *InvokeI = dyn_cast<InvokeInst>(OutI))		if (auto *InvokeI = dyn_cast<InvokeInst>(OutI))
InsertPt = InvokeI->getNormalDest()->getFirstInsertionPt();		InsertPt = InvokeI->getNormalDest()->getFirstInsertionPt();
else if (auto *Phi = dyn_cast<PHINode>(OutI))		else if (auto *Phi = dyn_cast<PHINode>(OutI))
InsertPt = Phi->getParent()->getFirstInsertionPt();		InsertPt = Phi->getParent()->getFirstInsertionPt();
else		else
InsertPt = std::next(OutI->getIterator());		InsertPt = std::next(OutI->getIterator());

Instruction InsertBefore = &InsertPt;		Instruction InsertBefore = &InsertPt;
assert((InsertBefore->getFunction() == newFunction \|\|		assert((InsertBefore->getFunction() == newFunction \|\|
Blocks.count(InsertBefore->getParent())) &&		Blocks.count(InsertBefore->getParent())) &&
"InsertPt should be in new function");		"InsertPt should be in new function");
assert(OAI != newFunction->arg_end() &&		if (AggregateArgs && StructValues.contains(outputs[i])) {
"Number of output arguments should match "		assert(AggOutputArgBegin != newFunction->arg_end() &&
"the amount of defined values");		"Number of aggregate output arguments should match "
if (AggregateArgs) {		"the number of defined values");
Value *Idx[2];		Value *Idx[2];
Idx[0] = Constant::getNullValue(Type::getInt32Ty(Context));		Idx[0] = Constant::getNullValue(Type::getInt32Ty(Context));
Idx[1] = ConstantInt::get(Type::getInt32Ty(Context), FirstOut + i);		Idx[1] = ConstantInt::get(Type::getInt32Ty(Context), aggIdx);
GetElementPtrInst *GEP = GetElementPtrInst::Create(		GetElementPtrInst *GEP = GetElementPtrInst::Create(
StructArgTy, &*OAI, Idx, "gep_" + outputs[i]->getName(),		StructArgTy, &*AggOutputArgBegin, Idx, "gep_" + outputs[i]->getName(),
InsertBefore);		InsertBefore);
new StoreInst(outputs[i], GEP, InsertBefore);		new StoreInst(outputs[i], GEP, InsertBefore);
		++aggIdx;
// Since there should be only one struct argument aggregating		// Since there should be only one struct argument aggregating
// all the output values, we shouldn't increment OAI, which always		// all the output values, we shouldn't increment AggOutputArgBegin, which
// points to the struct argument, in this case.		// always points to the struct argument, in this case.
} else {		} else {
new StoreInst(outputs[i], &*OAI, InsertBefore);		assert(ScalarOutputArgBegin != newFunction->arg_end() &&
++OAI;		"Number of scalar output arguments should match "
		"the number of defined values");
		new StoreInst(outputs[i], &*ScalarOutputArgBegin, InsertBefore);
		++ScalarOutputArgBegin;
}		}
}		}

// Now that we've done the deed, simplify the switch instruction.		// Now that we've done the deed, simplify the switch instruction.
Type *OldFnRetTy = TheSwitch->getParent()->getParent()->getReturnType();		Type *OldFnRetTy = TheSwitch->getParent()->getParent()->getReturnType();
switch (NumExitBlocks) {		switch (NumExitBlocks) {
case 0:		case 0:
// There are no successors (the block containing the switch itself), which		// There are no successors (the block containing the switch itself), which
▲ Show 20 Lines • Show All 482 Lines • ▼ Show 20 Lines	for (auto AffectedValVH : AC->assumptionsFor(I->getOperand(0))) {
return true;		return true;
auto *AssumedInst = cast<Instruction>(AffectedCI->getOperand(0));		auto *AssumedInst = cast<Instruction>(AffectedCI->getOperand(0));
if (AssumedInst->getFunction() != &OldFunc)		if (AssumedInst->getFunction() != &OldFunc)
return true;		return true;
}		}
}		}
return false;		return false;
}		}

		void CodeExtractor::excludeArgFromAggregate(Value *Arg) {
		ExcludeArgsFromAggregate.insert(Arg);
		}

llvm/unittests/Transforms/Utils/CodeExtractorTest.cpp

Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	TEST(CodeExtractor, ExitBlockOrderingPhis) {
ConstantInt *CIFirst = dyn_cast<ConstantInt>(FirstReturn->getReturnValue());		ConstantInt *CIFirst = dyn_cast<ConstantInt>(FirstReturn->getReturnValue());
EXPECT_TRUE(CIFirst->getLimitedValue() == 1u);		EXPECT_TRUE(CIFirst->getLimitedValue() == 1u);

Instruction *NextTerm = NextExitStub->getTerminator();		Instruction *NextTerm = NextExitStub->getTerminator();
ReturnInst *NextReturn = dyn_cast<ReturnInst>(NextTerm);		ReturnInst *NextReturn = dyn_cast<ReturnInst>(NextTerm);
EXPECT_TRUE(NextReturn);		EXPECT_TRUE(NextReturn);
ConstantInt *CINext = dyn_cast<ConstantInt>(NextReturn->getReturnValue());		ConstantInt *CINext = dyn_cast<ConstantInt>(NextReturn->getReturnValue());
EXPECT_TRUE(CINext->getLimitedValue() == 0u);		EXPECT_TRUE(CINext->getLimitedValue() == 0u);

EXPECT_FALSE(verifyFunction(*Outlined));		EXPECT_FALSE(verifyFunction(*Outlined));
EXPECT_FALSE(verifyFunction(*Func));		EXPECT_FALSE(verifyFunction(*Func));
}		}

TEST(CodeExtractor, ExitBlockOrdering) {		TEST(CodeExtractor, ExitBlockOrdering) {
LLVMContext Ctx;		LLVMContext Ctx;
SMDiagnostic Err;		SMDiagnostic Err;
std::unique_ptr<Module> M(parseAssemblyString(R"invalid(		std::unique_ptr<Module> M(parseAssemblyString(R"invalid(
Show All 40 Lines	TEST(CodeExtractor, ExitBlockOrdering) {
ConstantInt *CIFirst = dyn_cast<ConstantInt>(FirstReturn->getReturnValue());		ConstantInt *CIFirst = dyn_cast<ConstantInt>(FirstReturn->getReturnValue());
EXPECT_TRUE(CIFirst->getLimitedValue() == 1u);		EXPECT_TRUE(CIFirst->getLimitedValue() == 1u);

Instruction *NextTerm = NextExitStub->getTerminator();		Instruction *NextTerm = NextExitStub->getTerminator();
ReturnInst *NextReturn = dyn_cast<ReturnInst>(NextTerm);		ReturnInst *NextReturn = dyn_cast<ReturnInst>(NextTerm);
EXPECT_TRUE(NextReturn);		EXPECT_TRUE(NextReturn);
ConstantInt *CINext = dyn_cast<ConstantInt>(NextReturn->getReturnValue());		ConstantInt *CINext = dyn_cast<ConstantInt>(NextReturn->getReturnValue());
EXPECT_TRUE(CINext->getLimitedValue() == 0u);		EXPECT_TRUE(CINext->getLimitedValue() == 0u);

EXPECT_FALSE(verifyFunction(*Outlined));		EXPECT_FALSE(verifyFunction(*Outlined));
EXPECT_FALSE(verifyFunction(*Func));		EXPECT_FALSE(verifyFunction(*Func));
}		}

TEST(CodeExtractor, ExitPHIOnePredFromRegion) {		TEST(CodeExtractor, ExitPHIOnePredFromRegion) {
LLVMContext Ctx;		LLVMContext Ctx;
SMDiagnostic Err;		SMDiagnostic Err;
std::unique_ptr<Module> M(parseAssemblyString(R"invalid(		std::unique_ptr<Module> M(parseAssemblyString(R"invalid(
▲ Show 20 Lines • Show All 242 Lines • ▼ Show 20 Lines	TEST(CodeExtractor, RemoveBitcastUsesFromOuterLifetimeMarkers) {
CE.findInputsOutputs(Inputs, Outputs, SinkingCands);		CE.findInputsOutputs(Inputs, Outputs, SinkingCands);
EXPECT_EQ(Outputs.size(), 0U);		EXPECT_EQ(Outputs.size(), 0U);

Function *Outlined = CE.extractCodeRegion(CEAC);		Function *Outlined = CE.extractCodeRegion(CEAC);
EXPECT_TRUE(Outlined);		EXPECT_TRUE(Outlined);
EXPECT_FALSE(verifyFunction(*Outlined));		EXPECT_FALSE(verifyFunction(*Outlined));
EXPECT_FALSE(verifyFunction(*Func));		EXPECT_FALSE(verifyFunction(*Func));
}		}

		TEST(CodeExtractor, PartialAggregateArgs) {
		LLVMContext Ctx;
		SMDiagnostic Err;
		std::unique_ptr<Module> M(parseAssemblyString(R"ir(
		target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
		target triple = "x86_64-unknown-linux-gnu"

		declare void @use(i32)

		define void @foo(i32 %a, i32 %b, i32 %c) {
		entry:
		br label %extract

		extract:
		call void @use(i32 %a)
		call void @use(i32 %b)
		call void @use(i32 %c)
		br label %exit

		exit:
		ret void
		}
		)ir",
		Err, Ctx));

		Function *Func = M->getFunction("foo");
		SmallVector<BasicBlock *, 1> Blocks{getBlockByName(Func, "extract")};

		// Create the CodeExtractor with arguments aggregation enabled.
		CodeExtractor CE(Blocks, /* DominatorTree */ nullptr,
		/* AggregateArgs */ true);
		EXPECT_TRUE(CE.isEligible());

		CodeExtractorAnalysisCache CEAC(*Func);
		SetVector<Value *> Inputs, Outputs, SinkingCands, HoistingCands;
		BasicBlock *CommonExit = nullptr;
		CE.findAllocas(CEAC, SinkingCands, HoistingCands, CommonExit);
		CE.findInputsOutputs(Inputs, Outputs, SinkingCands);
		// Exclude the first input from the argument aggregate.
		CE.excludeArgFromAggregate(Inputs[0]);

		Function *Outlined = CE.extractCodeRegion(CEAC, Inputs, Outputs);
		EXPECT_TRUE(Outlined);
		// Expect 2 arguments in the outlined function: the excluded input and the
		// struct aggregate for the remaining inputs.
		EXPECT_EQ(Outlined->arg_size(), 2U);
		EXPECT_FALSE(verifyFunction(*Outlined));
		EXPECT_FALSE(verifyFunction(*Func));
		}
} // end anonymous namespace		} // end anonymous namespace

This is an archive of the discontinued LLVM Phabricator instance.

[CodeExtractor] Enable partial aggregate argumentsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 403100

llvm/include/llvm/Transforms/Utils/CodeExtractor.h

llvm/lib/Transforms/Utils/CodeExtractor.cpp

llvm/unittests/Transforms/Utils/CodeExtractorTest.cpp

[CodeExtractor] Enable partial aggregate arguments
ClosedPublic