This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
InitializePasses.h
-
Transforms/
-
Scalar.h
-
lib/Transforms/
-
Transforms/
-
IPO/
2
PassManagerBuilder.cpp
-
Scalar/
17
AggregateLifter.cpp
-
CMakeLists.txt
-
Scalar.cpp
-
test/Transforms/AggregateLifter/
-
Transforms/
-
AggregateLifter/
-
extractvalue.ll
-
store.ll

Differential D12269

Add a pass to lift aggregate into allocas so SROA can get rid of them.
AbandonedPublic

Authored by deadalnix on Aug 23 2015, 12:47 AM.

Download Raw Diff

Details

Reviewers

reames
majnemer
mehdi_amini
hfinkel

Summary

As per title. After various discussion, it seems that this may be a better approach than transforming them into scalar. The pass itself does not very much to optimize, but transform the IR is a way that subsequents passes, mainly SROA, can consume and optimize.

Diff Detail

Event Timeline

deadalnix updated this revision to Diff 32921.Aug 23 2015, 12:47 AM

deadalnix retitled this revision from to Add a pass to lift aggregate into allocas so SROA can get rid of them..

deadalnix updated this object.

deadalnix added reviewers: hfinkel, majnemer, reames, mehdi_amini.

deadalnix added a subscriber: llvm-commits.

Hi Amaury,

Overall that seems pretty good, I'm glad you find a way to leverage SROA.

I feel the implementation could benefit from more comments/description. You'll find some comments inline.

Best,

Mehdi

lib/Transforms/IPO/PassManagerBuilder.cpp
211	I think this should probably go just before SROA, either at the three places it is inserted in this file, or only before the one in the FPM. I tend to think the latter is more appropriate, unless there is room for the inliner to expose more cases than can be handled. What is the impact of this transformation on the code generated by clang? Should it be enabled by a flag only?
lib/Transforms/Scalar/AggregateLifter.cpp
11	s/ans/and/
12	s/rewrting/rewriting/
78	It seems this is needed because `FixupWorklist` is a map. But it is not clear to me why are you needing the map behavior?
92	for-range?
112	`getTypeStoreSize()` is fine for the `memcpy` I think. But since you get it just before creating the alloca, the name `Size` for the variable can be ambiguous. What about either moving it closer to its use, or renaming it something like `StoreSize`.
133	What about refactoring with an `AggregateLifterPass` that creates a transient object `AggregateLifter` which is not a pass, takes a reference to a Function and keep it as a member, and initialize a DataLayout reference member. (and make `liftStore` a private member) This pattern avoid having to dereference multiple pointer all over the place to get back to the parent Function and the DataLayout. Also having less state in the Pass itself seems cleaner to me in general. I see the pass as a wrapper for the transformation only (single responsibility). I also believe it makes it easier to reuse, for instance in the new PassManager interface.
137	Smaller is not a correctness issue right? I think larger might be more problematic. (Also at this point you might be storing somewhere else than at the beginning of the alloca, processing a GEP)

deadalnix added inline comments.Aug 23 2015, 10:37 PM

lib/Transforms/IPO/PassManagerBuilder.cpp
211	Yeah i inserted it pretty randomly. clang should not be affected by this as it doesn't generate aggregate loads/stores.
lib/Transforms/Scalar/AggregateLifter.cpp
78	In this diff, I don't really need. But you onlyhave the "top-down" part of the problem here : you get a load and lift from there. In the case you get a store of a value that not come from a load to begin with, you need to go bottom up ie you lift starting from the store. While going bottom-up, you need to check if some plan exists for other aggregate you encounter, and this is where the map semantic is needed.
92	Is there a simple way to go backward using range ? construct à la range(rbegin rend) are not very handy.
133	Sounds good.
137	Yeah the comment is wrong.< Also, I'm not sure what if, that is exactly why this comment is here. Thinking more about it, I think there may be a correctness concern here, I have to think more about this.

majnemer added inline comments.Aug 23 2015, 10:51 PM

lib/Transforms/Scalar/AggregateLifter.cpp
133	I imagine we'd want to run the utility from within CGP so that frontends can rely on CodeGen succeeding in -O0 situations.

deadalnix added inline comments.Aug 25 2015, 1:01 AM

lib/Transforms/Scalar/AggregateLifter.cpp
133	I'm not sure. This pass most likely make thing worse by itself. Without getting other passes to optimize the result, I'm not sure this is worth doing. Maybe it could be integrated to SROA itself.

deadalnix added a reviewer: chandlerc.Aug 25 2015, 11:36 PM

Various nits.
Use FunctionLifter as a per function state.

Is the approach agreed upon ? If so, what is the best place to put this in ?
@chandlerc You seems to me the SROA guy, do you think this could be integrated with in SROA ? If yes, what is the best way to proceed ? If no, what is a good place to put this in ?

I'm really not sure this is the right approach, despite liking a number of
aspects and being optimistic early-on. But I've not yet had time to put my
thoughts here into a nice cohesive email. I'll do that as soon as I can to
help resolve this.

msg-6101-655.txt162 BDownload

ping ?

Rebase. @chandlerc , could you elaborate on your concerns ? Without any feedback from you, this is stuck in limbos.

Rebase, ping ping ping

Chandler, please do comment on the approach. We should get this settled.

lib/Transforms/Scalar/AggregateLifter.cpp
14	You should provide some small examples here of what this actually means.
118	Don't need { } if there is only one conditional statement. Same comment applies to several places below.
134	Looks like we're dropping 'volatile' and any associated metadata here when we convert these loads.

ping

deadalnix added inline comments.Sep 26 2015, 12:51 AM

lib/Transforms/Scalar/AggregateLifter.cpp
134	Aren't volatile not considered simple ? In which case, the transoformation shouldn't happen on them.

I indicated previously that I wasn't sure this is the correct approach. I wanted to clarify that.

The problem I see with the approach of lowering things as loads and stores of (perhaps excessively large) integers is that I think it creates a bad canonicalization problem. Let's consider what happens when we lower as i128 loads and stores of an aggregate. Later on, we may inline and end up access the low and high 64 bits as f64 types. But in order to do that, we would have to use shift and trunc instructions on the integer values. As a consequence of these shift and truncs, we would have integer operations on the bits and avoid properly canonicalizing loads and stores or the basic domain used. =/ I'm moderately worried about this.

The crux of the problem is that we would introduce *strongly typed* operations like 'shl', 'lshr' to extract the narrow regions from the former aggregate. The extractvalue (and insertvalue) instructions quite usefully can attach the type of the extracted or inserted value rather than any particular integer type. As a consequence, I think that we should model this by extracting each value from the FCA, and storing it separately.

For function arguments, this logic doesn't really seem to apply. There, I feel like frontends should be able to fully decompose things or fuse things into proper primitive types.

The only remaining reason why aggregates really exist are to support return types. I actually think that's OK, they're not a terrible syntax and mechanism for modeling multiple return types. I would particularly like it if the only thing you could reasonably do is extractvalue the independent values. This would make aggregates just the LLVM mechanism for doing multiple return types with proper SSA form. I think that's fine. Maybe some day we would even want to replace it with TokenType, for now aggregates provide a nice layer of type checking and sanity checking for returns.

Now, some may wonder what about my vehement arguments about bitfields using large integer types for loads and stores and the memory model implications. Fundamentally, I think these are different concerns. There are two core differences. First, bitfields are *defined* as a single memory location by the frontend language, and so it is reasonable for us to go to some lengths to operate on them as such. I don't think that aggregates are actually so defined in any source languages I'm aware of (C, C++, ObjC, Go, Swift, CUDA, to name a few). But as an example, if a frontend wants to *specifically define* a load or store of an aggregate as an atomic load or store of N bits of memory, I'd rather the frontend choose to express that in an N-bit integer operation with the IR rather than a transform pass doing it.

The second, and perhaps overriding reason why I think all of the arguments line up a different way for bitfields is that bitfields quite fundamentally *are* integers. We lose nothing by modeling them as such and create no canonicalization problems.

Does this make sense to folks?

(Again, sorry for the delay writing this up. A decent chunk of it was taking quite a few hours to carefully think through all the consequences and convince myself that they were OK.)

-Chandler

In D12269#256309, @chandlerc wrote:

I indicated previously that I wasn't sure this is the correct approach. I wanted to clarify that.

The problem I see with the approach of lowering things as loads and stores of (perhaps excessively large) integers is that I think it creates a bad canonicalization problem. Let's consider what happens when we lower as i128 loads and stores of an aggregate. Later on, we may inline and end up access the low and high 64 bits as f64 types. But in order to do that, we would have to use shift and trunc instructions on the integer values. As a consequence of these shift and truncs, we would have integer operations on the bits and avoid properly canonicalizing loads and stores or the basic domain used. =/ I'm moderately worried about this.

You seems to be operating under the previous proposal, aka promoting aggregate as large integer stores/loads. This is a new proposal, that transform the load/store into memcpy to allocs, and let SROA do its job from there. This way of doing things would avoid the problem you mention.

The only remaining reason why aggregates really exist are to support return types. I actually think that's OK, they're not a terrible syntax and mechanism for modeling multiple return types. I would particularly like it if the only thing you could reasonably do is extractvalue the independent values. This would make aggregates just the LLVM mechanism for doing multiple return types with proper SSA form. I think that's fine. Maybe some day we would even want to replace it with TokenType, for now aggregates provide a nice layer of type checking and sanity checking for returns.

I do not think this is the only reason. Aggregate have memory layout that depend on various target characteristics and require some logic to compute offset and alignment of various fields in them. Duplicating that logic in every frontends seems like a net lose to me.

I'm not sure what you mean by TokenType, but in the current state of affair, it seems like we should support aggregate operations, at least.

The second, and perhaps overriding reason why I think all of the arguments line up a different way for bitfields is that bitfields quite fundamentally *are* integers. We lose nothing by modeling them as such and create no canonicalization problems.

I do think this proposal, that load from allocas, do not suffer from this problem.

In D12269#257155, @deadalnix wrote:

In D12269#256309, @chandlerc wrote:

I indicated previously that I wasn't sure this is the correct approach. I wanted to clarify that.

The problem I see with the approach of lowering things as loads and stores of (perhaps excessively large) integers is that I think it creates a bad canonicalization problem. Let's consider what happens when we lower as i128 loads and stores of an aggregate. Later on, we may inline and end up access the low and high 64 bits as f64 types. But in order to do that, we would have to use shift and trunc instructions on the integer values. As a consequence of these shift and truncs, we would have integer operations on the bits and avoid properly canonicalizing loads and stores or the basic domain used. =/ I'm moderately worried about this.

You seems to be operating under the previous proposal, aka promoting aggregate as large integer stores/loads. This is a new proposal, that transform the load/store into memcpy to allocs, and let SROA do its job from there. This way of doing things would avoid the problem you mention.

Sorry that I replied to the prior proposal. Anyways, this is... interesting.

So this *only* handles loads and stores of globals? Are those common? They aren't the common case of aggregate loads and stores that I have seen...

The only remaining reason why aggregates really exist are to support return types. I actually think that's OK, they're not a terrible syntax and mechanism for modeling multiple return types. I would particularly like it if the only thing you could reasonably do is extractvalue the independent values. This would make aggregates just the LLVM mechanism for doing multiple return types with proper SSA form. I think that's fine. Maybe some day we would even want to replace it with TokenType, for now aggregates provide a nice layer of type checking and sanity checking for returns.

I do not think this is the only reason. Aggregate have memory layout that depend on various target characteristics and require some logic to compute offset and alignment of various fields in them. Duplicating that logic in every frontends seems like a net lose to me.

I guess we disagree then, but we don't need to debate that on this thread.

I'm not sure what you mean by TokenType, but in the current state of affair, it seems like we should support aggregate operations, at least.

The second, and perhaps overriding reason why I think all of the arguments line up a different way for bitfields is that bitfields quite fundamentally *are* integers. We lose nothing by modeling them as such and create no canonicalization problems.

I do think this proposal, that load from allocas, do not suffer from this problem.

Sure, this doesn't make anything worse. Again, sorry for missing the change in design, the threads have become fairly fragmented.

But I don't really understand what this is trying to improve so it is hard for me to evaluate it. It seems really silly to memcpy into allocas just so that SROA can split them and promote them again. That is a lot of compile time and effort for something that we already know how to handle.

If *all* you want to handle are loads and stores of aggregates from globals, you could take the FCA splitter that is already used in SROA and promote it to a separate pass and run it over non-alloca instructions.... This would work today AFAICT? (You would need to make it a utility as SROA would also want to run it internally.)

What makes you think it is limited to globals ? It is for any load and/or store from memory in general. SROA doesn't touch theses. Personally, the case I'm trying to solve are aggregate access to memory that is freshly allocated. Other people have voiced interest in having these kind of load/store optimized, I can't speak for their reasons, but overall, this seems to be of interest. i used globals in the tests cases as it was easy, but this should not be limited to global, or that would be fairly useless.

Right now, this kind of operation is plain ignored by the optimizer. What I'm trying to do is transform these into something the rest of the pipeline understands and will process nicely. Lifting theses values into allocas create something that is not identical, but similar enough to what clang generate that the rest of the pipeline pick it up and optimize it nicely.

This basically transforms :

%1 = load { i8*, i64 }, { i8*, i64 }* %ptr
%2 = extractvalue { i8*, i64 } %1, 0
%3 = extractvalue { i8*, i64 } %1, 1

Into something like

%1 = alloca { i8*, i64 }
call void @llvm.memcpy(%1, %ptr, sizeof({ i8*, i64 })) ; pseudo code, you get the idea.
%2.lifted = gep { i8*, i64 }, { i8*, i64 }* %1, 0
%2 = load i8*, i8** %2.lifted
%3.lifted = gep { i8*, i64 }, { i8*, i64 }* %1, 1
%3 = load i64, i64* %3.lifted

As result, instead of having aggregate loads and stores, you get a memcpy and a set of non aggregate loads and stores. The rest of the pipeline pick up on this nicely and is able to optimize this away.

This has been made in its own pass as to have a POC up and see if it works well (it does, that is the model I got the best results with so far), but I indeed wondering if making SROA do it is not a better idea.

OK, so it is trying to optimize everything that isn't a load or store from an alloca.

I replied to the general concept at the bottom, and I think that still applies -- I don't think it's the right design to create allocas and memcpy instructions just for SROA to clean up. We should directly optimize the aggregate loads and stores.

@chandler , do you think adding this to SROA, so we don't lift thing to then let SROA optimize but rather optimize right away in SROA make sense ? If so, do you have some pointer on how to implement this in SROA ?

deadalnix mentioned this in D13379: Add an utility to lift aggregate loads in SROA..Oct 2 2015, 1:39 AM

Doing it in SROA in D13379 . Seems like a better place than in its own pass.

deadalnix abandoned this revision.Oct 12 2015, 4:07 PM

Revision Contents

Path

Size

include/

llvm/

InitializePasses.h

1 line

Transforms/

Scalar.h

6 lines

lib/

Transforms/

IPO/

PassManagerBuilder.cpp

3 lines

Scalar/

AggregateLifter.cpp

227 lines

CMakeLists.txt

1 line

Scalar.cpp

1 line

test/

Transforms/

AggregateLifter/

extractvalue.ll

42 lines

store.ll

42 lines

Diff 35305

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 294 Lines • ▼ Show 20 Lines
	void initializePlaceBackedgeSafepointsImplPass(PassRegistry&);			void initializePlaceBackedgeSafepointsImplPass(PassRegistry&);
	void initializePlaceSafepointsPass(PassRegistry&);			void initializePlaceSafepointsPass(PassRegistry&);
	void initializeDwarfEHPreparePass(PassRegistry&);			void initializeDwarfEHPreparePass(PassRegistry&);
	void initializeFloat2IntPass(PassRegistry&);			void initializeFloat2IntPass(PassRegistry&);
	void initializeLoopDistributePass(PassRegistry&);			void initializeLoopDistributePass(PassRegistry&);
	void initializeSjLjEHPreparePass(PassRegistry&);			void initializeSjLjEHPreparePass(PassRegistry&);
	void initializeDemandedBitsPass(PassRegistry&);			void initializeDemandedBitsPass(PassRegistry&);
	void initializeFuncletLayoutPass(PassRegistry &);			void initializeFuncletLayoutPass(PassRegistry &);
				void initializeAggregateLifterPass(PassRegistry&);
	}			}

	#endif			#endif

include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 474 Lines • ▼ Show 20 Lines
	FunctionPass *createNaryReassociatePass();			FunctionPass *createNaryReassociatePass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LoopDistribute - Distribute loops.			// LoopDistribute - Distribute loops.
	//			//
	FunctionPass *createLoopDistributePass();			FunctionPass *createLoopDistributePass();

				//===----------------------------------------------------------------------===//
				//
				// AggregateLifter - Convert aggregate load/store into
				// optimizable operations in alloca.
				FunctionPass *createAggregateLifterPass();

	} // End llvm namespace			} // End llvm namespace

	#endif			#endif

lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 201 Lines • ▼ Show 20 Lines	if (OptLevel == 0) {
addExtensionsToPM(EP_EnabledOnOptLevel0, MPM);		addExtensionsToPM(EP_EnabledOnOptLevel0, MPM);
return;		return;
}		}

// Add LibraryInfo if we have some.		// Add LibraryInfo if we have some.
if (LibraryInfo)		if (LibraryInfo)
MPM.add(new TargetLibraryInfoWrapperPass(*LibraryInfo));		MPM.add(new TargetLibraryInfoWrapperPass(*LibraryInfo));

		// Lift aggregate load/store into alloca
		MPM.add(createAggregateLifterPass());
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions I think this should probably go just before SROA, either at the three places it is inserted in this file, or only before the one in the FPM. I tend to think the latter is more appropriate, unless there is room for the inliner to expose more cases than can be handled. What is the impact of this transformation on the code generated by clang? Should it be enabled by a flag only? mehdi_amini: I think this should probably go just before SROA, either at the three places it is inserted in…
		deadalnixAuthorUnsubmitted Not Done Reply Inline Actions Yeah i inserted it pretty randomly. clang should not be affected by this as it doesn't generate aggregate loads/stores. deadalnix: Yeah i inserted it pretty randomly. clang should not be affected by this as it doesn't generate…

addInitialAliasAnalysisPasses(MPM);		addInitialAliasAnalysisPasses(MPM);

if (!DisableUnitAtATime) {		if (!DisableUnitAtATime) {
addExtensionsToPM(EP_ModuleOptimizerEarly, MPM);		addExtensionsToPM(EP_ModuleOptimizerEarly, MPM);

MPM.add(createIPSCCPPass()); // IP SCCP		MPM.add(createIPSCCPPass()); // IP SCCP
MPM.add(createGlobalOptimizerPass()); // Optimize out global vars		MPM.add(createGlobalOptimizerPass()); // Optimize out global vars

▲ Show 20 Lines • Show All 469 Lines • Show Last 20 Lines

lib/Transforms/Scalar/AggregateLifter.cpp

This file was added.

				//===- AggregateLifter.cpp - Lift aggregate loads/stores into alloca ------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				/// \file
				/// This transforms removes load and stores of aggregate type by replacing them
				/// by a memcpy into an alloc and rewriting IR to use it. This will allow
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions s/ans/and/ mehdi_amini: s/ans/and/
				/// subsequent passes, notably SROA, to optimize properly.
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions s/rewrting/rewriting/ mehdi_amini: s/rewrting/rewriting/
				///
				//===----------------------------------------------------------------------===//
				hfinkelUnsubmitted Not Done Reply Inline Actions You should provide some small examples here of what this actually means. hfinkel: You should provide some small examples here of what this actually means.

				#include "llvm/Transforms/Scalar.h"

				#include "llvm/ADT/Statistic.h"
				#include "llvm/Pass.h"
				#include "llvm/ADT/DenseMap.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/InstIterator.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/Module.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"

				using namespace llvm;

				#define DEBUG_TYPE "aggregate-lifter"

				namespace {

				class AggregateLifter : public FunctionPass {

				public:
				AggregateLifter(): FunctionPass(ID) { }

				bool runOnFunction(Function &F) override;
				void getAnalysisUsage(AnalysisUsage &AU) const override;

				const char *getPassName() const override { return "AggregateLifter"; }
				static char ID;
				};

				typedef std::pair<Instruction , Value > FixupPair;
				typedef DenseMap<Instruction , Value > FixupMap;

				class FunctionLifter {
				Function &F;
				const DataLayout &DL;
				IRBuilder<> AtEntry;

				FixupMap FixupWorklist;
				SmallVector<Instruction*, 8> InstrsToErase;

				public:
				FunctionLifter(Function &F): F(F),
				DL(F.getParent()->getDataLayout()),
				AtEntry(F.getEntryBlock().begin()) {}

				bool run();

				private:
				void runOnLoad(LoadInst *LI);
				void liftAggregate(Instruction I, Value A);

				bool liftStore(StoreInst SI, Value A);
				bool liftExtractValue(ExtractValueInst EVI, Value A);
				};

				}

				bool AggregateLifter::runOnFunction(Function &F) {
				if (skipOptnoneFunction(F))
				return false;

				mehdi_aminiUnsubmitted Not Done Reply Inline Actions It seems this is needed because `FixupWorklist` is a map. But it is not clear to me why are you needing the map behavior? mehdi_amini: It seems this is needed because `FixupWorklist` is a map. But it is not clear to me why are you…
				deadalnixAuthorUnsubmitted Not Done Reply Inline Actions In this diff, I don't really need. But you onlyhave the "top-down" part of the problem here : you get a load and lift from there. In the case you get a store of a value that not come from a load to begin with, you need to go bottom up ie you lift starting from the store. While going bottom-up, you need to check if some plan exists for other aggregate you encounter, and this is where the map semantic is needed. deadalnix: In this diff, I don't really need. But you onlyhave the "top-down" part of the problem here…
				DEBUG(dbgs() << "AggregateLifter function: " << F.getName() << "\n");

				FunctionLifter L(F);
				return L.run();
				}

				bool FunctionLifter::run() {
				for (auto &I : instructions(F)) {
				LoadInst* LI = dyn_cast<LoadInst>(&I);
				if (!LI) {
				continue;
				}

				runOnLoad(LI);
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions for-range? mehdi_amini: for-range?
				deadalnixAuthorUnsubmitted Not Done Reply Inline Actions Is there a simple way to go backward using range ? construct à la range(rbegin rend) are not very handy. deadalnix: Is there a simple way to go backward using range ? construct à la range(rbegin rend) are not…
				}

				while (!FixupWorklist.empty()) {
				FixupMap CurrentWorklist;
				std::swap(CurrentWorklist, FixupWorklist);

				for (auto &I : CurrentWorklist) {
				liftAggregate(I.first, I.second);
				}
				}

				if (InstrsToErase.empty()) {
				return false;
				}

				unsigned Count = InstrsToErase.size();
				for (unsigned i = Count - 1; i < Count; --i) {
				InstrsToErase[i]->eraseFromParent();
				}

				mehdi_aminiUnsubmitted Not Done Reply Inline Actions `getTypeStoreSize()` is fine for the `memcpy` I think. But since you get it just before creating the alloca, the name `Size` for the variable can be ambiguous. What about either moving it closer to its use, or renaming it something like `StoreSize`. mehdi_amini: `getTypeStoreSize()` is fine for the `memcpy` I think. But since you get it just before…
				return true;
				}

				void FunctionLifter::runOnLoad(LoadInst *LI) {
				if (!LI->isSimple()) {
				return;
				hfinkelUnsubmitted Not Done Reply Inline Actions Don't need { } if there is only one conditional statement. Same comment applies to several places below. hfinkel: Don't need { } if there is only one conditional statement. Same comment applies to several…
				}

				Type *T = LI->getType();
				if (!T->isAggregateType()) {
				return;
				}

				unsigned Size = DL.getTypeStoreSize(T); // getTypeAllocSize ?
				unsigned Align = LI->getAlignment();

				AllocaInst* A = AtEntry.CreateAlloca(T, nullptr,
				LI->getName() + ".lifted");
				A->setAlignment(Align);

				IRBuilder<> Builder(LI);
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions What about refactoring with an `AggregateLifterPass` that creates a transient object `AggregateLifter` which is not a pass, takes a reference to a Function and keep it as a member, and initialize a DataLayout reference member. (and make `liftStore` a private member) This pattern avoid having to dereference multiple pointer all over the place to get back to the parent Function and the DataLayout. Also having less state in the Pass itself seems cleaner to me in general. I see the pass as a wrapper for the transformation only (single responsibility). I also believe it makes it easier to reuse, for instance in the new PassManager interface. mehdi_amini: What about refactoring with an `AggregateLifterPass` that creates a transient object…
				deadalnixAuthorUnsubmitted Not Done Reply Inline Actions Sounds good. deadalnix: Sounds good.
				majnemerUnsubmitted Not Done Reply Inline Actions I imagine we'd want to run the utility from within CGP so that frontends can rely on CodeGen succeeding in -O0 situations. majnemer: I imagine we'd want to run the utility from within CGP so that frontends can rely on CodeGen…
				deadalnixAuthorUnsubmitted Not Done Reply Inline Actions I'm not sure. This pass most likely make thing worse by itself. Without getting other passes to optimize the result, I'm not sure this is worth doing. Maybe it could be integrated to SROA itself. deadalnix: I'm not sure. This pass most likely make thing worse by itself. Without getting other passes to…
				Builder.CreateMemCpy(A, LI->getPointerOperand(), Size, Align);
				hfinkelUnsubmitted Not Done Reply Inline Actions Looks like we're dropping 'volatile' and any associated metadata here when we convert these loads. hfinkel: Looks like we're dropping 'volatile' and any associated metadata here when we convert these…
				deadalnixAuthorUnsubmitted Not Done Reply Inline Actions Aren't volatile not considered simple ? In which case, the transoformation shouldn't happen on them. deadalnix: Aren't volatile not considered simple ? In which case, the transoformation shouldn't happen on…

				InstrsToErase.push_back(LI);
				FixupWorklist.insert(FixupPair(LI, A));
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Smaller is not a correctness issue right? I think larger might be more problematic. (Also at this point you might be storing somewhere else than at the beginning of the alloca, processing a GEP) mehdi_amini: Smaller is not a correctness issue right? I think larger might be more problematic. (Also at…
				deadalnixAuthorUnsubmitted Not Done Reply Inline Actions Yeah the comment is wrong.< Also, I'm not sure what if, that is exactly why this comment is here. Thinking more about it, I think there may be a correctness concern here, I have to think more about this. deadalnix: Yeah the comment is wrong.< Also, I'm not sure what if, that is exactly why this comment is…
				}

				bool FunctionLifter::liftStore(StoreInst SI, Value A) {
				if (!SI->isSimple()) {
				return false;
				}

				Type *T = SI->getValueOperand()->getType();
				unsigned Size = DL.getTypeStoreSize(T); // getTypeAllocSize ?
				// What if this is smaller than alloca's align ?
				unsigned Align = SI->getAlignment();

				IRBuilder<> Builder(SI);
				Builder.CreateMemCpy(SI->getPointerOperand(), A, Size, Align);

				return true;
				}

				bool FunctionLifter::liftExtractValue(ExtractValueInst EVI, Value A) {
				IRBuilder<> Builder(EVI);

				SmallVector<Value*, 8> IdxList;
				IdxList.push_back(Builder.getInt32(0));

				for (auto Idx : EVI->getIndices()) {
				IdxList.push_back(Builder.getInt32(Idx));
				}

				Type *AT = EVI->getAggregateOperand()->getType();
				Value *GEP = Builder.CreateInBoundsGEP(AT, A, IdxList,
				EVI->getName() + ".lifted");
				Type *T = EVI->getType();
				if (T->isAggregateType()) {
				FixupWorklist.insert(FixupPair(EVI, GEP));
				return true;
				}

				LoadInst *LI = Builder.CreateLoad(GEP, EVI->getName());
				EVI->replaceAllUsesWith(LI);

				return true;
				}

				void FunctionLifter::liftAggregate(Instruction I, Value A) {
				Type *T = I->getType();

				DEBUG(dbgs() << "T: " << *T << "\n");
				assert(T->isAggregateType() && "T is expected to be an aggregate");

				DEBUG(dbgs() << "Lifting " << I << " into " << A << "\n");

				LoadInst *LI = nullptr;

				for (User *U : I->users()) {
				StoreInst* SI = dyn_cast<StoreInst>(U);
				if (SI && liftStore(SI, A)) {
				InstrsToErase.push_back(SI);
				continue;
				}

				ExtractValueInst *EVI = dyn_cast<ExtractValueInst>(U);
				if (EVI && liftExtractValue(EVI, A)) {
				InstrsToErase.push_back(EVI);
				continue;
				}

				// No luck, fallback on loading the aggregate and hope SROA knows better.
				if (LI == nullptr) {
				IRBuilder<> Builder(I);
				LI = Builder.CreateLoad(A, I->getName());
				}

				U->replaceUsesOfWith(I, LI);
				}
				}

				void AggregateLifter::getAnalysisUsage(AnalysisUsage &AU) const {
				AU.setPreservesCFG();
				}

				char AggregateLifter::ID = 0;

				FunctionPass *llvm::createAggregateLifterPass() {
				return new AggregateLifter();
				}

				INITIALIZE_PASS(AggregateLifter, "aggregate-lifter",
				"Lift aggregate store/load into alloca.",
				false, false)

lib/Transforms/Scalar/CMakeLists.txt

	add_llvm_library(LLVMScalarOpts			add_llvm_library(LLVMScalarOpts
	ADCE.cpp			ADCE.cpp
				AggregateLifter.cpp
	AlignmentFromAssumptions.cpp			AlignmentFromAssumptions.cpp
	BDCE.cpp			BDCE.cpp
	ConstantHoisting.cpp			ConstantHoisting.cpp
	ConstantProp.cpp			ConstantProp.cpp
	CorrelatedValuePropagation.cpp			CorrelatedValuePropagation.cpp
	DCE.cpp			DCE.cpp
	DeadStoreElimination.cpp			DeadStoreElimination.cpp
	EarlyCSE.cpp			EarlyCSE.cpp
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

lib/Transforms/Scalar/Scalar.cpp

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeSeparateConstOffsetFromGEPPass(Registry);		initializeSeparateConstOffsetFromGEPPass(Registry);
initializeSpeculativeExecutionPass(Registry);		initializeSpeculativeExecutionPass(Registry);
initializeStraightLineStrengthReducePass(Registry);		initializeStraightLineStrengthReducePass(Registry);
initializeLoadCombinePass(Registry);		initializeLoadCombinePass(Registry);
initializePlaceBackedgeSafepointsImplPass(Registry);		initializePlaceBackedgeSafepointsImplPass(Registry);
initializePlaceSafepointsPass(Registry);		initializePlaceSafepointsPass(Registry);
initializeFloat2IntPass(Registry);		initializeFloat2IntPass(Registry);
initializeLoopDistributePass(Registry);		initializeLoopDistributePass(Registry);
		initializeAggregateLifterPass(Registry);
}		}

void LLVMInitializeScalarOpts(LLVMPassRegistryRef R) {		void LLVMInitializeScalarOpts(LLVMPassRegistryRef R) {
initializeScalarOpts(*unwrap(R));		initializeScalarOpts(*unwrap(R));
}		}

void LLVMAddAggressiveDCEPass(LLVMPassManagerRef PM) {		void LLVMAddAggressiveDCEPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createAggressiveDCEPass());		unwrap(PM)->add(createAggressiveDCEPass());
▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

test/Transforms/AggregateLifter/extractvalue.ll

This file was added.

				; RUN: opt -aggregate-lifter -S < %s \| FileCheck %s
				; RUN: opt -O3 -S < %s \| FileCheck %s -check-prefix=CHECK-OPT

				target datalayout = "e-i64:64-f80:128-n8:16:32:64"
				target triple = "x86_64-unknown-linux-gnu"

				%A = type { i8*, i64 }
				%B = type { %A, i64 }

				@a0 = global %A zeroinitializer
				@a1 = global %A zeroinitializer
				@b0 = global %B zeroinitializer

				; CHECK-LABEL: @return(
				; CHECK-OPT-LABEL: @return(
				define i64 @return() {
				%1 = load %A, %A* @a0
				%2 = extractvalue %A %1, 1
				ret i64 %2
				; CHECK: alloca %A
				; CHECK: call void @llvm.memcpy.p0i8.p0i8.i64
				; CHECK-NEXT: getelementptr inbounds %A, %A* %.lifted, i32 0, i32 1
				; CHECK-NEXT: load i64, i64*
				; CHECK-NEXT: ret i64
				; CHECK-OPT-NEXT: load i64, i64* getelementptr inbounds
				; CHECK-OPT-NEXT: ret i64
				}

				; CHECK-LABEL: @nested(
				; CHECK-OPT-LABEL: @nested(
				define i64 @nested() {
				%1 = load %B, %B* @b0
				%2 = extractvalue %B %1, 0
				%3 = extractvalue %A %2, 1
				ret i64 %3
				; CHECK: alloca %B
				; CHECK: call void @llvm.memcpy.p0i8.p0i8.i64
				; CHECK-NEXT: getelementptr inbounds %B, %B*
				; CHECK-NEXT: getelementptr inbounds %A, %A*
				; CHECK-OPT-NEXT: load i64, i64* getelementptr inbounds
				; CHECK-OPT-NEXT: ret i64
				}

test/Transforms/AggregateLifter/store.ll

This file was added.

				; RUN: opt -aggregate-lifter -S < %s \| FileCheck %s
				; RUN: opt -O3 -S < %s \| FileCheck %s -check-prefix=CHECK-OPT

				target datalayout = "e-i64:64-f80:128-n8:16:32:64"
				target triple = "x86_64-unknown-linux-gnu"

				%A = type { i8*, i64 }
				%B = type { %A, i64 }

				@a0 = global %A zeroinitializer
				@a1 = global %A zeroinitializer
				@b0 = global %B zeroinitializer

				; CHECK-LABEL: @forward(
				; CHECK-OPT-LABEL: @forward(
				define void @forward() {
				%1 = load %A, %A* @a0
				store %A %1, %A* @a1
				ret void
				; CHECK: alloca %A
				; CHECK: call void @llvm.memcpy.p0i8.p0i8.i64
				; CHECK: call void @llvm.memcpy.p0i8.p0i8.i64
				; CHECK-NEXT: ret void
				; CHECK-OPT-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64
				; CHECK-OPT-NEXT: ret void
				}

				; CHECK-LABEL: @forwardelement(
				; CHECK-OPT-LABEL: @forwardelement(
				define void @forwardelement() {
				%1 = load %B, %B* @b0
				%2 = extractvalue %B %1, 0
				store %A %2, %A* @a0
				ret void
				; CHECK: alloca %B
				; CHECK: call void @llvm.memcpy.p0i8.p0i8.i64
				; CHECK-NEXT: getelementptr inbounds %B, %B*
				; CHECK: call void @llvm.memcpy.p0i8.p0i8.i64
				; CHECK-NEXT: ret void
				; CHECK-OPT-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64
				; CHECK-OPT-NEXT: ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

Add a pass to lift aggregate into allocas so SROA can get rid of them.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 35305

include/llvm/InitializePasses.h

include/llvm/Transforms/Scalar.h

lib/Transforms/IPO/PassManagerBuilder.cpp

lib/Transforms/Scalar/AggregateLifter.cpp

lib/Transforms/Scalar/CMakeLists.txt

lib/Transforms/Scalar/Scalar.cpp

test/Transforms/AggregateLifter/extractvalue.ll

test/Transforms/AggregateLifter/store.ll

Add a pass to lift aggregate into allocas so SROA can get rid of them.
AbandonedPublic