This is an archive of the discontinued LLVM Phabricator instance.

[GlobalMerge] Look at uses to create smaller global sets.
ClosedPublic

Authored by ab on Mar 4 2015, 5:39 PM.

Download Raw Diff

Details

Reviewers: None
Commits: rG279e3ee954fd: [GlobalMerge] Look at uses to create smaller global sets.
rL235249: [GlobalMerge] Look at uses to create smaller global sets.

Summary

As discussed in the RFC, I've been trying out a bunch of improvements to GlobalMerge. Whenever you try to improve it, it ends up being too conservative for SPEC, but this one was good enough for SPEC and some other "modern" code I've looked at.

The functionality is hidden under two flags, both enabled by default in the patch:
-global-merge-group-by-use: instead of merging everything together (current behavior, available by passing =false), look at the users of GlobalVariables, and try to group them by function, to create sets of globals used together.
-global-merge-ignore-single-use: going further, keep merging everything together *except* globals that are only ever used alone, that is, for which it is obviously non-profitable to merge them with others.

The first flag by itself seems to create some regressions in benchmarks, but ignoring globals used alone is just always better. I measured on AArch64 -O3+-flto, and I got, depending on what you look at, -0.01 to -0.17% geomean improvement:
The overwhelming majority of the changes is just "adrp+add+add" turning into "adrp+add" (all this is without LOHs). These are the biggest improvements:

SingleSource/Benchmarks/Adobe-C++/stepanov_vector                           6.3374   6.3139   -0.37%  0.0001
External/SPEC/CINT2006/400_perlbench/400_perlbench                          12.0686  11.9385  -1.08%  0.0029
External/Povray/povray                                                      4.2761   4.1903   -2.01%  0.0002
External/SPEC/CINT2006/456_hmmer/456_hmmer                                  4.9376   4.8260   -2.26%  0.0002
External/SPEC/CINT2000/253_perlbmk/253_perlbmk                              11.2362  10.7393  -4.42%  0.0045

There are some regressions as well, with the exact same changes (one less add), so I'm blaming those on bad luck: code and/or data alignment, etc.. (yeah I know, bias, and all that; and yes, the numbers are pretty stable and reproducible.)

The patch itself is pretty simple: look at all uses of globals, create sets of globals used together in a function, and finally pick the "biggest" (very braindead: usage count * size of the set) such sets.
I'll add tests if I stumble upon other interesting cases.

Diff Detail

Event Timeline

ab updated this revision to Diff 21247.Mar 4 2015, 5:39 PM

ab retitled this revision from to [GlobalMerge] Look at uses to create smaller global sets..

ab updated this object.

ab edited the test plan for this revision. (Show Details)

ab added subscribers: Unknown Object (MLST), qcolombet, Jiangning.

Herald added a subscriber: aemerson. · View Herald TranscriptMar 4 2015, 5:39 PM

Hi Ahmed,

I think this is starting to look pretty good. A request for a nice big comment with the algorithm as a block comment somewhere in the file? Makes it a bit more hard to suss out what you're doing in some cases. One silly comment inline as well.

Thanks and sorry for the delay!

-eric

lib/CodeGen/GlobalMerge.cpp
191	I find the "Enable" here actively hurts my readability on this code, seems to work better without, but up to you.

You're right Eric, the "Enable" was redundant, removed. I also added a couple of block comments explaining 1) why we don't just merge everything and call it a day, and 2) an overview of the algorithm used to look at the uses

Thanks!

-Ahmed

One nit, but at this point it looks good to me. I don't think I missed anything terrible algorithmically, but no one else has stepped up either.

-eric

lib/CodeGen/GlobalMerge.cpp
267	Might be nice to comment the loops as to what they're each doing.

Closed by commit rL235249: [GlobalMerge] Look at uses to create smaller global sets. (authored by ab). · Explain WhyApr 17 2015, 6:25 PM

This revision was automatically updated to reflect the committed changes.

r235249, thanks Eric!

I should note I had to update a few of the existing ARM testcases (I tested on AArch64 only), but none of them were really testing the merging heuristics, so I think it's fine.

-Ahmed

Revision Contents

Path

Size

lib/

CodeGen/

GlobalMerge.cpp

247 lines

test/

CodeGen/

AArch64/

global-merge-group-by-use.ll

92 lines

global-merge-ignore-single-use.ll

61 lines

Diff 23357

lib/CodeGen/GlobalMerge.cpp

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
// and in ARM code this becomes:		// and in ARM code this becomes:
//		//
// ldr r0, [r5, #40]		// ldr r0, [r5, #40]
// ldr r1, [r5, #80]		// ldr r1, [r5, #80]
// mul r0, r1, r0		// mul r0, r1, r0
// str r0, [r5], #4		// str r0, [r5], #4
//		//
// note that we saved 2 registers here almostly "for free".		// note that we saved 2 registers here almostly "for free".
		//
		// However, merging globals can have tradeoffs:
		// - it confuses debuggers, tools, and users
		// - it makes linker optimizations less useful (order files, LOHs, ...)
		// - it forces usage of indexed addressing (which isn't necessarily "free")
		// - it can increase register pressure when the uses are disparate enough.
		//
		// We use heuristics to discover the best global grouping we can (cf cl::opts).
// ===---------------------------------------------------------------------===//		// ===---------------------------------------------------------------------===//

#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
		#include "llvm/ADT/DenseMap.h"
		#include "llvm/ADT/SmallBitVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalVariable.h"		#include "llvm/IR/GlobalVariable.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Target/TargetLowering.h"		#include "llvm/Target/TargetLowering.h"
#include "llvm/Target/TargetLoweringObjectFile.h"		#include "llvm/Target/TargetLoweringObjectFile.h"
#include "llvm/Target/TargetSubtargetInfo.h"		#include "llvm/Target/TargetSubtargetInfo.h"
		#include <algorithm>
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "global-merge"		#define DEBUG_TYPE "global-merge"

static cl::opt<bool>		static cl::opt<bool>
EnableGlobalMerge("enable-global-merge", cl::Hidden,		EnableGlobalMerge("enable-global-merge", cl::Hidden,
cl::desc("Enable global merge pass"),		cl::desc("Enable global merge pass"),
cl::init(true));		cl::init(true));

		static cl::opt<bool> GlobalMergeGroupByUse(
		"global-merge-group-by-use", cl::Hidden,
		cl::desc("Improve global merge pass to look at uses"), cl::init(true));

		static cl::opt<bool> GlobalMergeIgnoreSingleUse(
		"global-merge-ignore-single-use", cl::Hidden,
		cl::desc("Improve global merge pass to ignore globals only used alone"),
		cl::init(true));

static cl::opt<bool>		static cl::opt<bool>
EnableGlobalMergeOnConst("global-merge-on-const", cl::Hidden,		EnableGlobalMergeOnConst("global-merge-on-const", cl::Hidden,
cl::desc("Enable global merge pass on constants"),		cl::desc("Enable global merge pass on constants"),
cl::init(false));		cl::init(false));

// FIXME: this could be a transitional option, and we probably need to remove		// FIXME: this could be a transitional option, and we probably need to remove
// it if only we are sure this optimization could always benefit all targets.		// it if only we are sure this optimization could always benefit all targets.
static cl::opt<bool>		static cl::opt<bool>
Show All 9 Lines	class GlobalMerge : public FunctionPass {
// FIXME: Infer the maximum possible offset depending on the actual users		// FIXME: Infer the maximum possible offset depending on the actual users
// (these max offsets are different for the users inside Thumb or ARM		// (these max offsets are different for the users inside Thumb or ARM
// functions), see the code that passes in the offset in the ARM backend		// functions), see the code that passes in the offset in the ARM backend
// for more information.		// for more information.
unsigned MaxOffset;		unsigned MaxOffset;

bool doMerge(SmallVectorImpl<GlobalVariable*> &Globals,		bool doMerge(SmallVectorImpl<GlobalVariable*> &Globals,
Module &M, bool isConst, unsigned AddrSpace) const;		Module &M, bool isConst, unsigned AddrSpace) const;
		/// \brief Merge everything in \p Globals for which the corresponding bit
		/// in \p GlobalSet is set.
		bool doMerge(SmallVectorImpl<GlobalVariable *> &Globals,
		const BitVector &GlobalSet, Module &M, bool isConst,
		unsigned AddrSpace) const;

/// \brief Check if the given variable has been identified as must keep		/// \brief Check if the given variable has been identified as must keep
/// \pre setMustKeepGlobalVariables must have been called on the Module that		/// \pre setMustKeepGlobalVariables must have been called on the Module that
/// contains GV		/// contains GV
bool isMustKeepGlobalVariable(const GlobalVariable *GV) const {		bool isMustKeepGlobalVariable(const GlobalVariable *GV) const {
return MustKeepGlobalVariables.count(GV);		return MustKeepGlobalVariables.count(GV);
}		}

▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	bool GlobalMerge::doMerge(SmallVectorImpl<GlobalVariable*> &Globals,
std::stable_sort(Globals.begin(), Globals.end(),		std::stable_sort(Globals.begin(), Globals.end(),
[this](const GlobalVariable GV1, const GlobalVariable GV2) {		[this](const GlobalVariable GV1, const GlobalVariable GV2) {
Type *Ty1 = cast<PointerType>(GV1->getType())->getElementType();		Type *Ty1 = cast<PointerType>(GV1->getType())->getElementType();
Type *Ty2 = cast<PointerType>(GV2->getType())->getElementType();		Type *Ty2 = cast<PointerType>(GV2->getType())->getElementType();

return (DL->getTypeAllocSize(Ty1) < DL->getTypeAllocSize(Ty2));		return (DL->getTypeAllocSize(Ty1) < DL->getTypeAllocSize(Ty2));
});		});

		// If we want to just blindly group all globals together, do so.
		if (!GlobalMergeGroupByUse) {
		echristoUnsubmitted Not Done Reply Inline Actions I find the "Enable" here actively hurts my readability on this code, seems to work better without, but up to you. echristo: I find the "Enable" here actively hurts my readability on this code, seems to work better…
		BitVector AllGlobals(Globals.size());
		AllGlobals.set();
		return doMerge(Globals, AllGlobals, M, isConst, AddrSpace);
		}

		// If we want to be smarter, look at all uses of each global, to try to
		// discover all sets of globals used together, and how many times each of
		// these sets occured.
		//
		// Keep this reasonably efficient, by having an append-only list of all sets
		// discovered so far (UsedGlobalSet), and mapping each "together-ness" unit of
		// code (currently, a Function) to the set of globals seen so far that are
		// used together in that unit (GlobalUsesByFunction).
		//
		// When we look at the Nth global, we now that any new set is either:
		// - the singleton set {N}, containing this global only, or
		// - the union of {N} and a previously-discovered set, containing some
		// combination of the previous N-1 globals.
		// Using that knowledge, when looking at the Nth global, we can keep:
		// - a reference to the singleton set {N} (CurGVOnlySetIdx)
		// - a list mapping each previous set to its union with {N} (EncounteredUGS),
		// if it actually occurs.

		// We keep track of the sets of globals used together "close enough".
		struct UsedGlobalSet {
		UsedGlobalSet(size_t Size) : Globals(Size), UsageCount(1) {}
		BitVector Globals;
		unsigned UsageCount;
		};

		// Each set is unique in UsedGlobalSets.
		std::vector<UsedGlobalSet> UsedGlobalSets;

		// Avoid repeating the create-global-set pattern.
		auto CreateGlobalSet = [&]() -> UsedGlobalSet & {
		UsedGlobalSets.emplace_back(Globals.size());
		return UsedGlobalSets.back();
		};

		// The first set is the empty set.
		CreateGlobalSet().UsageCount = 0;

		// We define "close enough" to be "in the same function".
		// FIXME: Grouping uses by function is way too aggressive, so we should have
		// a better metric for distance between uses.
		// The obvious alternative would be to group by BasicBlock, but that's in
		// turn too conservative..
		// Anything in between wouldn't be trivial to compute, so just stick with
		// per-function grouping.

		// The value type is an index into UsedGlobalSets.
		// The default (0) conveniently points to the empty set.
		DenseMap<Function , size_t /UsedGlobalSetIdx*/> GlobalUsesByFunction;

		// Now, look at each merge-eligible global in turn.

		// Keep track of the sets we already encountered to which we added the
		// current global.
		// Each element matches the same-index element in UsedGlobalSets.
		// This lets us efficiently tell whether a set has already been expanded to
		// include the current global.
		std::vector<size_t> EncounteredUGS;

		for (size_t GI = 0, GE = Globals.size(); GI != GE; ++GI) {
		GlobalVariable *GV = Globals[GI];

		// Reset the encountered sets for this global..
		std::fill(EncounteredUGS.begin(), EncounteredUGS.end(), 0);
		// ..and grow it in case we created new sets for the previous global.
		EncounteredUGS.resize(UsedGlobalSets.size());

		// We might need to create a set that only consists of the current global.
		// Keep track of its index into UsedGlobalSets.
		size_t CurGVOnlySetIdx = 0;

		for (auto &U : GV->uses()) {
		echristoUnsubmitted Not Done Reply Inline Actions Might be nice to comment the loops as to what they're each doing. echristo: Might be nice to comment the loops as to what they're each doing.
		// This Use might be a ConstantExpr. We're interested in instruction
		// users, so look at ConstantExpr users as well.
		Use UI, UE;
		if (ConstantExpr *CE = dyn_cast<ConstantExpr>(U.getUser())) {
		UI = &*CE->use_begin();
		UE = nullptr;
		} else if (isa<Instruction>(U.getUser())) {
		UI = &U;
		UE = UI->getNext();
		} else {
		continue;
		}

		for (; UI != UE; UI = UI->getNext()) {
		Instruction *I = dyn_cast<Instruction>(UI->getUser());
		if (!I)
		continue;

		Function *ParentFn = I->getParent()->getParent();
		size_t UGSIdx = GlobalUsesByFunction[ParentFn];

		// If this is the first global the basic block uses, map it to the set
		// consisting of this global only.
		if (!UGSIdx) {
		// If that set doesn't exist yet, create it.
		if (!CurGVOnlySetIdx) {
		CurGVOnlySetIdx = UsedGlobalSets.size();
		CreateGlobalSet().Globals.set(GI);
		} else {
		++UsedGlobalSets[CurGVOnlySetIdx].UsageCount;
		}

		GlobalUsesByFunction[ParentFn] = CurGVOnlySetIdx;
		continue;
		}

		// If we already encountered this BB, just increment the counter.
		if (UsedGlobalSets[UGSIdx].Globals.test(GI)) {
		++UsedGlobalSets[UGSIdx].UsageCount;
		continue;
		}

		// If not, the previous set wasn't actually used in this function.
		--UsedGlobalSets[UGSIdx].UsageCount;

		// If we already expanded the previous set to include this global, just
		// reuse that expanded set.
		if (size_t ExpandedIdx = EncounteredUGS[UGSIdx]) {
		++UsedGlobalSets[ExpandedIdx].UsageCount;
		GlobalUsesByFunction[ParentFn] = ExpandedIdx;
		continue;
		}

		// If not, create a new set consisting of the union of the previous set
		// and this global. Mark it as encountered, so we can reuse it later.
		GlobalUsesByFunction[ParentFn] = EncounteredUGS[UGSIdx] =
		UsedGlobalSets.size();

		UsedGlobalSet &NewUGS = CreateGlobalSet();
		NewUGS.Globals.set(GI);
		NewUGS.Globals \|= UsedGlobalSets[UGSIdx].Globals;
		}
		}
		}

		// Now we found a bunch of sets of globals used together. We accumulated
		// the number of times we encountered the sets (i.e., the number of blocks
		// that use that exact set of globals).
		//
		// Multiply that by the size of the set to give us a crude profitability
		// metric.
		std::sort(UsedGlobalSets.begin(), UsedGlobalSets.end(),
		[](const UsedGlobalSet &UGS1, const UsedGlobalSet &UGS2) {
		return UGS1.Globals.count() * UGS1.UsageCount <
		UGS2.Globals.count() * UGS2.UsageCount;
		});

		// We can choose to merge all globals together, but ignore globals never used
		// with another global. This catches the obviously non-profitable cases of
		// having a single global, but is aggressive enough for any other case.
		if (GlobalMergeIgnoreSingleUse) {
		BitVector AllGlobals(Globals.size());
		for (size_t i = 0, e = UsedGlobalSets.size(); i != e; ++i) {
		const UsedGlobalSet &UGS = UsedGlobalSets[e - i - 1];
		if (UGS.UsageCount == 0)
		continue;
		if (UGS.Globals.count() > 1)
		AllGlobals \|= UGS.Globals;
		}
		return doMerge(Globals, AllGlobals, M, isConst, AddrSpace);
		}

		// Starting from the sets with the best (=biggest) profitability, find a
		// good combination.
		// The ideal (i.e., expensive) solution can only be found by trying all
		// combinations, looking for the one with the best profitability.
		// Don't be smart about it, and just pick the first compatible combination,
		// starting with the sets with the best profitability.
		BitVector PickedGlobals(Globals.size());
		bool Changed = false;

		for (size_t i = 0, e = UsedGlobalSets.size(); i != e; ++i) {
		const UsedGlobalSet &UGS = UsedGlobalSets[e - i - 1];
		if (UGS.UsageCount == 0)
		continue;
		if (PickedGlobals.anyCommon(UGS.Globals))
		continue;
		PickedGlobals \|= UGS.Globals;
		// If the set only contains one global, there's no point in merging.
		// Ignore the global for inclusion in other sets though, so keep it in
		// PickedGlobals.
		if (UGS.Globals.count() < 2)
		continue;
		Changed \|= doMerge(Globals, UGS.Globals, M, isConst, AddrSpace);
		}

		return Changed;
		}

		bool GlobalMerge::doMerge(SmallVectorImpl<GlobalVariable *> &Globals,
		const BitVector &GlobalSet, Module &M, bool isConst,
		unsigned AddrSpace) const {

Type *Int32Ty = Type::getInt32Ty(M.getContext());		Type *Int32Ty = Type::getInt32Ty(M.getContext());

assert(Globals.size() > 1);		assert(Globals.size() > 1);

// FIXME: This simple solution merges globals all together as maximum as		DEBUG(dbgs() << " Trying to merge set, starts with #"
// possible. However, with this solution it would be hard to remove dead		<< GlobalSet.find_first() << "\n");
// global symbols at link-time. An alternative solution could be checking
// global symbols references function by function, and make the symbols		ssize_t i = GlobalSet.find_first();
// being referred in the same function merged and we would probably need		while (i != -1) {
// to introduce heuristic algorithm to solve the merge conflict from		ssize_t j = 0;
// different functions.
for (size_t i = 0, e = Globals.size(); i != e; ) {
size_t j = 0;
uint64_t MergedSize = 0;		uint64_t MergedSize = 0;
std::vector<Type*> Tys;		std::vector<Type*> Tys;
std::vector<Constant*> Inits;		std::vector<Constant*> Inits;

bool HasExternal = false;		bool HasExternal = false;
GlobalVariable *TheFirstExternal = 0;		GlobalVariable *TheFirstExternal = 0;
for (j = i; j != e; ++j) {		for (j = i; j != -1; j = GlobalSet.find_next(j)) {
Type *Ty = Globals[j]->getType()->getElementType();		Type *Ty = Globals[j]->getType()->getElementType();
MergedSize += DL->getTypeAllocSize(Ty);		MergedSize += DL->getTypeAllocSize(Ty);
if (MergedSize > MaxOffset) {		if (MergedSize > MaxOffset) {
break;		break;
}		}
Tys.push_back(Ty);		Tys.push_back(Ty);
Inits.push_back(Globals[j]->getInitializer());		Inits.push_back(Globals[j]->getInitializer());

Show All 16 Lines	while (i != -1) {
// first variable merged as the suffix of global symbol name. This would		// first variable merged as the suffix of global symbol name. This would
// be able to avoid the link-time naming conflict for globalm symbols.		// be able to avoid the link-time naming conflict for globalm symbols.
GlobalVariable *MergedGV = new GlobalVariable(		GlobalVariable *MergedGV = new GlobalVariable(
M, MergedTy, isConst, Linkage, MergedInit,		M, MergedTy, isConst, Linkage, MergedInit,
HasExternal ? "_MergedGlobals_" + TheFirstExternal->getName()		HasExternal ? "_MergedGlobals_" + TheFirstExternal->getName()
: "_MergedGlobals",		: "_MergedGlobals",
nullptr, GlobalVariable::NotThreadLocal, AddrSpace);		nullptr, GlobalVariable::NotThreadLocal, AddrSpace);

for (size_t k = i; k < j; ++k) {		for (ssize_t k = i, idx = 0; k != j; k = GlobalSet.find_next(k)) {
GlobalValue::LinkageTypes Linkage = Globals[k]->getLinkage();		GlobalValue::LinkageTypes Linkage = Globals[k]->getLinkage();
std::string Name = Globals[k]->getName();		std::string Name = Globals[k]->getName();

Constant *Idx[2] = {		Constant *Idx[2] = {
ConstantInt::get(Int32Ty, 0),		ConstantInt::get(Int32Ty, 0),
ConstantInt::get(Int32Ty, k-i)		ConstantInt::get(Int32Ty, idx++)
};		};
Constant *GEP = ConstantExpr::getInBoundsGetElementPtr(MergedGV, Idx);		Constant *GEP = ConstantExpr::getInBoundsGetElementPtr(MergedGV, Idx);
Globals[k]->replaceAllUsesWith(GEP);		Globals[k]->replaceAllUsesWith(GEP);
Globals[k]->eraseFromParent();		Globals[k]->eraseFromParent();

if (Linkage != GlobalValue::InternalLinkage) {		if (Linkage != GlobalValue::InternalLinkage) {
// Generate a new alias...		// Generate a new alias...
auto *PTy = cast<PointerType>(GEP->getType());		auto *PTy = cast<PointerType>(GEP->getType());
▲ Show 20 Lines • Show All 130 Lines • Show Last 20 Lines

test/CodeGen/AArch64/global-merge-group-by-use.ll

				; RUN: llc %s -mtriple=aarch64-apple-ios -enable-global-merge -global-merge-group-by-use=true -global-merge-ignore-single-use=false -asm-verbose=false -o - \| FileCheck %s

				; We assume that globals of the same size aren't reordered inside a set.

				; Check that we create two MergedGlobal instances for two functions using
				; disjoint sets of globals

				@m1 = internal global i32 0, align 4
				@n1 = internal global i32 0, align 4

				; CHECK-LABEL: f1:
				define void @f1(i32 %a1, i32 %a2) #0 {
				; CHECK-NEXT: adrp x8, [[SET1:__MergedGlobals[0-9]*]]@PAGE
				; CHECK-NEXT: add x8, x8, [[SET1]]@PAGEOFF
				; CHECK-NEXT: stp w0, w1, [x8]
				; CHECK-NEXT: ret
				store i32 %a1, i32* @m1, align 4
				store i32 %a2, i32* @n1, align 4
				ret void
				}

				@m2 = internal global i32 0, align 4
				@n2 = internal global i32 0, align 4
				@o2 = internal global i32 0, align 4

				; CHECK-LABEL: f2:
				define void @f2(i32 %a1, i32 %a2, i32 %a3) #0 {
				; CHECK-NEXT: adrp x8, [[SET2:__MergedGlobals[0-9]*]]@PAGE
				; CHECK-NEXT: add x8, x8, [[SET2]]@PAGEOFF
				; CHECK-NEXT: stp w0, w1, [x8]
				; CHECK-NEXT: str w2, [x8, #8]
				; CHECK-NEXT: ret
				store i32 %a1, i32* @m2, align 4
				store i32 %a2, i32* @n2, align 4
				store i32 %a3, i32* @o2, align 4
				ret void
				}

				; Sanity-check (don't worry about cost models) that we pick the biggest subset
				; of all global used "together" directly or indirectly. Here, that means
				; merging n3, m4, and n4 together, but ignoring m3.

				@m3 = internal global i32 0, align 4
				@n3 = internal global i32 0, align 4

				; CHECK-LABEL: f3:
				define void @f3(i32 %a1, i32 %a2) #0 {
				; CHECK-NEXT: adrp x8, _m3@PAGE
				; CHECK-NEXT: adrp x9, [[SET3:__MergedGlobals[0-9]*]]@PAGE
				; CHECK-NEXT: str w0, [x8, _m3@PAGEOFF]
				; CHECK-NEXT: str w1, [x9, [[SET3]]@PAGEOFF]
				; CHECK-NEXT: ret
				store i32 %a1, i32* @m3, align 4
				store i32 %a2, i32* @n3, align 4
				ret void
				}

				@m4 = internal global i32 0, align 4
				@n4 = internal global i32 0, align 4

				; CHECK-LABEL: f4:
				define void @f4(i32 %a1, i32 %a2, i32 %a3) #0 {
				; CHECK-NEXT: adrp x8, [[SET3]]@PAGE
				; CHECK-NEXT: add x8, x8, [[SET3]]@PAGEOFF
				; CHECK-NEXT: stp w0, w1, [x8, #4]
				; CHECK-NEXT: str w2, [x8]
				; CHECK-NEXT: ret
				store i32 %a1, i32* @m4, align 4
				store i32 %a2, i32* @n4, align 4
				store i32 %a3, i32* @n3, align 4
				ret void
				}

				; Finally, check that we don't do anything with one-element global sets.
				@o5 = internal global i32 0, align 4

				; CHECK-LABEL: f5:
				define void @f5(i32 %a1) #0 {
				; CHECK-NEXT: adrp x8, _o5@PAGE
				; CHECK-NEXT: str w0, [x8, _o5@PAGEOFF]
				; CHECK-NEXT: ret
				store i32 %a1, i32* @o5, align 4
				ret void
				}

				; CHECK-DAG: .zerofill __DATA,__bss,_o5,4,2

				; CHECK-DAG: .zerofill __DATA,__bss,[[SET1]],8,3
				; CHECK-DAG: .zerofill __DATA,__bss,[[SET2]],12,3
				; CHECK-DAG: .zerofill __DATA,__bss,[[SET3]],12,3

				attributes #0 = { nounwind }

test/CodeGen/AArch64/global-merge-ignore-single-use.ll

				; RUN: llc %s -mtriple=aarch64-apple-ios -enable-global-merge -global-merge-group-by-use=true -global-merge-ignore-single-use=true -asm-verbose=false -o - \| FileCheck %s

				; We assume that globals of the same size aren't reordered inside a set.

				@m1 = internal global i32 0, align 4
				@n1 = internal global i32 0, align 4
				@o1 = internal global i32 0, align 4

				; CHECK-LABEL: f1:
				define void @f1(i32 %a1, i32 %a2) #0 {
				; CHECK-NEXT: adrp x8, [[SET:__MergedGlobals]]@PAGE
				; CHECK-NEXT: add x8, x8, [[SET]]@PAGEOFF
				; CHECK-NEXT: stp w0, w1, [x8]
				; CHECK-NEXT: ret
				store i32 %a1, i32* @m1, align 4
				store i32 %a2, i32* @n1, align 4
				ret void
				}

				@m2 = internal global i32 0, align 4
				@n2 = internal global i32 0, align 4

				; CHECK-LABEL: f2:
				define void @f2(i32 %a1, i32 %a2, i32 %a3) #0 {
				; CHECK-NEXT: adrp x8, [[SET]]@PAGE
				; CHECK-NEXT: add x8, x8, [[SET]]@PAGEOFF
				; CHECK-NEXT: stp w0, w1, [x8]
				; CHECK-NEXT: str w2, [x8, #8]
				; CHECK-NEXT: ret
				store i32 %a1, i32* @m1, align 4
				store i32 %a2, i32* @n1, align 4
				store i32 %a3, i32* @o1, align 4
				ret void
				}

				; CHECK-LABEL: f3:
				define void @f3(i32 %a1, i32 %a2) #0 {
				; CHECK-NEXT: adrp x8, [[SET]]@PAGE
				; CHECK-NEXT: add x8, x8, [[SET]]@PAGEOFF
				; CHECK-NEXT: stp w0, w1, [x8, #12]
				; CHECK-NEXT: ret
				store i32 %a1, i32* @m2, align 4
				store i32 %a2, i32* @n2, align 4
				ret void
				}

				@o2 = internal global i32 0, align 4

				; CHECK-LABEL: f4:
				define void @f4(i32 %a1) #0 {
				; CHECK-NEXT: adrp x8, _o2@PAGE
				; CHECK-NEXT: str w0, [x8, _o2@PAGEOFF]
				; CHECK-NEXT: ret
				store i32 %a1, i32* @o2, align 4
				ret void
				}

				; CHECK-DAG: .zerofill __DATA,__bss,[[SET]],20,4
				; CHECK-DAG: .zerofill __DATA,__bss,_o2,4,2

				attributes #0 = { nounwind }