This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Align stack objects passed to memory intrinsics
ClosedPublic

Authored by john.brawn on Feb 26 2015, 6:11 AM.

Download Raw Diff

Details

Reviewers

aadg
hfinkel

Commits

rG0dbcd6544223: [ARM] Align stack objects passed to memory intrinsics
rL232627: [ARM] Align stack objects passed to memory intrinsics

Summary

Memcpy, and other memory intrinsics, typically tries to use LDM/STM if the source and target addresses are 4-byte aligned. In CodeGenPrepare look for calls to memory intrinsics and, if the object is on the stack, 4-byte align it if it's large enough that we expect that memcpy would want to use LDM/STM to copy it.

Diff Detail

Repository: rL LLVM

Event Timeline

john.brawn updated this revision to Diff 20755.Feb 26 2015, 6:11 AM

john.brawn retitled this revision from to [ARM] Align stack objects that may be memcpy'd.

john.brawn updated this object.

john.brawn edited the test plan for this revision. (Show Details)

john.brawn set the repository for this revision to rL LLVM.

john.brawn added a subscriber: Unknown Object (MLST).

Herald added a subscriber: aemerson. · View Herald TranscriptFeb 26 2015, 6:11 AM

Hi John,

Have you done some stack consumption analysis? I remember Arnaud trying to reduce the size of the stack and this patch seems to go against it.

I agree it's a good idea to align it at 4, but we need to be careful, especially in -Os/z.

cheers,
--renato

rengolin added a reviewer: aadg.Feb 26 2015, 7:40 AM

Zero change in stack consumption in any EEMBC benchmark (I haven't checked SPEC). Also by CodeGen time we don't know if we're compiling -Os/-Oz as we just have CodeGenOpt::Level.

In D7908#130548, @john.brawn wrote:

Zero change in stack consumption in any EEMBC benchmark (I haven't checked SPEC). Also by CodeGen time we don't know if we're compiling -Os/-Oz as we just have CodeGenOpt::Level.

No, the size optimization attribute is independent of CodeGenOpt::Level. You can get at it from CodeGen by querying:

MF.getFunction()->hasFnAttribute(Attribute::OptimizeForSize);

In D7908#130569, @hfinkel wrote:
In D7908#130548, @john.brawn wrote:

Zero change in stack consumption in any EEMBC benchmark (I haven't checked SPEC). Also by CodeGen time we don't know if we're compiling -Os/-Oz as we just have CodeGenOpt::Level.

No, the size optimization attribute is independent of CodeGenOpt::Level. You can get at it from CodeGen by querying:
MF.getFunction()->hasFnAttribute(Attribute::OptimizeForSize);

That having been said, code size and stack size, in this context, might be inversely related.

It might be better to implement this in CodeGenPrep. You could look directly for allocas that are used by memcpy intrinsics or as byval function argument parameters, and increase the alignment of only those directly.

Also worth noting, is that in getMemcpyLoadsAndStores in lib/CodeGen/SelectionDAG/SelectionDAG.cpp, there already is code that attempts to reset the alignment of a destination stack object when possible. Perhaps this should just be extended to do the same for the source.

In D7908#131187, @hfinkel wrote:

Also worth noting, is that in getMemcpyLoadsAndStores in lib/CodeGen/SelectionDAG/SelectionDAG.cpp, there already is code that attempts to reset the alignment of a destination stack object when possible. Perhaps this should just be extended to do the same for the source.

The goal is to align not just on direct calls to memcpy, but also calls to functions that then may call memcpy. It looks like doing it as a CodeGenPrep pass can do both - we first align arguments to a call, then check if the call is actually a memcpy and up its alignment if possible.

New patch attached that does this instead as a CodeGenPrepare pass.

Can you please put this into CodeGenPrep proper (with some target hook to enable it). I likely want to use this for PowerPC too, and the actual logic is fairly simple (so far).

lib/Target/ARM/ARMAlignAllocaPass.cpp
32 ↗	(On Diff #21012)	Line too long?
62 ↗	(On Diff #21012)	What are you actually trying to check for here? I assume you're mostly interested in looking for GEPs here (and not, for example, ptrtoint). Also, for GEPs, do you want to check that the offet to the underlying alloca is divisible by the alignment of interest. Maybe you should use something like Val->stripAndAccumulateInBoundsConstantOffsets() -- true, you'd miss the dynamic offset case, but I don't know if you have examples where that's important.

New patch attached that moves this into CodeGenPrepare, enabled by shouldAlignAllocaArgs in TargetLowering.

rengolin added inline comments.Mar 6 2015, 7:37 AM

lib/CodeGen/CodeGenPrepare.cpp
1236 ↗	(On Diff #21355)	I'm a bit averse of setting parameters for the computation inside a conditional and having to assert right after every call. I'd rather have (or reuse) some default size/alignment parameters from elsewhere.

john.brawn added inline comments.Mar 6 2015, 7:58 AM

lib/CodeGen/CodeGenPrepare.cpp
1236 ↗	(On Diff #21355)	If an implementation of shouldAlignAllocaArgs were to not set AllocaAlign then that indicates that it's not doing what it should, so failing an assert is better than hiding that by silently using a default value.

rengolin added inline comments.Mar 6 2015, 8:26 AM

lib/CodeGen/CodeGenPrepare.cpp
1236 ↗	(On Diff #21355)	I'm not against the assert, I'm against passing a reference for AllocaSize and AllocaAlign on a check call. The call is not explicit that it's setting the two values here, nor is it on TargetLowering's default implementation. A more clear way would be to make a function whose only job is to set those variables and make it explicit on its name, like "setAlignAllocaArgs(CI, Size, Align)" and add "Size > 0" to the conditional. You can avoid the assert in that case, too. Also, your check for TLI is redundant, since you already check for TD.

john.brawn added inline comments.Mar 6 2015, 10:08 AM

lib/CodeGen/CodeGenPrepare.cpp
1236 ↗	(On Diff #21355)	Doing it this way (return a bool, set a reference/pointer argument only if returning true) seems to be the standard way these queries are done in TargetInfo - see allowsMisalignedMemoryAccesses, getStackCookieLocation, GetAddrModeArguments.

hfinkel added inline comments.Mar 6 2015, 10:11 AM

include/llvm/Target/TargetLowering.h
980 ↗	(On Diff #21355)	Please name AllocaSize -> MinAllocaSize to make its purpose clearer.
lib/CodeGen/CodeGenPrepare.cpp
1235 ↗	(On Diff #21355)	Don't initialize these here; that makes it clear that they must be set by the TLI call, and allows mistakes to be caught by tools that detect use of undefined values (valgrind, etc.).
1236 ↗	(On Diff #21355)	I'm not against the assert, I'm against passing a reference for AllocaSize and AllocaAlign on a check call. The call is not explicit that it's setting the two values here, nor is it on TargetLowering's default implementation. A more clear way would be to make a function whose only job is to set those variables and make it explicit on its name, like "setAlignAllocaArgs(CI, Size, Align)" and add "Size > 0" to the conditional. You can avoid the assert in that case, too. I disagree. So long as the values are not initialized above the call, it is clear that the call must be setting them (and returning true if it has done so). This pattern is used by a number of other callbacks, even some in this file (getStackCookieLocation, getPreIndexedAddressParts, getPostIndexedAddressParts, ShrinkDemandedConstant, etc.), and I think it is appropriate here.

I may be reworking this patch to align not just allocas, but also global variables defined in this translation unit. The ultimate thing I'm trying to optimize is cases like

void fn() {
  char arr[31];
  strcpy(arr, "some string");
  // do somthing with arr
}

This patch aligns arr, but the variable generated for the string needs to be aligned as well. The plan was to do that in clang, but I'm thinking that maybe doing it here would be better.

Or maybe it would be better to get this checked in then do that, but the name shouldAlignAllocaArgs would have to be changed (maybe shouldAlignPointerArgs).

New patch that addresses review comments. It also renames things slightly in expectation that we'll be also aligning GlobalVariables. I'll be doing that in an upcoming patch.

You're currently doing this only for pointers that are captured as call arguments. Do you care about pointers captured in any other way (by having their address stored somewhere, or returned, for example)?

lib/CodeGen/CodeGenPrepare.cpp
1256 ↗	(On Diff #21488)	may have improved -> may be able to improve

In D7908#136838, @hfinkel wrote:

You're currently doing this only for pointers that are captured as call arguments. Do you care about pointers captured in any other way (by having their address stored somewhere, or returned, for example)?

Maybe? The original patch did that as it just aligned everything that was used in a FrameIndex instruction. I guess the question is, how much do we care if we over-align things pointlessly? When it's an argument to a memcpy call it's definitely a good idea, but in an argument to some other call or when the address is stored somewhere it may turn out to be pointless (which is possibly an argument that I should be restricting this to only memcpy/move etc. calls).

In D7908#136944, @john.brawn wrote:

In D7908#136838, @hfinkel wrote:

You're currently doing this only for pointers that are captured as call arguments. Do you care about pointers captured in any other way (by having their address stored somewhere, or returned, for example)?

Maybe? The original patch did that as it just aligned everything that was used in a FrameIndex instruction. I guess the question is, how much do we care if we over-align things pointlessly? When it's an argument to a memcpy call it's definitely a good idea, but in an argument to some other call or when the address is stored somewhere it may turn out to be pointless (which is possibly an argument that I should be restricting this to only memcpy/move etc. calls).

Agreed.

I don't have a strong opinion here. I suppose I'm inclined to go for other kinds of captures as well, or to stick with just memcpy/move/set. Picking only function calls, which amounts to some, but not all, captures, seems like an inconsistency likely to cause different kinds of abstractions to get different optimizations, and that's likely not good. Let's go to one extreme or the other ;)

New patch attached that goes for the approach of just aligning memory intrinsics.

john.brawn updated this object.Mar 10 2015, 6:51 AM

hfinkel added inline comments.Mar 10 2015, 2:14 PM

lib/CodeGen/CodeGenPrepare.cpp
1252 ↗	(On Diff #21573)	If this is not an inbounds GEP, then this subtraction could underflow (it could overflow otherwise too, but without inbounds it could underflow even in a case where the code does not have undefined behavior). I recommend making sure the value being subtracted is not greater than the alloc size. Please add a test case for this.

john.brawn added inline comments.Mar 11 2015, 4:03 AM

lib/CodeGen/CodeGenPrepare.cpp
1252 ↗	(On Diff #21573)	stripAndAccumulateInBoundsConstantOffsets will stop if it hits a non-inbound GEP, so we wouldn't get here in that case. Adding a test sound like a good idea though, I'll do that.

hfinkel added inline comments.Mar 11 2015, 6:52 AM

lib/CodeGen/CodeGenPrepare.cpp
1252 ↗	(On Diff #21573)	Yea, but even for an inbounds gep, we should do the right thing (the code may have undefined behavior, but having that reset the alignment of some objects seems a bit silly).

Attached patch handles size<offset by doing the comparison in a different way (by checking size >= minsize+offset), and adds a bunch of tests of offset handling.

While working on the next part of this (aligning global variables) I realised that we want to 8-byte align on some targets as it's faster, so the attached patch does that.

Looks good as far as I'm concerned. You might want to do the global variable handling as a separate patch.

This revision is now accepted and ready to land.Mar 17 2015, 11:13 AM

Closed by commit rL232627: [ARM] Align stack objects passed to memory intrinsics (authored by john.brawn). · Explain WhyMar 18 2015, 5:04 AM

This revision was automatically updated to reflect the committed changes.

dmgreen mentioned this in D132233: [CGP][ARM] Dont align memcpy args when optimization for size.Aug 19 2022, 7:38 AM

arichardson mentioned this in D134168: [RISCV] Make preferred alignment of PointerArgs for MemIntrinsic.Sep 20 2022, 3:32 AM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Target/

TargetLowering.h

9 lines

lib/

CodeGen/

CodeGenPrepare.cpp

36 lines

Target/

ARM/

ARMISelLowering.h

3 lines

ARMISelLowering.cpp

15 lines

test/

CodeGen/

ARM/

memfunc.ll

297 lines

memset-inline.ll

2 lines

Diff 22173

llvm/trunk/include/llvm/Target/TargetLowering.h

Show First 20 Lines • Show All 970 Lines • ▼ Show 20 Lines	virtual bool getStackCookieLocation(unsigned &/AddressSpace/,
return false;		return false;
}		}

/// Returns true if a cast between SrcAS and DestAS is a noop.		/// Returns true if a cast between SrcAS and DestAS is a noop.
virtual bool isNoopAddrSpaceCast(unsigned SrcAS, unsigned DestAS) const {		virtual bool isNoopAddrSpaceCast(unsigned SrcAS, unsigned DestAS) const {
return false;		return false;
}		}

		/// Return true if the pointer arguments to CI should be aligned by aligning
		/// the object whose address is being passed. If so then MinSize is set to the
		/// minimum size the object must be to be aligned and PrefAlign is set to the
		/// preferred alignment.
		virtual bool shouldAlignPointerArgs(CallInst /CI/, unsigned &/MinSize*/,
		unsigned &/PrefAlign/) const {
		return false;
		}

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
/// \name Helpers for TargetTransformInfo implementations		/// \name Helpers for TargetTransformInfo implementations
/// @{		/// @{

/// Get the ISD node that corresponds to the Instruction class opcode.		/// Get the ISD node that corresponds to the Instruction class opcode.
int InstructionOpcodeToISD(unsigned Opcode) const;		int InstructionOpcodeToISD(unsigned Opcode) const;

/// Estimate the cost of type-legalization and the legalized type.		/// Estimate the cost of type-legalization and the legalized type.
▲ Show 20 Lines • Show All 1,781 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp

Show First 20 Lines • Show All 1,222 Lines • ▼ Show 20 Lines	if (TLI->ExpandInlineAsm(CI)) {
SunkAddrs.clear();		SunkAddrs.clear();
return true;		return true;
}		}
// Sink address computing for memory operands into the block.		// Sink address computing for memory operands into the block.
if (OptimizeInlineAsmInst(CI))		if (OptimizeInlineAsmInst(CI))
return true;		return true;
}		}

		const DataLayout *TD = TLI ? TLI->getDataLayout() : nullptr;

		// Align the pointer arguments to this call if the target thinks it's a good
		// idea
		unsigned MinSize, PrefAlign;
		if (TLI && TD && TLI->shouldAlignPointerArgs(CI, MinSize, PrefAlign)) {
		for (auto &Arg : CI->arg_operands()) {
		// We want to align both objects whose address is used directly and
		// objects whose address is used in casts and GEPs, though it only makes
		// sense for GEPs if the offset is a multiple of the desired alignment and
		// if size - offset meets the size threshold.
		if (!Arg->getType()->isPointerTy())
		continue;
		APInt Offset(TD->getPointerSizeInBits(
		cast<PointerType>(Arg->getType())->getAddressSpace()), 0);
		Value Val = Arg->stripAndAccumulateInBoundsConstantOffsets(TD, Offset);
		uint64_t Offset2 = Offset.getLimitedValue();
		AllocaInst *AI;
		if ((Offset2 & (PrefAlign-1)) == 0 &&
		(AI = dyn_cast<AllocaInst>(Val)) &&
		AI->getAlignment() < PrefAlign &&
		TD->getTypeAllocSize(AI->getAllocatedType()) >= MinSize + Offset2)
		AI->setAlignment(PrefAlign);
		// TODO: Also align GlobalVariables
		}
		// If this is a memcpy (or similar) then we may be able to improve the
		// alignment
		if (MemIntrinsic *MI = dyn_cast<MemIntrinsic>(CI)) {
		unsigned Align = getKnownAlignment(MI->getDest(), *TD);
		if (MemTransferInst *MTI = dyn_cast<MemTransferInst>(MI))
		Align = std::min(Align, getKnownAlignment(MTI->getSource(), *TD));
		if (Align > MI->getAlignment())
		MI->setAlignment(ConstantInt::get(MI->getAlignmentType(), Align));
		}
		}

IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI);		IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI);
if (II) {		if (II) {
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
default: break;		default: break;
case Intrinsic::objectsize: {		case Intrinsic::objectsize: {
// Lower all uses of llvm.objectsize.*		// Lower all uses of llvm.objectsize.*
bool Min = (cast<ConstantInt>(II->getArgOperand(1))->getZExtValue() == 1);		bool Min = (cast<ConstantInt>(II->getArgOperand(1))->getZExtValue() == 1);
Type *ReturnTy = CI->getType();		Type *ReturnTy = CI->getType();
▲ Show 20 Lines • Show All 3,415 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMISelLowering.h

Show First 20 Lines • Show All 362 Lines • ▼ Show 20 Lines	public:
const TargetRegisterClass *getRegClassFor(MVT VT) const override;		const TargetRegisterClass *getRegClassFor(MVT VT) const override;

/// Returns true if a cast between SrcAS and DestAS is a noop.		/// Returns true if a cast between SrcAS and DestAS is a noop.
bool isNoopAddrSpaceCast(unsigned SrcAS, unsigned DestAS) const override {		bool isNoopAddrSpaceCast(unsigned SrcAS, unsigned DestAS) const override {
// Addrspacecasts are always noops.		// Addrspacecasts are always noops.
return true;		return true;
}		}

		bool shouldAlignPointerArgs(CallInst *CI, unsigned &MinSize,
		unsigned &PrefAlign) const override;

/// createFastISel - This method returns a target specific FastISel object,		/// createFastISel - This method returns a target specific FastISel object,
/// or null if the target does not support "fast" ISel.		/// or null if the target does not support "fast" ISel.
FastISel *createFastISel(FunctionLoweringInfo &funcInfo,		FastISel *createFastISel(FunctionLoweringInfo &funcInfo,
const TargetLibraryInfo *libInfo) const override;		const TargetLibraryInfo *libInfo) const override;

Sched::Preference getSchedulingPreference(SDNode *N) const override;		Sched::Preference getSchedulingPreference(SDNode *N) const override;

bool		bool
▲ Show 20 Lines • Show All 243 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show All 36 Lines
#include "llvm/IR/CallingConv.h"		#include "llvm/IR/CallingConv.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalValue.h"		#include "llvm/IR/GlobalValue.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/MC/MCSectionMachO.h"		#include "llvm/MC/MCSectionMachO.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include "llvm/Target/TargetOptions.h"		#include "llvm/Target/TargetOptions.h"
#include <utility>		#include <utility>
▲ Show 20 Lines • Show All 1,105 Lines • ▼ Show 20 Lines	if (Subtarget->hasNEON()) {
if (VT == MVT::v4i64)		if (VT == MVT::v4i64)
return &ARM::QQPRRegClass;		return &ARM::QQPRRegClass;
if (VT == MVT::v8i64)		if (VT == MVT::v8i64)
return &ARM::QQQQPRRegClass;		return &ARM::QQQQPRRegClass;
}		}
return TargetLowering::getRegClassFor(VT);		return TargetLowering::getRegClassFor(VT);
}		}

		// memcpy, and other memory intrinsics, typically tries to use LDM/STM if the
		// source/dest is aligned and the copy size is large enough. We therefore want
		// to align such objects passed to memory intrinsics.
		bool ARMTargetLowering::shouldAlignPointerArgs(CallInst *CI, unsigned &MinSize,
		unsigned &PrefAlign) const {
		if (!isa<MemIntrinsic>(CI))
		return false;
		MinSize = 8;
		// On ARM11 onwards (excluding M class) 8-byte aligned LDM is typically 1
		// cycle faster than 4-byte aligned LDM.
		PrefAlign = (Subtarget->hasV6Ops() && !Subtarget->isMClass() ? 8 : 4);
		return true;
		}

// Create a fast isel object.		// Create a fast isel object.
FastISel *		FastISel *
ARMTargetLowering::createFastISel(FunctionLoweringInfo &funcInfo,		ARMTargetLowering::createFastISel(FunctionLoweringInfo &funcInfo,
const TargetLibraryInfo *libInfo) const {		const TargetLibraryInfo *libInfo) const {
return ARM::createFastISel(funcInfo, libInfo);		return ARM::createFastISel(funcInfo, libInfo);
}		}

Sched::Preference ARMTargetLowering::getSchedulingPreference(SDNode *N) const {		Sched::Preference ARMTargetLowering::getSchedulingPreference(SDNode *N) const {
▲ Show 20 Lines • Show All 10,122 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/memfunc.ll

	; RUN: llc < %s -mtriple=armv7-apple-ios -o - \| FileCheck %s			; RUN: llc < %s -mtriple=armv7-apple-ios -disable-post-ra -o - \| FileCheck %s --check-prefix=CHECK-IOS --check-prefix=CHECK
	; RUN: llc < %s -mtriple=thumbv7m-none-macho -o - \| FileCheck %s --check-prefix=DARWIN			; RUN: llc < %s -mtriple=thumbv7m-none-macho -disable-post-ra -o - \| FileCheck %s --check-prefix=CHECK-DARWIN --check-prefix=CHECK
	; RUN: llc < %s -mtriple=arm-none-eabi -o - \| FileCheck --check-prefix=EABI %s			; RUN: llc < %s -mtriple=arm-none-eabi -disable-post-ra -o - \| FileCheck %s --check-prefix=CHECK-EABI --check-prefix=CHECK
	; RUN: llc < %s -mtriple=arm-none-eabihf -o - \| FileCheck --check-prefix=EABI %s			; RUN: llc < %s -mtriple=arm-none-eabihf -disable-post-ra -o - \| FileCheck %s --check-prefix=CHECK-EABI --check-prefix=CHECK

	@from = common global [500 x i32] zeroinitializer, align 4			@from = common global [500 x i32] zeroinitializer, align 4
	@to = common global [500 x i32] zeroinitializer, align 4			@to = common global [500 x i32] zeroinitializer, align 4

	define void @f() {			define void @f1() {
	entry:			entry:
				; CHECK-LABEL: f1

	; CHECK: memmove			; CHECK-IOS: memmove
	; EABI: __aeabi_memmove			; CHECK-DARWIN: memmove
				; CHECK-EABI: __aeabi_memmove
	call void @llvm.memmove.p0i8.p0i8.i32(i8* bitcast ([500 x i32]* @from to i8), i8 bitcast ([500 x i32]* @to to i8*), i32 500, i32 0, i1 false)			call void @llvm.memmove.p0i8.p0i8.i32(i8* bitcast ([500 x i32]* @from to i8), i8 bitcast ([500 x i32]* @to to i8*), i32 500, i32 0, i1 false)

	; CHECK: memcpy			; CHECK-IOS: memcpy
	; EABI: __aeabi_memcpy			; CHECK-DARWIN: memcpy
				; CHECK-EABI: __aeabi_memcpy
	call void @llvm.memcpy.p0i8.p0i8.i32(i8* bitcast ([500 x i32]* @from to i8), i8 bitcast ([500 x i32]* @to to i8*), i32 500, i32 0, i1 false)			call void @llvm.memcpy.p0i8.p0i8.i32(i8* bitcast ([500 x i32]* @from to i8), i8 bitcast ([500 x i32]* @to to i8*), i32 500, i32 0, i1 false)

	; EABI memset swaps arguments			; EABI memset swaps arguments
	; CHECK: mov r1, #0			; CHECK-IOS: mov r1, #0
	; CHECK: memset			; CHECK-IOS: memset
	; DARWIN: movs r1, #0			; CHECK-DARWIN: movs r1, #0
	; DARWIN: memset			; CHECK-DARWIN: memset
	; EABI: mov r2, #0			; CHECK-EABI: mov r2, #0
	; EABI: __aeabi_memset			; CHECK-EABI: __aeabi_memset
	call void @llvm.memset.p0i8.i32(i8* bitcast ([500 x i32]* @from to i8*), i8 0, i32 500, i32 0, i1 false)			call void @llvm.memset.p0i8.i32(i8* bitcast ([500 x i32]* @from to i8*), i8 0, i32 500, i32 0, i1 false)
	unreachable			unreachable
	}			}

				; Check that alloca arguments to memory intrinsics are automatically aligned if at least 8 bytes in size
				define void @f2(i8* %dest, i32 %n) {
				entry:
				; CHECK-LABEL: f2

				; IOS (ARMv7) should 8-byte align, others should 4-byte align
				; CHECK-IOS: add r1, sp, #32
				; CHECK-IOS: memmove
				; CHECK-DARWIN: add r1, sp, #28
				; CHECK-DARWIN: memmove
				; CHECK-EABI: add r1, sp, #28
				; CHECK-EABI: __aeabi_memmove
				%arr0 = alloca [9 x i8], align 1
				%0 = bitcast [9 x i8]* %arr0 to i8*
				call void @llvm.memmove.p0i8.p0i8.i32(i8* %dest, i8* %0, i32 %n, i32 0, i1 false)

				; CHECK: add r1, sp, #16
				; CHECK-IOS: memcpy
				; CHECK-DARWIN: memcpy
				; CHECK-EABI: __aeabi_memcpy
				%arr1 = alloca [9 x i8], align 1
				%1 = bitcast [9 x i8]* %arr1 to i8*
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %1, i32 %n, i32 0, i1 false)

				; CHECK-IOS: mov r0, sp
				; CHECK-IOS: mov r1, #0
				; CHECK-IOS: memset
				; CHECK-DARINW: add r0, sp, #4
				; CHECK-DARWIN: movs r1, #0
				; CHECK-DARWIN: memset
				; CHECK-EABI: add r0, sp, #4
				; CHECK-EABI: mov r2, #0
				; CHECK-EABI: __aeabi_memset
				%arr2 = alloca [9 x i8], align 1
				%2 = bitcast [9 x i8]* %arr2 to i8*
				call void @llvm.memset.p0i8.i32(i8* %2, i8 0, i32 %n, i32 0, i1 false)

				unreachable
				}

				; Check that alloca arguments are not aligned if less than 8 bytes in size
				define void @f3(i8* %dest, i32 %n) {
				entry:
				; CHECK-LABEL: f3

				; CHECK: {{add(.w)? r1, sp, #17\|sub(.w)? r1, r7, #15}}
				; CHECK-IOS: memmove
				; CHECK-DARWIN: memmove
				; CHECK-EABI: __aeabi_memmove
				%arr0 = alloca [7 x i8], align 1
				%0 = bitcast [7 x i8]* %arr0 to i8*
				call void @llvm.memmove.p0i8.p0i8.i32(i8* %dest, i8* %0, i32 %n, i32 0, i1 false)

				; CHECK: {{add(.w)? r1, sp, #10}}
				; CHECK-IOS: memcpy
				; CHECK-DARWIN: memcpy
				; CHECK-EABI: __aeabi_memcpy
				%arr1 = alloca [7 x i8], align 1
				%1 = bitcast [7 x i8]* %arr1 to i8*
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %1, i32 %n, i32 0, i1 false)

				; CHECK: {{add(.w)? r0, sp, #3}}
				; CHECK-IOS: mov r1, #0
				; CHECK-IOS: memset
				; CHECK-DARWIN: movs r1, #0
				; CHECK-DARWIN: memset
				; CHECK-EABI: mov r2, #0
				; CHECK-EABI: __aeabi_memset
				%arr2 = alloca [7 x i8], align 1
				%2 = bitcast [7 x i8]* %arr2 to i8*
				call void @llvm.memset.p0i8.i32(i8* %2, i8 0, i32 %n, i32 0, i1 false)

				unreachable
				}

				; Check that alloca arguments are not aligned if size+offset is less than 8 bytes
				define void @f4(i8* %dest, i32 %n) {
				entry:
				; CHECK-LABEL: f4

				; CHECK: {{add(.w)? r., sp, #23\|sub(.w)? r., r7, #17}}
				; CHECK-IOS: memmove
				; CHECK-DARWIN: memmove
				; CHECK-EABI: __aeabi_memmove
				%arr0 = alloca [9 x i8], align 1
				%0 = getelementptr inbounds [9 x i8], [9 x i8]* %arr0, i32 0, i32 4
				call void @llvm.memmove.p0i8.p0i8.i32(i8* %dest, i8* %0, i32 %n, i32 0, i1 false)

				; CHECK: {{add(.w)? r., sp, #(10\|14)}}
				; CHECK-IOS: memcpy
				; CHECK-DARWIN: memcpy
				; CHECK-EABI: __aeabi_memcpy
				%arr1 = alloca [9 x i8], align 1
				%1 = getelementptr inbounds [9 x i8], [9 x i8]* %arr1, i32 0, i32 4
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %1, i32 %n, i32 0, i1 false)

				; CHECK: {{add(.w)? r., sp, #(1\|5)}}
				; CHECK-IOS: mov r1, #0
				; CHECK-IOS: memset
				; CHECK-DARWIN: movs r1, #0
				; CHECK-DARWIN: memset
				; CHECK-EABI: mov r2, #0
				; CHECK-EABI: __aeabi_memset
				%arr2 = alloca [9 x i8], align 1
				%2 = getelementptr inbounds [9 x i8], [9 x i8]* %arr2, i32 0, i32 4
				call void @llvm.memset.p0i8.i32(i8* %2, i8 0, i32 %n, i32 0, i1 false)

				unreachable
				}

				; Check that alloca arguments are not aligned if the offset is not a multiple of 4
				define void @f5(i8* %dest, i32 %n) {
				entry:
				; CHECK-LABEL: f5

				; CHECK: {{add(.w)? r., sp, #27\|sub(.w)? r., r7, #21}}
				; CHECK-IOS: memmove
				; CHECK-DARWIN: memmove
				; CHECK-EABI: __aeabi_memmove
				%arr0 = alloca [13 x i8], align 1
				%0 = getelementptr inbounds [13 x i8], [13 x i8]* %arr0, i32 0, i32 1
				call void @llvm.memmove.p0i8.p0i8.i32(i8* %dest, i8* %0, i32 %n, i32 0, i1 false)

				; CHECK: {{add(.w)? r., sp, #(10\|14)}}
				; CHECK-IOS: memcpy
				; CHECK-DARWIN: memcpy
				; CHECK-EABI: __aeabi_memcpy
				%arr1 = alloca [13 x i8], align 1
				%1 = getelementptr inbounds [13 x i8], [13 x i8]* %arr1, i32 0, i32 1
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %1, i32 %n, i32 0, i1 false)

				; CHECK: {{add(.w)? r., sp, #(1\|5)}}
				; CHECK-IOS: mov r1, #0
				; CHECK-IOS: memset
				; CHECK-DARWIN: movs r1, #0
				; CHECK-DARWIN: memset
				; CHECK-EABI: mov r2, #0
				; CHECK-EABI: __aeabi_memset
				%arr2 = alloca [13 x i8], align 1
				%2 = getelementptr inbounds [13 x i8], [13 x i8]* %arr2, i32 0, i32 1
				call void @llvm.memset.p0i8.i32(i8* %2, i8 0, i32 %n, i32 0, i1 false)

				unreachable
				}

				; Check that alloca arguments are not aligned if the offset is unknown
				define void @f6(i8* %dest, i32 %n, i32 %i) {
				entry:
				; CHECK-LABEL: f6

				; CHECK: {{add(.w)? r., sp, #27\|sub(.w)? r., r7, #25}}
				; CHECK-IOS: memmove
				; CHECK-DARWIN: memmove
				; CHECK-EABI: __aeabi_memmove
				%arr0 = alloca [13 x i8], align 1
				%0 = getelementptr inbounds [13 x i8], [13 x i8]* %arr0, i32 0, i32 %i
				call void @llvm.memmove.p0i8.p0i8.i32(i8* %dest, i8* %0, i32 %n, i32 0, i1 false)

				; CHECK: {{add(.w)? r., sp, #(10\|14)}}
				; CHECK-IOS: memcpy
				; CHECK-DARWIN: memcpy
				; CHECK-EABI: __aeabi_memcpy
				%arr1 = alloca [13 x i8], align 1
				%1 = getelementptr inbounds [13 x i8], [13 x i8]* %arr1, i32 0, i32 %i
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %1, i32 %n, i32 0, i1 false)

				; CHECK: {{add(.w)? r., sp, #(1\|5)}}
				; CHECK-IOS: mov r1, #0
				; CHECK-IOS: memset
				; CHECK-DARWIN: movs r1, #0
				; CHECK-DARWIN: memset
				; CHECK-EABI: mov r2, #0
				; CHECK-EABI: __aeabi_memset
				%arr2 = alloca [13 x i8], align 1
				%2 = getelementptr inbounds [13 x i8], [13 x i8]* %arr2, i32 0, i32 %i
				call void @llvm.memset.p0i8.i32(i8* %2, i8 0, i32 %n, i32 0, i1 false)

				unreachable
				}

				; Check that alloca arguments are not aligned if the GEP is not inbounds
				define void @f7(i8* %dest, i32 %n) {
				entry:
				; CHECK-LABEL: f7

				; CHECK: {{add(.w)? r., sp, #27\|sub(.w)? r., r7, #21}}
				; CHECK-IOS: memmove
				; CHECK-DARWIN: memmove
				; CHECK-EABI: __aeabi_memmove
				%arr0 = alloca [13 x i8], align 1
				%0 = getelementptr [13 x i8], [13 x i8]* %arr0, i32 0, i32 4
				call void @llvm.memmove.p0i8.p0i8.i32(i8* %dest, i8* %0, i32 %n, i32 0, i1 false)

				; CHECK: {{add(.w)? r., sp, #(10\|14)}}
				; CHECK-IOS: memcpy
				; CHECK-DARWIN: memcpy
				; CHECK-EABI: __aeabi_memcpy
				%arr1 = alloca [13 x i8], align 1
				%1 = getelementptr [13 x i8], [13 x i8]* %arr1, i32 0, i32 4
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %1, i32 %n, i32 0, i1 false)

				; CHECK: {{add(.w)? r., sp, #(1\|5)}}
				; CHECK-IOS: mov r1, #0
				; CHECK-IOS: memset
				; CHECK-DARWIN: movs r1, #0
				; CHECK-DARWIN: memset
				; CHECK-EABI: mov r2, #0
				; CHECK-EABI: __aeabi_memset
				%arr2 = alloca [13 x i8], align 1
				%2 = getelementptr [13 x i8], [13 x i8]* %arr2, i32 0, i32 4
				call void @llvm.memset.p0i8.i32(i8* %2, i8 0, i32 %n, i32 0, i1 false)

				unreachable
				}

				; Check that alloca arguments are not aligned when the offset is past the end of the allocation
				define void @f8(i8* %dest, i32 %n) {
				entry:
				; CHECK-LABEL: f8

				; CHECK: {{add(.w)? r., sp, #27\|sub(.w)? r., r7, #21}}
				; CHECK-IOS: memmove
				; CHECK-DARWIN: memmove
				; CHECK-EABI: __aeabi_memmove
				%arr0 = alloca [13 x i8], align 1
				%0 = getelementptr inbounds [13 x i8], [13 x i8]* %arr0, i32 0, i32 16
				call void @llvm.memmove.p0i8.p0i8.i32(i8* %dest, i8* %0, i32 %n, i32 0, i1 false)

				; CHECK: {{add(.w)? r., sp, #(10\|14)}}
				; CHECK-IOS: memcpy
				; CHECK-DARWIN: memcpy
				; CHECK-EABI: __aeabi_memcpy
				%arr1 = alloca [13 x i8], align 1
				%1 = getelementptr inbounds [13 x i8], [13 x i8]* %arr1, i32 0, i32 16
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %1, i32 %n, i32 0, i1 false)

				; CHECK: {{add(.w)? r., sp, #(1\|5)}}
				; CHECK-IOS: mov r1, #0
				; CHECK-IOS: memset
				; CHECK-DARWIN: movs r1, #0
				; CHECK-DARWIN: memset
				; CHECK-EABI: mov r2, #0
				; CHECK-EABI: __aeabi_memset
				%arr2 = alloca [13 x i8], align 1
				%2 = getelementptr inbounds [13 x i8], [13 x i8]* %arr2, i32 0, i32 16
				call void @llvm.memset.p0i8.i32(i8* %2, i8 0, i32 %n, i32 0, i1 false)

				unreachable
				}

	declare void @llvm.memmove.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32, i1) nounwind			declare void @llvm.memmove.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32, i1) nounwind
	declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32, i1) nounwind			declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32, i1) nounwind
	declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1) nounwind			declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1) nounwind

llvm/trunk/test/CodeGen/ARM/memset-inline.ll

	Show All 11 Lines
	}			}

	define void @t2() nounwind ssp {			define void @t2() nounwind ssp {
	entry:			entry:
	; CHECK-LABEL: t2:			; CHECK-LABEL: t2:
	; CHECK: add.w r1, r0, #10			; CHECK: add.w r1, r0, #10
	; CHECK: vmov.i32 {{q[0-9]+}}, #0x0			; CHECK: vmov.i32 {{q[0-9]+}}, #0x0
	; CHECK: vst1.16 {d{{[0-9]+}}, d{{[0-9]+}}}, [r1]			; CHECK: vst1.16 {d{{[0-9]+}}, d{{[0-9]+}}}, [r1]
	; CHECK: vst1.32 {d{{[0-9]+}}, d{{[0-9]+}}}, [r0]			; CHECK: vst1.64 {d{{[0-9]+}}, d{{[0-9]+}}}, [r0]
	%buf = alloca [26 x i8], align 1			%buf = alloca [26 x i8], align 1
	%0 = getelementptr inbounds [26 x i8], [26 x i8]* %buf, i32 0, i32 0			%0 = getelementptr inbounds [26 x i8], [26 x i8]* %buf, i32 0, i32 0
	call void @llvm.memset.p0i8.i32(i8* %0, i8 0, i32 26, i32 1, i1 false)			call void @llvm.memset.p0i8.i32(i8* %0, i8 0, i32 26, i32 1, i1 false)
	call void @something(i8* %0) nounwind			call void @something(i8* %0) nounwind
	ret void			ret void
	}			}

	declare void @something(i8*) nounwind			declare void @something(i8*) nounwind
	declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1) nounwind			declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1) nounwind
	declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) nounwind			declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) nounwind