Ignoring for a moment how I would like to use this, I think this is a nice self-contained improvement (or perhaps even bug fix) so that getIntstructionCost/getIntrinsicCost returns something an awful lot more reasonable for memcpy's. I think the dependent refactoring patches are general improvements too. So, a friendly ping, any opinions on this?

olista01 added inline comments.Mar 29 2019, 3:28 AM

lib/Target/ARM/ARMTargetTransformInfo.cpp
404 ↗	(On Diff #192345)	I can't find the old definition or any uses of this function in the existing code or linked patches (D59785, D59766), have I missed a patch?

Ah, oops, that was a experiment in my local branch. I will get rid of this, and upload a new diff soon. So, thanks for spotting this, you haven't missed anything!

cleaned up the diff,
tweaked the cost for generating a library call.

SjoerdMeijer mentioned this in D59785: [TargetLowering] Change getOptimalMemOpType to take a function attribute list.Apr 1 2019, 11:53 AM

Run code-size benchmark CSiBe, triggered an assert in function FindOptimalMemOpLowering, and fixed that.

This are the code-size results:

App	Before	AFTER	Diff
csibe/OpenTCP-1.0.4	21879	21895	0.07%
csibe/bzip2-1.0.2	47350	47390	0.08%
csibe/cg_compiler_opensrc	96212	96184	-0.03%
csibe/compiler	19286	19297	0.06%
csibe/jikespg-1.3	170538	170199	-0.20%
csibe/jpeg-6b	96431	96441	0.01%
csibe/libpng-1.2.5	77702	77902	0.26%
csibe/lwip-0.5.3.preproc	68611	68627	0.02%
csibe/mpeg2dec-0.3.1	38988	38924	-0.16%
csibe/mpgcut-1.1	7196	7198	0.03%
csibe/teem-1.6.0-src	1033815	1034595	0.08%
csibe/ttt-0.10.1.preproc	12711	12703	-0.06%
csibe/unrarlib-0.4.0	10394	10415	0.20%
csibe/zlib-1.1.4	29229	29228	0.00%
total	1730343	1730999	0.04%

Overall, small differences and the changes are neutral, but I will look into libpng to see what is happening there.

Removed the change in findOptimalMemOpLowering, because that belongs to D59766 which I have just updated.

About the regression: in my local tree had a logic error in findOptimalMemOpLowering, i.e. the condition to bail early was wrong. It was right here in the upstream diffs, so somehow I managed to mess that up).

Now, the good news is that the regressions disappear, In fact, there are no changes at all anymore. Because of the logic error, is was triggering more often, also for other intrinsics. Thus, this is now almost a non-functional change, which is what I was expecting as this is just the initial plumbing to model the cost correctly.

Investigating the "regressions" was a useful exercise though. I learned that in libPNG it was actually making the right decisions, but we were unlucky with register allocation allocating high registers like R9 and R10 not allowing narrow encodings for instructions LDRB and STRB. The other regression, in TEEM, was because it was also transforming memset intrinsics to eabi_memclr4 calls. This was just wrong, and was an result of not bailing early in findOptimalMemOpLowering. But this shows I definitely want to look at removing this early bail in a follow-up patch, because with the memclr things fixed, it should be a win overall and reduce code-size.

But summarising: this is not causing any changes in codesize for two code-bases CSiBe and Mbed, and thus is almost a non-functional change, which was my intention.

dmgreen added inline comments.Apr 24 2019, 1:58 PM

lib/Target/ARM/ARMTargetTransformInfo.cpp
412 ↗	(On Diff #193888)	Can you explain why this is more than 6? I would imagine it would be something like 4; 1 (for the call) + a few for argument setup.
436 ↗	(On Diff #193888)	Does this need to be clang-formatted?
test/Analysis/CostModel/ARM/memcpy.ll
373 ↗	(On Diff #196444)	Is it worth adding a few strict-align tests too?

Hi Dave, many thanks for taking a look!

Can you explain why this is more than 6? I would imagine it would be something like 4; 1 (for the call) + a few for argument setup.

6 was arbitrary, I wanted to make it slightly more costly than 'TCC_Expensive', but don't have good reasons at the moment, so I've changed it back to 4.

Does this need to be clang-formatted?

I did and made a few changes. But to keep the inline comments near the arguments readable, I kept them on separate lines (clang-format puts them after each other).

Is it worth adding a few strict-align tests too?

Yes, thanks for the suggestion. Done!

dmgreen added inline comments.Apr 26 2019, 3:16 AM

lib/Target/ARM/ARMTargetTransformInfo.cpp
420 ↗	(On Diff #196632)	How come this is needed now? I was under the impression that findOptimalMemOpLowering handled allows unaligned cases. And if the memory is aligned (but size >= 8), we can still expand the memcpy.
434 ↗	(On Diff #196632)	Should we be using the same value as getMaxStoresPerMemcpy here? That way we may not need the "Size >= 32 && ... " checks above.

Thanks for the suggestion to use getMaxStoresPerMemcpy; that greatly simplified and cleaned up things. Have added more tests too (all functions are now tested with/without strict align).

Nice. LGTM

lib/Target/ARM/ARMTargetTransformInfo.cpp
422 ↗	(On Diff #197094)	F->hasMinSize()
test/Analysis/CostModel/ARM/memcpy.ll
180 ↗	(On Diff #197094)	This can be removed?

This revision is now accepted and ready to land.Apr 29 2019, 8:17 AM

SjoerdMeijer mentioned this in rL359537: [TargetLowering] Change getOptimalMemOpType to take a function attribute list.Apr 30 2019, 1:37 AM

SjoerdMeijer mentioned this in rG180f1ae57c9d: [TargetLowering] Change getOptimalMemOpType to take a function attribute list.

SjoerdMeijer mentioned this in rG0ed4619679b5: [TargetLowering] findOptimalMemOpLowering. NFCI..Apr 30 2019, 3:09 AM

SjoerdMeijer mentioned this in rL359543: [TargetLowering] findOptimalMemOpLowering. NFCI..

Closed by commit rL359547: [ARM] Implement TTI::getMemcpyCost (authored by SjoerdMeijer). · Explain WhyApr 30 2019, 3:28 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2019, 3:28 AM

Diff 197285

llvm/trunk/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 578 Lines • ▼ Show 20 Lines
	int TargetTransformInfo::getAddressComputationCost(Type *Tp,			int TargetTransformInfo::getAddressComputationCost(Type *Tp,
	ScalarEvolution *SE,			ScalarEvolution *SE,
	const SCEV *Ptr) const {			const SCEV *Ptr) const {
	int Cost = TTIImpl->getAddressComputationCost(Tp, SE, Ptr);			int Cost = TTIImpl->getAddressComputationCost(Tp, SE, Ptr);
	assert(Cost >= 0 && "TTI should not produce negative costs!");			assert(Cost >= 0 && "TTI should not produce negative costs!");
	return Cost;			return Cost;
	}			}

				int TargetTransformInfo::getMemcpyCost(const Instruction *I) const {
				int Cost = TTIImpl->getMemcpyCost(I);
				assert(Cost >= 0 && "TTI should not produce negative costs!");
				return Cost;
				}

	int TargetTransformInfo::getArithmeticReductionCost(unsigned Opcode, Type *Ty,			int TargetTransformInfo::getArithmeticReductionCost(unsigned Opcode, Type *Ty,
	bool IsPairwiseForm) const {			bool IsPairwiseForm) const {
	int Cost = TTIImpl->getArithmeticReductionCost(Opcode, Ty, IsPairwiseForm);			int Cost = TTIImpl->getArithmeticReductionCost(Opcode, Ty, IsPairwiseForm);
	assert(Cost >= 0 && "TTI should not produce negative costs!");			assert(Cost >= 0 && "TTI should not produce negative costs!");
	return Cost;			return Cost;
	}			}

	int TargetTransformInfo::getMinMaxReductionCost(Type Ty, Type CondTy,			int TargetTransformInfo::getMinMaxReductionCost(Type Ty, Type CondTy,
	▲ Show 20 Lines • Show All 640 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMTargetTransformInfo.h

Show First 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	unsigned getRegisterBitWidth(bool Vector) const {

return 32;		return 32;
}		}

unsigned getMaxInterleaveFactor(unsigned VF) {		unsigned getMaxInterleaveFactor(unsigned VF) {
return ST->getMaxInterleaveFactor();		return ST->getMaxInterleaveFactor();
}		}

		int getMemcpyCost(const Instruction *I);

int getShuffleCost(TTI::ShuffleKind Kind, Type Tp, int Index, Type SubTp);		int getShuffleCost(TTI::ShuffleKind Kind, Type Tp, int Index, Type SubTp);

int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
const Instruction *I = nullptr);		const Instruction *I = nullptr);

int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,		int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
const Instruction *I = nullptr);		const Instruction *I = nullptr);

Show All 40 Lines

llvm/trunk/lib/Target/ARM/ARMTargetTransformInfo.cpp

Show All 15 Lines
#include "llvm/CodeGen/ISDOpcodes.h"		#include "llvm/CodeGen/ISDOpcodes.h"
#include "llvm/CodeGen/ValueTypes.h"		#include "llvm/CodeGen/ValueTypes.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/MC/SubtargetFeature.h"		#include "llvm/MC/SubtargetFeature.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/MachineValueType.h"		#include "llvm/Support/MachineValueType.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
▲ Show 20 Lines • Show All 364 Lines • ▼ Show 20 Lines	if (Ty->isVectorTy() && SE &&
!BaseT::isConstantStridedAccessLessThan(SE, Ptr, MaxMergeDistance + 1))		!BaseT::isConstantStridedAccessLessThan(SE, Ptr, MaxMergeDistance + 1))
return NumVectorInstToHideOverhead;		return NumVectorInstToHideOverhead;

// In many cases the address computation is not merged into the instruction		// In many cases the address computation is not merged into the instruction
// addressing mode.		// addressing mode.
return 1;		return 1;
}		}

		int ARMTTIImpl::getMemcpyCost(const Instruction *I) {
		const MemCpyInst *MI = dyn_cast<MemCpyInst>(I);
		assert(MI && "MemcpyInst expected");
		ConstantInt *C = dyn_cast<ConstantInt>(MI->getLength());

		// To model the cost of a library call, we assume 1 for the call, and
		// 3 for the argument setup.
		const unsigned LibCallCost = 4;

		// If 'size' is not a constant, a library call will be generated.
		if (!C)
		return LibCallCost;

		const unsigned Size = C->getValue().getZExtValue();
		const unsigned DstAlign = MI->getDestAlignment();
		const unsigned SrcAlign = MI->getSourceAlignment();
		const Function *F = I->getParent()->getParent();
		const unsigned Limit = TLI->getMaxStoresPerMemmove(F->hasMinSize());
		std::vector<EVT> MemOps;

		// MemOps will be poplulated with a list of data types that needs to be
		// loaded and stored. That's why we multiply the number of elements by 2 to
		// get the cost for this memcpy.
		if (getTLI()->findOptimalMemOpLowering(
		MemOps, Limit, Size, DstAlign, SrcAlign, false /IsMemset/,
		false /ZeroMemset/, false /MemcpyStrSrc/, false /AllowOverlap/,
		MI->getDestAddressSpace(), MI->getSourceAddressSpace(),
		F->getAttributes()))
		return MemOps.size() * 2;

		// If we can't find an optimal memop lowering, return the default cost
		return LibCallCost;
		}

int ARMTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,		int ARMTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,
Type *SubTp) {		Type *SubTp) {
if (Kind == TTI::SK_Broadcast) {		if (Kind == TTI::SK_Broadcast) {
static const CostTblEntry NEONDupTbl[] = {		static const CostTblEntry NEONDupTbl[] = {
// VDUP handles these cases.		// VDUP handles these cases.
{ISD::VECTOR_SHUFFLE, MVT::v2i32, 1},		{ISD::VECTOR_SHUFFLE, MVT::v2i32, 1},
{ISD::VECTOR_SHUFFLE, MVT::v2f32, 1},		{ISD::VECTOR_SHUFFLE, MVT::v2f32, 1},
{ISD::VECTOR_SHUFFLE, MVT::v2i64, 1},		{ISD::VECTOR_SHUFFLE, MVT::v2i64, 1},
▲ Show 20 Lines • Show All 249 Lines • Show Last 20 Lines

llvm/trunk/test/Analysis/CostModel/ARM/memcpy.ll

	; RUN: opt < %s -cost-model -analyze -cost-kind=code-size \| FileCheck %s			; RUN: opt < %s -cost-model -analyze -cost-kind=code-size \| \
				; RUN: FileCheck %s --check-prefixes=COMMON,CHECK-NO-SA
				; RUN: opt < %s -cost-model -analyze -cost-kind=code-size -mattr=+strict-align \| \
				; RUN: FileCheck %s --check-prefixes=COMMON,CHECK-SA

	target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"			target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
	target triple = "thumbv7m-arm-unknown-eabi"			target triple = "thumbv7m-arm-unknown-eabi"

	define void @memcpy(i8* %d, i8* %s, i32 %N) {			;;;;;;;;;;;;
				; Align 1, 1
				;;;;;;;;;;;;

				define void @memcpy_1(i8* %d, i8* %s) {
				;
				; with/without strict-align:
				;
				; ldrb r1, [r1]
				; strb r1, [r0]
				;
				; COMMON: function 'memcpy_1'
				; CHECK-NO-SA-NEXT: cost of 2 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				; CHECK-SA-NEXT: cost of 2 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %d, i8* align 1 %s, i32 1, i1 false)
				ret void
				}

				define void @memcpy_2(i8* %d, i8* %s) {
				;
				; no strict-align:
				;
				; ldrh r1, [r1]
				; strh r1, [r0]
				;
				; strict-align:
				;
				; ldrb r2, [r1]
				; ldrb r1, [r1, #1]
				; strb r1, [r0, #1]
				; strb r2, [r0]
				;
				; COMMON: function 'memcpy_2'
				; CHECK-NO-SA-NEXT: cost of 2 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				; CHECK-SA-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %d, i8* align 1 %s, i32 2, i1 false)
				ret void
				}

				define void @memcpy_3(i8* %d, i8* %s) {
				;
				; no strict-align:
				;
				; ldrb r2, [r1, #2]
				; strb r2, [r0, #2]
				; ldrh r1, [r1]
				; strh r1, [r0]
				;
				; strict-align:
				;
				; ldrb r2, [r1]
				; ldrb r3, [r1, #1]
				; ldrb r1, [r1, #2]
				; strb r1, [r0, #2]
				; strb r3, [r0, #1]
				; strb r2, [r0]
				;
				; COMMON: function 'memcpy_3'
				; CHECK-NO-SA-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				; CHECK-SA-NEXT: cost of 6 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %d, i8* align 1 %s, i32 3, i1 false)
				ret void
				}

				define void @memcpy_4(i8* %d, i8* %s) {
				;
				; no strict-align:
				;
				; ldr r1, [r1]
				; str r1, [r0]
				;
				; strict-align:
				;
				; ldrb.w r12, [r1]
				; ldrb r3, [r1, #1]
				; ldrb r2, [r1, #2]
				; ldrb r1, [r1, #3]
				; strb r1, [r0, #3]
				; strb r2, [r0, #2]
				; strb r3, [r0, #1]
				; strb.w r12, [r0]
				;
				; COMMON: function 'memcpy_4'
				; CHECK-NO-SA-NEXT: cost of 2 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				; CHECK-SA-NEXT: cost of 8 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %d, i8* align 1 %s, i32 4, i1 false)
				ret void
				}

				define void @memcpy_8(i8* %d, i8* %s) {
				;
				; no strict-align:
				;
				; ldr r2, [r1]
				; ldr r1, [r1, #4]
				; str r1, [r0, #4]
				; str r2, [r0]
				;
				; strict-align:
				;
				; push {r7, lr}
				; movs r2, #8
				; bl __aeabi_memcpy
				; pop {r7, pc}
				;
				; COMMON: function 'memcpy_8'
				; COMMON-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %d, i8* align 1 %s, i32 8, i1 false)
				ret void
				}

				define void @memcpy_16(i8* %d, i8* %s) {
				;
				; no strict-align:
				;
				; ldr.w r12, [r1]
				; ldr r3, [r1, #4]
				; ldr r2, [r1, #8]
				; ldr r1, [r1, #12]
				; str r1, [r0, #12]
				; str r2, [r0, #8]
				; str r3, [r0, #4]
				; str.w r12, [r0]
				;
				; strict-align:
				;
				; push {r7, lr}
				; movs r2, #8
				; bl __aeabi_memcpy
				; pop {r7, pc}
				;
				; COMMON: function 'memcpy_16'
				; CHECK-NO-SA-NEXT: cost of 8 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				; CHECK-SA-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %d, i8* align 1 %s, i32 16, i1 false)
				ret void
				}

				define void @memcpy_32(i8* %d, i8* %s, i32 %N) {
				;
				; with/without strict-align:
				;
				; movs r2, #32
				; bl __aeabi_memcpy
				;
				; COMMON: function 'memcpy_32'
				; COMMON-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %d, i8* align 1 %s, i32 32, i1 false)
				ret void
				}

				define void @memcpy_N(i8* %d, i8* %s, i32 %N) {
				;
				; with/without strict-align:
				;
				; bl __aeabi_memcpy
				;
				; COMMON: function 'memcpy_N'
				; COMMON-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %d, i8* align 1 %s, i32 %N, i1 false)
				ret void
				}

				;;;;;;;;;;;;;
				; Align 2, 2
				;;;;;;;;;;;;;

				define void @memcpy_1_al2(i8* %d, i8* %s) {
				;
				; with/without strict-align:
				;
				; ldrb r1, [r1]
				; strb r1, [r0]
				;
				; COMMON: function 'memcpy_1_al2'
				; COMMON-NEXT: cost of 2 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 2 %d, i8* align 2 %s, i32 1, i1 false)
				ret void
				}

				define void @memcpy_2_al2(i8* %d, i8* %s) {
				;
				; with/without strict-align:
				;
				; ldrh r1, [r1]
				; strh r1, [r0]
				;
				; COMMON: function 'memcpy_2_al2'
				; COMMON-NEXT: cost of 2 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 2 %d, i8* align 2 %s, i32 2, i1 false)
				ret void
				}

				define void @memcpy_3_al2(i8* %d, i8* %s) {
				;
				; with/without strict-align:
				;
				; ldrb r2, [r1, #2]
				; strb r2, [r0, #2]
				; ldrh r1, [r1]
				; strh r1, [r0]
				;
				; COMMON: function 'memcpy_3_al2'
				; COMMON-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 2 %d, i8* align 2 %s, i32 3, i1 false)
				ret void
				}

				define void @memcpy_4_al2(i8* %d, i8* %s) {
				;
				; no strict-align:
				;
				; ldr r1, [r1]
				; str r1, [r0]
				;
				; strict-align:
				;
				; ldrh r2, [r1, #2]
				; strh r2, [r0, #2]
				; ldrh r1, [r1]
				; strh r1, [r0]
				;
				; COMMON: function 'memcpy_4_al2'
				; CHECK-NO-SA-NEXT: cost of 2 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				; CHECK-SA-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 2 %d, i8* align 2 %s, i32 4, i1 false)
				ret void
				}

				define void @memcpy_8_al2(i8* %d, i8* %s) {
				;
				; no strict-align:
				;
				; ldr r2, [r1]
				; ldr r1, [r1, #4]
				; str r1, [r0, #4]
				; str r2, [r0]
				;
				; strict-align:
				;
				; ldrh r2, [r1, #6]
				; strh r2, [r0, #6]
				; ldrh r2, [r1, #4]
				; strh r2, [r0, #4]
				; ldrh r2, [r1, #2]
				; strh r2, [r0, #2]
				; ldrh r1, [r1]
				; strh r1, [r0]
				;
				; COMMON: function 'memcpy_8_al2'
				; CHECK-NO-SA-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				; CHECK-SA-NEXT: cost of 8 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 2 %d, i8* align 2 %s, i32 8, i1 false)
				ret void
				}

				define void @memcpy_16_al2(i8* %d, i8* %s) {
				;
				; no strict-align:
				;
				; ldr.w r12, [r1]
				; ldr r3, [r1, #4]
				; ldr r2, [r1, #8]
				; ldr r1, [r1, #12]
				; str r1, [r0, #12]
				; str r2, [r0, #8]
				; str r3, [r0, #4]
				; str.w r12, [r0]
				;
				; strict-align:
				;
				; movs r2, #16
				; bl __aeabi_memcpy
				;
				; COMMON: function 'memcpy_16_al2'
				; CHECK-NO-SA-NEXT: cost of 8 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				; CHECK-SA-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 2 %d, i8* align 2 %s, i32 16, i1 false)
				ret void
				}

				define void @memcpy_32_al2(i8* %d, i8* %s, i32 %N) {
				;
				; with/without strict-align:
				;
				; movs r2, #32
				; bl __aeabi_memcpy
				;
				; COMMON: function 'memcpy_32_al2'
				; COMMON-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 2 %d, i8* align 2 %s, i32 32, i1 false)
				ret void
				}

				define void @memcpy_N_al2(i8* %d, i8* %s, i32 %N) {
				;
				; with/without strict-align:
				;
				; bl __aeabi_memcpy
				;
				; COMMON: function 'memcpy_N_al2'
				; COMMON-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 2 %d, i8* align 2 %s, i32 %N, i1 false)
				ret void
				}

				;;;;;;;;;;;;;
				; Align 4, 4
				;;;;;;;;;;;;;

				define void @memcpy_1_al4(i8* %d, i8* %s) {
				;
				; with/without strict-align:
				;
				; ldrb r1, [r1]
				; strb r1, [r0]
				;
				; COMMON: function 'memcpy_1_al4'
				; COMMON-NEXT: cost of 2 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %d, i8* align 4 %s, i32 1, i1 false)
				ret void
				}

				define void @memcpy_2_al4(i8* %d, i8* %s) {
				;
				; with/without strict-align:
				;
				; ldrh r1, [r1]
				; strh r1, [r0]
				;
				; COMMON: function 'memcpy_2_al4'
				; COMMON-NEXT: cost of 2 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %d, i8* align 4 %s, i32 2, i1 false)
				ret void
				}

				define void @memcpy_3_al4(i8* %d, i8* %s) {
				;
				; with/without strict-align:
				;
				; ldrb r2, [r1, #2]
				; strb r2, [r0, #2]
				; ldrh r1, [r1]
				; strh r1, [r0]
				;
				; COMMON: function 'memcpy_3_al4'
				; COMMON-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %d, i8* align 4 %s, i32 3, i1 false)
				ret void
				}

				define void @memcpy_4_al4(i8* %d, i8* %s) {
				;
				; with/without strict-align:
				;
				; ldr r1, [r1]
				; str r1, [r0]
				;
				; COMMON: function 'memcpy_4_al4'
				; COMMON-NEXT: cost of 2 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %d, i8* align 4 %s, i32 4, i1 false)
				ret void
				}

				define void @memcpy_8_al4(i8* %d, i8* %s) {
				;
				; with/without strict-align:
				;
				; ldrd r2, r1, [r1]
				; strd r2, r1, [r0]
				;
				; COMMON: function 'memcpy_8_al4'
				; COMMON-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %d, i8* align 4 %s, i32 8, i1 false)
				ret void
				}

				define void @memcpy_16_al4(i8* %d, i8* %s) {
				;
				; with/without strict-align:
				;
				; ldm.w r1, {r2, r3, r12}
				; ldr r1, [r1, #12]
				; stm.w r0, {r2, r3, r12}
				; str r1, [r0, #12]
				;
				; COMMON: function 'memcpy_16_al4'
				; COMMON-NEXT: cost of 8 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %d, i8* align 4 %s, i32 16, i1 false)
				ret void
				}

				define void @memcpy_32_al4(i8* %d, i8* %s, i32 %N) {
				;
				; with/without strict-align:
				;
				; ldm.w r1!, {r2, r3, r12, lr}
				; stm.w r0!, {r2, r3, r12, lr}
				; ldm.w r1, {r2, r3, r12, lr}
				; stm.w r0, {r2, r3, r12, lr}
				;
				; COMMON: function 'memcpy_32_al4'
				; COMMON-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %d, i8* align 4 %s, i32 32, i1 false)
				ret void
				}

				define void @memcpy_N_al4(i8* %d, i8* %s, i32 %N) {
				;
				; with/without strict-align:
				;
				; bl __aeabi_memcpy4
				;
				; COMMON: function 'memcpy_N_al4'
				; COMMON-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %d, i8* align 4 %s, i32 %N, i1 false)
				ret void
				}

				;;;;;;;;;;;;;
				; Align 1, 4
				;;;;;;;;;;;;;

				define void @memcpy_1_al14(i8* %d, i8* %s) {
				;
				; with/without strict-align:
				;
				; ldrb r1, [r1]
				; strb r1, [r0]
				;
				; COMMON: function 'memcpy_1_al14'
				; COMMON-NEXT: cost of 2 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %d, i8* align 4 %s, i32 1, i1 false)
				ret void
				}

				define void @memcpy_2_al14(i8* %d, i8* %s) {
				;
				; no strict-align:
				;
				; ldrh r1, [r1]
				; strh r1, [r0]
				;
				; strict-align:
				;
				; ldrb r2, [r1]
				; ldrb r1, [r1, #1]
				; strb r1, [r0, #1]
				; strb r2, [r0]
				;
				; COMMON: function 'memcpy_2_al14'
				; CHECK-NO-SA-NEXT: cost of 2 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				; CHECK-SA-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %d, i8* align 4 %s, i32 2, i1 false)
				ret void
				}

				define void @memcpy_3_al14(i8* %d, i8* %s) {
				;
				; no strict-align:
				;
				; ldrb r2, [r1, #2]
				; strb r2, [r0, #2]
				; ldrh r1, [r1]
				; strh r1, [r0]
				;
				; strict-align:
				;
				; ldrb r2, [r1]
				; ldrb r3, [r1, #1]
				; ldrb r1, [r1, #2]
				; strb r1, [r0, #2]
				; strb r3, [r0, #1]
				; strb r2, [r0]
				;
				; COMMON: function 'memcpy_3_al14'
				; CHECK-NO-SA-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				; CHECK-SA-NEXT: cost of 6 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %d, i8* align 4 %s, i32 3, i1 false)
				ret void
				}

				define void @memcpy_4_al14(i8* %d, i8* %s) {
				;
				; no strict-align:
				;
				; ldr r1, [r1]
				; str r1, [r0]
				;
				; strict-align:
				;
				; ldrb.w r12, [r1]
				; ldrb r3, [r1, #1]
				; ldrb r2, [r1, #2]
				; ldrb r1, [r1, #3]
				; strb r1, [r0, #3]
				; strb r2, [r0, #2]
				; strb r3, [r0, #1]
				; strb.w r12, [r0]
				;
				; COMMON: function 'memcpy_4_al14'
				; CHECK-NO-SA-NEXT: cost of 2 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				; CHECK-SA-NEXT: cost of 8 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %d, i8* align 4 %s, i32 4, i1 false)
				ret void
				}

				define void @memcpy_8_al14(i8* %d, i8* %s) {
				;
				; no strict-align:
				;
				; ldr r2, [r1]
				; ldr r1, [r1, #4]
				; str r1, [r0, #4]
				; str r2, [r0]
				;
				; strict-align:
				;
				; push {r7, lr}
				; movs r2, #8
				; bl __aeabi_memcpy
				; pop {r7, pc}
				;
				; COMMON: function 'memcpy_8_al14'
				; COMMON-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %d, i8* align 4 %s, i32 8, i1 false)
				ret void
				}

				define void @memcpy_16_al14(i8* %d, i8* %s) {
				;
				; no strict-align:
				;
				; ldr.w r12, [r1]
				; ldr r3, [r1, #4]
				; ldr r2, [r1, #8]
				; ldr r1, [r1, #12]
				; str r1, [r0, #12]
				; str r2, [r0, #8]
				; str r3, [r0, #4]
				; str.w r12, [r0]
				;
				; strict-align:
				;
				; movs r2, #16
				; bl __aeabi_memcpy
				;
				; COMMON: function 'memcpy_16_al14'
				; CHECK-NO-SA-NEXT: cost of 8 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				; CHECK-SA-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %d, i8* align 4 %s, i32 16, i1 false)
				ret void
				}

				define void @memcpy_32_al14(i8* %d, i8* %s) {
				;
				; with/without strict-align:
				;
				; movs r2, #32
				; bl __aeabi_memcpy
				;
				; COMMON: function 'memcpy_32_al14'
				; COMMON-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %d, i8* align 4 %s, i32 32, i1 false)
				ret void
				}

				define void @memcpy_N_al14(i8* %d, i8* %s, i32 %N) {
				;
				; with/without strict-align:
				;
				; bl __aeabi_memcpy4
				;
				; COMMON: function 'memcpy_N_al14'
				; COMMON-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
				entry:
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %d, i8* align 4 %s, i32 %N, i1 false)
				ret void
				}

				;;;;;;;;;;;;;
				; Align 4, 1
				;;;;;;;;;;;;;

				define void @memcpy_1_al41(i8* %d, i8* %s) {
				;
				; with/without strict-align:
				;
				; ldrb r1, [r1]
				; strb r1, [r0]
				;
				; COMMON: function 'memcpy_1_al41'
				; COMMON-NEXT: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32
				;
	entry:			entry:
	; CHECK: cost of 4 for instruction: call void @llvm.memcpy.p0i8.p0i8.i32			call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %d, i8* align 1 %s, i32 1, i1 false)
	call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %d, i8* align 1 %s, i32 36, i1 false)
	ret void			ret void
	}			}

	declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture writeonly, i8* nocapture readonly, i32, i1) #1			declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture writeonly, i8* nocapture readonly, i32, i1) #1

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Implement TTI::getMemcpyCost
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 197285

llvm/trunk/lib/Analysis/TargetTransformInfo.cpp

llvm/trunk/lib/Target/ARM/ARMTargetTransformInfo.h

llvm/trunk/lib/Target/ARM/ARMTargetTransformInfo.cpp

llvm/trunk/test/Analysis/CostModel/ARM/memcpy.ll

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Implement TTI::getMemcpyCostClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 197285

llvm/trunk/lib/Analysis/TargetTransformInfo.cpp

llvm/trunk/lib/Target/ARM/ARMTargetTransformInfo.h

llvm/trunk/lib/Target/ARM/ARMTargetTransformInfo.cpp

llvm/trunk/test/Analysis/CostModel/ARM/memcpy.ll

[ARM] Implement TTI::getMemcpyCost
ClosedPublic