This is an archive of the discontinued LLVM Phabricator instance.

Differential D12035

[ARM] Improve cost model to handle sdiv by a pow-of-two.
AbandonedPublic

Authored by mcrosier on Aug 14 2015, 9:47 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
jmolloy
MatzeB
Gerolf

Summary

This patch improves the target-specific cost model to better handle signed
division by a power of two. The immediate result is that this enables the SLP
vectorizer and loop vectorizer to do a better job.

Just something I saw in passing. This is already done for the X86 and AArch64 backends.

Diff Detail

Repository: rL LLVM

Event Timeline

mcrosier updated this revision to Diff 32159.Aug 14 2015, 9:47 AM

mcrosier retitled this revision from to [ARM] Improve cost model to handle sdiv by a pow-of-two..

mcrosier updated this object.

mcrosier added reviewers: t.p.northover, jmolloy, Gerolf.

mcrosier set the repository for this revision to rL LLVM.

mcrosier added subscribers: gberry, mssimpso, junbuml, bmakam.

Herald added subscribers: rengolin, aemerson. · View Herald TranscriptAug 14 2015, 9:47 AM

mcrosier added a subscriber: llvm-commits.Aug 14 2015, 9:48 AM

mcrosier added a reviewer: MatzeB.Aug 17 2015, 5:39 AM

It seems like we may want to factor this code out (perhaps into a protected TargetTransformInfo function) so that all targets that use this DAG combine (which looks to be most of them) can use it as well.

I see a couple of potential issues with the cost estimation itself:

the DAG combine adds an additional SUB if the divisor is negative which isn't accounted for
the OperandValueKinds (OpInfo1, OpInfo2) are used for all of the operations, but these do not describe the actual operands for the intermediate instructions since they are not operating on the original operands of the SDIV. For example, the ADD instruction is an add of two intermediate results which we know nothing about (so they should be marked as OK_AnyValue), but operand 2 of the ADD will be marked as OK_UniformConstant. This may not make a difference for this particular pattern on the targets in question, but could potentially be incorrect for other targets.

Thanks for the review, Geoff. Unfortunately, I don't have time revisit this change (again, mostly a drive by patch). Perhaps someone else in the community would like to push this one through.. :)

I agree with Geoff that this should be factored out and thought well.

Not only such a cost calculation should be used everywhere, but it looks like a special case in its own. Without knowing the effect of this cost change into other patterns, it's hard to decide if this is truly beneficial overall, or just a local spike in this use-case.

I agree that the cost tables are not relying enough on knowledge of the instructions and timings and there's a lot of fudge already to help the vectorizer do a better job, but that fudge is in place after general agreements from benchmarks and large code bases. I don't like it, but I also don't have a better solution right now.

Unfortunately, any change needs to come with some benchmarking results, even those that would make the cost table better. :/

For the patch, specifically, I'd suggest this to be factored out into a higher level (for all scalar divide needs), and also take into account platforms that have HW divide.

@haicheng: Feel free to take a look at this..

mcrosier abandoned this revision.Oct 6 2015, 6:49 AM

Revision Contents

Path

Size

lib/

Target/

ARM/

ARMTargetTransformInfo.cpp

22 lines

test/

Transforms/

LoopVectorize/

ARM/

powof2sdiv.ll

29 lines

SLPVectorizer/

ARM/

sdiv-pow2.ll

40 lines

Diff 32159

lib/Target/ARM/ARMTargetTransformInfo.cpp

Context not available.
	TTI::OperandValueProperties Opd2PropInfo) {	TTI::OperandValueProperties Opd2PropInfo) {

	int ISDOpcode = TLI->InstructionOpcodeToISD(Opcode);	int ISDOpcode = TLI->InstructionOpcodeToISD(Opcode);
		assert(ISDOpcode && "Invalid opcode");

		if (ISDOpcode == ISD::SDIV &&
		Op2Info == TargetTransformInfo::OK_UniformConstantValue &&
		Opd2PropInfo == TargetTransformInfo::OP_PowerOf2) {
		// On ARM, scalar signed division by constants power-of-two are
		// normally expanded to the sequence SRA + SRL + ADD + SRA.
		// The OperandValue properties many not be same as that of previous
		// operation;conservatively assume OP_None.
		int Cost = 2 * getArithmeticInstrCost(Instruction::AShr, Ty, Op1Info,
		Op2Info, TargetTransformInfo::OP_None,
		TargetTransformInfo::OP_None);
		Cost += getArithmeticInstrCost(Instruction::LShr, Ty, Op1Info, Op2Info,
		TargetTransformInfo::OP_None,
		TargetTransformInfo::OP_None);
		Cost += getArithmeticInstrCost(Instruction::Add, Ty, Op1Info, Op2Info,
		TargetTransformInfo::OP_None,
		TargetTransformInfo::OP_None);

		return Cost;
		}

	std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);	std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);

	const unsigned FunctionCallDivCost = 20;	const unsigned FunctionCallDivCost = 20;
Context not available.

test/Transforms/LoopVectorize/ARM/powof2sdiv.ll

This file was added.

				; RUN: opt < %s -loop-vectorize -mtriple=thumbv7-unknown-linux-gnu -S \| FileCheck %s

				%struct.anon = type { [100 x i32], i32, [100 x i32] }

				@Foo = common global %struct.anon zeroinitializer, align 4

				; CHECK-LABEL: @foo(
				; CHECK: load <4 x i32>, <4 x i32>*
				; CHECK: sdiv <4 x i32>
				; CHECK: store <4 x i32>

				define void @foo(){
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds %struct.anon, %struct.anon* @Foo, i64 0, i32 2, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%div = sdiv i32 %0, 2
				%arrayidx2 = getelementptr inbounds %struct.anon, %struct.anon* @Foo, i64 0, i32 0, i64 %indvars.iv
				store i32 %div, i32* %arrayidx2, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 100
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret void
				}

test/Transforms/SLPVectorizer/ARM/sdiv-pow2.ll

This file was added.

				; RUN: opt < %s -basicaa -slp-vectorizer -S -mtriple=thumbv7-unknown-linux-gnu \| FileCheck %s

				; CHECK-LABEL: @test1
				; CHECK: load <4 x i32>
				; CHECK: add nsw <4 x i32>
				; CHECK: sdiv <4 x i32>

				define void @test1(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i32* noalias nocapture readonly %c) {
				entry:
				%0 = load i32, i32* %b, align 4
				%1 = load i32, i32* %c, align 4
				%add = add nsw i32 %1, %0
				%div = sdiv i32 %add, 2
				store i32 %div, i32* %a, align 4
				%arrayidx3 = getelementptr inbounds i32, i32* %b, i64 1
				%2 = load i32, i32* %arrayidx3, align 4
				%arrayidx4 = getelementptr inbounds i32, i32* %c, i64 1
				%3 = load i32, i32* %arrayidx4, align 4
				%add5 = add nsw i32 %3, %2
				%div6 = sdiv i32 %add5, 2
				%arrayidx7 = getelementptr inbounds i32, i32* %a, i64 1
				store i32 %div6, i32* %arrayidx7, align 4
				%arrayidx8 = getelementptr inbounds i32, i32* %b, i64 2
				%4 = load i32, i32* %arrayidx8, align 4
				%arrayidx9 = getelementptr inbounds i32, i32* %c, i64 2
				%5 = load i32, i32* %arrayidx9, align 4
				%add10 = add nsw i32 %5, %4
				%div11 = sdiv i32 %add10, 2
				%arrayidx12 = getelementptr inbounds i32, i32* %a, i64 2
				store i32 %div11, i32* %arrayidx12, align 4
				%arrayidx13 = getelementptr inbounds i32, i32* %b, i64 3
				%6 = load i32, i32* %arrayidx13, align 4
				%arrayidx14 = getelementptr inbounds i32, i32* %c, i64 3
				%7 = load i32, i32* %arrayidx14, align 4
				%add15 = add nsw i32 %7, %6
				%div16 = sdiv i32 %add15, 2
				%arrayidx17 = getelementptr inbounds i32, i32* %a, i64 3
				store i32 %div16, i32* %arrayidx17, align 4
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Improve cost model to handle sdiv by a pow-of-two.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 32159

lib/Target/ARM/ARMTargetTransformInfo.cpp

test/Transforms/LoopVectorize/ARM/powof2sdiv.ll

test/Transforms/SLPVectorizer/ARM/sdiv-pow2.ll

[ARM] Improve cost model to handle sdiv by a pow-of-two.
AbandonedPublic