This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
4
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
redundant-vf2-cost.ll

Differential D48048

[LV] Prevent LV to run cost model twice for VF=2
ClosedPublic

Authored by dcaballe on Jun 11 2018, 1:15 PM.

Download Raw Diff

Details

Reviewers

xusx595
hsaito
fhahn
mkuper

Commits

rG68795245cfb3: [LV] Prevent LV to run cost model twice for VF=2
rL334840: [LV] Prevent LV to run cost model twice for VF=2

Summary

This is a minor fix for LV cost model, where the cost for VF=2 is computed twice when the vectorization of the loop is forced without specifying a VF. It was reported by @xusx595 in the mailing list.

Diff Detail

Event Timeline

dcaballe created this revision.Jun 11 2018, 1:15 PM

Herald added subscribers: llvm-commits, rogfer01. · View Herald TranscriptJun 11 2018, 1:15 PM

LGTM.

This revision is now accepted and ready to land.Jun 11 2018, 2:40 PM

LGTM. Thanks for this patch and having me as a reviewer. As my work on vectorization is not based on the llvm-trunk but still on top of VPlan, it is not straightforward for me to create patches.

Thank you both for the review!

Ayal added a subscriber: Ayal.Jun 12 2018, 1:02 PM

Ayal added inline comments.

lib/Transforms/Vectorize/LoopVectorize.cpp
5033	Could have alternatively started from `unsigned i = 2 * Width`, as the condition above is essentially peeling the first iteration. Ideally computing `expectedCost(1)` would also be saved in this case.

Thanks for the comments, Ayal! Please, let me know if you have any other concerns.

lib/Transforms/Vectorize/LoopVectorize.cpp
5033	Good points! Could have alternatively started from unsigned i = 2 * Width, as the condition above is essentially peeling the first iteration. I initially did that but then I had to replicate the LLVM_DEBUG line also in the peeled iteration to be consistent. For that reason I opted for this approach which doesn't need that replication. Ideally computing expectedCost(1) would also be saved in this case. Agreed. Unfortunately ScalarCost is necessary below. I also thought that it'd be interesting to have the debug information about the scalar cost for these cases

If no more comments, I'll proceed with the commit.

Thank you all!
Diego

Ayal added inline comments.Jun 14 2018, 3:16 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
5033	Having a designated LLVM_DEBUG line explaining that the first VF considered is 2 instead of 1, might be helpful. If cost is to be computed just for supplying it as debug information, it should appear under LLVM_DEBUG. Anyway, mtcw, feel free to proceed.

dcaballe added inline comments.Jun 14 2018, 3:36 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
5033	Thanks, Ayal. If cost is to be computed just for supplying it as debug information, it should appear under LLVM_DEBUG. Not only for debug. Look at line 5059. Not sure if it would be executed when vectorization is forced, though, but I'd prefer not to change the behavior w.r.t the scalar cost in this patch. I'll proceed then.

Closed by commit rL334840: [LV] Prevent LV to run cost model twice for VF=2 (authored by dcaballe). · Explain WhyJun 15 2018, 9:26 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

7 lines

test/

Transforms/

LoopVectorize/

redundant-vf2-cost.ll

34 lines

Diff 150820

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 5,017 Lines • ▼ Show 20 Lines
	VectorizationFactor			VectorizationFactor
	LoopVectorizationCostModel::selectVectorizationFactor(unsigned MaxVF) {			LoopVectorizationCostModel::selectVectorizationFactor(unsigned MaxVF) {
	float Cost = expectedCost(1).first;			float Cost = expectedCost(1).first;
	const float ScalarCost = Cost;			const float ScalarCost = Cost;
	unsigned Width = 1;			unsigned Width = 1;
	LLVM_DEBUG(dbgs() << "LV: Scalar loop costs: " << (int)ScalarCost << ".\n");			LLVM_DEBUG(dbgs() << "LV: Scalar loop costs: " << (int)ScalarCost << ".\n");

	bool ForceVectorization = Hints->getForce() == LoopVectorizeHints::FK_Enabled;			bool ForceVectorization = Hints->getForce() == LoopVectorizeHints::FK_Enabled;
	// Ignore scalar width, because the user explicitly wants vectorization.
	if (ForceVectorization && MaxVF > 1) {			if (ForceVectorization && MaxVF > 1) {
	Width = 2;			// Ignore scalar width, because the user explicitly wants vectorization.
	Cost = expectedCost(Width).first / (float)Width;			// Initialize cost to max so that VF = 2 is, at least, chosen during cost
				// evaluation.
				Cost = std::numeric_limits<float>::max();
	}			}

	for (unsigned i = 2; i <= MaxVF; i *= 2) {			for (unsigned i = 2; i <= MaxVF; i *= 2) {
				AyalUnsubmitted Not Done Reply Inline Actions Could have alternatively started from `unsigned i = 2 * Width`, as the condition above is essentially peeling the first iteration. Ideally computing `expectedCost(1)` would also be saved in this case. Ayal: Could have alternatively started from `unsigned i = 2 * Width`, as the condition above is…
				dcaballeAuthorUnsubmitted Not Done Reply Inline Actions Good points! Could have alternatively started from unsigned i = 2 * Width, as the condition above is essentially peeling the first iteration. I initially did that but then I had to replicate the LLVM_DEBUG line also in the peeled iteration to be consistent. For that reason I opted for this approach which doesn't need that replication. Ideally computing expectedCost(1) would also be saved in this case. Agreed. Unfortunately ScalarCost is necessary below. I also thought that it'd be interesting to have the debug information about the scalar cost for these cases dcaballe: Good points! > Could have alternatively started from unsigned i = 2 * Width, as the condition…
				AyalUnsubmitted Not Done Reply Inline Actions Having a designated LLVM_DEBUG line explaining that the first VF considered is 2 instead of 1, might be helpful. If cost is to be computed just for supplying it as debug information, it should appear under LLVM_DEBUG. Anyway, mtcw, feel free to proceed. Ayal: Having a designated LLVM_DEBUG line explaining that the first VF considered is 2 instead of 1…
				dcaballeAuthorUnsubmitted Not Done Reply Inline Actions Thanks, Ayal. If cost is to be computed just for supplying it as debug information, it should appear under LLVM_DEBUG. Not only for debug. Look at line 5059. Not sure if it would be executed when vectorization is forced, though, but I'd prefer not to change the behavior w.r.t the scalar cost in this patch. I'll proceed then. dcaballe: Thanks, Ayal. > If cost is to be computed just for supplying it as debug information, it…
	// Notice that the vector loop needs to be executed less times, so			// Notice that the vector loop needs to be executed less times, so
	// we need to divide the cost of the vector loops by the width of			// we need to divide the cost of the vector loops by the width of
	// the vector elements.			// the vector elements.
	VectorizationCostTy C = expectedCost(i);			VectorizationCostTy C = expectedCost(i);
	float VectorCost = C.first / (float)i;			float VectorCost = C.first / (float)i;
	LLVM_DEBUG(dbgs() << "LV: Vector loop of width " << i			LLVM_DEBUG(dbgs() << "LV: Vector loop of width " << i
	<< " costs: " << (int)VectorCost << ".\n");			<< " costs: " << (int)VectorCost << ".\n");
	if (!C.second && !ForceVectorization) {			if (!C.second && !ForceVectorization) {
	▲ Show 20 Lines • Show All 2,622 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/redundant-vf2-cost.ll

				; RUN: opt < %s -loop-vectorize -mtriple x86_64 -debug -disable-output 2>&1 \| FileCheck %s
				; REQUIRES: asserts

				; Check that cost model is not executed twice for VF=2 when vectorization is
				; forced for a particular loop.

				; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{[0-9]+}} = load i32
				; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: store i32
				; CHECK-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{[0-9]+}} = load i32
				; CHECK-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: store i32
				; CHECK: LV: Vector loop of width 2 costs: {{[0-9]+}}.

				define i32 @foo(i32* %A, i32 %n) {
				entry:
				%cmp3.i = icmp eq i32 %n, 0
				br i1 %cmp3.i, label %exit, label %for.body.i

				for.body.i:
				%iv = phi i32 [ %add.i, %for.body.i ], [ 0, %entry ]
				%ld_addr = getelementptr inbounds i32, i32* %A, i32 %iv
				%0 = load i32, i32* %ld_addr, align 4
				%val = add i32 %0, 1
				store i32 %val, i32* %ld_addr, align 4
				%add.i = add nsw i32 %iv, 1
				%cmp.i = icmp eq i32 %add.i, %n
				br i1 %cmp.i, label %exit, label %for.body.i, !llvm.loop !0

				exit:
				%__init.addr.0.lcssa.i = phi i32 [ 0, %entry ], [ %add.i, %for.body.i ]
				ret i32 %__init.addr.0.lcssa.i
				}

				!0 = !{!0, !1}
				!1 = !{!"llvm.loop.vectorize.enable", i1 true}