This is an archive of the discontinued LLVM Phabricator instance.

[LV] Do not vectorize loops with a low dynamic tripcount, as determined by profile information
Needs ReviewPublic

Authored by mkuper on Nov 18 2016, 3:02 PM.

Download Raw Diff

Details

Reviewers

davidxl
danielcdh
gilr
mssimpso

Summary

This is somewhat limited at this point - there are two known sources of inaccuracy:

We still don't have a code duplication factor, so, for sampling-based FDO, we'll get the wrong trip count if the loop was vectorized in the sampled binary.
Loops that are dynamically dead in the profile will still be vectorized, since getLoopEstimatedTripCount() still can't distinguish "loop was never entered" from "no information".

Both of these will need to be fixed on the "estimate trip count" side.
Dehao, David, do you think it's worth waiting with this until we have the duplication factors?

Diff Detail

Event Timeline

mkuper updated this revision to Diff 78587.Nov 18 2016, 3:02 PM

mkuper retitled this revision from to [LV] Do not vectorize loops with a low dynamic tripcount, as determined by profile information.

mkuper updated this object.

mkuper added reviewers: mssimpso, gilr, danielcdh, davidxl.

mkuper added a subscriber: llvm-commits.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptNov 18 2016, 3:02 PM

anemet added a subscriber: anemet.Nov 18 2016, 4:16 PM

anemet added inline comments.

lib/Transforms/Vectorize/LoopVectorize.cpp
7203–7206	While you're here, can you please improve this message to actually mention low-trip count?
test/Transforms/LoopVectorize/X86/runtime-trip-count.ll
3	We usually try to formulate these tests without relying on asserts so that we get coverage with a no-assert build as well.

mkuper added inline comments.Nov 18 2016, 4:32 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
7203–7206	Sure.
test/Transforms/LoopVectorize/X86/runtime-trip-count.ll
3	I realize that, but from a testing perspective, I actually want to verify the reason it didn't get vectorized, not only that it's not vectorized. Do you think it would be better to duplicate the test and have an asserts and a non-asserts version? Do you know if "UNSUPPORTED: asserts" works? (Is there a way to require asserts only for a specific run line, as opposed to the whole test? That would solve the problem.)

anemet added inline comments.Nov 18 2016, 4:42 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
7203–7206	Thanks.
test/Transforms/LoopVectorize/X86/runtime-trip-count.ll
3	I'd say just use opt remarks then (-pass-remarks-missed=loop-vectorize). In the opt output, you won't have the function name (only the source line but that required debug info). If you want the function name, you could generate the YAML output which has everything including the function name.

mkuper added inline comments.Nov 18 2016, 4:45 PM

test/Transforms/LoopVectorize/X86/runtime-trip-count.ll
3	Ok, I guess opt remarks is a reasonable solution. I don't actually need the function name, since I only care about seeing a remark for the low case.

Updated per Adam's comments.

This is fine with me now. I let the others comment on your initial question.

Does the test need to be target-specific? Otherwise, this looks good to me as well.

No, I think I can move the test out, thanks Matt!

davidxl added inline comments.Feb 1 2017, 9:50 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
7182	This looks like a reusable/common utility function (combine static count and profile count). Probably extract it out?

Ayal mentioned this in D34373: [LV] Optimize for size when vectorizing loops with tiny trip count.Jun 28 2017, 10:33 AM

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

27 lines

test/

Transforms/

LoopVectorize/

X86/

runtime-trip-count.ll

65 lines

Diff 78607

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,173 Lines • ▼ Show 20 Lines	#endif /* NDEBUG */

if (!Hints.allowVectorization(F, L, AlwaysVectorize)) {		if (!Hints.allowVectorization(F, L, AlwaysVectorize)) {
DEBUG(dbgs() << "LV: Loop hints prevent vectorization.\n");		DEBUG(dbgs() << "LV: Loop hints prevent vectorization.\n");
return false;		return false;
}		}

// Check the loop for a trip count threshold:		// Check the loop for a trip count threshold:
// do not vectorize loops with a tiny trip count.		// do not vectorize loops with a tiny trip count.
const unsigned TC = SE->getSmallConstantTripCount(L);		bool KnownTC = false;
		davidxlUnsubmitted Not Done Reply Inline Actions This looks like a reusable/common utility function (combine static count and profile count). Probably extract it out? davidxl: This looks like a reusable/common utility function (combine static count and profile count).
if (TC > 0u && TC < TinyTripCountVectorThreshold) {		unsigned TC = SE->getSmallConstantTripCount(L);
DEBUG(dbgs() << "LV: Found a loop with a very small trip count. "		if (TC) {
<< "This loop is not worth vectorizing.");		KnownTC = true;
		} else if (F->getEntryCount()) {
		// If the tripcount is unknown, but profile information is available,
		// use a profile-based estimate.
		auto EstimatedTC = getLoopEstimatedTripCount(L);
		if (EstimatedTC) {
		TC = *EstimatedTC;
		KnownTC = true;
		}
		}

		if (KnownTC && TC < TinyTripCountVectorThreshold) {
		DEBUG(dbgs() << "LV: Found a loop with small trip count: " << TC
		<< ". This loop is not worth vectorizing.");
if (Hints.getForce() == LoopVectorizeHints::FK_Enabled)		if (Hints.getForce() == LoopVectorizeHints::FK_Enabled)
DEBUG(dbgs() << " But vectorizing was explicitly forced.\n");		DEBUG(dbgs() << " But vectorizing was explicitly forced.\n");
else {		else {
DEBUG(dbgs() << "\n");		DEBUG(dbgs() << "\n");
ORE->emit(createMissedAnalysis(Hints.vectorizeAnalysisPassName(),		ORE->emit(createMissedAnalysis(Hints.vectorizeAnalysisPassName(),
"NotBeneficial", L)		"SmallTripCount", L)
<< "vectorization is not beneficial "		<< "not beneficial due to small (" << ore::NV("TripCount", TC)
"and is not explicitly forced");		<< ") trip count, and is not explicitly forced");
		anemetUnsubmitted Not Done Reply Inline Actions While you're here, can you please improve this message to actually mention low-trip count? anemet: While you're here, can you please improve this message to actually mention low-trip count?
		mkuperAuthorUnsubmitted Not Done Reply Inline Actions Sure. mkuper: Sure.
		anemetUnsubmitted Not Done Reply Inline Actions Thanks. anemet: Thanks.
return false;		return false;
}		}
}		}

PredicatedScalarEvolution PSE(SE, L);		PredicatedScalarEvolution PSE(SE, L);

// Check if it is legal to vectorize the loop.		// Check if it is legal to vectorize the loop.
LoopVectorizationRequirements Requirements(*ORE);		LoopVectorizationRequirements Requirements(*ORE);
▲ Show 20 Lines • Show All 265 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/X86/runtime-trip-count.ll

				; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -S -pass-remarks-missed=loop-vectorize 2>&1 \| FileCheck %s

				; CHECK: remark: low_dynamic.c:1:1: loop not vectorized: not beneficial due to small (4) trip count
				anemetUnsubmitted Not Done Reply Inline Actions We usually try to formulate these tests without relying on asserts so that we get coverage with a no-assert build as well. anemet: We usually try to formulate these tests without relying on asserts so that we get coverage with…
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions I realize that, but from a testing perspective, I actually want to verify the reason it didn't get vectorized, not only that it's not vectorized. Do you think it would be better to duplicate the test and have an asserts and a non-asserts version? Do you know if "UNSUPPORTED: asserts" works? (Is there a way to require asserts only for a specific run line, as opposed to the whole test? That would solve the problem.) mkuper: I realize that, but from a testing perspective, I actually want to verify the reason it didn't…
				anemetUnsubmitted Not Done Reply Inline Actions I'd say just use opt remarks then (-pass-remarks-missed=loop-vectorize). In the opt output, you won't have the function name (only the source line but that required debug info). If you want the function name, you could generate the YAML output which has everything including the function name. anemet: I'd say just use opt remarks then (-pass-remarks-missed=loop-vectorize). In the opt output…
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions Ok, I guess opt remarks is a reasonable solution. I don't actually need the function name, since I only care about seeing a remark for the low case. mkuper: Ok, I guess opt remarks is a reasonable solution. I don't actually need the function name…

				target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; CHECK-LABEL: @high_dynamic
				; CHECK: fadd <4 x float>
				define void @high_dynamic(float* nocapture %a, i32 %k) !prof !0 {
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %a, i64 %indvars.iv
				%0 = load float, float* %arrayidx, align 4
				%add = fadd float %0, 1.000000e+00
				store float %add, float* %arrayidx, align 4
				%indvars.iv.next = add i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %k
				br i1 %exitcond, label %for.end, label %for.body, !prof !1

				for.end: ; preds = %for.body
				ret void
				}

				; CHECK-LABEL: @low_dynamic
				; CHECK-NOT: <4 x float>
				define void @low_dynamic(float* nocapture %a, i32 %k) !prof !0 {
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %a, i64 %indvars.iv
				%0 = load float, float* %arrayidx, align 4
				%add = fadd float %0, 1.000000e+00
				store float %add, float* %arrayidx, align 4
				%indvars.iv.next = add i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %k
				br i1 %exitcond, label %for.end, label %for.body, !prof !2, !dbg !10

				for.end: ; preds = %for.body
				ret void
				}


				!llvm.module.flags = !{!3, !4}
				!llvm.dbg.cu = !{!5}

				!0 = !{!"function_entry_count", i64 1}
				!1 = !{!"branch_weights", i32 1001, i32 400001}
				!2 = !{!"branch_weights", i32 1001, i32 4001}
				!3 = !{i32 2, !"Dwarf Version", i32 2}
				!4 = !{i32 2, !"Debug Info Version", i32 3}
				!5 = distinct !DICompileUnit(language: DW_LANG_C99, producer: "clang version 3.6.0", isOptimized: true, emissionKind: LineTablesOnly, file: !6, enums: !7, retainedTypes: !7, globals: !7, imports: !7)
				!6 = !DIFile(filename: "low_dynamic.c", directory: ".")
				!7 = !{}
				!8 = distinct !DISubprogram(name: "low_dynamic", line: 1, isLocal: false, isDefinition: true, virtualIndex: 6, flags: DIFlagPrototyped, isOptimized: true, unit: !5, scopeLine: 1, file: !6, scope: !6, type: !9, variables: !7)
				!9 = !DISubroutineType(types: !7)
				!10 = !DILocation(line: 1, column: 1, scope: !8)