This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorizer] give more advice in remark about failure to vectorize call
ClosedPublic

Authored by spatel on Jan 10 2019, 9:31 AM.

Download Raw Diff

Details

Reviewers

hfinkel
Ayal
efriedma

Commits

rG7d65fe5cd551: [LoopVectorizer] give more advice in remark about failure to vectorize call
rL351010: [LoopVectorizer] give more advice in remark about failure to vectorize call

Summary

Something like this is requested by:
https://bugs.llvm.org/show_bug.cgi?id=40265
...and it seems like a common enough case that we should acknowledge it. Not sure if this crosses the line for wordiness in an optimization remark though.

Diff Detail

Event Timeline

spatel created this revision.Jan 10 2019, 9:31 AM

Herald added a subscriber: mcrosier. · View Herald TranscriptJan 10 2019, 9:31 AM

Can we check whether the function could be vectorized if fast math were enabled, so we only show the advice when it's relevant?

"relaxing the floating-point model" is a little confusing... can we explicitly say "consider turning on fast math" or something like that?

Patch updated:

Try to distinguish a vectorizable libcall from an arbitrary call (I don't see an exact mapping, but "hasOptimizedCodeGen()" looks close).
Add tests to show that we correctly differentiate the 2 cases.

hfinkel added inline comments.Jan 10 2019, 6:05 PM

lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
726	I'd prefer it say "fast-math mode" instead of just "fast-math". It would be nice if we could also point users to -fno-math-errno, as that might fix this problem for them and they might not be able to use -ffast-math for the whole translation unit. Now we already have a problem in the vectorizer because it has a lot of optimization remarks that mention Clang-specific things (flags, pragmas, etc.). The intent of the optimization-remark design was that the frontend callback handler would handle such cases by adding frontend-specific information in the frontend (and not have it embedded here). That didn't happen, and while we should clean this up, in the mean time we might just make the problem incrementally worse and mention flags here too: "try compiling with -fno-math-errno or -ffast-math".

spatel marked an inline comment as done.Jan 11 2019, 6:12 AM

spatel added inline comments.

lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
726	Yes, I was trying to avoid clang-specific language here. And in the motivating bug report, you're exactly right -- we only needed -fno-math-errno to overcome the limitation (that's why I was using the likely too vague "relaxed FP" vocabulary in the previous rev).

Patch updated:

Added an FP-type constraint to the mathlib check (no point suggesting FP flags if it's not an FP call).
Changed remark text to include clang-specific flags (and suggest/hope that users can translate those to their actual front-end options if this isn't a clang-based invocation).

LGTM

This revision is now accepted and ready to land.Jan 11 2019, 8:38 AM

This LGTM too, just adding mtcw wondering if these extra checks for more accurate reporting are worth placing under allowExtraAnalysis(); and/or if TLI->isFunctionVectorizable() shouldn't be the one informing the cause of its failure when returning false.

In D56551#1354518, @Ayal wrote:

This LGTM too, just adding mtcw wondering if these extra checks for more accurate reporting are worth placing under allowExtraAnalysis(); and/or if TLI->isFunctionVectorizable() shouldn't be the one informing the cause of its failure when returning false.

Those are good questions/comments. I'm not too familiar with the code organization here, but I'll add that to the 'TODO' comment for now, so we don't lose it.

Closed by commit rL351010: [LoopVectorizer] give more advice in remark about failure to vectorize call (authored by spatel). · Explain WhyJan 12 2019, 7:32 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorizationLegality.cpp

24 lines

test/

Transforms/

LoopVectorize/

libcall-remark.ll

52 lines

Diff 181266

lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Show First 20 Lines • Show All 708 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {
// * Are debug info intrinsics.		// * Are debug info intrinsics.
// * Have a mapping to an IR intrinsic.		// * Have a mapping to an IR intrinsic.
// * Have a vector version available.		// * Have a vector version available.
auto *CI = dyn_cast<CallInst>(&I);		auto *CI = dyn_cast<CallInst>(&I);
if (CI && !getVectorIntrinsicIDForCall(CI, TLI) &&		if (CI && !getVectorIntrinsicIDForCall(CI, TLI) &&
!isa<DbgInfoIntrinsic>(CI) &&		!isa<DbgInfoIntrinsic>(CI) &&
!(CI->getCalledFunction() && TLI &&		!(CI->getCalledFunction() && TLI &&
TLI->isFunctionVectorizable(CI->getCalledFunction()->getName()))) {		TLI->isFunctionVectorizable(CI->getCalledFunction()->getName()))) {
		// If the call is a recognized math libary call, it is likely that
		// we can vectorize it given loosened floating-point constraints.
		LibFunc Func;
		bool IsMathLibCall =
		TLI && CI->getCalledFunction() &&
		CI->getType()->isFloatingPointTy() &&
		TLI->getLibFunc(CI->getCalledFunction()->getName(), Func) &&
		TLI->hasOptimizedCodeGen(Func);

		if (IsMathLibCall) {
		hfinkelUnsubmitted Not Done Reply Inline Actions I'd prefer it say "fast-math mode" instead of just "fast-math". It would be nice if we could also point users to -fno-math-errno, as that might fix this problem for them and they might not be able to use -ffast-math for the whole translation unit. Now we already have a problem in the vectorizer because it has a lot of optimization remarks that mention Clang-specific things (flags, pragmas, etc.). The intent of the optimization-remark design was that the frontend callback handler would handle such cases by adding frontend-specific information in the frontend (and not have it embedded here). That didn't happen, and while we should clean this up, in the mean time we might just make the problem incrementally worse and mention flags here too: "try compiling with -fno-math-errno or -ffast-math". hfinkel: I'd prefer it say "fast-math mode" instead of just "fast-math". It would be nice if we could…
		spatelAuthorUnsubmitted Done Reply Inline Actions Yes, I was trying to avoid clang-specific language here. And in the motivating bug report, you're exactly right -- we only needed -fno-math-errno to overcome the limitation (that's why I was using the likely too vague "relaxed FP" vocabulary in the previous rev). spatel: Yes, I was trying to avoid clang-specific language here. And in the motivating bug report…
		// TODO: Ideally, we should not use clang-specific language here,
		// but it's hard to provide meaningful yet generic advice.
		ORE->emit(createMissedAnalysis("CantVectorizeLibcall", CI)
		<< "library call cannot be vectorized. "
		"Try compiling with -fno-math-errno, -ffast-math, "
		"or similar flags");
		} else {
ORE->emit(createMissedAnalysis("CantVectorizeCall", CI)		ORE->emit(createMissedAnalysis("CantVectorizeCall", CI)
<< "call instruction cannot be vectorized");		<< "call instruction cannot be vectorized");
		}
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LV: Found a non-intrinsic, non-libfunc callsite.\n");		dbgs() << "LV: Found a non-intrinsic callsite.\n");
return false;		return false;
}		}

// Intrinsics such as powi,cttz and ctlz are legal to vectorize if the		// Intrinsics such as powi,cttz and ctlz are legal to vectorize if the
// second argument is the same (i.e. loop invariant)		// second argument is the same (i.e. loop invariant)
if (CI && hasVectorInstrinsicScalarOpd(		if (CI && hasVectorInstrinsicScalarOpd(
getVectorIntrinsicIDForCall(CI, TLI), 1)) {		getVectorIntrinsicIDForCall(CI, TLI), 1)) {
auto *SE = PSE.getSE();		auto *SE = PSE.getSE();
▲ Show 20 Lines • Show All 464 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/libcall-remark.ll

				; RUN: opt -S -loop-vectorize < %s 2>&1 -pass-remarks-analysis=.* \| FileCheck %s

				; Test the optimization remark emitter for recognition
				; of a mathlib function vs. an arbitrary function.

				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.14.0"
				@data = external local_unnamed_addr global [32768 x float], align 16

				; CHECK: loop not vectorized: library call cannot be vectorized

				define void @libcall_blocks_vectorization() {
				entry:
				br label %for.body

				for.cond.cleanup:
				ret void

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds [32768 x float], [32768 x float]* @data, i64 0, i64 %indvars.iv
				%t0 = load float, float* %arrayidx, align 4
				%sqrtf = tail call float @sqrtf(float %t0)
				store float %sqrtf, float* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 32768
				br i1 %exitcond, label %for.cond.cleanup, label %for.body
				}

				; CHECK: loop not vectorized: call instruction cannot be vectorized

				define void @arbitrary_call_blocks_vectorization() {
				entry:
				br label %for.body

				for.cond.cleanup:
				ret void

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds [32768 x float], [32768 x float]* @data, i64 0, i64 %indvars.iv
				%t0 = load float, float* %arrayidx, align 4
				%sqrtf = tail call float @arbitrary(float %t0)
				store float %sqrtf, float* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 32768
				br i1 %exitcond, label %for.cond.cleanup, label %for.body
				}

				declare float @sqrtf(float)
				declare float @arbitrary(float)