This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1/2
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/WebAssembly/
-
Transforms/
-
SLPVectorizer/
-
WebAssembly/
-
no-vectorize-rotate.ll

Differential D85759

[SLPVectorizer] Fix regression from cost model refactoring
AbandonedPublic

Authored by tlively on Aug 11 2020, 10:49 AM.

Download Raw Diff

Details

Reviewers

samparker
dfukalov
aheejin
dschuff
ABataev

Summary

8cc911fa5b06 refactored the getIntrinsicInstrCost function and was
meant to be a nonfunctional change, but it accidentally changed how
costs were calculated in the SLP vectorizer, which regressed
WebAssembly codegen and resulted in a downstream bug report at
https://github.com/emscripten-core/emscripten/issues/11449. This patch
is meant to restore the original behavior.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	30 ms	linux > Flang.Preprocessing::compiler_defined_macros.F90

Event Timeline

tlively created this revision.Aug 11 2020, 10:49 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 11 2020, 10:49 AM

Herald added subscribers: llvm-commits, sunfish, hiraditya and 2 others. · View Herald Transcript

tlively requested review of this revision.Aug 11 2020, 10:49 AM

Pre-commit the test case and rebase to show the diff?

This regression is in release branch too, right? so worth to backport it?

@hans

@RKSimon Sure, I can do that.

@xbolva00 Yes, I think it would be good to backport this. I will handle attaching a bug to the release meta-bug when this lands.

tlively mentioned this in rG94791970de10: [SLPVectorizer] Pre-commit a test for D85759.Aug 11 2020, 11:30 AM

rebase on pre-committed test

tlively mentioned this in rG52b71aa8b1a0: Revert "[SLPVectorizer] Pre-commit a test for D85759".Aug 11 2020, 12:11 PM

Huh, the precommitted test failed on multiple bots even though it passes for me locally. I've reverted the commit that added the test for now while I investigate.

In D85759#2211128, @tlively wrote:

Huh, the precommitted test failed on multiple bots even though it passes for me locally. I've reverted the commit that added the test for now while I investigate.

I think you have to add a REQUIRES for the WebAssembly target, otherwise bots that do not build that target will fail with the wasm32 triple.

tlively mentioned this in rGf969734c21e8: Reland "[SLPVectorizer] Pre-commit a test for D85759".Aug 11 2020, 12:18 PM

In D85759#2211128, @tlively wrote:

Huh, the precommitted test failed on multiple bots even though it passes for me locally. I've reverted the commit that added the test for now while I investigate.

I had forgotten to include a lit.local.cfg file disabling the test when WebAssembly is not present. The test has been relanded.

In D85759#2211133, @fhahn wrote:

In D85759#2211128, @tlively wrote:

Huh, the precommitted test failed on multiple bots even though it passes for me locally. I've reverted the commit that added the test for now while I investigate.

I think you have to add a REQUIRES for the WebAssembly target, otherwise bots that do not build that target will fail with the wasm32 triple.

Yes, thanks for the catch!

Harbormaster completed remote builds in B67961: Diff 284817.Aug 11 2020, 12:58 PM

Harbormaster completed remote builds in B67972: Diff 284834.Aug 11 2020, 1:38 PM

I think D79941 is supposed to work; what it did is move those logics into constructors of IntrinsicCostAttributes. For example, this constructor saves FMF and argument types, which are supposed to be used later in the same way. I think it'd be better to figure that out and fix it than restore the old code? Maybe the original author @samparker has some ideas on why :)

There were at least two main paths, through different APIs, and then the logic diverged depending on whether arguments, or just their types were passed - and I have no good idea why. But looking at my original patch again, I also don't know why I wouldn't have just done what is proposed here... I made almost the same change in getVectorCallCosts too, so it might be worth looking there too. I know what @sanwou01 saw an important uplift from this change though, so it would be nice if we could maintain that (not sure if he committed a test) and fix this regression.

In D85759#2212136, @aheejin wrote:

I think D79941 is supposed to work; what it did is move those logics into constructors of IntrinsicCostAttributes. For example, this constructor saves FMF and argument types, which are supposed to be used later in the same way. I think it'd be better to figure that out and fix it than restore the old code? Maybe the original author @samparker has some ideas on why :)

Before the original change and after this patch, the intrinsic cost calculation here uses the type-based path. Using the constructor you pointed out is what the regressed code did, and it changed the behavior to use the argument-based path because that constructor also records the argument values. There is no constructor available that encapsulates this exact logic but doesn't also record the argument values.

In D85759#2212471, @samparker wrote:

I know what @sanwou01 saw an important uplift from this change though, so it would be nice if we could maintain that (not sure if he committed a test) and fix this regression.

I'm not sure we should be trying to preserve an improvement that the previous change introduced, given that it was meant to be a non-functional change. There doesn't seem to be a test for it, anyway. On the flip side, I could also probably fix the WebAssembly regression by adding logic to the WebAssembly TTI if folks think that this fix is a code maintenance regression and that it's ok that 8cc911fa5b06 was a functional change after all.

aheejin added inline comments.Aug 13 2020, 3:43 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3596–3605	Why are the type based path and the arg based path different? Are they supposed to, or it is a bug? But for this patch, maybe we can change this single line and get done?

tlively added inline comments.Aug 13 2020, 4:45 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3596–3605	I don't know why they are different, although there is a TODO to get rid of the difference. Unfortunately this fix doesn't work because `getTypeBasedIntrinsicInstrCost` is not in the TargetTransformInfo interface. I think the motivation for the previous refactoring was to remove it from the public interface.

I took a little more look at this and all of this cost model is very confusing. :( First I don't understand why the lists of intrinsics listed in switch-cases in getIntrinsicInstrCost and getTypeBasedIntrinsicInstrCost are different. fshr is listed only in getIntrinsicInstrCost but not in getTypeBasedIntrinsicInstrCost. It uses some more info than just types (like if some value is power of 2 or something), but even if we don't assume anything, it returns a far greater number (in this case 5) because it computes and accumulates cost for each of its argument while getTypeBasedIntrinsicInstrCost does not have a case handling for fshr so it just assumes it is cheap and returns 1.

Not sure why the code has evolved like this and I'm sure one of the paths here is wrong. At a glance getIntrinsicInstrCost looks more correct, because it has at least a specialized handling routine for fshr, while getTypeBasedIntrinsicInstrCost blindly assumes it is cheap and returns 1. But @tlively says the previous version, which is using getTypeBasedIntrinsicInstrCost and returning 1, was correct. Can you elaborate why this has to be 1?

The routines here look like written by many people throughout a long period and not very consistent, so not sure what the correct action here.

I took a little more look at this and all of this cost model is very confusing.

It is... people have lazily added stuff to TTI so I spent a few months trying to clean it up and the intrinsics was by far the hardest, and it's still a mess! I don't know why there was ever distinct paths for types and/or arguments, I think it may have grown from scalar and vector code paths... But the fact that it's still a mess is probably a different discussion from what this patch is trying to achieve though. This change looks good to me, purely because it looks like I just really messed up and this fixes it...

Yes, I only made this patch because the previous one was intended to be NFC. However, I agree (based on my cursory understanding of the relevant code) that it would be better to have just one code path, but this patch enforces an additional use of the second code path. It would make further cleanup of this code easier if I left this as-is and just fixed the issue in the WebAssembly backend, so I think I will abandon this revision and do that instead. @samparker and @aheejin, does that sound reasonable to you?

Yeah, that sounds better, if that does not involve too much effort.

In D85759#2210894, @xbolva00 wrote:

This regression is in release branch too, right? so worth to backport it?

@hans

It needs to land first :-) What's the status here?

The status is that we decided to leave the behavior as-is despite the WebAssembly regression. I will (eventually) do a WebAssembly-specific fix, but there will be no need to backport that. No further action should be required here.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

10 lines

test/

Transforms/

SLPVectorizer/

WebAssembly/

no-vectorize-rotate.ll

41 lines

Diff 284817

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,587 Lines • ▼ Show 20 Lines case Instruction::Store: {

} }

return ReuseShuffleCost + VecStCost - ScalarStCost; return ReuseShuffleCost + VecStCost - ScalarStCost;

} }

case Instruction::Call: { case Instruction::Call: {

CallInst *CI = cast<CallInst>(VL0); CallInst *CI = cast<CallInst>(VL0);

Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI); Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);

// Calculate the cost of the scalar and vector calls. // Calculate the cost of the scalar and vector calls.

IntrinsicCostAttributes CostAttrs(ID, *CI, 1, 1); SmallVector<Type *, 4> ScalarTys;

for (unsigned op = 0, opc = CI->getNumArgOperands(); op != opc; ++op)

Lint: Pre-merge checks

clang-tidy: warning: invalid case style for variable 'op' [readability-identifier-naming]
not useful
clang-tidy: warning: invalid case style for variable 'opc' [readability-identifier-naming]
not useful

Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'op' [readability-identifier-naming]…

ScalarTys.push_back(CI->getArgOperand(op)->getType());

FastMathFlags FMF;

if (auto *FPMO = dyn_cast<FPMathOperator>(CI))

FMF = FPMO->getFastMathFlags();

IntrinsicCostAttributes CostAttrs(ID, ScalarTy, ScalarTys, FMF, 1);

int ScalarEltCost = TTI->getIntrinsicInstrCost(CostAttrs, CostKind); int ScalarEltCost = TTI->getIntrinsicInstrCost(CostAttrs, CostKind);

aheejinUnsubmitted

Not Done

// Calculate the cost of the scalar and vector calls.

- SmallVector<Type *, 4> ScalarTys;

- for (unsigned op = 0, opc = CI->getNumArgOperands(); op != opc; ++op)

- ScalarTys.push_back(CI->getArgOperand(op)->getType());

- FastMathFlags FMF;

- if (auto *FPMO = dyn_cast<FPMathOperator>(CI))

- FMF = FPMO->getFastMathFlags();

- IntrinsicCostAttributes CostAttrs(ID, ScalarTy, ScalarTys, FMF, 1);

- int ScalarEltCost = TTI->getIntrinsicInstrCost(CostAttrs, CostKind);

+ int ScalarEltCost = TTI->getTypeBasedIntrinsicInstrCost(CostAttrs, CostKind);

if (NeedToShuffleReuses) {

Why are the type based path and the arg based path different? Are they supposed to, or it is a bug? But for this patch, maybe we can change this single line and get done?

aheejin: Why are the type based path and the arg based path different? Are they supposed to, or it is a…

tlivelyAuthorUnsubmitted

Done

I don't know why they are different, although there is a TODO to get rid of the difference. Unfortunately this fix doesn't work because getTypeBasedIntrinsicInstrCost is not in the TargetTransformInfo interface. I think the motivation for the previous refactoring was to remove it from the public interface.

tlively: I don't know why they are different, although there is a TODO to get rid of the difference.

if (NeedToShuffleReuses) { if (NeedToShuffleReuses) {

ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost; ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;

} }

int ScalarCallCost = VecTy->getNumElements() * ScalarEltCost; int ScalarCallCost = VecTy->getNumElements() * ScalarEltCost;

auto VecCallCosts = getVectorCallCosts(CI, VecTy, TTI, TLI); auto VecCallCosts = getVectorCallCosts(CI, VecTy, TTI, TLI);

int VecCallCost = std::min(VecCallCosts.first, VecCallCosts.second); int VecCallCost = std::min(VecCallCosts.first, VecCallCosts.second);

▲ Show 20 Lines • Show All 4,114 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/WebAssembly/no-vectorize-rotate.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -slp-vectorizer -instcombine -S \| FileCheck %s

				; Regression test for a bug in the SLP vectorizer that was causing
				; these rotates to be incorrectly combined into a vector rotate.

				target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
				target triple = "wasm32-unknown-unknown"

				define void @foo(<2 x i64> %x, <4 x i32> %y, i64* %out) #0 {
				; CHECK-LABEL: @foo(
				; CHECK-NEXT: [[A:%.]] = extractelement <2 x i64> [[X:%.]], i32 0
				; CHECK-NEXT: [[B:%.]] = extractelement <4 x i32> [[Y:%.]], i32 2
				; CHECK-NEXT: [[CONV6:%.*]] = zext i32 [[B]] to i64
				; CHECK-NEXT: [[C:%.*]] = tail call i64 @llvm.fshl.i64(i64 [[A]], i64 [[A]], i64 [[CONV6]])
				; CHECK-NEXT: store i64 [[C]], i64* [[OUT:%.*]], align 8
				; CHECK-NEXT: [[D:%.*]] = extractelement <2 x i64> [[X]], i32 1
				; CHECK-NEXT: [[E:%.*]] = extractelement <4 x i32> [[Y]], i32 3
				; CHECK-NEXT: [[CONV17:%.*]] = zext i32 [[E]] to i64
				; CHECK-NEXT: [[F:%.*]] = tail call i64 @llvm.fshl.i64(i64 [[D]], i64 [[D]], i64 [[CONV17]])
				; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i64, i64 [[OUT]], i32 1
				; CHECK-NEXT: store i64 [[F]], i64* [[ARRAYIDX2]], align 8
				; CHECK-NEXT: ret void
				;
				%a = extractelement <2 x i64> %x, i32 0
				%b = extractelement <4 x i32> %y, i32 2
				%conv6 = zext i32 %b to i64
				%c = tail call i64 @llvm.fshl.i64(i64 %a, i64 %a, i64 %conv6)
				store i64 %c, i64* %out
				%d = extractelement <2 x i64> %x, i32 1
				%e = extractelement <4 x i32> %y, i32 3
				%conv17 = zext i32 %e to i64
				%f = tail call i64 @llvm.fshl.i64(i64 %d, i64 %d, i64 %conv17)
				%arrayidx2 = getelementptr inbounds i64, i64* %out, i32 1
				store i64 %f, i64* %arrayidx2
				ret void
				}

				declare i64 @llvm.fshl.i64(i64, i64, i64)

				attributes #0 = {"target-cpu"="generic" "target-features"="+simd128"}