This is an archive of the discontinued LLVM Phabricator instance.

[SLPVectorizer] Fix regression from cost model refactoring
AbandonedPublic

Authored by tlively on Aug 11 2020, 10:49 AM.

Details

Summary

8cc911fa5b06 refactored the getIntrinsicInstrCost function and was
meant to be a nonfunctional change, but it accidentally changed how
costs were calculated in the SLP vectorizer, which regressed
WebAssembly codegen and resulted in a downstream bug report at
https://github.com/emscripten-core/emscripten/issues/11449. This patch
is meant to restore the original behavior.

Diff Detail

Event Timeline

tlively created this revision.Aug 11 2020, 10:49 AM
tlively requested review of this revision.Aug 11 2020, 10:49 AM
RKSimon added a subscriber: RKSimon.

Pre-commit the test case and rebase to show the diff?

xbolva00 added a subscriber: hans.Aug 11 2020, 11:01 AM
xbolva00 added a subscriber: xbolva00.

This regression is in release branch too, right? so worth to backport it?

@hans

@RKSimon Sure, I can do that.

@xbolva00 Yes, I think it would be good to backport this. I will handle attaching a bug to the release meta-bug when this lands.

tlively updated this revision to Diff 284834.Aug 11 2020, 11:37 AM
  • rebase on pre-committed test

Huh, the precommitted test failed on multiple bots even though it passes for me locally. I've reverted the commit that added the test for now while I investigate.

fhahn added a subscriber: fhahn.Aug 11 2020, 12:16 PM

Huh, the precommitted test failed on multiple bots even though it passes for me locally. I've reverted the commit that added the test for now while I investigate.

I think you have to add a REQUIRES for the WebAssembly target, otherwise bots that do not build that target will fail with the wasm32 triple.

Huh, the precommitted test failed on multiple bots even though it passes for me locally. I've reverted the commit that added the test for now while I investigate.

I had forgotten to include a lit.local.cfg file disabling the test when WebAssembly is not present. The test has been relanded.

Huh, the precommitted test failed on multiple bots even though it passes for me locally. I've reverted the commit that added the test for now while I investigate.

I think you have to add a REQUIRES for the WebAssembly target, otherwise bots that do not build that target will fail with the wasm32 triple.

Yes, thanks for the catch!

aheejin added a comment.EditedAug 11 2020, 7:27 PM

I think D79941 is supposed to work; what it did is move those logics into constructors of IntrinsicCostAttributes. For example, this constructor saves FMF and argument types, which are supposed to be used later in the same way. I think it'd be better to figure that out and fix it than restore the old code? Maybe the original author @samparker has some ideas on why :)

There were at least two main paths, through different APIs, and then the logic diverged depending on whether arguments, or just their types were passed - and I have no good idea why. But looking at my original patch again, I also don't know why I wouldn't have just done what is proposed here... I made almost the same change in getVectorCallCosts too, so it might be worth looking there too. I know what @sanwou01 saw an important uplift from this change though, so it would be nice if we could maintain that (not sure if he committed a test) and fix this regression.

I think D79941 is supposed to work; what it did is move those logics into constructors of IntrinsicCostAttributes. For example, this constructor saves FMF and argument types, which are supposed to be used later in the same way. I think it'd be better to figure that out and fix it than restore the old code? Maybe the original author @samparker has some ideas on why :)

Before the original change and after this patch, the intrinsic cost calculation here uses the type-based path. Using the constructor you pointed out is what the regressed code did, and it changed the behavior to use the argument-based path because that constructor also records the argument values. There is no constructor available that encapsulates this exact logic but doesn't also record the argument values.

I know what @sanwou01 saw an important uplift from this change though, so it would be nice if we could maintain that (not sure if he committed a test) and fix this regression.

I'm not sure we should be trying to preserve an improvement that the previous change introduced, given that it was meant to be a non-functional change. There doesn't seem to be a test for it, anyway. On the flip side, I could also probably fix the WebAssembly regression by adding logic to the WebAssembly TTI if folks think that this fix is a code maintenance regression and that it's ok that 8cc911fa5b06 was a functional change after all.

aheejin added inline comments.Aug 13 2020, 3:43 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3596–3605

Why are the type based path and the arg based path different? Are they supposed to, or it is a bug? But for this patch, maybe we can change this single line and get done?

tlively added inline comments.Aug 13 2020, 4:45 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3596–3605

I don't know why they are different, although there is a TODO to get rid of the difference. Unfortunately this fix doesn't work because getTypeBasedIntrinsicInstrCost is not in the TargetTransformInfo interface. I think the motivation for the previous refactoring was to remove it from the public interface.

aheejin added a comment.EditedAug 13 2020, 11:38 PM

I took a little more look at this and all of this cost model is very confusing. :( First I don't understand why the lists of intrinsics listed in switch-cases in getIntrinsicInstrCost and getTypeBasedIntrinsicInstrCost are different. fshr is listed only in getIntrinsicInstrCost but not in getTypeBasedIntrinsicInstrCost. It uses some more info than just types (like if some value is power of 2 or something), but even if we don't assume anything, it returns a far greater number (in this case 5) because it computes and accumulates cost for each of its argument while getTypeBasedIntrinsicInstrCost does not have a case handling for fshr so it just assumes it is cheap and returns 1.

Not sure why the code has evolved like this and I'm sure one of the paths here is wrong. At a glance getIntrinsicInstrCost looks more correct, because it has at least a specialized handling routine for fshr, while getTypeBasedIntrinsicInstrCost blindly assumes it is cheap and returns 1. But @tlively says the previous version, which is using getTypeBasedIntrinsicInstrCost and returning 1, was correct. Can you elaborate why this has to be 1?

The routines here look like written by many people throughout a long period and not very consistent, so not sure what the correct action here.

I took a little more look at this and all of this cost model is very confusing.

It is... people have lazily added stuff to TTI so I spent a few months trying to clean it up and the intrinsics was by far the hardest, and it's still a mess! I don't know why there was ever distinct paths for types and/or arguments, I think it may have grown from scalar and vector code paths... But the fact that it's still a mess is probably a different discussion from what this patch is trying to achieve though. This change looks good to me, purely because it looks like I just really messed up and this fixes it...

Yes, I only made this patch because the previous one was intended to be NFC. However, I agree (based on my cursory understanding of the relevant code) that it would be better to have just one code path, but this patch enforces an additional use of the second code path. It would make further cleanup of this code easier if I left this as-is and just fixed the issue in the WebAssembly backend, so I think I will abandon this revision and do that instead. @samparker and @aheejin, does that sound reasonable to you?

Yeah, that sounds better, if that does not involve too much effort.

hans added a comment.Aug 25 2020, 10:51 AM

This regression is in release branch too, right? so worth to backport it?

@hans

It needs to land first :-) What's the status here?

tlively abandoned this revision.Aug 25 2020, 10:56 AM

The status is that we decided to leave the behavior as-is despite the WebAssembly regression. I will (eventually) do a WebAssembly-specific fix, but there will be no need to backport that. No further action should be required here.