8cc911fa5b06 refactored the getIntrinsicInstrCost function and was
meant to be a nonfunctional change, but it accidentally changed how
costs were calculated in the SLP vectorizer, which regressed
WebAssembly codegen and resulted in a downstream bug report at
https://github.com/emscripten-core/emscripten/issues/11449. This patch
is meant to restore the original behavior.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Huh, the precommitted test failed on multiple bots even though it passes for me locally. I've reverted the commit that added the test for now while I investigate.
I think you have to add a REQUIRES for the WebAssembly target, otherwise bots that do not build that target will fail with the wasm32 triple.
I had forgotten to include a lit.local.cfg file disabling the test when WebAssembly is not present. The test has been relanded.
I think D79941 is supposed to work; what it did is move those logics into constructors of IntrinsicCostAttributes. For example, this constructor saves FMF and argument types, which are supposed to be used later in the same way. I think it'd be better to figure that out and fix it than restore the old code? Maybe the original author @samparker has some ideas on why :)
There were at least two main paths, through different APIs, and then the logic diverged depending on whether arguments, or just their types were passed - and I have no good idea why. But looking at my original patch again, I also don't know why I wouldn't have just done what is proposed here... I made almost the same change in getVectorCallCosts too, so it might be worth looking there too. I know what @sanwou01 saw an important uplift from this change though, so it would be nice if we could maintain that (not sure if he committed a test) and fix this regression.
Before the original change and after this patch, the intrinsic cost calculation here uses the type-based path. Using the constructor you pointed out is what the regressed code did, and it changed the behavior to use the argument-based path because that constructor also records the argument values. There is no constructor available that encapsulates this exact logic but doesn't also record the argument values.
I'm not sure we should be trying to preserve an improvement that the previous change introduced, given that it was meant to be a non-functional change. There doesn't seem to be a test for it, anyway. On the flip side, I could also probably fix the WebAssembly regression by adding logic to the WebAssembly TTI if folks think that this fix is a code maintenance regression and that it's ok that 8cc911fa5b06 was a functional change after all.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
3596–3605 | Why are the type based path and the arg based path different? Are they supposed to, or it is a bug? But for this patch, maybe we can change this single line and get done? |
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
3596–3605 | I don't know why they are different, although there is a TODO to get rid of the difference. Unfortunately this fix doesn't work because getTypeBasedIntrinsicInstrCost is not in the TargetTransformInfo interface. I think the motivation for the previous refactoring was to remove it from the public interface. |
I took a little more look at this and all of this cost model is very confusing. :( First I don't understand why the lists of intrinsics listed in switch-cases in getIntrinsicInstrCost and getTypeBasedIntrinsicInstrCost are different. fshr is listed only in getIntrinsicInstrCost but not in getTypeBasedIntrinsicInstrCost. It uses some more info than just types (like if some value is power of 2 or something), but even if we don't assume anything, it returns a far greater number (in this case 5) because it computes and accumulates cost for each of its argument while getTypeBasedIntrinsicInstrCost does not have a case handling for fshr so it just assumes it is cheap and returns 1.
Not sure why the code has evolved like this and I'm sure one of the paths here is wrong. At a glance getIntrinsicInstrCost looks more correct, because it has at least a specialized handling routine for fshr, while getTypeBasedIntrinsicInstrCost blindly assumes it is cheap and returns 1. But @tlively says the previous version, which is using getTypeBasedIntrinsicInstrCost and returning 1, was correct. Can you elaborate why this has to be 1?
The routines here look like written by many people throughout a long period and not very consistent, so not sure what the correct action here.
I took a little more look at this and all of this cost model is very confusing.
It is... people have lazily added stuff to TTI so I spent a few months trying to clean it up and the intrinsics was by far the hardest, and it's still a mess! I don't know why there was ever distinct paths for types and/or arguments, I think it may have grown from scalar and vector code paths... But the fact that it's still a mess is probably a different discussion from what this patch is trying to achieve though. This change looks good to me, purely because it looks like I just really messed up and this fixes it...
Yes, I only made this patch because the previous one was intended to be NFC. However, I agree (based on my cursory understanding of the relevant code) that it would be better to have just one code path, but this patch enforces an additional use of the second code path. It would make further cleanup of this code easier if I left this as-is and just fixed the issue in the WebAssembly backend, so I think I will abandon this revision and do that instead. @samparker and @aheejin, does that sound reasonable to you?
The status is that we decided to leave the behavior as-is despite the WebAssembly regression. I will (eventually) do a WebAssembly-specific fix, but there will be no need to backport that. No further action should be required here.
clang-tidy: warning: invalid case style for variable 'op' [readability-identifier-naming]
not useful
clang-tidy: warning: invalid case style for variable 'opc' [readability-identifier-naming]
not useful