I found that currently we incorrectly assume intrinsic as a function call and it prevents us from the opportunity to vectorize. On Aarch64 Cortex-A53 we think that llvm.fmuladd.f64 is a function call which is wrong, but we could ask the backend about the cost of such an intrinsic to determine on how it ends instruction(s)/function call. This is one of the reasons why LNT's matmul_f64_4x4.c is not vectorized on AArch64.
Details
Diff Detail
Event Timeline
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
7570–7591 | Can you tranform all this code to lambda? | |
7571–7572 | if (auto *II = dyn_cast<IntrinsicInst>(&*PrevInstIt)) | |
7574–7576 | Sink this code to the else substatement. | |
7578–7580 | Braces | |
7586–7591 | Do we have a btter way to check if the intrinsic is lowered as an instruction rather than the call? NoCallIntrinsic = IntrCost < CallCost; | |
llvm/test/Transforms/SLPVectorizer/AArch64/fmulladd.ll | ||
1 | Need to precommit the test | |
3–4 | I assume this can be removed | |
9 | cleanup attrs, I assume they are not required. | |
75 | Remove attribute reference |
Rebased, Addressed remarks.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
7586–7591 | No, it looks to me the only way. We could duplicate the code from getTypeBasedIntrinsicInstrCost() and ask about legality of an operation from a target, but it would be too much to duplicate. Also, we could hardcode minimal call cost, for example 10, but it would look incorrect to me too. |
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
7576 | Do you still need this cast to get the list of arguments? |
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
7576 | yes, in order to get correctly prepared ICA. Otherwise sometime we could get incorrect cost or compile time error if we provide empty set of arguments. |
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
7576 | II is already an IntrinsicInst, which has CallInst as the base class. I mean, you don't need to do another cast here, you can use II instead of CI |
LG with a nit
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
7593 | Remove extra parens around isa call. |