The generic cost of logical or/and reductions should be cost of bitcast
<ReduxWidth x i1> to iReduxWidth + cmp eq|ne iReduxWidth.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/Analysis/CostModel/SystemZ/reduce-and.ll | ||
---|---|---|
13 | The cost model for SystemZ is not complete/correct, needs to be fixed, that's why there is a regression for it. |
llvm/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
1909 | SystemZ cost model does not implement vector-to-int bitcast and crashes. That's why have to use Base::getCastInstrCost() here rather than thisT()->getCastInstrCost() |
llvm/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
1902 | I'm not sure this is always true because some backends (e.g. AArch64) promote i1 to larger integers. The costs for AArch64 still look a bit odd to be honest. I tried them out manually and I observe about 8 instructions for AND reductions using <4 x i1> vectors since we have lots of bytewise moves of -1 into the vector lanes of a <4 x i32> vector. |
llvm/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
1902 | This is known problem, see Looks like the construct is not lowered properly on some targets |
llvm/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
1902 | Sure, I totally agree the codegen for ARM and AArch64 is awful and I take your point. I was just wondering if this assumption was a problem: %val = bitcast <ReduxWidth x i1> to iReduxWidth as I don't think is true for targets that promote i1 to i32 or something like that. In the bug shown above (https://bugs.llvm.org/show_bug.cgi?id=41636) even the optimal code is still operating on vectors of i8 types. I guess for those targets that do promote i1->iX they can come up with their own cost in the target specific getArithmeticReductionCost so maybe this isn't really a problem? |
llvm/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
1902 | Yes, this is the idea. This patch provides just the basic cost estimation for this particular case, in case if the target cost is different it should define its own cost for this case. |
llvm/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
1909 | How much of a task would it be to tweak the SystemZ TTI to avoid this? I worry that this kind of thing gets forgotten about and could cause other problems in the future. |
LGTM - cheers.
This is known problem, see
https://bugs.llvm.org/show_bug.cgi?id=41636
https://bugs.llvm.org/show_bug.cgi?id=41635
https://bugs.llvm.org/show_bug.cgi?id=41634
Please can you ping those bugs mentioning this default cost change?
I'm not sure this is always true because some backends (e.g. AArch64) promote i1 to larger integers. The costs for AArch64 still look a bit odd to be honest. I tried them out manually and I observe about 8 instructions for AND reductions using <4 x i1> vectors since we have lots of bytewise moves of -1 into the vector lanes of a <4 x i32> vector.