Some code optimisations for lowering i64 vector types for the min/max instruction in ISel.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
7159 | I think you can move this if block to the start of the function, i.e. unsigned Opcode = Op.getOpcode(); if (...) { switch (Opcode) { ... } } | |
7160 | I think you might be able to just do this: if (useSVEForFixedLengthVectorVT(VT, /*OverrideNEON=*/true)) { instead? | |
llvm/test/Analysis/CostModel/AArch64/min-max.ll | ||
-87 | nit: It might be easier to review the patch if you first introduced a NFC change to update the tests using utils/update_analyze_test_checks.py? | |
llvm/test/CodeGen/AArch64/vecreduce-umax-legalization.ll | ||
90–94 | Hi @Rin, at first glance this code looks like it might be worse than before? I realise the instruction count is the same, but I wonder if the cost of 'ext' might be higher? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
4833–4834 | These can use fallthroughs to the next case, if they all call LowerMinMax in the same way. | |
7159 | Is this VT check needed? | |
llvm/test/Analysis/CostModel/AArch64/min-max.ll | ||
-87 | I was updating this file the other day. I think if you rebase the changes may disappear. |
llvm/test/CodeGen/AArch64/vecreduce-umax-legalization.ll | ||
---|---|---|
90–94 | Why would the cost of an ext be higher? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
7160 | Hi @Rin, apologies please ignore my previous comment about changing the if statement. I see now we also want scalable vector types to fall into the block too! | |
llvm/test/CodeGen/AArch64/vecreduce-umax-legalization.ll | ||
90–94 | @dmgreen it might not be! I just wondered if ext was worse than the mov that's all. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
7159 |
Nope, I'll get rid of it |
Address review comments
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
7160 | No worries :) |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1043 | Can these go into a loop, looping over the UMAX/UMIN/... | |
llvm/test/Analysis/CostModel/AArch64/min-max.ll | ||
-87 | It turns out I had not updated that test yet.. it should now have the changes I mentioned, if you rebase the patch. Can you make sure the instructions have a proper cost now too? They should be 2 I think. There is already some code in AArch64TTIImpl::getIntrinsicInstrCost for it. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
7180 | If I'm understanding correctly, if we mark both ISD::SMAX and ISD::VSELECT as "Expand", vector legalization decides to unroll it. So we mark ISD::SMAX as "Custom", then explicitly lower to a VSELECT to get the code we want. This seems kind of silly, given VSELECT is equivalent to AArch64ISD::BSP. For the sake of making changes to target-independent code easier, maybe we should consider marking ISD::VSELECT "custom"? Or add a target hook to indicate whether the operation is cheap? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
7180 | Yes, we didn't mention that anywhere here. There are three ways we thought of fixing this - either make vselect custom/legal, adding a target hook for the expand code or custom lowering the type. Because it was only a single type, custom lowering seemed like the simplest route forward. The target hook felt messy for one type on one operation, and custom lowering vselect could be a much larger change - one that would easily cause regressions if a lot of other transforms were not added. I guess this turned into more code than I expected, with it being mixed up with SVE lowering. Perhaps custom lowering vselect would be better in the long run if someone was to optimize the BSP's in all the cases it would need, but this patch seems fine to me for the problem it is solving. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
7180 | Okay. I'm not really concerned about this patch on its own, just thinking about what we might want to do in the future. |
Can these go into a loop, looping over the UMAX/UMIN/...