While get_active_lane_mask lowering it uses WHILELO instruction, but for constant range suitable for PTRUE then we could issue PTRUE instruction instead.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| llvm/test/CodeGen/AArch64/active_lane_mask.ll | ||
|---|---|---|
| 486 | Please can you add a test which shows what happens when the %n parameter of active.lane.mask goes out of the range that can be represented as a ptrue immediate. I expect this should fall back to the old codegen? | |
This transformation is more subtle than you expect. See https://developer.arm.com/documentation/ddi0602/2022-09/SVE-Instructions/PTRUE--predicate---Initialise-predicate-from-named-constraint-, specifically the part that reads If the constraint specifies more elements than are available at the current vector length then all elements of the destination predicate are set to false..
| llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
|---|---|---|
| 17640 | Notwithstanding Paul's comment, I think something like this also needs to account for the mapping between PredPattern constants and the integer %n constant. Note that 1-8 have a one-to-one correspondance and above that they don't. | |
| llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
|---|---|---|
| 17639–17640 | Unless I'm missing something, this condition doesn't look right. I suspect this combine could happen before legalization in which case you could have vscale x [big] x something, and the [big] you're testing against here could exceed the runtime vector length -- and then you're hitting the problem Paul pointed out that ptrue would return an empty predicate. I take to account for this properly you're going to need to query the minimum vscale via SubTarget->getMinSVEVectorSizeInBits(). | |
| llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
|---|---|---|
| 4755 | unsigned | |
| 4755–4757 | std::max(Subtarget->getMinSVEVectorSizeInBits(), 128u) | |
| 4759 | NumActiveElems? | |
| 4759–4760 | Op.getConstantOperandVal(2) - Op.getConstantOperandVal(1); | |
| 4762 | this should be <= | |
| 4763 | this should be the encoded pattern returned by getSVEPredPatternFromNumElements, see elsewhere in ISelLowering for examples. | |
| llvm/test/CodeGen/AArch64/active_lane_mask.ll | ||
| 478 | these tests aren't sufficient, it would be good to have tests for
| |
| 495 | active.lane.mask is an unsigned less-than comparison so this generates a predicate with 4 active lanes, which for a minimum vector width of 128 and esize of 32 can be represented with ptrue p0.s, vl4? | |
| llvm/test/CodeGen/AArch64/active_lane_mask.ll | ||
|---|---|---|
| 495 | correct. | |
| llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
|---|---|---|
| 4758–4759 | This doesn't look correct because while can take 64-bit operation. This code can truncate the result in such a way that a ptrue might incorrectly look like a viable replacement. | |
| 4763 | It likely doesn't really matter but given getSVEPredPatternFromNumElements takes an unsigned (because it's linked to the interface used by getVectorMinNumElements) perhaps it's clean to perform the <= check before calling getSVEPredPatternFromNumElements, that way we know the truncation is not losing any precision. | |
| llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
|---|---|---|
| 4758–4759 | Sorry I meant "while can take 64-bit operands". | |
| llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
|---|---|---|
| 4758–4759 | ok, as whilelo operands are i32, I think we have to zero extend those operands to unsigned by cast<ConstantSDNode>(Op)->getZExtValue(). | |
| llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
|---|---|---|
| 4758–4759 | whilelo operands are not restricted to i32, which can be seen by looking at IntrinsicsAArch64.td. Although the definition suggests any integer type is allowed, that's just a deficiency of how the td file is parsed. Realistically these intrinsics only accept either i32 or i64 operands, which are tested by sve-intrinsics-while.ll. Perhaps you can keep things as APInt up until the comparison, at which point you know an unsigned will be big enough to hold the length. | |
unsigned