This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Use PTRUE instruction for get_active_lane_mask intrinsic if the range is appropriate for predicator constant
ClosedPublic

Authored by dtemirbulatov on Nov 7 2022, 5:50 AM.

Details

Summary

While get_active_lane_mask lowering it uses WHILELO instruction, but for constant range suitable for PTRUE then we could issue PTRUE instruction instead.

Diff Detail

Event Timeline

dtemirbulatov created this revision.Nov 7 2022, 5:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 7 2022, 5:50 AM
dtemirbulatov requested review of this revision.Nov 7 2022, 5:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 7 2022, 5:50 AM
dtemirbulatov retitled this revision from Use ptrue instruction for get_active_lane_mask instrinis if range is from 0 to SVE prdicator constant to Use PTRUE instruction for get_active_lane_mask instrinis if range is from 0 to SVE prdicator constant.Nov 7 2022, 5:55 AM
dtemirbulatov edited the summary of this revision. (Show Details)
dtemirbulatov retitled this revision from Use PTRUE instruction for get_active_lane_mask instrinis if range is from 0 to SVE prdicator constant to [AArch64][SVE] Use PTRUE instruction for get_active_lane_mask instrinis if range is from 0 to SVE prdicator constant.Nov 7 2022, 6:00 AM
dtemirbulatov retitled this revision from [AArch64][SVE] Use PTRUE instruction for get_active_lane_mask instrinis if range is from 0 to SVE prdicator constant to [AArch64][SVE] Use PTRUE instruction for get_active_lane_mask intrinsic if the range is from 0 to SVE prdicator constant.Nov 7 2022, 8:20 AM
dtemirbulatov edited the summary of this revision. (Show Details)
dtemirbulatov retitled this revision from [AArch64][SVE] Use PTRUE instruction for get_active_lane_mask intrinsic if the range is from 0 to SVE prdicator constant to [AArch64][SVE] Use PTRUE instruction for get_active_lane_mask intrinsic if the range is from 0 to SVE predicator constant.
dtemirbulatov edited the summary of this revision. (Show Details)
dtemirbulatov edited the summary of this revision. (Show Details)Nov 7 2022, 8:23 AM
peterwaller-arm added inline comments.Nov 7 2022, 8:34 AM
llvm/test/CodeGen/AArch64/active_lane_mask.ll
486

Please can you add a test which shows what happens when the %n parameter of active.lane.mask goes out of the range that can be represented as a ptrue immediate. I expect this should fall back to the old codegen?

This transformation is more subtle than you expect. See https://developer.arm.com/documentation/ddi0602/2022-09/SVE-Instructions/PTRUE--predicate---Initialise-predicate-from-named-constraint-, specifically the part that reads If the constraint specifies more elements than are available at the current vector length then all elements of the destination predicate are set to false..

peterwaller-arm added inline comments.Nov 7 2022, 8:59 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17640

Notwithstanding Paul's comment, I think something like this also needs to account for the mapping between PredPattern constants and the integer %n constant.

https://github.com/llvm/llvm-project/blob/2e999b7dd1934a44d38c3a753460f1e5a217e9a5/llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h#L530-L556

Note that 1-8 have a one-to-one correspondance and above that they don't.

georges added a subscriber: georges.Nov 7 2022, 9:28 AM
Matt added a subscriber: Matt.Nov 7 2022, 12:39 PM

Fixed remarks.

dtemirbulatov marked 2 inline comments as done.Nov 7 2022, 6:01 PM
peterwaller-arm added inline comments.Nov 8 2022, 4:49 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17639–17640

Unless I'm missing something, this condition doesn't look right.

I suspect this combine could happen before legalization in which case you could have vscale x [big] x something, and the [big] you're testing against here could exceed the runtime vector length -- and then you're hitting the problem Paul pointed out that ptrue would return an empty predicate.

I take to account for this properly you're going to need to query the minimum vscale via SubTarget->getMinSVEVectorSizeInBits().

dtemirbulatov updated this revision to Diff 475420.EditedNov 15 2022, 4:19 AM

Rebased, addressed comments.

dtemirbulatov retitled this revision from [AArch64][SVE] Use PTRUE instruction for get_active_lane_mask intrinsic if the range is from 0 to SVE predicator constant to [AArch64][SVE] Use PTRUE instruction for get_active_lane_mask intrinsic if the range is appropriate for predicator constant.Nov 16 2022, 5:44 AM
dtemirbulatov edited the summary of this revision. (Show Details)

Add support for any range not just from 0.

dtemirbulatov marked an inline comment as done.Nov 16 2022, 5:47 AM
c-rhodes added inline comments.Nov 17 2022, 4:46 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
4755

unsigned

4755–4757

std::max(Subtarget->getMinSVEVectorSizeInBits(), 128u)

4759

NumActiveElems?

4759–4760

Op.getConstantOperandVal(2) - Op.getConstantOperandVal(1);

4762

this should be <=

4763

this should be the encoded pattern returned by getSVEPredPatternFromNumElements, see elsewhere in ISelLowering for examples.

llvm/test/CodeGen/AArch64/active_lane_mask.ll
478

these tests aren't sufficient, it would be good to have tests for

  • the upper bound of active elements vl256, this can be tested with a <vscale x 16 x i1> predicate and minimum SVE vector width of 2048 specified via vscale_range(16, 16) function attribute. Also add a test for this vector width that exceeds 256 active elements.
  • A non-zero %base argument for active.lane.mask.
495

active.lane.mask is an unsigned less-than comparison so this generates a predicate with 4 active lanes, which for a minimum vector width of 128 and esize of 32 can be represented with ptrue p0.s, vl4?

dtemirbulatov marked an inline comment as done.

Addressed remarks.

dtemirbulatov marked 7 inline comments as done.Nov 17 2022, 4:25 PM
dtemirbulatov added inline comments.
llvm/test/CodeGen/AArch64/active_lane_mask.ll
495

correct.

This revision is now accepted and ready to land.Nov 18 2022, 3:25 AM
This revision was landed with ongoing or failed builds.Nov 18 2022, 8:21 AM
This revision was automatically updated to reflect the committed changes.
paulwalker-arm added inline comments.Nov 24 2022, 6:42 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
4758–4759

This doesn't look correct because while can take 64-bit operation. This code can truncate the result in such a way that a ptrue might incorrectly look like a viable replacement.

4763

It likely doesn't really matter but given getSVEPredPatternFromNumElements takes an unsigned (because it's linked to the interface used by getVectorMinNumElements) perhaps it's clean to perform the <= check before calling getSVEPredPatternFromNumElements, that way we know the truncation is not losing any precision.

paulwalker-arm added inline comments.Nov 24 2022, 7:39 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
4758–4759

Sorry I meant "while can take 64-bit operands".

dtemirbulatov added inline comments.Nov 24 2022, 5:49 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
4758–4759

ok, as whilelo operands are i32, I think we have to zero extend those operands to unsigned by cast<ConstantSDNode>(Op)->getZExtValue().

paulwalker-arm added inline comments.Nov 25 2022, 2:46 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
4758–4759

whilelo operands are not restricted to i32, which can be seen by looking at IntrinsicsAArch64.td. Although the definition suggests any integer type is allowed, that's just a deficiency of how the td file is parsed. Realistically these intrinsics only accept either i32 or i64 operands, which are tested by sve-intrinsics-while.ll.

Perhaps you can keep things as APInt up until the comparison, at which point you know an unsigned will be big enough to hold the length.