In AArch64 target, the WHILEWR instructiuon is enable in sve2.
Details
Details
Diff Detail
Diff Detail
Unit Tests
Unit Tests
Event Timeline
Comment Actions
Hi
I think the idea of using whilewr is a nice one. I think how this should work is similar to the active.lane.mask intrinsics.
- We define a generic intrinsic, argue about the name and the exact semantics, adding details to the language ref.
- Add a target hook from the vectorizer to opt into using them for runtime checks.
- Lower them generically to a series of compares and whatnot in DAG (this may be difficult depending on the exact semantics)
- Under AArch64 we expand it to a whilewr and a csel last (I think). Which can then hopefully optimize to use b.last.
At least that is how I think I would expect it to work, with an intrinsic that accepts two pointers or integers of pointer size and produces an i1. The alternative would be just match it in the backend. Unfortunately the semantics of whilewr don't seem not super obvious. I think the b variant performs (VL -1) < zext(B) - zext(A)) | (zext(B) - zext(A)) > 0 for the last lane, which is a little odd for values where A+VL wraps around 0 and probably makes direct matching difficult.
We would also need to account for UF correctly, which might be possible using a different element size.