This change allows to lower WHILEop operations with constant operands to PTRUE instructions, it also fixes issue with unsigned integer overflow for WHILELO support already commited in https://reviews.llvm.org/D137547.
Details
Diff Detail
Event Timeline
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
4848–4859 | This isn't quite what I had in mind when suggesting you can increase commonality. I had in mind a helper function that takes parameters like IsSigned and IsLess or perhaps even makes use of the existing AArch64CC::CondCode enum? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
4327 | nit: Move ElementSize closer to its use. | |
4329 | nit: Can you move these out into separate variables, e.g. X and Y? | |
4339 | Should this Overflow test be moved below the if (IsOpEqualOrSame) { ... } condition? | |
4847–4848 | Is it worth just moving all the behaviour into that function, and passing IsSigned, IsLess and IsEqual as parameters, such that you get: case Intrinsic::aarch64_sve_whilelo: return optimizeWhile(Op.getOperand(1), Op.getOperand(2), /*IsSigned=*/false, /*IsLess=*/true, /*IsEqual=*/false); case Intrinsic::aarch64_sve_whilels: return optimizeWhile(Op.getOperand(1), Op.getOperand(2), /*IsSigned=*/false, /*IsLess=*/true, /*IsEqual=*/true); case Intrinsic::aarch64_sve_whilelt: return optimizeWhile(Op.getOperand(1), Op.getOperand(2), /*IsSigned=*/true, /*IsLess=*/true, /*IsEqual=*/false); case Intrinsic::aarch64_sve_whilele: return optimizeWhile(Op.getOperand(1), Op.getOperand(2), /*IsSigned=*/true, /*IsLess=*/true, /*IsEqual=*/true); ... |
Addressed remarks, avoided to check overflow while adding one to the range for equal or same conditions.
Thanks that's looking a lot better!
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
4602 | nit: add a newline before the if? | |
4605 | I guess this can still overflow right? I think you had the right code for that before, it only needed moving the if (Overflow) return SDValue(); after the increment. It would also be good to add a test for the overflow case (both signed and unsigned, and both 'less' and 'less or equal') |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
4605 | With the increment case, I don't think we have to worry about overflow/underflow. it either world be 0 or too large/too small number that are not representable with PTRUE VL instruction. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
4605 | With the increment case, I don't think we have to worry about overflow/underflow. it either world be 0 or too large/too small number that are not representable with PTRUE VL instruction. |
llvm/test/CodeGen/AArch64/sve-intrinsics-while.ll | ||
---|---|---|
87 | I guess this doesn't match: ptrue p0.d, vl4 because NumActiveElems.getZExtValue() <= (MinSVEVectorSize / ElementSize) evaluates to false. Can you clarify the reason these intrinsics don't fold in the name of these tests (e.g. whilele_d_ii_dont_fold_to_ptrue_larger_than_minvec) and/or add a comment to explain it? | |
101 | Same question here: rename this to whilele_b_ii_dont_fold_to_ptrue_nonexistent_vl9 or something? | |
121 | same question about renaming this test, e.g. whilele_b_ii_dont_fold_to_ptrue_overflow ? | |
270 | For whilelo (unsigned), this is the same as -1, and is therefore not much different than the test above. | |
410 | For whilels (unsigned), this is the same as -1, and is therefore not much different than the test above. |
llvm/test/CodeGen/AArch64/sve2-intrinsics-while.ll | ||
---|---|---|
141 | Similar comment as for whilegt, it's better to pick some numbers that actually end up as a valid vl if you remove the overflow check, e.g. @llvm.aarch64.sve.whilege.nxv16i1.i32(i32 -2147483641, i32 2147483647) | |
258 | Similar comment as for whilehi, this doesn't overflow. You could use @llvm.aarch64.sve.whilehs.nxv16i1.i32(i32 6, i32 4294967295) ; this would wrap around to vl8 | |
399 | It's better to pick some numbers that actually end up as a valid vl if you remove the overflow check, e.g. @llvm.aarch64.sve.whilegt.nxv16i1.i32(i32 -2147483641, i32 2147483647) Can you use similar numbers whilege? | |
527 | This doesn't overflow right? 4294967295 - 6 => 4294967289 (no overflow here). For whilehi, it would overflow if the start value would be lower than the end value, e.g. @llvm.aarch64.sve.whilehi.nxv16i1.i32(i32 7, i32 4294967295) ; this would wrap around to vl8 |
nit: Move ElementSize closer to its use.