Details
- Reviewers
paulwalker-arm dmgreen david-arm
Diff Detail
Unit Tests
Event Timeline
Is this optimisation valid? The merging SVE intrinsics have strict rules about what happens to inactive lanes. For the llvm.aarch64.sve.and the inactive lanes are set to the matching lanes of the first operand. This means that the inactive lanes of the second operand play no role in the operation and thus the example in and_i64_zero_comm is not a zeroing and.
However, given the inactive lanes of the second operand play no role, this effectively means the select is redundant and can be optimised away as an instcombine before it gets to code generation. So I guess the question is whether you are seeing this issue in real code and thus it's worth implementing the instcombine.
Oh, sorry, and thanks @paulwalker-arm for your reminder . I forgot to use clang end-to-end to confirm the final assembly, At first thought it will be better performance to generate movprfx, without realizing that the select is redundant in this case.
Indeed, the instructions generated by the s113_tuned version are more efficient in the link https://gcc.godbolt.org/z/P14sb6MPq
Just trying to cleanup my review list since we're agreed a different approach is required.