This is an archive of the discontinued LLVM Phabricator instance.

[WIP][AArch64][DAGCombiner] Swap the operations of logical operation AND to match movprfx
AbandonedPublic

Authored by Allen on Jun 26 2022, 3:59 AM.

Diff Detail

Event Timeline

Allen created this revision.Jun 26 2022, 3:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2022, 3:59 AM
Allen requested review of this revision.Jun 26 2022, 3:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2022, 3:59 AM

Is this optimisation valid? The merging SVE intrinsics have strict rules about what happens to inactive lanes. For the llvm.aarch64.sve.and the inactive lanes are set to the matching lanes of the first operand. This means that the inactive lanes of the second operand play no role in the operation and thus the example in and_i64_zero_comm is not a zeroing and.

However, given the inactive lanes of the second operand play no role, this effectively means the select is redundant and can be optimised away as an instcombine before it gets to code generation. So I guess the question is whether you are seeing this issue in real code and thus it's worth implementing the instcombine.

Matt added a subscriber: Matt.Jun 28 2022, 2:22 PM
Allen added a comment.EditedJun 30 2022, 5:07 AM

Is this optimisation valid? The merging SVE intrinsics have strict rules about what happens to inactive lanes. For the llvm.aarch64.sve.and the inactive lanes are set to the matching lanes of the first operand. This means that the inactive lanes of the second operand play no role in the operation and thus the example in and_i64_zero_comm is not a zeroing and.

However, given the inactive lanes of the second operand play no role, this effectively means the select is redundant and can be optimised away as an instcombine before it gets to code generation. So I guess the question is whether you are seeing this issue in real code and thus it's worth implementing the instcombine.

Oh, sorry, and thanks @paulwalker-arm for your reminder . I forgot to use clang end-to-end to confirm the final assembly, At first thought it will be better performance to generate movprfx, without realizing that the select is redundant in this case.
Indeed, the instructions generated by the s113_tuned version are more efficient in the link https://gcc.godbolt.org/z/P14sb6MPq

Allen retitled this revision from [AArch64][DAGCombiner] Swap the operations of logical operation AND to match movprfx to [WIP][AArch64][DAGCombiner] Swap the operations of logical operation AND to match movprfx.Jun 30 2022, 5:11 AM
paulwalker-arm requested changes to this revision.Jul 15 2022, 9:04 AM

Just trying to cleanup my review list since we're agreed a different approach is required.

This revision now requires changes to proceed.Jul 15 2022, 9:04 AM
Allen abandoned this revision.Jul 15 2022, 5:55 PM

sorry, forget to adopt