This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE][ISel] Combine dup of load to replicating load
ClosedPublic

Authored by peterwaller-arm on Dec 8 2022, 8:36 AM.

Details

Summary

(dup (load) z_or_x_passthrough) => (replicating load)

Diff Detail

Event Timeline

peterwaller-arm created this revision.Dec 8 2022, 8:36 AM
Herald added a reviewer: efriedma. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
peterwaller-arm requested review of this revision.Dec 8 2022, 8:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2022, 8:36 AM
paulwalker-arm added inline comments.Dec 8 2022, 10:34 AM
llvm/test/tools/llvm-mca/AArch64/A64FX/A64FX-sve-instructions.s
2574–2575

I've pushed https://reviews.llvm.org/rGb4028fbc1a88 which fixes this issue.

llvm/test/tools/llvm-mca/AArch64/A64FX/A64FX-sve-instructions.s
2574–2575

Thanks, much appreciated. Rebased.

Matt added a subscriber: Matt.Dec 8 2022, 1:06 PM
paulwalker-arm accepted this revision.Dec 12 2022, 2:00 AM

Some test recommendations but otherwise looks good.

llvm/test/CodeGen/AArch64/sve-ld1r.ll
843

To cover all the new variants you also want sign/zext forms for the integer types.

873

To cover all the new variants you also want unpacked forms for the floating-point types.

963–973

I don't think dup.x tests are necessary because we don't have any isel patterns for them. Instead we always lower such intrinsics to splat_vector, which is already tested?

That said, the dup.x tests are validating the immediate range so perhaps they can be converted to use dup into undef. That that said, I suppose there's an argument that given the immediate operand type is shared by the other pattern within the multiclass it's already being sufficiently tested. It's up to you depending on how paranoid you want to be.

This revision is now accepted and ready to land.Dec 12 2022, 2:00 AM
  • Test zext/sext.
  • Test unpacked forms (no negative tests, since dup with unpacked types is not specifically supported currently as there is currently no known way to get a DUP_MERGE_PASSTHRU from C/C++ code).
  • Drop immediate tests since those are covered.
  • Remove 'q' accidentally copied into the test names.
This revision was landed with ongoing or failed builds.Dec 14 2022, 2:38 AM
This revision was automatically updated to reflect the committed changes.
Allen added a subscriber: Allen.Jan 19 2023, 12:49 AM
Allen added inline comments.
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s
4475–4490

Excuse me, I'm curious why the above changes affect the SideEffects attribute here. Would you give me some guidance? Thank you.

dmgreen added inline comments.
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s
4475–4490

The way that hasSideEffects for an instruction works is that:

  • It defaults to true.
  • It gets set to false if there is a tablegen pattern that generated that instruction (I believe it might needs to be a single instruction generated by the pattern).
  • It can be overridden with let hasSideEffects=1/0 on the instruction.

If you are looking to remove the side effects flags, and the instruction doesn't have a pattern, adding hasSideEffects=0 is usually a good way to go. Removing the side effects can help improve scheduling freedom and other code motion in the backend, so is usually good to do so long as the instruction doesn't do anything odd.

Allen added inline comments.Jan 19 2023, 1:31 AM
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s
4475–4490

Thansk @dmgreen very much.

  • As the above changes don't have some overridden with let hasSideEffects=1/0 , so I think the 2nd guidelines works in this case.
  • Is the def : LD1RPat<nxv4i32, load, LD1RW_IMM, PTRUE_S, i32, am_indexed32_6b, uimm6s4>; before the change a tablegen pattern, and it also should set the instruction to false?
dmgreen added inline comments.Jan 19 2023, 3:19 AM
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s
4475–4490

Yeah - IIRC, I think the logic in Tablegen might only apply when the pattern produces a single instruction. So the pattern with (load (ptrue 31), .. would not set hasSideEffects=0 in the same way as the new (load $pg, .. pattern.

paulwalker-arm added inline comments.Jan 19 2023, 4:54 AM
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s
4475–4490

Does hasSideEffects default to false? I investigated this after D140680 and spotted bit hasSideEffects = ?; with none of the base SVE instructions classes that sit above it setting it. I suspect there was some AArch64 upstream refactoring between our original downstream implementation and when it was upstreamed that we missed and so this property has been dandling ever since (either that or it's just always been wrong).

Eitherway, as agreed on D140680 I'll have a patch in the next few days that should resolve it. For this patch you can just accept the new value as has been the case of other patches where this flip has occurred.

paulwalker-arm added inline comments.Jan 19 2023, 4:56 AM
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s
4475–4490

Oops. Please ignore the final comment, I hadn't spotted this was not a new patch.