This is an archive of the discontinued LLVM Phabricator instance.

[WIP][LegalizeTypes][LegalizeDAG] Use misaligned load/store to optimize memory access with non-power2 integer types.
AbandonedPublic

Authored by bcl5980 on Sep 7 2022, 2:33 AM.

Details

Summary

When load integer types with non-power2 aligned size, we will split the load to some no overlap legal type loads.
For example, if we trying to legalize i56, it will be split to i32 + i16 + i8 three loads.
This change trying to use two i32 load with 8bits overlap to reduce the load number.

The motivation comes from ARM64EC (https://reviews.llvm.org/D125418#inline-1267564).
For now, we don't apply it to store because it will involve extra dependency in CPU load store queue.

Diff Detail

Event Timeline

bcl5980 created this revision.Sep 7 2022, 2:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2022, 2:33 AM
bcl5980 requested review of this revision.Sep 7 2022, 2:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2022, 2:33 AM
bcl5980 updated this revision to Diff 458409.EditedSep 7 2022, 3:43 AM

fix test fail.

efriedma added inline comments.Sep 7 2022, 10:42 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
801

The alignment you're passing in here doesn't seem right; you want to pass in the alignment of the load you're planning to generate, right?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
2045 ↗(On Diff #458409)

None of this logic looks like it's target-specific; can we just do this in target-independent code?

bcl5980 added inline comments.Sep 7 2022, 9:43 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
801

Ah, yeah. I missed it. Thanks for the catch.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
2045 ↗(On Diff #458409)

Yes. Of course, we can move this code into target-independent layer.
But there are some regressions on other platform like x86, comes from load/store pair.
For now, I haven't enabled the optimization for store. So, load pattern is different from store if we have a load/store pair.
For example,

define void @i56_or(ptr %a) {
  %aa = load i56, ptr %a, align 1
  %b = or i56 %aa, 384
  store i56 %b, ptr %a, align 1
  ret void
}

Actually, this is a general issue but the gain from load can cover the extra instructions on ARM but x86 can't. So, I limited this to ARM.
Maybe we can enable the optimization for store also to fix the issue. But I still have a little concern on it.

bcl5980 updated this revision to Diff 458686.Sep 8 2022, 2:01 AM
bcl5980 retitled this revision from [AArch64] Use misaligned load/store to optimize memory access with non-power2 integer types. to [LegalizeTypes][LegalizeDAG] Use misaligned load/store to optimize memory access with non-power2 integer types..
bcl5980 added reviewers: RKSimon, craig.topper.

Move the code to independent layer.
Send the patch to review but for now x86 still has some regressions.

Do we have test coverage for load+store patterns involving integers larger than 56 bits?

If the code quality issues are specific to 56 bits, it might make sense to make the initial patch skip handling 56-bit integers, and try to address them in a followup.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
815

The alignment on this load isn't right.

The "alignment" of a load SDNode isn't the alignment of the load itself; it's the alignment of the base of the MachinePointerInfo. So if you're using getWithOffset() to adjust the MachinePointerInfo, the alignment doesn't change based on the offset. (This distinction helps make alias analysis more accurate.)

bcl5980 updated this revision to Diff 458946.Sep 8 2022, 7:27 PM

Update load+store patterns with integers larger than 56 bits. Looks even worse than 56bits.

bcl5980 marked 2 inline comments as done.Sep 8 2022, 7:27 PM
bcl5980 planned changes to this revision.Sep 8 2022, 7:28 PM
bcl5980 retitled this revision from [LegalizeTypes][LegalizeDAG] Use misaligned load/store to optimize memory access with non-power2 integer types. to [WIP][LegalizeTypes][LegalizeDAG] Use misaligned load/store to optimize memory access with non-power2 integer types..
bcl5980 abandoned this revision.Oct 24 2022, 10:45 PM