Page MenuHomePhabricator

[AArch64][CodeGen] Use interleave store for streaming compatible functions

Authored by CarolineConcatto on Mar 28 2023, 4:16 AM.



The previous patch, D135564, was too conservative to avoid store interleave
for streaming-compatible functions/mode.

In this patch, we allow using the interleave store but using scalable vector.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 4:16 AM
CarolineConcatto requested review of this revision.Mar 28 2023, 4:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 4:16 AM
sdesmalen added inline comments.Mar 28 2023, 4:31 AM

I assume we'll need a similar capability for interleaved loads?


It would be nice to have some more test coverage. Maybe something where the test doesn't depending on a splat, and some different vector lengths (e.g. where vector legalisation is required)

david-arm added inline comments.Mar 29 2023, 1:51 AM

Interestingly this code is faster than using st2w! That makes me wonder if we're missing some DAG combines somewhere for interleaving stores of splats. It's probably something unlikely to happen in practice though. I don't think you have to do anything in this patch, but it's something we may want to revisit at some point.

I agree with @sdesmalen here - it would be good to have a more generic test case (without splats) here because otherwise this particular test is fragile.

  • Add more tests to interleave store with streaming compatible function

Thank you for pointing that out.
We don't need to do nothing in both functions, because isLegalInterleavedAccessType
already sets UseScalable to true when Subtarget->forceStreamingCompatibleSVE().

Thanks for the new tests @CarolineConcatto! I just had a couple more suggestions on possibly improving the tests a bit more ...


I don't think you need the second %v2 argument here, since it's never actually used. You can rewrite the IR below to just be:

%interleaved = shufflevector <8 x i32> %v1, <8 x i32> undef, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
store <8 x i32> %interleaved, ptr %a, align 1

This test has the same problem as @hang_when_merging_stores_after_legalisation, because it's using splats. I think you can do this:

define void @interleave_store_legalization(ptr %p, <8 x i32> %a, <8 x i32> %b) #0 {
  %interleaved = shufflevector <8 x i32> %a, <8 x i32> %b, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
  store <16 x i32> %interleaved, ptr %p, align 1
  ret void
  • Fix the test with wrong size of the input vector and remove the splat in the second test

Yes, you are right. I should have changed the size of the input vectors. I think now it is better. Right?


I don't fully understand why the splat in not a good example, but I did changed. Hope it is better now,

david-arm accepted this revision.Mar 29 2023, 6:06 AM

LGTM! Eccelente! Thanks for making the changes @CarolineConcatto.


Well, it's for the same reason as @hang_when_merging_stores_after_legalisation, because the splat version may get optimised in future to be a pair of stp instructions that's all.

This revision is now accepted and ready to land.Mar 29 2023, 6:06 AM
sdesmalen accepted this revision.Apr 12 2023, 5:36 AM
This revision was landed with ongoing or failed builds.Apr 13 2023, 1:44 AM
This revision was automatically updated to reflect the committed changes.