This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] add a peephole to remove redundant swap instructions created by expandVSXStoreForLE
ClosedPublic

Authored by tingwang on Dec 8 2022, 10:42 PM.

Details

Summary

The scenario is vector store of splat value on Power8 LE. It is not expected that we see swap instruction before the store.

Vector store on P8 little endian created by expandVSXStoreForLE() will have swap instruction added before the store in PPCISelLowring. If the vector is generated by splat, the swap instruction can be eliminated.

Ideally those swaps should have been removed by ppc-vsx-swaps pass. However due to its limitation, that removal did not happen. This patch does the removal in peephole.

Diff Detail

Event Timeline

tingwang created this revision.Dec 8 2022, 10:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2022, 10:42 PM
tingwang requested review of this revision.Dec 8 2022, 10:42 PM
shchenz added inline comments.Dec 13 2022, 5:46 PM
llvm/lib/Target/PowerPC/PPCMIPeephole.cpp
644

Can this branch be merged to the above one at line 627? Splat a double word and the double word is a splat of a small unit should also can be removed?

tingwang added inline comments.Dec 13 2022, 7:02 PM
llvm/lib/Target/PowerPC/PPCMIPeephole.cpp
644

Sure, I can do the merge to reuse logic.

Are you referring to instructions like xxspltib? I think those are introduced in v3.0, and they will not appear on P8, so I did not include those.

shchenz added inline comments.Dec 13 2022, 7:10 PM
llvm/lib/Target/PowerPC/PPCMIPeephole.cpp
644

Oh, I meant the XXPERMDI instruction can be removed like the above branch(at like 627) does. Not referring to any other instructions.

tingwang added inline comments.Dec 14 2022, 12:11 AM
llvm/lib/Target/PowerPC/PPCMIPeephole.cpp
644

IMHO, we saw these additional swaps on P8 because of stores, and that's why only saw Immed == 2. I'm not sure of any origin that can produce splat (i.e. Immed == 0 || Immed == 3 like 627) and reach peephole. If there is any such origin, it maybe more interesting to fix in the origin. To me, it is a little bit confused why need to handle splat here?

shchenz accepted this revision as: shchenz.Dec 15 2022, 8:59 PM

LGTM. Maybe wait for some days for other reviews.

llvm/lib/Target/PowerPC/PPCMIPeephole.cpp
644

OK, I see. I did some hacks and seems indeed XXPERMDI (as a splat) is never be fed by another splat. So let's keep as it is. And continuously improve this when a case is found.

This revision is now accepted and ready to land.Dec 15 2022, 8:59 PM
This revision was landed with ongoing or failed builds.Feb 2 2023, 5:53 PM
This revision was automatically updated to reflect the committed changes.

Why do we need this?

  1. For LE, why doesn't the swap removal pass remove the swaps rather than relying on this peephole?
  2. For AIX, why do we have swaps? We certainly don't swap values we store by default on big endian, so the description of this doesn't explain it at all.

Why do we need this?

  1. For LE, why doesn't the swap removal pass remove the swaps rather than relying on this peephole?
  2. For AIX, why do we have swaps? We certainly don't swap values we store by default on big endian, so the description of this doesn't explain it at all.

Hi Nemanja,

This scenario indirectly originated from D138883. These additional swaps observed on LE were not removed by ppc-vsx-swaps pass due to its limitations. Since the instruction combination is clear and simple, I weighted peephole a more reliable approach to remove these swaps than ppc-vsx-swaps pass. (IMHO, it cost more to fix in ppc-vsx-swaps pass, not even mention that ppc-vsx-swaps pass sometimes backfires and creates more swaps. I'm a little bit worried that enabling more patterns in ppc-vsx-swaps pass may cause it backfire at me...)

This was intended for LE, and all pattern changes here are related to LE. Sorry I should have high lighted this point, and will provider more information in the description.

By the way, thank you for providing help on D138883. As you suggested, I realized canCombineStoreAndExtract() could be used for the purpose, and did extend the API. Not sure if that serves the purpose well or not?

tingwang retitled this revision from [PowerPC] add a peephole to remove redundant swap instructions after vector splats on P8 to [PowerPC] add a peephole to remove redundant swap instructions created by expandVSXStoreForLE.Feb 5 2023, 6:02 PM
tingwang edited the summary of this revision. (Show Details)