This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Break up larger shuffle-masks into legal sizes in getShuffleCost
ClosedPublic

Authored by dmgreen on Apr 8 2022, 10:42 AM.

Details

Summary

Given a larger-than-legal shuffle mask, the final codegen will split into multiple sub-vectors. This attempts to model that in AArch64TTIImpl::getShuffleCost, splitting masks up according to the size of the legalized vectors. If the sub-masks have at most 2 input sources we can call getShuffleCost on them and sum the costs, to get a more accurate final cost for the entire shuffle. The call to improveShuffleKindFromMask helps to improve the shuffle kind for the sub-mask cost call.

Diff Detail

Event Timeline

dmgreen created this revision.Apr 8 2022, 10:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2022, 10:42 AM
dmgreen requested review of this revision.Apr 8 2022, 10:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2022, 10:42 AM
samtebbs added inline comments.Apr 13 2022, 3:10 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2654

I might be misunderstanding this part of the code but I don't see how Source != Source1 will ever be true, since Source1 is assigned to Source when NumSources == 0, which is always the case in the first iteration. Same for the other else-if below.

@dmgreen Might llvm::processShuffleMasks from D115653 be of use here?

dmgreen updated this revision to Diff 423587.Apr 19 2022, 4:03 AM

Update and rebase.

@dmgreen Might llvm::processShuffleMasks from D115653 be of use here?

Hmm. I'm not sure. I remember thinking I should give it a try, I remember deciding it wasn't a good idea, but I don't remember why now. I think it won't give us the information we really need. We would need to process the mask going in, and then process ">=2 source" back into "2 source" and ">2 source" masks, as it will currently build >2 source masks out of 2 source masks. Plus going into and out of lambdas can make the structure difficult to follow, as I had this code it seemed simpler. Perhaps I've misunderstood how it should be used.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2654

The first unique Sources we find will get into the first if (NumSources == 0). The next time we find a Source that isn't == Source1 we will get into this if, and if we find more than 2 unique Sources, we just start to accumulate the total number of sources. If the Source is already one of Source1 or Source2 then we have already counted it and don't need to process. it in these checks again. Because NumSources increases it is kind of like a state machine.

samtebbs added inline comments.Apr 20 2022, 6:50 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2654

I think I somehow missed the assignment to Source when I looked over it the first time. Thanks for the explanation.

From looking at D100486, it appears that processShuffleMasks only handles single-input shuffles. I think I would recommend going with this version for the moment, and re-evaluating to use processShuffleMasks if it changes in the future to handle multiple-input shuffles.

This revision is now accepted and ready to land.Apr 26 2022, 3:27 AM
This revision was landed with ongoing or failed builds.Apr 27 2022, 5:51 AM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Apr 27 2022, 7:00 AM

This makes clang crash building aarch64 bare metal builtins one windows: http://45.33.8.238/win/57078/step_4.txt

Please take a look and revert for now if it takes a while to fix.

This makes clang crash building aarch64 bare metal builtins one windows: http://45.33.8.238/win/57078/step_4.txt

Please take a look and revert for now if it takes a while to fix.

Thanks for the report.

You got here before any the bots reported it. https://github.com/llvm/llvm-project/commit/46cef9a82df8d4029ef409cf1fd0329e8c1696e3 had the fix, which looks like it was down to fp128 types. I'll add a test too.