Given a larger-than-legal shuffle mask, the final codegen will split into multiple sub-vectors. This attempts to model that in AArch64TTIImpl::getShuffleCost, splitting masks up according to the size of the legalized vectors. If the sub-masks have at most 2 input sources we can call getShuffleCost on them and sum the costs, to get a more accurate final cost for the entire shuffle. The call to improveShuffleKindFromMask helps to improve the shuffle kind for the sub-mask cost call.
Details
Diff Detail
Event Timeline
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
2648 | I might be misunderstanding this part of the code but I don't see how Source != Source1 will ever be true, since Source1 is assigned to Source when NumSources == 0, which is always the case in the first iteration. Same for the other else-if below. |
Hmm. I'm not sure. I remember thinking I should give it a try, I remember deciding it wasn't a good idea, but I don't remember why now. I think it won't give us the information we really need. We would need to process the mask going in, and then process ">=2 source" back into "2 source" and ">2 source" masks, as it will currently build >2 source masks out of 2 source masks. Plus going into and out of lambdas can make the structure difficult to follow, as I had this code it seemed simpler. Perhaps I've misunderstood how it should be used.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
2648 | The first unique Sources we find will get into the first if (NumSources == 0). The next time we find a Source that isn't == Source1 we will get into this if, and if we find more than 2 unique Sources, we just start to accumulate the total number of sources. If the Source is already one of Source1 or Source2 then we have already counted it and don't need to process. it in these checks again. Because NumSources increases it is kind of like a state machine. |
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
2648 | I think I somehow missed the assignment to Source when I looked over it the first time. Thanks for the explanation. |
From looking at D100486, it appears that processShuffleMasks only handles single-input shuffles. I think I would recommend going with this version for the moment, and re-evaluating to use processShuffleMasks if it changes in the future to handle multiple-input shuffles.
This makes clang crash building aarch64 bare metal builtins one windows: http://45.33.8.238/win/57078/step_4.txt
Please take a look and revert for now if it takes a while to fix.
Thanks for the report.
You got here before any the bots reported it. https://github.com/llvm/llvm-project/commit/46cef9a82df8d4029ef409cf1fd0329e8c1696e3 had the fix, which looks like it was down to fp128 types. I'll add a test too.
I might be misunderstanding this part of the code but I don't see how Source != Source1 will ever be true, since Source1 is assigned to Source when NumSources == 0, which is always the case in the first iteration. Same for the other else-if below.