Page MenuHomePhabricator

[SLP] Workaround for vectorizing loads with more than one store uses.
Needs ReviewPublic

Authored by vporpo on Aug 26 2022, 5:00 PM.

Details

Summary
 L0      L1
/ \     /  \

S0 S0' S1 S1'

In AArch64 this simple pattern of loads with multiple store users is not getting
vectorized because of the cost of the external uses.
This patch is a workaround that reduces the cost of the external use if the def
is a load and the user can be found in the Stores seeds map.

Diff Detail

Event Timeline

vporpo created this revision.Aug 26 2022, 5:00 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 26 2022, 5:00 PM
vporpo requested review of this revision.Aug 26 2022, 5:00 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 26 2022, 5:00 PM

It is very optimistic estimation. It does not check if the external stores could be vectorized, does not check for the reordering shuffles costs. May lead to regressions in many cases.

Yeah I agree, it is too optimistic. What if I also check the contents of the tree to make sure it is a store-load only, and also check that the ordering of the external stores matches the loads?

Yeah I agree, it is too optimistic. What if I also check the contents of the tree to make sure it is a store-load only, and also check that the ordering of the external stores matches the loads?

Maybe try something like this: for each unsuccessful vectotization attempt record stores, which build vectorized store and compare the uses with this stores. The last stores will be vectorized, because all the users are potentially vectorizable. The process then repeats itself, because of the changes in basicblock. And first stores are also vectorized. You can estimate the reordering cost too in this case.

Vasileios, Abataev's suggestion seems reasonable. Can you try it?

@davidxl I tried it, but it is not as simple as it seems. It won't vectorize the code fully unless we also modify the visiting order of the seeds.

Naive question -- can this be done as a post-processing step with proper reordering?