Page MenuHomePhabricator

[LV] Tail folded inloop reductions.
Needs ReviewPublic

Authored by dmgreen on Thu, Jul 23, 11:10 AM.

Details

Summary

This expands upon D75069, allowing them to be inserted into tail folded loops. Reductions are generates with the form:

x = select(mask, vecop, zero)
v = vecreduce.add(x)

Where zero here is chosen as the identity value for add reductions. The backend is then expected to fold the select and the vecreduce into a single predicated instruction.

Most of the code is fairly straight forward, except for the creation of blockmasks which need to ensure they are created in dominance order.

Diff Detail

Event Timeline

dmgreen created this revision.Thu, Jul 23, 11:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptThu, Jul 23, 11:10 AM
dmgreen marked an inline comment as done.Thu, Jul 23, 11:12 AM
dmgreen added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6956–6967

This code is a bit of a mess, and is causing a lot of the test differences. It's here to try and enforce that the block mask is always created before the reduction intrinsic, even if the reduction is inserted at the point of an existing binary operation. Without this we can end up with situations where the select does not dominate the block mask it is using.

I don't really like this change - it puts too much of a required ordering into these nodes - but I'm not sure of a better way to handle it. I tried a few ways without much success. Because we usually only get access to a VPValue, not the VPRecipe it represents, moving the recipe (and all dependent recipes) after the fact doesn't seem to be simple either.

Any thoughts/suggestions?