This patch splits off the tail-folding logic from general mask handling
to it's own transform, which
- introduces the required mask to the VPlan
- updates existing masks to be the AND of the existing mask and the mask for the header block,
- adds masks to recipes that require masks with tail-folding.
This moves tail folding to be part of the gradual lowering & refinement
of a VPlan. Moving tail-folding to a VPlan2VPlan transform makes it
independent of the underlying IR and allows it to be applied to generic
VPlans.
What about float point division?