Page MenuHomePhabricator

[LV] Tail folded inloop reductions.

Authored by dmgreen on Jul 23 2020, 11:10 AM.



This expands upon D75069, allowing them to be inserted into tail folded loops. Reductions are generates with the form:

x = select(mask, vecop, zero)
v = vecreduce.add(x)

Where zero here is chosen as the identity value for add reductions. The backend is then expected to fold the select and the vecreduce into a single predicated instruction.

Most of the code is fairly straight forward, except for the creation of blockmasks which need to ensure they are created in dominance order.

Diff Detail

Event Timeline

dmgreen created this revision.Jul 23 2020, 11:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 23 2020, 11:10 AM
dmgreen marked an inline comment as done.Jul 23 2020, 11:12 AM
dmgreen added inline comments.

This code is a bit of a mess, and is causing a lot of the test differences. It's here to try and enforce that the block mask is always created before the reduction intrinsic, even if the reduction is inserted at the point of an existing binary operation. Without this we can end up with situations where the select does not dominate the block mask it is using.

I don't really like this change - it puts too much of a required ordering into these nodes - but I'm not sure of a better way to handle it. I tried a few ways without much success. Because we usually only get access to a VPValue, not the VPRecipe it represents, moving the recipe (and all dependent recipes) after the fact doesn't seem to be simple either.

Any thoughts/suggestions?

dmgreen updated this revision to Diff 288849.Aug 30 2020, 3:44 AM

Rebase and ping.

I'm still unsure how to proceed with VPRecipeBuilder::createBlockInMask, and what the advantages are of having vprecipe and vpvalue being separate are.

dmgreen updated this revision to Diff 293368.Sep 22 2020, 1:01 AM
dmgreen added a reviewer: SjoerdMeijer.

I have updated this to include a getFirstNonPhi() method on VPBasicBlock, which helps to move some of this logic into a nicer place. This way the Block Mask is still created before any other instructions, but after the phi nodes in the block.

dmgreen added inline comments.Sep 22 2020, 1:02 AM

I've tried to clean this up using a getFirstNonPhi and InsertPointGuard.

Just some minor questions inline.


nit: perhaps a comment about the in-loop reductions.


Is that a Phi?


Not important, but just out of curiousity, why is this moved up?

438 ↗(On Diff #293368)

Maye a bit off topic for this patch, but are we tail-predicating this loop for MVE? Could we do that? Reason I am asking is that I am looking at this icmp of the induction and the BTC, for which we could emit, so we would get the tail-predication?

dmgreen updated this revision to Diff 296912.Thu, Oct 8, 3:27 AM
dmgreen marked 3 inline comments as done.

Rebase over recent reduction changes and add additional comments.

Thanks for taking a look.


A VPWidenIntOrFpInductionSC? It looks like an induction variable, which I think will be a PHI, yeah.


These all moved up because we now create the blocks predicates after the phi recipes, to ensure they are before all the potentially predicated instructions that will need to use them.

438 ↗(On Diff #293368)

I think this test just needed a rebase.

SjoerdMeijer accepted this revision.Thu, Oct 8, 3:40 AM

Cheers, looks like a good change to me. Perhaps wait a day with committing just in case there are more/other opinions on this.

This revision is now accepted and ready to land.Thu, Oct 8, 3:40 AM
This revision was automatically updated to reflect the committed changes.