This patch updates codegen to use VPValues to manage the generated
scalarized instructions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Adjust vplan printing test and split off changes to use VPTransformState in a few more places to D95757. This is required to ensure we pick up the scalarized values managed directly in VPTransformState.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2929 | nit: Should it be Builder.getInsertBlock() to avoid calling getParent on BasicBlock::end() iterator? It might be impossible to have the end iterator here, but the getInsertBlock version is generally safer, IMO. | |
2947 | nit: I think commenting it out would be misleading for the reader. Can we just remove it? |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
9046 | No need to pass as arguments now that State itself is passed. | |
9208 | This is porting the resetVectorValue to State operations, which I think is fine by itself (I tend to agree with passing nullptr as the underlying value to VPPredInstPHI), but something else bothers me: It seems VPPredInstPHI doesn't take over the VPReplicate as the VPValue mapped to the predicated IR value during VPlan construction, which means that recipes constructed for users of predicated instructions use the ReplicateRecipe although it doesn't dominate them. We get away with this by overriding the IR's vector value, but for correct VP def-use we should have VPPredInstPHI take over as the registered VPValue in VPlan during construction. Does all that make sense? |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
9208 | I think the problem here is how we generate code for a scalarization region, like the one below. Uses after the region use the PHI-PREDICATED-INSTRUCTION VPValue. But when generating code for the region itself, we execute the same blocks once for each vector element. So in the second iteration "REPLICATE ir<%div> = udiv ir<16>, ir<%counter1.0> (S->V)\l" currently needs to pack its result in the result vector from the previous iteration. Currently that is done by overwriting the state of the def "REPLICATE ir<%div> = udiv ir<16>, ir<%counter1.0> (S->V)\l" with the predicated PHI from the previous iteration, so the vector-pack logic picks it up. It might be possible to instead change the code to just have scalar phis and do the packing as part of the phi creation. I think for transformations, the VPValue def-use chain should be correct, and it is 'just' a code-gen improvement. Do you think we need to fix this before this one goes in? subgraph cluster_N2 { fontname=Courier label="\<xVFxUF\> pred.udiv" N1 [label = "pred.udiv.entry:\n" + " + "BRANCH-ON-MASK vp<%4>\l"\l" + "CondBit: vp<%4> (for.body3)\l" ] N1 -> N3 [ label="T"] N1 -> N4 [ label="F"] N3 [label = "pred.udiv.if:\n" + "REPLICATE ir<%div> = udiv ir<16>, ir<%counter1.0> (S->V)\l" ] N3 -> N4 [ label=""] N4 [label = "pred.udiv.continue:\n" + "PHI-PREDICATED-INSTRUCTION vp<%6> = ir<%div>\l" ] } |
Ping :)
Rebased and added a note about why we are re-setting the value for operands of VPPRedPHI. Also added a few comments to set/reset functions and assertions.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
9046 |
Does that make sense? | |
9208 | Yes, thanks! | |
9208 |
Right, I missed the part where you replace the predicated instruction's VPValue with the new PredInstRecipe.
Not sure recipes should create values in other places than their designated location. Anyway, this shouldn't block this patch. |
Thanks Gil! I removed the UF/VF arguments from widenPHIInstruction, I totally missed that I did not address this comment originally.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
9046 |
Yes, sorry I totally missed that I did not update this so far! Fixed now. |
nit: Should it be Builder.getInsertBlock() to avoid calling getParent on BasicBlock::end() iterator? It might be impossible to have the end iterator here, but the getInsertBlock version is generally safer, IMO.