Page MenuHomePhabricator

[LV] Track all IR blocks corresponding to VPBasicBlock
AbandonedPublic

Authored by reames on Aug 2 2022, 1:25 PM.

Details

Summary

When working with edges entering or leaving a VPBasicBlock, we need to use either the first or last IR block corresponding to the VPBasicBlock. The existing code is only correct when a VPBasicBlock in the header or latch position corresponds to exactly one BasicBlock. This happens to be true as the only VPBasicBlock which corresponds to more than one BasicBlock today is a VPReplicateRecipe which (as an implementation detail) can't be either a loop header or latch.

I decided to track the whole set for simplicity. It feels odd to not have a way to map IR blocks to VP blocks and vice versa, so since I was changing it anyways, figured I'd just track the whole set.

Diff Detail

Unit TestsFailed

TimeTest
60,100 msx64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases::scariness_score_test.cpp
Script: -- : 'RUN: at line 4'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -shared-libasan -O0 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/asan/TestCases/scariness_score_test.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/asan/X86_64LinuxDynamicConfig/TestCases/Output/scariness_score_test.cpp.tmp
60,080 msx64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
Script: -- : 'RUN: at line 4'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -O0 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/asan/TestCases/scariness_score_test.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Output/scariness_score_test.cpp.tmp

Event Timeline

reames created this revision.Aug 2 2022, 1:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2022, 1:25 PM
reames requested review of this revision.Aug 2 2022, 1:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2022, 1:25 PM
Herald added a subscriber: vkmr. · View Herald Transcript

The patch seems fine to me, but I think Florian probably understands this part of the codebase better than I.

The patch seems fine to me, but I think Florian probably understands this part of the codebase better than I.

@fhahn ping

The existing code is only correct when a VPBasicBlock in the header or latch position corresponds to exactly one BasicBlock. This happens to be true as the only VPBasicBlock which corresponds to more than one BasicBlock today is a VPReplicateRecipe which (as an implementation detail) can't be either a loop header or latch.

I am not sure this is accurate, at the moment a non-predicated VPReplicateRecipe can be in any block I think. Predicated VPReplicateRecipes must be in a VPBasicBlock in a VPRegionBlock. IIUC this may become an issue after D131118 if regular VPReplicateRecipes could be expanded to multiple basic blocks?

If that's the issue, it would probably be clearer if this is modeled explicitly, by putting such recipes into their own region, more faithfully representing the fact that a loop will be generated for them.

reames abandoned this revision.Wed, Sep 7, 12:35 PM

The existing code is only correct when a VPBasicBlock in the header or latch position corresponds to exactly one BasicBlock. This happens to be true as the only VPBasicBlock which corresponds to more than one BasicBlock today is a VPReplicateRecipe which (as an implementation detail) can't be either a loop header or latch.

I am not sure this is accurate, at the moment a non-predicated VPReplicateRecipe can be in any block I think. Predicated VPReplicateRecipes must be in a VPBasicBlock in a VPRegionBlock. IIUC this may become an issue after D131118 if regular VPReplicateRecipes could be expanded to multiple basic blocks?

If that's the issue, it would probably be clearer if this is modeled explicitly, by putting such recipes into their own region, more faithfully representing the fact that a loop will be generated for them.

The original comment was correct. The code as written is incorrect if a VPReplicateRecipe which corresponds to two or more BasicBlocks if that recipe were either header or exiting block of the loop.

However, as my comment said, this is impossible in the current code. A predicated replicate recipe can't be either of those positions (since it's contained by the VPRegionBlock), and a non-predicated one always corresponds to one basic block.

Your suggestion is to essentially add an invariant that a replicate region only contains more than one BB if it's contained in a VPRegionBlock. I am not opposed to that, but I'm also not motivated to make that change. It's significantly more invasive, and I just don't care that much.

Ayal added inline comments.Wed, Sep 7, 1:47 PM
llvm/lib/Transforms/Vectorize/VPlan.cpp
373

Hmm, VPBasicBlock::execute() is called once per VPBB when generating IR, so how could VPBB2IRBB record multiple/overwriting NewBB's here per same this? If there are such cases, perhaps its better to simplify them and retain a single IRBB per VPBB, adding an assert that no overwriting takes place. VPBB's contain only recipes and are free of control-flow - which is modeled using multiple VPBB's and VPRegions - so should fill a single IRBB.

A related issue are A-B-C cases above that try to reuse the same IRBB for pairs of back-to-back VPBB's (which may best be avoided for clarity and left to subsequent simplifyCFG to fold instead), but these are multiple VPBB's filling one IRBB, rather than the converse.

Another related issue is the correspondence between original IRBB's and VPBB's during VPlan construction rather than execution. There multiple replicate-and-predicate recipes that stem from the same IRBB are each assigned separate VPBB's (of a replicating region), as can be seen by trying to assign each such VPBB a unique name associated with its original IRBB (VPBBsForBB). Again yielding multiple VPBB's per one IRBB.

Ayal added inline comments.Sun, Sep 11, 7:31 AM
llvm/lib/Transforms/Vectorize/VPlan.cpp
373

Hmm, VPBasicBlock::execute() is called once per VPBB when generating IR, so how could VPBB2IRBB record multiple/overwriting NewBB's here per same this?

Ahh, a Replicating Region executes each of its VPBB's VF*UF times, effectively performing complete unrolling at VPlan execution time. It may indeed be clearer to either fully unroll in VPlan (when setting VF and UF?) or emit a loop instead of unrolling it.