This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Fix scalable vector crash in VPReplicateRecipe::execute
AbandonedPublic

Authored by david-arm on Jun 30 2021, 8:34 AM.

Details

Summary

When trying to vectorise certain loops using scalable vectors we try to
use the replicate recipe even for non-uniform cases. The recipe does
not handle scalable vectors correctly because it tries to scalarise
the vector instruction by generating a scalar instance for each lane
and packing them into a vector. We don't know the number of lanes at
runtime so this is not an option.

I've decided to create a new scalable replicate recipe instead called
VPScalableReplicate, which also calls a new overloaded version of
scalarizeInstruction that generates a whole vector part in one go,
instead of generating N scalar instances for N lanes. The new version
of scalarizeInstruction is based on the original version, and currently
only supports certain cases, such as GEP instructions, or instructions
with loop-invariant operands.

Tests have been added here:

Transforms/LoopVectorize/AArch64/sve-vpreplicate.ll

Diff Detail

Event Timeline

david-arm created this revision.Jun 30 2021, 8:34 AM
david-arm requested review of this revision.Jun 30 2021, 8:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2021, 8:34 AM

Hey David,
Thank you for adding me as a reviewer.
I try to add some comments that I believe are valid, hope they make sense.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3069

Do we need this test also for scalable recipe in scalarizeInstructions

if (IfPredicateInstr)

?

8964–8965

I believe you don't need the assert
See lib/Transforms/Vectorize/VPlan.h
struct VFRange;

8966

Why do we need R and Recipe here?

8967

Should we test if this is not predicated instruction here?
I don't see any test for that in the recipe.

llvm/test/Transforms/LoopVectorize/AArch64/sve-vpreplicate.ll
2

Is it possible to use that with for all architectures?
Or only for now for AArch64?

35

nit:
s/while.body189/loop.body/
s/while.end192.loopexit/exit/

136

Why do you need 4,5,6 and 7?

david-arm added inline comments.Jul 5 2021, 1:07 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3069

Good question! So in VPRecipeBuilder::handleReplication I have explicitly asserted that we should never create this recipe with predication, because I think the only cases we'd hit are divides (which we don't support) or loads/stores. For the latter we'll widen the instruction using masked load/store intrinsics and shouldn't use this recipe I think.

8964–8965

Ah I see now - good spot thank you! I hadn't realised we already asserted this in the VFRange constructor.

8966

The problem here is caused by complicated multiple inheritance, i.e.

class VPReplicateRecipe : public VPRecipeBase, public VPValue

and the fact the functions below take pointers to only one of the inherited base classes, i.e.

void setRecipe(Instruction *I, VPRecipeBase *R) {

void addVPValue(Value *V, VPValue *VPV)

In order to have a single common block here I would have to restructure the recipes to have the same common base, i.e.

class VPReplicateBaseRecipe : public VPRecipeBase, public VPValue

class VPScalableReplicateRecipe : public VPReplicateBaseRecipe {

class VPReplicateRecipe : public VPReplicateBaseRecipe {

This would mean I could then write:

VPReplicateBaseRecipe *Recipe;
if (IsScalable && !IsUniform)
  Recipe = new VPScalableReplicateRecipe(I, Plan->mapToVPValues(I->operands()));
else
  Recipe = new VPReplicateRecipe(I, Plan->mapToVPValues(I->operands()),
                                  IsUniform, IsPredicated);
setRecipe(I, R);
Plan->addVPValue(I, R);

However, I wasn't sure which was more acceptable here - rewrite the class structures to be more complex or simply have two blocks of code here?

Alternatively, if anyone knows some magic C++ that allows me to write a common block with rewriting the class structure that would be great too! I could well be missing some trick here.

8967

Ah you're right! I added an assert in the for loop below, but it's not quite the same thing. I'll add one here too.

llvm/test/Transforms/LoopVectorize/AArch64/sve-vpreplicate.ll
2

Possibly so? I thought I might have tried that when I first started the patch and hit issues, but I can try again.

136

This is needed for one of the loops above that contains metadata:

tail call void @llvm.experimental.noalias.scope.decl(metadata !4)

This line is explicitly testing one of the code paths in the new scalarizeInstruction() function.

fhahn added a comment.Jul 5 2021, 1:18 AM

IIUC from the description, this recipe effectively either widens the instruction or just computes the first lane if it is uniform. Does it handle any other case?

IIUC from the description, this recipe effectively either widens the instruction or just computes the first lane if it is uniform. Does it handle any other case?

Yes. If you look at the new scalarizeInstruction() it also broadcasts a value into a new instruction if the result is non-void and *all* operands are loop invariant. This is a it different to the original scalarizeInstruction function that simply creates a scalar instruction per lane, generating the same value each time. This tests the extractvalue case in one of the tests.

david-arm updated this revision to Diff 356484.Jul 5 2021, 5:11 AM
  • Addressed review comments. Removed unnecessary VFRange assert, added new one for IsPredicated and tidied up the test.
david-arm marked 7 inline comments as done.Jul 5 2021, 5:13 AM
david-arm added inline comments.
llvm/test/Transforms/LoopVectorize/AArch64/sve-vpreplicate.ll
2

After some discussion downstream we think it's better to keep such tests in the AArch64 directory because without a target it leads to problems related to not knowing about vector widths, scalable properties of the target, etc.

kmclaughlin added inline comments.Jul 7 2021, 9:39 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3077

I'm a bit confused by the message for this assert, as I thought vectors are not considered to be aggregate types?

3131

This switch statement currently only has one case, was this added so that we can support more instructions in the future? If so, would it be better to instead just try a dyn_cast<GetElementPtrInst> here and then add the switch again later when we need to cover other instruction types?

david-arm updated this revision to Diff 357176.Jul 8 2021, 2:40 AM
  • Removed isAggregateTy assert for new scalarizeInstruction as I don't think we need to worry about this. I also updated the assert message for the original scalarizeInstruction function.
  • Specialised the new scalarizeInstruction function for GEPs, since that's the only case we currently deal with. It's more readable and efficient this way.
david-arm marked 2 inline comments as done.Jul 8 2021, 2:42 AM
david-arm added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3077

That's a good point! I've removed the assert anyway as I don't think it's necessary, and I've also corrected the message for the assert in the original scalarizeInstruction function

fhahn added a comment.Jul 8 2021, 3:17 AM

IIUC from the description, this recipe effectively either widens the instruction or just computes the first lane if it is uniform. Does it handle any other case?

Yes. If you look at the new scalarizeInstruction() it also broadcasts a value into a new instruction if the result is non-void and *all* operands are loop invariant. This is a it different to the original scalarizeInstruction function that simply creates a scalar instruction per lane, generating the same value each time. This tests the extractvalue case in one of the tests.

Ok thanks. So effectively this handles 3 different cases (the instruction could be widened and only the first lane needs to be computed, with calls to llvm.experimental.noalias.scope.decl being a special case, where generating it once in the loop is also sufficient ), which should not really specific to scalable vectors, but for scalable vectors it crashes, right?

I'm not sure if we actually need a new recipe for those cases and we might be able to adjust the recipes in the plan to similar effect. Let sketch an alternative and share it later today or tomorrow.

fhahn added a comment.Jul 8 2021, 3:19 AM

... Let sketch an alternative and share it later today or tomorrow.

This should be Let me ... sketch

david-arm marked an inline comment as done.Jul 9 2021, 1:36 AM

... Let sketch an alternative and share it later today or tomorrow.

This should be Let me ... sketch

Hi @fhahn, if you want I'm also happy to sketch out an alternative if you can give me some pointers? I wrote the code this way as it wasn't obvious what other way I'd do this - at least for the GEP it's due to the decision to treat it as being "scalarised after vectorisation" very early on in the process.

fhahn added a comment.Jul 12 2021, 1:13 AM

It should be is possible to fix/adjust the recipes in the plan after initial generation, instead of anticipating those cases up front.

This should be easier in this case and I think the existing recipes should cover the functionality required. For example, the widening part for GEPs can be done in D105784. Note that this may also be beneficial for fixed-width vectors, pending a cost check. We should be able to do this not only for GEPs, but any widen able instruction. For the other cases, we only need to compute lane 0 or nothing in the intrinsic case. We can also adjust the recipes in a similar way.

Hi @david-arm, the case for the first test (phi_multiple_use) seems like an existing bug in collectLoopScalars where the LV incorrectly assumes that a value is scalar after vectorization even when it's also needed as a widened value.
The code already seemed to guard against similar cases, but some values still slipped through the cracks, as your test points out!

I've created a patch to address that in D106164.

Matt added a subscriber: Matt.Jul 16 2021, 11:38 AM
david-arm abandoned this revision.Jul 27 2021, 7:14 AM

This is being implemented differently in a series of other patches currently being worked on.