This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1/3
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/AArch64/
-
Transforms/
-
LoopVectorize/
-
AArch64/
9/12
scalarize-store-with-predication.ll

Differential D99569

[LoopVectorize] Fix bug where predicated loads/stores were dropped
ClosedPublic

Authored by joechrisellis on Mar 30 2021, 2:08 AM.

Download Raw Diff

Details

Reviewers

efriedma
peterwaller-arm
bsmith
DavidTruby
echristo
rengolin
gilr
fhahn
sdesmalen
craig.topper
david-arm

Commits

rG2c551aedcf8b: [LoopVectorize] Fix bug where predicated loads/stores were dropped

Summary

This commit fixes a bug where the loop vectoriser fails to predicate
loads/stores when interleaving for targets that support masked
loads and stores.

Code such as:

1  void foo(int *restrict data1, int *restrict data2)
2  {
3    int counter = 1024;
4    while (counter--)
5      if (data1[counter] > data2[counter])
6        data1[counter] = data2[counter];
7  }

... could previously be transformed in such a way that the predicated
store implied by:

if (data1[counter] > data2[counter])
   data1[counter] = data2[counter];

... was lost, resulting in miscompiles.

This bug was causing some tests in llvm-test-suite to fail when built
for SVE.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	70 ms	x64 debian > LLVM.Transforms/LoopVectorize/X86::x86-predication.ll
	140 ms	x64 windows > LLVM.Transforms/LoopVectorize/X86::x86-predication.ll
	120 ms	x64 windows > LLVM.tools/dsymutil/X86::label2.test

Event Timeline

joechrisellis created this revision.Mar 30 2021, 2:08 AM

Herald added a reviewer: efriedma. · View Herald TranscriptMar 30 2021, 2:08 AM

Herald added subscribers: psnobl, hiraditya, kristof.beyls, tschuett. · View Herald Transcript

joechrisellis requested review of this revision.Mar 30 2021, 2:08 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2021, 2:08 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B96267: Diff 334081.Mar 30 2021, 3:00 AM

Hmm. I anticipated there might be some test failures for this one.

Minor change to x86-predication.ll test.

The codegen from before the patch had a simpler CFG because we weren’t
predicating on loads. In this particular case, it turns out that it’s actually
fine to not explicitly predicate on the loads, but only because they exist in a
basic block next to a udiv which does require predication, so we get it ‘for
free’ in a sense. This is not guaranteed to always be the case, though, so we
have to be careful.

The result of this patch is that the IR CFG for scalarize_and_sink_gather in
x86-predication.ll is larger, and so the test needs to be updated to reflect
that. Ultimately, though, the machine code lowering is the same, so no loss in
performance.

joechrisellis retitled this revision from [AArch64][SVE] Fix vectoriser bug where predicated stores were dropped to [LoopVectorize] Fix bug where predicated loads/stores were dropped.Mar 31 2021, 5:29 AM

joechrisellis edited the summary of this revision. (Show Details)

peterwaller-arm added a subscriber: peterwaller-arm.Mar 31 2021, 5:53 AM

peterwaller-arm added inline comments.

llvm/test/Transforms/LoopVectorize/AArch64/scalarize-store-with-predication.ll
4	I suspect using `-debug` needs a `REQUIRES: asserts` line, since the tests will be run presumably with a release build where -debug is unavailable. The output of: rg -C3 -- 'RUN.* -debug-only' ... in the llvm/test directory appears to show that they generally have such a REQUIRES line. I find this a little surprising since presumably this doesn't work on an asserts+release build; but my reading is that enabling assertions may enable debug output: https://github.com/llvm/llvm-project/blob/8396aeb07cddd8ab9a6a154a4ab7ac56fc24bda5/llvm/cmake/modules/HandleLLVMOptions.cmake#L61-L76

Harbormaster completed remote builds in B96515: Diff 334418.Mar 31 2021, 5:59 AM

joechrisellis added reviewers: peterwaller-arm, bsmith, DavidTruby, echristo, rengolin.Mar 31 2021, 6:31 AM

Improve test and address review comments.

@peterwaller-arm: add REQUIRES: asserts line to new test.

Harbormaster completed remote builds in B96522: Diff 334428.Mar 31 2021, 7:09 AM

peterwaller-arm added a reviewer: gilr.Mar 31 2021, 7:53 AM

joechrisellis added a reviewer: fhahn.Apr 1 2021, 6:12 AM

david-arm added a subscriber: david-arm.Apr 1 2021, 6:52 AM

david-arm added inline comments.

llvm/test/Transforms/LoopVectorize/AArch64/scalarize-store-with-predication.ll
4	@peterwaller-arm yep you're right this needs: REQUIRES: asserts and it will enable debug.
llvm/test/Transforms/LoopVectorize/X86/x86-predication.ll
69 ↗	(On Diff #334428)	Hi @joechrisellis you've lost the variable `T0` here that was used to ensure the load fed into the udiv. It would be good if you could re-add this.

Address comments.

@david-arm: re-add T0 variable into test.

Harbormaster completed remote builds in B97282: Diff 335483.Apr 6 2021, 5:29 AM

Gentle ping. 🙂

llvm/test/Transforms/LoopVectorize/X86/x86-predication.ll
69 ↗	(On Diff #334428)	Sure, will re-add this. For context, I removed this because the control flow is now slightly different now that the loads are predicated, so I thought the cleanest thing to do would be to remove the use of this variable. I've added it back now.

joechrisellis added a reviewer: sdesmalen.Apr 12 2021, 4:10 AM

fhahn added inline comments.Apr 12 2021, 4:39 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1515	I'm not sure about the default. Why does it make sense to default to VF = 1? Can all users of the function instead pass the right VF?
llvm/test/Transforms/LoopVectorize/AArch64/scalarize-store-with-predication.ll
2	as you are only checking specifically for small IR snippets of the predicated blocks, do we actually need `-dce -instcombine`? Running those potentially make the test more fragile, because it may be impacted by changes to `-dce -instcombine`.
6	can we instead just do `2>&1 \| FileCheck %s` in one go? Splitting them up makes it a bit more inconvenient to reproduce a failure, because 2 lines need to copied/executed?
22	this seems already out of sync with the IR below.... Shouldn't the `restrict` be translated to `noalias` in the function definition below? FWIW I think it would be more helpful if you'd add a comment to the test saying this checks that the store in the `if.then` block gets properly predicated if VF = 1 or something like that, as the IR is already quite compact :)
46	Is this check crucial? Seems like the IR checks already check this and we could remove it and execute it even with asserts.
57	do we need 2 loads here or could one be removed and compare to a constant?
73	If it's not SVE specific, can we have 2 run lines, one with `-mattr=+sve` and one without?

Address review comments.

@fhahn: test changes.

Hi @fhahn -- thanks for the review!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1515	The reason I added this optional parameter is because `isPredicatedInst` is used in this context: bool IsPredicated = LoopVectorizationPlanner::getDecisionAndClampRange( [&](ElementCount VF) { return CM.isPredicatedInst(I, VF); }, Range); I wanted to make sure that the VF (which is used to make decisions on how to clamp the range) is made available to `isScalarWithPredication` in the fallback case for this particular call, but keep the same signature for calls elsewhere. Given what I think the function is supposed to do, I can see why it might not make sense to pass a VF parameter for the general case, but here I _think_ we need it. Other references to `isPredicatedInst` do not supply a VF parameter, so in the fallback cases, they will default to VF = 1 when we reach the call for `isScalarWithPredication`. This is just pushing the optional parameter up the stack so that it can be used for `isPredicatedInst` too. If there's a better thing to do here I'm happy to make changes. 😄
llvm/test/Transforms/LoopVectorize/AArch64/scalarize-store-with-predication.ll
22	Good spot. I'll remove the `restrict` in the C code example since it isn't strictly necessary here anyways.
46	Fair enough, done!
57	I would like to keep this IR as-is for consistency with the C code at the top of the test. 🙂

Harbormaster completed remote builds in B98649: Diff 337381.Apr 14 2021, 3:33 AM

joechrisellis added a reviewer: craig.topper.Apr 16 2021, 6:46 AM

LGTM! I think there is potentially a regression in the X86 test 'scalarize_and_sink_gather' in terms of number of generated lines of assembly, but the fix looks right for now. We could potentially improve the code quality in a future patch that looks for instructions with matching predicates and folds them into the same predicated block.

This revision is now accepted and ready to land.Apr 19 2021, 5:59 AM

fhahn added inline comments.Apr 19 2021, 6:11 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1515	his is just pushing the optional parameter up the stack so that it can be used for isPredicatedInst too. That's fair enough, but my point was that there are only 2 uses of the function I think, so the default parameter is not really that useful and I think it would be better to just explicitly pass the VF in at both call sites. Also, the default of VF(1) is surprising (but I realize it is used elsewhere) and it's probably better to force the caller to think about what VF to pass in.
llvm/test/Transforms/LoopVectorize/AArch64/scalarize-store-with-predication.ll
57	I would like to keep this IR as-is for consistency with the C code at the top of the test. 🙂 Personally I think that the C example code should not stand in the way of having a simpler IR test. If you really want to keep the C code, can't you simplify the C as well? (With the even simpler IR I think the C code adds more noise than value it adds).

Address review comments.

@fhahn:
- make VF parameter to isPredicatedInst required.
- simplify IR test.

Harbormaster completed remote builds in B99657: Diff 338779.Apr 20 2021, 4:17 AM

This revision was landed with ongoing or failed builds.Apr 22 2021, 8:06 AM

Closed by commit rG2c551aedcf8b: [LoopVectorize] Fix bug where predicated loads/stores were dropped (authored by joechrisellis). · Explain Why

This revision was automatically updated to reflect the committed changes.

joechrisellis added a commit: rG2c551aedcf8b: [LoopVectorize] Fix bug where predicated loads/stores were dropped.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

10 lines

test/

Transforms/

LoopVectorize/

AArch64/

scalarize-store-with-predication.ll

76 lines

Diff 334081

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,503 Lines • ▼ Show 20 Lines	public:
/// predication for that VF.		/// predication for that VF.
bool		bool
isScalarWithPredication(Instruction *I,		isScalarWithPredication(Instruction *I,
ElementCount VF = ElementCount::getFixed(1)) const;		ElementCount VF = ElementCount::getFixed(1)) const;

// Returns true if \p I is an instruction that will be predicated either		// Returns true if \p I is an instruction that will be predicated either
// through scalar predication or masked load/store or masked gather/scatter.		// through scalar predication or masked load/store or masked gather/scatter.
// Superset of instructions that return true for isScalarWithPredication.		// Superset of instructions that return true for isScalarWithPredication.
bool isPredicatedInst(Instruction *I) {		// Optional parameter \p VF is unused by this function, and passed directly to
		// isScalarWithPredication in the fall-back case.
		bool isPredicatedInst(Instruction *I,
		ElementCount VF = ElementCount::getFixed(1)) {
		fhahnUnsubmitted Not Done Reply Inline Actions I'm not sure about the default. Why does it make sense to default to VF = 1? Can all users of the function instead pass the right VF? fhahn: I'm not sure about the default. Why does it make sense to default to VF = 1? Can all users of…
		joechrisellisAuthorUnsubmitted Done Reply Inline Actions The reason I added this optional parameter is because `isPredicatedInst` is used in this context: bool IsPredicated = LoopVectorizationPlanner::getDecisionAndClampRange( [&](ElementCount VF) { return CM.isPredicatedInst(I, VF); }, Range); I wanted to make sure that the VF (which is used to make decisions on how to clamp the range) is made available to `isScalarWithPredication` in the fallback case for this particular call, but keep the same signature for calls elsewhere. Given what I think the function is supposed to do, I can see why it might not make sense to pass a VF parameter for the general case, but here I _think_ we need it. Other references to `isPredicatedInst` do not supply a VF parameter, so in the fallback cases, they will default to VF = 1 when we reach the call for `isScalarWithPredication`. This is just pushing the optional parameter up the stack so that it can be used for `isPredicatedInst` too. If there's a better thing to do here I'm happy to make changes. 😄 joechrisellis: The reason I added this optional parameter is because `isPredicatedInst` is used in this…
		fhahnUnsubmitted Not Done Reply Inline Actions his is just pushing the optional parameter up the stack so that it can be used for isPredicatedInst too. That's fair enough, but my point was that there are only 2 uses of the function I think, so the default parameter is not really that useful and I think it would be better to just explicitly pass the VF in at both call sites. Also, the default of VF(1) is surprising (but I realize it is used elsewhere) and it's probably better to force the caller to think about what VF to pass in. fhahn: > his is just pushing the optional parameter up the stack so that it can be used for…
if (!blockNeedsPredication(I->getParent()))		if (!blockNeedsPredication(I->getParent()))
return false;		return false;
// Loads and stores that need some form of masked operation are predicated		// Loads and stores that need some form of masked operation are predicated
// instructions.		// instructions.
if (isa<LoadInst>(I) \|\| isa<StoreInst>(I))		if (isa<LoadInst>(I) \|\| isa<StoreInst>(I))
return Legal->isMaskRequired(I);		return Legal->isMaskRequired(I);
return isScalarWithPredication(I);		return isScalarWithPredication(I, VF);
}		}

/// Returns true if \p I is a memory instruction with consecutive memory		/// Returns true if \p I is a memory instruction with consecutive memory
/// access that can be widened.		/// access that can be widened.
bool		bool
memoryInstructionCanBeWidened(Instruction *I,		memoryInstructionCanBeWidened(Instruction *I,
ElementCount VF = ElementCount::getFixed(1));		ElementCount VF = ElementCount::getFixed(1));

▲ Show 20 Lines • Show All 7,061 Lines • ▼ Show 20 Lines
VPBasicBlock *VPRecipeBuilder::handleReplication(		VPBasicBlock *VPRecipeBuilder::handleReplication(
Instruction I, VFRange &Range, VPBasicBlock VPBB,		Instruction I, VFRange &Range, VPBasicBlock VPBB,
VPlanPtr &Plan) {		VPlanPtr &Plan) {
bool IsUniform = LoopVectorizationPlanner::getDecisionAndClampRange(		bool IsUniform = LoopVectorizationPlanner::getDecisionAndClampRange(
[&](ElementCount VF) { return CM.isUniformAfterVectorization(I, VF); },		[&](ElementCount VF) { return CM.isUniformAfterVectorization(I, VF); },
Range);		Range);

bool IsPredicated = LoopVectorizationPlanner::getDecisionAndClampRange(		bool IsPredicated = LoopVectorizationPlanner::getDecisionAndClampRange(
[&](ElementCount VF) { return CM.isScalarWithPredication(I, VF); },		[&](ElementCount VF) { return CM.isPredicatedInst(I, VF); }, Range);
Range);

auto *Recipe = new VPReplicateRecipe(I, Plan->mapToVPValues(I->operands()),		auto *Recipe = new VPReplicateRecipe(I, Plan->mapToVPValues(I->operands()),
IsUniform, IsPredicated);		IsUniform, IsPredicated);
setRecipe(I, Recipe);		setRecipe(I, Recipe);
Plan->addVPValue(I, Recipe);		Plan->addVPValue(I, Recipe);

// Find if I uses a predicated instruction. If so, it will use its scalar		// Find if I uses a predicated instruction. If so, it will use its scalar
// value. Avoid hoisting the insert-element which packs the scalar value into		// value. Avoid hoisting the insert-element which packs the scalar value into
▲ Show 20 Lines • Show All 1,383 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/scalarize-store-with-predication.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -loop-vectorize -dce -instcombine \
				fhahnUnsubmitted Done Reply Inline Actions as you are only checking specifically for small IR snippets of the predicated blocks, do we actually need `-dce -instcombine`? Running those potentially make the test more fragile, because it may be impacted by changes to `-dce -instcombine`. fhahn: as you are only checking specifically for small IR snippets of the predicated blocks, do we…
				; RUN: -debug-only=loop-vectorize \
				; RUN: -S -o - 2>%t < %s \| FileCheck %s
				peterwaller-armUnsubmitted Done Reply Inline Actions I suspect using `-debug` needs a `REQUIRES: asserts` line, since the tests will be run presumably with a release build where -debug is unavailable. The output of: rg -C3 -- 'RUN.* -debug-only' ... in the llvm/test directory appears to show that they generally have such a REQUIRES line. I find this a little surprising since presumably this doesn't work on an asserts+release build; but my reading is that enabling assertions may enable debug output: https://github.com/llvm/llvm-project/blob/8396aeb07cddd8ab9a6a154a4ab7ac56fc24bda5/llvm/cmake/modules/HandleLLVMOptions.cmake#L61-L76 peterwaller-arm: I suspect using `-debug` needs a `REQUIRES: asserts` line, since the tests will be run…
				david-armUnsubmitted Not Done Reply Inline Actions @peterwaller-arm yep you're right this needs: REQUIRES: asserts and it will enable debug. david-arm: @peterwaller-arm yep you're right this needs: REQUIRES: asserts and it will enable debug.
				; RUN: FileCheck --check-prefix=DBG %s < %t

				fhahnUnsubmitted Done Reply Inline Actions can we instead just do `2>&1 \| FileCheck %s` in one go? Splitting them up makes it a bit more inconvenient to reproduce a failure, because 2 lines need to copied/executed? fhahn: can we instead just do `2>&1 \| FileCheck %s` in one go? Splitting them up makes it a bit more…
				target triple = "aarch64-unknown-linux-gnu"

				;
				; IR generated from (approximately):
				;
				; 1 void foo(int restrict data1, int restrict data2)
				; 2 {
				; 3 int counter = 1024;
				; 4 while (counter--)
				; 5 if (data1[counter] > data2[counter])
				; 6 data1[counter] = data2[counter];
				; 7 }
				;

				define void @foo(i32* %data1, i32* %data2) #0 {
				; CHECK-LABEL: @foo(
				fhahnUnsubmitted Done Reply Inline Actions this seems already out of sync with the IR below.... Shouldn't the `restrict` be translated to `noalias` in the function definition below? FWIW I think it would be more helpful if you'd add a comment to the test saying this checks that the store in the `if.then` block gets properly predicated if VF = 1 or something like that, as the IR is already quite compact :) fhahn: this seems already out of sync with the IR below.... Shouldn't the `restrict` be translated to…
				joechrisellisAuthorUnsubmitted Done Reply Inline Actions Good spot. I'll remove the `restrict` in the C code example since it isn't strictly necessary here anyways. joechrisellis: Good spot. I'll remove the `restrict` in the C code example since it isn't strictly necessary…
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[WHILE_BODY:%.*]]
				; CHECK: while.body:
				; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 1023, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.]], [[IF_END:%.]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[DATA1:%.*]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[DATA2:%.*]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[ARRAYIDX2]], align 4
				; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i32 [[TMP0]], [[TMP1]]
				; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN:%.*]], label [[IF_END]]
				; CHECK: if.then:
				; CHECK-NEXT: store i32 [[TMP1]], i32* [[ARRAYIDX]], align 4
				; CHECK-NEXT: br label [[IF_END]]
				; CHECK: if.end:
				; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], -1
				; CHECK-NEXT: [[TOBOOL_NOT:%.*]] = icmp eq i64 [[INDVARS_IV]], 0
				; CHECK-NEXT: br i1 [[TOBOOL_NOT]], label [[WHILE_END:%.*]], label [[WHILE_BODY]]
				; CHECK: while.end:
				; CHECK-NEXT: ret void
				;
				; DBG: LV: Scalarizing: %arrayidx = getelementptr inbounds i32, i32* %data1, i64 %indvars.iv
				; DBG-NEXT: LV: Scalarizing: %0 = load i32, i32* %arrayidx, align 4
				; DBG-NEXT: LV: Scalarizing: %arrayidx2 = getelementptr inbounds i32, i32* %data2, i64 %indvars.iv
				; DBG-NEXT: LV: Scalarizing: %1 = load i32, i32* %arrayidx2, align 4
				fhahnUnsubmitted Done Reply Inline Actions Is this check crucial? Seems like the IR checks already check this and we could remove it and execute it even with asserts. fhahn: Is this check crucial? Seems like the IR checks already check this and we could remove it and…
				joechrisellisAuthorUnsubmitted Done Reply Inline Actions Fair enough, done! joechrisellis: Fair enough, done!
				; DBG-NEXT: LV: Scalarizing: %cmp = icmp sgt i32 %0, %1
				; DBG-NEXT: LV: Scalarizing and predicating: store i32 %1, i32* %arrayidx, align 4
				; DBG-NEXT: LV: Scalarizing: %arrayidx = getelementptr inbounds i32, i32* %data1, i64 %indvars.iv
				; DBG-NEXT: LV: Scalarizing: %arrayidx2 = getelementptr inbounds i32, i32* %data2, i64 %indvars.iv
				entry:
				br label %while.body

				while.body:
				%indvars.iv = phi i64 [ 1023, %entry ], [ %indvars.iv.next, %if.end ]
				%arrayidx = getelementptr inbounds i32, i32* %data1, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				fhahnUnsubmitted Not Done Reply Inline Actions do we need 2 loads here or could one be removed and compare to a constant? fhahn: do we need 2 loads here or could one be removed and compare to a constant?
				joechrisellisAuthorUnsubmitted Done Reply Inline Actions I would like to keep this IR as-is for consistency with the C code at the top of the test. 🙂 joechrisellis: I would like to keep this IR as-is for consistency with the C code at the top of the test. 🙂
				fhahnUnsubmitted Not Done Reply Inline Actions I would like to keep this IR as-is for consistency with the C code at the top of the test. 🙂 Personally I think that the C example code should not stand in the way of having a simpler IR test. If you really want to keep the C code, can't you simplify the C as well? (With the even simpler IR I think the C code adds more noise than value it adds). fhahn: > I would like to keep this IR as-is for consistency with the C code at the top of the test. 🙂…
				%arrayidx2 = getelementptr inbounds i32, i32* %data2, i64 %indvars.iv
				%1 = load i32, i32* %arrayidx2, align 4
				%cmp = icmp sgt i32 %0, %1
				br i1 %cmp, label %if.then, label %if.end

				if.then:
				store i32 %1, i32* %arrayidx, align 4
				br label %if.end

				if.end:
				%indvars.iv.next = add nsw i64 %indvars.iv, -1
				%tobool.not = icmp eq i64 %indvars.iv, 0
				br i1 %tobool.not, label %while.end, label %while.body

				while.end:
				ret void
				fhahnUnsubmitted Done Reply Inline Actions If it's not SVE specific, can we have 2 run lines, one with `-mattr=+sve` and one without? fhahn: If it's not SVE specific, can we have 2 run lines, one with `-mattr=+sve` and one without?
				}

				attributes #0 = { "target-features"="+sve" }