Page MenuHomePhabricator

[LoopVectorize] Fix bug where predicated loads/stores were dropped
ClosedPublic

Authored by joechrisellis on Mar 30 2021, 2:08 AM.

Details

Summary

This commit fixes a bug where the loop vectoriser fails to predicate
loads/stores when interleaving for targets that support masked
loads and stores.

Code such as:

1  void foo(int *restrict data1, int *restrict data2)
2  {
3    int counter = 1024;
4    while (counter--)
5      if (data1[counter] > data2[counter])
6        data1[counter] = data2[counter];
7  }

... could previously be transformed in such a way that the predicated
store implied by:

if (data1[counter] > data2[counter])
   data1[counter] = data2[counter];

... was lost, resulting in miscompiles.

This bug was causing some tests in llvm-test-suite to fail when built
for SVE.

Diff Detail

Unit TestsFailed

TimeTest
100 msx64 windows > LLVM.tools/dsymutil/X86::label2.test
Script: -- : 'RUN: at line 14'; c:\ws\w16c2-2\llvm-project\premerge-checks\build\bin\dsymutil.exe -oso-prepend-path C:\ws\w16c2-2\llvm-project\premerge-checks\llvm\test\tools\dsymutil\X86/../Inputs C:\ws\w16c2-2\llvm-project\premerge-checks\llvm\test\tools\dsymutil\X86/../Inputs/private/tmp/label/label.out -o C:\ws\w16c2-2\llvm-project\premerge-checks\build\test\tools\dsymutil\X86\Output\label2.test.tmp.dSYM

Event Timeline

joechrisellis created this revision.Mar 30 2021, 2:08 AM
joechrisellis requested review of this revision.Mar 30 2021, 2:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2021, 2:08 AM
joechrisellis planned changes to this revision.Mar 30 2021, 3:07 AM

Hmm. I anticipated there might be some test failures for this one.

Minor change to x86-predication.ll test.

The codegen from before the patch had a simpler CFG because we weren’t
predicating on loads. In this particular case, it turns out that it’s actually
fine to not explicitly predicate on the loads, but only because they exist in a
basic block next to a udiv which does require predication, so we get it ‘for
free’ in a sense. This is not guaranteed to always be the case, though, so we
have to be careful.

The result of this patch is that the IR CFG for scalarize_and_sink_gather in
x86-predication.ll is larger, and so the test needs to be updated to reflect
that. Ultimately, though, the machine code lowering is the same, so no loss in
performance.

joechrisellis retitled this revision from [AArch64][SVE] Fix vectoriser bug where predicated stores were dropped to [LoopVectorize] Fix bug where predicated loads/stores were dropped.Mar 31 2021, 5:29 AM
joechrisellis edited the summary of this revision. (Show Details)
peterwaller-arm added inline comments.
llvm/test/Transforms/LoopVectorize/AArch64/scalarize-store-with-predication.ll
4

I suspect using -debug needs a REQUIRES: asserts line, since the tests will be run presumably with a release build where -debug is unavailable.

The output of:

rg -C3 -- 'RUN.* -debug-only'

... in the llvm/test directory appears to show that they generally have such a REQUIRES line.

I find this a little surprising since presumably this doesn't work on an asserts+release build; but my reading is that enabling assertions may enable debug output:

https://github.com/llvm/llvm-project/blob/8396aeb07cddd8ab9a6a154a4ab7ac56fc24bda5/llvm/cmake/modules/HandleLLVMOptions.cmake#L61-L76

joechrisellis marked an inline comment as done.

Improve test and address review comments.

david-arm added inline comments.
llvm/test/Transforms/LoopVectorize/AArch64/scalarize-store-with-predication.ll
4

@peterwaller-arm yep you're right this needs:

REQUIRES: asserts

and it will enable debug.

llvm/test/Transforms/LoopVectorize/X86/x86-predication.ll
69

Hi @joechrisellis you've lost the variable T0 here that was used to ensure the load fed into the udiv. It would be good if you could re-add this.

joechrisellis marked an inline comment as done.

Address comments.

Gentle ping. 🙂

llvm/test/Transforms/LoopVectorize/X86/x86-predication.ll
69

Sure, will re-add this. For context, I removed this because the control flow is now slightly different now that the loads are predicated, so I thought the cleanest thing to do would be to remove the use of this variable. I've added it back now.

fhahn added inline comments.Apr 12 2021, 4:39 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1515

I'm not sure about the default. Why does it make sense to default to VF = 1? Can all users of the function instead pass the right VF?

llvm/test/Transforms/LoopVectorize/AArch64/scalarize-store-with-predication.ll
2

as you are only checking specifically for small IR snippets of the predicated blocks, do we actually need -dce -instcombine? Running those potentially make the test more fragile, because it may be impacted by changes to -dce -instcombine.

6

can we instead just do 2>&1 | FileCheck %s in one go? Splitting them up makes it a bit more inconvenient to reproduce a failure, because 2 lines need to copied/executed?

22

this seems already out of sync with the IR below.... Shouldn't the restrict be translated to noalias in the function definition below? FWIW I think it would be more helpful if you'd add a comment to the test saying this checks that the store in the if.then block gets properly predicated if VF = 1 or something like that, as the IR is already quite compact :)

46

Is this check crucial? Seems like the IR checks already check this and we could remove it and execute it even with asserts.

57

do we need 2 loads here or could one be removed and compare to a constant?

73

If it's not SVE specific, can we have 2 run lines, one with -mattr=+sve and one without?

joechrisellis marked 5 inline comments as done.

Address review comments.

Hi @fhahn -- thanks for the review!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1515

The reason I added this optional parameter is because isPredicatedInst is used in this context:

bool IsPredicated = LoopVectorizationPlanner::getDecisionAndClampRange(
    [&](ElementCount VF) { return CM.isPredicatedInst(I, VF); }, Range);

I wanted to make sure that the VF (which is used to make decisions on how to clamp the range) is made available to isScalarWithPredication in the fallback case for this particular call, but keep the same signature for calls elsewhere. Given what I think the function is supposed to do, I can see why it might not make sense to pass a VF parameter for the general case, but here I _think_ we need it. Other references to isPredicatedInst do not supply a VF parameter, so in the fallback cases, they will default to VF = 1 when we reach the call for isScalarWithPredication. This is just pushing the optional parameter up the stack so that it can be used for isPredicatedInst too.

If there's a better thing to do here I'm happy to make changes. 😄

llvm/test/Transforms/LoopVectorize/AArch64/scalarize-store-with-predication.ll
22

Good spot. I'll remove the restrict in the C code example since it isn't strictly necessary here anyways.

46

Fair enough, done!

57

I would like to keep this IR as-is for consistency with the C code at the top of the test. 🙂

david-arm accepted this revision.Apr 19 2021, 5:59 AM

LGTM! I think there is potentially a regression in the X86 test 'scalarize_and_sink_gather' in terms of number of generated lines of assembly, but the fix looks right for now. We could potentially improve the code quality in a future patch that looks for instructions with matching predicates and folds them into the same predicated block.

This revision is now accepted and ready to land.Apr 19 2021, 5:59 AM
fhahn added inline comments.Apr 19 2021, 6:11 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1515

his is just pushing the optional parameter up the stack so that it can be used for isPredicatedInst too.

That's fair enough, but my point was that there are only 2 uses of the function I think, so the default parameter is not really that useful and I think it would be better to just explicitly pass the VF in at both call sites. Also, the default of VF(1) is surprising (but I realize it is used elsewhere) and it's probably better to force the caller to think about what VF to pass in.

llvm/test/Transforms/LoopVectorize/AArch64/scalarize-store-with-predication.ll
57

I would like to keep this IR as-is for consistency with the C code at the top of the test. 🙂

Personally I think that the C example code should not stand in the way of having a simpler IR test. If you really want to keep the C code, can't you simplify the C as well? (With the even simpler IR I think the C code adds more noise than value it adds).

Address review comments.

  • @fhahn:
    • make VF parameter to isPredicatedInst required.
    • simplify IR test.
This revision was landed with ongoing or failed builds.Apr 22 2021, 8:06 AM
This revision was automatically updated to reflect the committed changes.