Page MenuHomePhabricator

[LoopVectorize][SVE] Fix crash when vectorising FP negation
AcceptedPublic

Authored by david-arm on Mar 5 2021, 9:21 AM.

Details

Summary

This patch fixes a crash encountered when vectorising the following loop:

void foo(float *dst, float *src, long long n) {
  for (long long i = 0; i < n; i++)
    dst[i] = -src[i];
}

using scalable vectors. I've added a test to

Transforms/LoopVectorize/AArch64/sve-basic-vec.ll

as well as cleaned up the other tests in the same file.

Diff Detail

Unit TestsFailed

TimeTest
2,440 msx64 debian > libarcher.races::lock-unrelated.c
Script: -- : 'RUN: at line 13'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -Wl,-rpath,/mnt/disks/ssd0/agent/llvm-project/build/lib /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/lock-unrelated.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/deflake.bash /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp 2>&1 | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp.log | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/lock-unrelated.c

Event Timeline

david-arm created this revision.Mar 5 2021, 9:21 AM
david-arm requested review of this revision.Mar 5 2021, 9:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 5 2021, 9:21 AM
david-arm edited the summary of this revision. (Show Details)Mar 5 2021, 9:22 AM
sdesmalen added inline comments.Mar 8 2021, 4:33 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7381–7382

From looking at isScalarAfterVectorization and collectLoopScalars, if the node is scalar after vectorization, it could be that:

  1. VF is scalar (i.e. VF.getFixedValue() == 1).
  2. VF is a vector, and the operation remains scalar. e.g. induction variable updates in the loop, which all remain scalar. There is special code in collectLoopScalars that does that analysis.
  3. VF is a vector, and the operation is scalarized (i.e. getWideningDecision(.., VF) == CM_Scalarize).

For the case where the result is scalar after vectorization, I would have expected VectorTy->getVectorElementType() to be passed to TTI.getArithmeticInstrCost, not VectorTy, so I wonder if that's a bug.

Also, for case 2, we shouldn't be multiplying the cost by N. This code should instead check explicitly if the node is scalarized instead of relying on the more broader-defined isScalarAfterVectorization.

When scalarization does happen, the cost must be multiplied with VF.getFixedValue() instead (which has the implicit assert that VF is not scalable), so that you can avoid adding an unnecessary branch.

david-arm updated this revision to Diff 330250.Fri, Mar 12, 8:22 AM
  • Attempted to address @sdesmalen 's comments about the cost multiplier N. I created a new patch to remove the multiplier (D98512), since after some investigation it seems that it should always be 1.

Hi @sdesmalen, thanks for pointing this out! It looks like for the cases you listed above we actually have:

  1. VF is scalar, in which case N=1. The code in getInstructionCost is misleading since VectorTy is actually the element type in this case!
  2. VF is a vector. If this is an instruction to be scalarised then it will live in InstsInScalarize and the higher level getInstructionCost will deal with this separately. It seems that isScalarAfterVectorization returns true if the instruction is a member of the Scalars variable. From what I can tell the members of Scalars fall into two main categories: GEPs/bitcasts with scalar uses, induction variable updates with scalar uses. In these cases the instruction will remain as a single scalar instruction.
sdesmalen accepted this revision.Fri, Mar 12, 8:33 AM

This patch LGTM now, thanks for the investigation!

This revision is now accepted and ready to land.Fri, Mar 12, 8:33 AM