# [LV] Use the known trip count when costing non-tail folded VFsClosedPublic

Authored by dmgreen on Apr 6 2023, 9:29 AM.

# Details

Reviewers
 fhahn Ayal SjoerdMeijer sdesmalen david-arm bmahjour
Commits
rG1869a9c225c7: [LV] Use the known trip count when costing non-tail folded VFs
Summary

Now that we store the ScalarCost in the VectorizationFactor it is possible to use it to get a slightly more accurate cost in isMoreProfitable between two vector factors. This extends the logic added in D101726 to non-tail-folded cases, using the costs of VecCost * (TripCount / VF) + ScalarCost * (TripCount % VF) to compare VFs where the TripCount is known.

This shouldn't alter very much as small trip counts are usually not vectorized, but does seem to help in the testcase where 4 * VF4 is chosen as profitable compared to 2 * VF8 + 4* scalar.

# Diff Detail

### Event Timeline

dmgreen created this revision.Apr 6 2023, 9:29 AM
Herald added a project: Restricted Project. Apr 6 2023, 9:29 AM
dmgreen requested review of this revision.Apr 6 2023, 9:29 AM
Herald added a project: Restricted Project. Apr 6 2023, 9:29 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5368

Hi @dmgreen, perhaps I've missed something here but it looks like there is a divide-by-zero in the code?

If A.Width.getFixedValue() is non-zero then we calculate (CostA * divideCeil(MaxTripCount, A.Width.getFixedValue())), and if it is zero then we do (CostA * (MaxTripCount / A.Width.getFixedValue()) + ...

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5368

Oh yeah, that was wrong. It was meant to be based on foldTailByMasking. That must have been confusing. I've reran the testing and it still looks OK, this patch doesnt usually alter a lot.

dmgreen updated this revision to Diff 516316.Apr 24 2023, 2:30 AM

Update this to what it should have been. One of the tests has changed because 12345 as a i8 is really 57, which was too many scalar iterations to pick v16 over v8. I've changed it to 241 so that it keeps testing the same thing.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5372–5376

nit: Is it worth using a lambda for this, e.g.

```auto GetCostForTC = [MaxTripCount, this](unsigned VF, InstructionCost VectorCost,
InstructionCost ScalarCost) {
VectorCost * divideCeil(MaxTripCount, VF);
VectorCost * (MaxTripCount / VF) + ScalarCost * (MaxTripCount % VF);
};

auto RTCostA = GetCostForTC(A.Width.getFixedValue(), CostA, A.ScalarCost);
auto RTCostB = GetCostForTC(B.Width.getFixedValue(), CostB, B.ScalarCost);```
llvm/test/Transforms/LoopVectorize/AArch64/smallest-and-widest-types.ll
98 ↗(On Diff #516316)

I'm curious why this test needed changing. What VF does it pick with 12345?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5372–5376

Sounds good I'll do that now.

llvm/test/Transforms/LoopVectorize/AArch64/smallest-and-widest-types.ll
98 ↗(On Diff #516316)

12345 as a i8 is really 57, which was too many scalar iterations to pick v16 over v8. It is 3*vf16 + 9*vf1 vs 7*vf8 + 1*vf1.

I've changed it to 241 so that it keeps testing the same thing. And is a i8 value.

dmgreen updated this revision to Diff 516408.Apr 24 2023, 7:43 AM

Update as suggested

sdesmalen accepted this revision.Apr 24 2023, 7:58 AM

LGTM!

llvm/test/Transforms/LoopVectorize/AArch64/smallest-and-widest-types.ll
98 ↗(On Diff #516316)

Ha, I didn't spot it said i8 12345, that's silly. Your explanation makes sense, thanks for clarifying!

This revision is now accepted and ready to land.Apr 24 2023, 7:58 AM
This revision was landed with ongoing or failed builds.Apr 24 2023, 2:02 PM
This revision was automatically updated to reflect the committed changes.