Page MenuHomePhabricator

[AArch64][SVE]Add cost model for vector reduce for scalable vector

Authored by CarolineConcatto on Dec 21 2020, 8:07 AM.



It uses MaxVScale to compute the maximum size of the scalable vector,
and assumes vector.reduce.<operand> uses tree-wise reduction algorithm
to compute the cost model.

Depends on D93030

Diff Detail

Event Timeline

CarolineConcatto requested review of this revision.Dec 21 2020, 8:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 21 2020, 8:07 AM
sdesmalen added inline comments.Jan 5 2021, 9:58 AM

My understanding is that the PairWise form is not something that can be expressed with the LLVM IR intrinsic and is therefore specific to fixed-width vectors which can use shufflevector.
Perhaps you can just assert IsPairWise is false. @david-arm any thoughts?


The scalable property should match for Ty and CondTy, so that can be an assert. With the if-condition changed to only check Ty.


remove TODO.


nit: s/getMaxVScale().getValue()/*getMaxVScale()/


nit: unsigned CmpOpcode = Ty->isFPOrFPVectorTy() ? Instruction::FCmp : Instruction::ICmp;


I think this should be:

Log2(LT.first) * MinMaxVectorCost + NumReduxLevels * MinMaxScalarCost

Because the number of legalized vectors will do min/max operations amongst the vectors (to end up with a single vector), before it will do the min/max reduction on the elements within the vector.
That also means distinguishing the cost for CmpSel on vectors and CmpSel on scalars.


remove TODO.


nit: You can just as well inline this into the expression below.

Have a couple nits, but seems reasonable to me. Somebody else should review as well.


please name the type




why not just return invalid cost now and assert at the call site?


NIT: isa<ScalableVectorType>(ValTy)

david-arm added inline comments.Jan 6 2021, 12:25 AM

I think we can probably assert IsPairwiseForm=false for now.


Remove TODO here as max vscale should always be a valid value for AArch64 + scalable types?


I don't think we need to use max vscale at all here to be honest. I've discussed this with Carol privately and getTypeLegalizationCost below already returns the number of parts needed (LT.first) along with the MVT for the legal vector type (LT.second). Since we have instructions that do both min/max operations (SMIN,etc.) as well as the horizontal vector reductions for min/max (SMINV, etc.) we can just do something similar to what X86 did and have a very simple cost lookup table for the legal horizontal reduction (SMINV), then add on a legalisation cost. i.e. something like:

unsigned LegalizationCost = 0;
if (not legal) {

// reduce to a single vector, i.e.
// res0 = SMIN part0, part1
// res1 = SMIN res0, part2
// ...
LegalizationCost = getMinMaxCost(); // SMIN, SMAX, etc.
LegalizationCost *= LT.first - 1;


// do cost lookup for SMINV, etc.
return LegalizationCost + LegalReductionCost;


As mentioned above, I think we can avoid using max vscale and calculating redux levels.


Similar to my comment above about max vscale, I think we can do this in a much simpler way. Rather than try to calculate a complex cost in this way using selects and compares, etc., I think we can just do it the same way that X86 does. We can have a simple cost lookup table for legal reductions (SMIN/MAX,UMIN/MAX, etc.), then add on a legalisation factor.


Remove TODO?


Same comments as for the MinMax case. We can avoid all reference to max vscale and redux levels, by treating this as the summation of:

LegalizationCost (sequence of FADD,ADD,etc. to reduce to single vector) + LegalReductionCost (single horizontal reduction instruction, e.g. FADDV).

I think the only case we don't support here is mul reductions, but we can just put in a high cost estimate for now, right?

-Split the cost on legalization and horizontal vector reduction cost as
suggested in the review

sdesmalen added inline comments.Jan 8 2021, 3:21 AM

I don't think this condition is necessary because LT.first > 1 already implies splitting is required.


You can use EVT::getTypeForEVT instead.


Sure, we can do that. I was more thinking that we'd then lose scaling of the cost if we know something about vscale. Reductions requires successive inter-lane operations (as opposed to e.g. fadd, which can be done in parallel), so the number of lanes probably has some impact on the cost. If we make it dependent on MaxVscale, the cost-model could return a lower cost if MaxVL=128bits than if MaxVL is 1024bits (so we can tweak the costs with -msve-vector-bits=<bits>). X86 doesn't have this problem, because the vector size is always fixed, so it can use a look-up-table.
I'm not saying that I'm completely against a simple fixed cost initially, just sharing my thoughts.


I don't know what default could be here other than llvm_unreachable. And if that is unreachable, then you can just as well return the final cost on line 1137.


same as above, I think this is implied by LT.first > 1.


Use EVT::getTypeForEVT for LT.second ?


move this to the default in switch below?


For Min/Max you use a fixed-cost of 1 and here you use 2 and 16. Does that mean Min/Max should also be 2?
And should 16 actually be Invalid when this function will use InstructionCost because it cannot be expanded for SVE? If so, can you add a TODO?

  • use EVT instead of MVT to get the vector type
  • use unreachable for the MinMax switch
  • add TODO for Arithmetic cost when the ISD is not supported
  • change MinMax horizontal reduction cost to be 2
CarolineConcatto marked 4 inline comments as done.

-remove redundant code when computing Type for MinMax Cost

Thank you @sdesmalen for helping to refine the code.
I hope it is near to the end now.


I've added unreachable to the switch for default as it is not expected to be reached at any time.
Without the default case the compiler complains:
warning: control reaches end of non-void function
as I did not added no return after the switch.


So, I have changed the cost of MinMax and Arithmetic to be 2.
I imagine it is fine for them to be the same.

sdesmalen added inline comments.Jan 13 2021, 7:10 AM

We already know Ty is a VectorType (see line 1104), so this cast is redundant.


Nit: can you call this LegalVTy?

With the two comments above, that would become:

Type *LegalVTy = EVT(LT.second).getTypeForEVT(Ty->getContext();

The switch statement below and selection of the ISD seems unnecessary if the cost is the same, you can just write:

return LegalizationCost + /*Cost of reduction*/ 2;


We already know Ty is a VectorType (see line 1151), so this cast is redundant.


nit: s/getArithmeticReductionCostScalableVectorType/getArithmeticReductionCostSVE

sdesmalen added inline comments.Jan 13 2021, 9:52 AM

@CarolineConcatto pointed out to me that earlier on in this patch @ctetreau suggested the opposite.

My argument for having 'SVE' in the name over 'ScalableVectorType' is that the reduction cost is different when SVE is available, regardless of whether the type passed is a scalable or fixed-width vector. For fixed-width vectors, the compiler may still choose to use one of the SVE reduction instructions if available. The code in that function is not specific to scalable vectors either.

-remove redundant code
-change getArithmeticReductionCostScalableVectorType to getArithmeticReductionCostSVE
-replace unsigned by int

CarolineConcatto marked 13 inline comments as done.Jan 13 2021, 10:22 AM

Thanks for the changes so far, I think the patch is nearly there now.


SVE has no instructions for MUL/FMUL reductions, so these should fall under default (return Invalid).


Can you make two tests for each reduction:

  • One with a legal type (<vscale x 2 x i64> (or <vscale x 2 x double> for fp reductions))
  • One with an illegal type that needs splitting (<vscale x 8 x i64> (or <vscale x 8 x double> for fp reductions))
  • remove cost for vector.reduce.{mul"fmul}
  • add regression tests for not legal vectors
CarolineConcatto marked 2 inline comments as done.

-add missing test for fmax,fadd with integer

@Thank you Sander,
I added more tests. Now we have tests for legal and not legal, besides the test with integer for FP operands.

  • change vector size to 128bits
sdesmalen added inline comments.Jan 18 2021, 1:10 AM

These tests can be removed because Floating Point reductions need an FP vector as their input, not an integer vector.


I was actually hoping for the test file to be organised in pairs of tests with a legal type, followed by an illegal (too wide) type, as follows:

define i64 @mul.i64.nxv2i64(<vscale x 2 x i64> %v) {
; CHECK-LABEL: 'mul.i64.nxv2i64'
; CHECK-NEXT: Cost Model: Found an estimated cost of .... for instruction:   %r = call i64 @llvm.vector.reduce.mul.nxv2i64(<vscale x 2 x i64> %v)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   ret i64 %r

  %r = call i64 @llvm.vector.reduce.mul.nxv2i64(<vscale x 2 x i64> %v)
  ret i64 %r

define i64 @mul.i64.nxv8i64(<vscale x 8 x i64> %v) {
; CHECK-LABEL: 'mul.i64.nxv8i64'
; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction:   %r = call i64 @llvm.vector.reduce.mul.nxv8i64(<vscale x 8 x i64> %v)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   ret i64 %r

  %r = call i64 @llvm.vector.reduce.mul.nxv8i64(<vscale x 8 x i64> %v)
  ret i64 %r

where the only difference is the number of elements (not the element type). This then clearly tests the cost on a legal type followed by a test where the code for legalization cost is exercised.

-re-arrange the test
-remove f<operand> tests with integer

sdesmalen accepted this revision.Jan 18 2021, 4:13 AM



This is different from what I suggested (increasing the number of elements), but the result of increasing the element-width is the same for the cost function, so I guess it's okay.

This revision is now accepted and ready to land.Jan 18 2021, 4:13 AM
This revision was automatically updated to reflect the committed changes.
CarolineConcatto marked 2 inline comments as done.