This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add cost modelling for vector widenning integer reduction.
ClosedPublic

Authored by jacquesguan on Jul 18 2022, 2:53 AM.

Details

Summary

In RVV, we use vwredsum.vs and vwredsumu.vs for vecreduce.add(ext(Ty A)) if the result type's width is twice of the input vector's SEW-width. In this situation, the cost of extended add reduction should be same as single-width add reduction.

Diff Detail

Event Timeline

jacquesguan created this revision.Jul 18 2022, 2:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2022, 2:53 AM
jacquesguan requested review of this revision.Jul 18 2022, 2:53 AM
reames requested changes to this revision.Jul 27 2022, 1:05 PM

Tests?

Also, how are reduction opcodes other than mul and add handled?

This revision now requires changes to proceed.Jul 27 2022, 1:05 PM

Tests?

Also, how are reduction opcodes other than mul and add handled?

Sorry, I don't get your point, getExtendedAddReductionCost is only used for gettring the cost of vecreduce.add(ext) or vecreduce.add(mul(ext, ext)) if IsMLA is true. So here we only need to handle ADD.

And for the test, I do not know well about CostModel test, vecreduce.add(ext) needs at least 2 instruction to make, but the test case would show the cost of each instruction, so any advice for making the case?

Tests?

Also, how are reduction opcodes other than mul and add handled?

Sorry, I don't get your point, getExtendedAddReductionCost is only used for gettring the cost of vecreduce.add(ext) or vecreduce.add(mul(ext, ext)) if IsMLA is true. So here we only need to handle ADD.

I had missed the Add in the function name and was assuming this was generic to widening reductions. I was thinking this API would be used for the floating point widening reduction variants as well, which appears not to be the case.

Honestly, this seems like a pretty poor API design. Not having thought about this extensively, it seems like splitting this into two APIs one which handles the generic mixed width reduction case (e.g. vecreduce.opcode(ext(Ty A))), and one which adds the dot-product acceleration case (e.g. vecreduce.add(mul(ext(Ty A), ext(Ty B))) would make more sense. Having a getExtendedReductionCost API would allow you to handle the FADD case, and potentially any future widen reduction instructions. (I don't have any particular extension in mind here, just a general concern.)

Huh, actually it's worse than that. The loop vectorizer appears to also be using this function for both reduce(ext(mul(ext(A), ext(B))) and reduce(mul(ext(A), ext(B)) - note difference in outer extend - and is inconsistent about values for MLA are passed in. The current usage appears to contract the documented interface for the existing routine.

You should fix that before we continue with this patch.

And for the test, I do not know well about CostModel test, vecreduce.add(ext) needs at least 2 instruction to make, but the test case would show the cost of each instruction, so any advice for making the case?

If nothing else, you could write a vectorizer test which checked the cost-model output. Not ideal, but possible.

jacquesguan added a comment.EditedJul 31 2022, 11:27 PM

Tests?

Also, how are reduction opcodes other than mul and add handled?

Sorry, I don't get your point, getExtendedAddReductionCost is only used for gettring the cost of vecreduce.add(ext) or vecreduce.add(mul(ext, ext)) if IsMLA is true. So here we only need to handle ADD.

I had missed the Add in the function name and was assuming this was generic to widening reductions. I was thinking this API would be used for the floating point widening reduction variants as well, which appears not to be the case.

Honestly, this seems like a pretty poor API design. Not having thought about this extensively, it seems like splitting this into two APIs one which handles the generic mixed width reduction case (e.g. vecreduce.opcode(ext(Ty A))), and one which adds the dot-product acceleration case (e.g. vecreduce.add(mul(ext(Ty A), ext(Ty B))) would make more sense. Having a getExtendedReductionCost API would allow you to handle the FADD case, and potentially any future widen reduction instructions. (I don't have any particular extension in mind here, just a general concern.)

Huh, actually it's worse than that. The loop vectorizer appears to also be using this function for both reduce(ext(mul(ext(A), ext(B))) and reduce(mul(ext(A), ext(B)) - note difference in outer extend - and is inconsistent about values for MLA are passed in. The current usage appears to contract the documented interface for the existing routine.

You should fix that before we continue with this patch.

And for the test, I do not know well about CostModel test, vecreduce.add(ext) needs at least 2 instruction to make, but the test case would show the cost of each instruction, so any advice for making the case?

If nothing else, you could write a vectorizer test which checked the cost-model output. Not ideal, but possible.

I created a new revision https://reviews.llvm.org/D130868 to try to refactor this API, would you like to review it?

Refactor the code.

reames requested changes to this revision.Aug 2 2022, 10:26 AM

Thanks for the rework of the API, much improved!

I'd still *prefer* a vectorizer test here, but won't strictly require one.

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
397

It's not really clear to me why you bother to translate to ISD here. The code would be much clearer as:

if (Opcode != Add && Opcode != FAdd)
  return ...
if (width check fails)
  return ...

std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, ValTy);
return (LT.first - 1) +
             getArithmeticReductionCost(Opcode, ValTy, FMF, CostKind);
This revision now requires changes to proceed.Aug 2 2022, 10:26 AM

Address comment and add test.

Thanks for the rework of the API, much improved!

I'd still *prefer* a vectorizer test here, but won't strictly require one.

I add a test which will show the difference between the inloop reduction and outloop. In RISCV, we could use widening reduction instruction if we do the inloop reduction. But the API preferInLoopReduction maybe need some adjustment to enable the inloop reduction for the extended reduction situation, I will change it in a new revision.

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
397

Done.

reames accepted this revision.Aug 3 2022, 7:26 AM

LGTM

If I read your last comment correctly, the new test doesn't actually hit the relevant cost model change without follow up work right? If so, please land it separately - as having cover for inloop reduction seems useful - and then land the code change on its own. Given the false impression that the code is tested by landing them together would be potentially confusing,

Side note, I don't think we're going to ever prefer inloop reductions over out of loop ones. Reduction instructions are generally log(VL) complexity or worse. Unless maybe there's a case I'm missing?

This revision is now accepted and ready to land.Aug 3 2022, 7:26 AM
This revision was landed with ongoing or failed builds.Aug 4 2022, 12:32 AM
This revision was automatically updated to reflect the committed changes.