In RVV, we use vwredsum.vs and vwredsumu.vs for vecreduce.add(ext(Ty A)) if the result type's width is twice of the input vector's SEW-width. In this situation, the cost of extended add reduction should be same as single-width add reduction.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
60,070 ms | x64 debian > BOLT.runtime/X86::exceptions-pic.test | |
60,240 ms | x64 debian > Clang.Driver::emit-reproducer.c | |
60,520 ms | x64 debian > Clang.Driver::fsanitize.c |
Event Timeline
Sorry, I don't get your point, getExtendedAddReductionCost is only used for gettring the cost of vecreduce.add(ext) or vecreduce.add(mul(ext, ext)) if IsMLA is true. So here we only need to handle ADD.
And for the test, I do not know well about CostModel test, vecreduce.add(ext) needs at least 2 instruction to make, but the test case would show the cost of each instruction, so any advice for making the case?
I had missed the Add in the function name and was assuming this was generic to widening reductions. I was thinking this API would be used for the floating point widening reduction variants as well, which appears not to be the case.
Honestly, this seems like a pretty poor API design. Not having thought about this extensively, it seems like splitting this into two APIs one which handles the generic mixed width reduction case (e.g. vecreduce.opcode(ext(Ty A))), and one which adds the dot-product acceleration case (e.g. vecreduce.add(mul(ext(Ty A), ext(Ty B))) would make more sense. Having a getExtendedReductionCost API would allow you to handle the FADD case, and potentially any future widen reduction instructions. (I don't have any particular extension in mind here, just a general concern.)
Huh, actually it's worse than that. The loop vectorizer appears to also be using this function for both reduce(ext(mul(ext(A), ext(B))) and reduce(mul(ext(A), ext(B)) - note difference in outer extend - and is inconsistent about values for MLA are passed in. The current usage appears to contract the documented interface for the existing routine.
You should fix that before we continue with this patch.
And for the test, I do not know well about CostModel test, vecreduce.add(ext) needs at least 2 instruction to make, but the test case would show the cost of each instruction, so any advice for making the case?
If nothing else, you could write a vectorizer test which checked the cost-model output. Not ideal, but possible.
I created a new revision https://reviews.llvm.org/D130868 to try to refactor this API, would you like to review it?
Thanks for the rework of the API, much improved!
I'd still *prefer* a vectorizer test here, but won't strictly require one.
llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp | ||
---|---|---|
391 | It's not really clear to me why you bother to translate to ISD here. The code would be much clearer as: if (Opcode != Add && Opcode != FAdd) return ... if (width check fails) return ... std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, ValTy); return (LT.first - 1) + getArithmeticReductionCost(Opcode, ValTy, FMF, CostKind); |
I add a test which will show the difference between the inloop reduction and outloop. In RISCV, we could use widening reduction instruction if we do the inloop reduction. But the API preferInLoopReduction maybe need some adjustment to enable the inloop reduction for the extended reduction situation, I will change it in a new revision.
llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp | ||
---|---|---|
391 | Done. |
LGTM
If I read your last comment correctly, the new test doesn't actually hit the relevant cost model change without follow up work right? If so, please land it separately - as having cover for inloop reduction seems useful - and then land the code change on its own. Given the false impression that the code is tested by landing them together would be potentially confusing,
Side note, I don't think we're going to ever prefer inloop reductions over out of loop ones. Reduction instructions are generally log(VL) complexity or worse. Unless maybe there's a case I'm missing?
It's not really clear to me why you bother to translate to ISD here. The code would be much clearer as: