This is an archive of the discontinued LLVM Phabricator instance.

[LV][ARM] Improve reduction costmodel for mismatching extension types.
ClosedPublic

Authored by dmgreen on Dec 9 2021, 12:57 AM.

Details

Summary

Given a MLA reduction from two different types (say i8 and i16), we were previously failing to find the reduction pattern, often making us chose the lower vector factor. This improves that by using the largest of the two extension types, allowing us to use the larger VF as the type of the reduction.

As per https://godbolt.org/z/KP549EEYM the backend handles this valiantly, leading to better performance.

Diff Detail

Event Timeline

dmgreen created this revision.Dec 9 2021, 12:57 AM
dmgreen requested review of this revision.Dec 9 2021, 12:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 9 2021, 12:57 AM
sdesmalen added inline comments.Dec 9 2021, 2:56 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7012

Is there any rationale for using the largest instead of the smallest type here? And would changing it to the smallest change the result of the test?

dmgreen added inline comments.Dec 9 2021, 6:55 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7012

Yeah - the largest type is the correct one to use. We end up with, for example, add(mul(ext(A), ext(ext(B))), where add(mul(ext, ext) is the reduction pattern. There is then an extra ext - I haven't added the cost of that, but we might as well. It will often be free or cheap under MVE in reality, but I don't think it should be a problem to add it, so long as the correct vector factor is chosen.

dmgreen updated this revision to Diff 393147.Dec 9 2021, 6:57 AM

Update with extra ext cost.

sdesmalen added inline comments.Dec 10 2021, 2:17 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7047

From what I can see, these two are mutually exclusive, i.e. (Op0Ty != LargestOpTy && Op1Ty != LargestOpTy) <=> false. Can you rewrite this as:

Value *ExtraExtOp = (Op0Ty != LargestOpTy) ? Op0 : Op1;
InstructionCost ExtraExtCost =
    TTI.getCastInstrCost(ExtraExtOp->getOpcode(), ExtType,
                         VectorType::get(ExtraExtOp->getType(), VectorTy),
                         TTI::CastContextHint::None, CostKind, ExtraExtOp);
dmgreen updated this revision to Diff 393449.Dec 10 2021, 4:57 AM

Thanks for the suggestion. Updated to have a single getCastInstrCost.

sdesmalen accepted this revision.Dec 10 2021, 5:30 AM

Thanks, LGTM!

This revision is now accepted and ready to land.Dec 10 2021, 5:30 AM