This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Fix cost estimation for buildvectors with extracts and/or constants.
ClosedPublic

Authored by ABataev on Apr 14 2023, 12:00 PM.

Details

Summary

If the partial matching is found and some other scalars must be
inserted, need to account the cost of the extractelements, transformed
to shuffles, and/or reused entries and calculate the cost of inserting
constants properly into the non-poison vectors.
Also, fixed the cost calculation for final gather/buildvector sequence.

Diff Detail

Event Timeline

ABataev created this revision.Apr 14 2023, 12:00 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2023, 12:00 PM
ABataev requested review of this revision.Apr 14 2023, 12:00 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2023, 12:00 PM

Thanks for looking at this. The performance certainly seems to have improved.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6984

Is this to work around the AArch64 cost model?

8799–8800

This comment looks out of date now.

ABataev added inline comments.Apr 17 2023, 12:38 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6984

Yes, because it has some kind of strange estimation if HasUse == false, the cost of insertelement is 0. Original cost estimation estimates the cost of the deleted extractelement instruction to be 3, while the insertelement instruction to be 0. Actually, it would be good to fix this problem in AArch64 cost model. The cost must be considered free, only if the operand0 is undef/poison, otherwise it is not zero.

I'm working on another solution, which should generate better shuffles, hope it will fix the regression for AArch64 and improve final emission for other targets.

Please can you rebase after D148279 ?

ABataev updated this revision to Diff 514658.Apr 18 2023, 8:52 AM

Restored original extractcost calculation, reworked estimation of the buildvector with non-undef initial vector.

RKSimon accepted this revision.Apr 19 2023, 2:46 AM

LGTM with one optional minor

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2275–2277

Pull out the HasRealUse computation.

bool HasRealUse = Opcode == Instruction::InsertElement && Op0 && !isa<UndefValue>(Op0);
return getVectorInstrCostHelper(nullptr, Val, Index, HasRealIUse);
This revision is now accepted and ready to land.Apr 19 2023, 2:46 AM
This revision was landed with ongoing or failed builds.Apr 19 2023, 5:57 AM
This revision was automatically updated to reflect the committed changes.
dmgreen added inline comments.Apr 19 2023, 7:53 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6984

Yeah that code has always been a bit off. I think once upon a time someone accidentally applied the "zero-lane insert/extract cost 0" to integers as well as floats, and since then it has happened to give better performance in many cases to keep the inaccuracy around. I will look into removing it if I can.