This is an archive of the discontinued LLVM Phabricator instance.

[CostModel][X86] Improve masked load/store AVX1/AVX2 costs
ClosedPublic

Authored by RKSimon on Apr 29 2019, 5:48 AM.

Details

Summary

A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops. More realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range.

e.g. SandyBridge
defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>;
defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>;
defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>;
defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>;

e.g. Btver2
defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>;
defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>;
defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>;
defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>;

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Apr 29 2019, 5:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2019, 5:48 AM

This sounds good for @llvm.masked.load, but not @llvm.masked.store.

As per https://www.agner.org/optimize/instruction_tables.pdf, even ryzen has crazy high costs for masked store, at ~30.
And that distinction seems consistent with other CPU's.

I'll rephrase - the masked load cost change looks good.

RKSimon updated this revision to Diff 202615.Jun 2 2019, 11:15 AM
RKSimon retitled this revision from [CostModel][X86] Reduce masked load/store AVX1/AVX2 costs to [CostModel][X86] Improve masked load/store AVX1/AVX2 costs.

Split the load/store cost multipliers - we can keep the x2 for loads, and I've uses x8 for stores, which tbh is a pretty vague median value between the AMD and Intel costs.

RKSimon edited the summary of this revision. (Show Details)Jun 2 2019, 11:17 AM
lebedev.ri accepted this revision.Jun 2 2019, 11:22 AM

Looks better overall.

This revision is now accepted and ready to land.Jun 2 2019, 11:22 AM
This revision was automatically updated to reflect the committed changes.