This is an archive of the discontinued LLVM Phabricator instance.

Cost model for VPMemory operations on PowerPC.
ClosedPublic

Authored by bmahjour on Sep 7 2021, 9:34 PM.

Details

Summary

Implemented getVPMemoryOpCost for PowerPC. Depends on D109416

Diff Detail

Event Timeline

hussainjk created this revision.Sep 7 2021, 9:34 PM
hussainjk requested review of this revision.Sep 7 2021, 9:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2021, 9:34 PM
bmahjour added inline comments.Sep 8 2021, 3:27 PM
llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h
146

make this a static const member variable and include "P9" in the name.

bmahjour commandeered this revision.Sep 10 2021, 10:54 AM
bmahjour edited reviewers, added: hussainjk; removed: bmahjour.
bmahjour updated this revision to Diff 372008.Sep 10 2021, 1:56 PM

turn the macro into a named constant and fix formatting issues.

RolandF requested changes to this revision.Oct 20 2021, 12:02 PM
RolandF added inline comments.
llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
1362

These variables are only used in the if block and should be declared there.

1363

This variable is only used once - the expression can be used directly instead.

1367

The Instruction pointer I is optional but is dereferenced unconditionally. The is likely to segfault. There is a dyn_cast but the result is not checked for nullptr so there may also be a segfault if I is something else. If a valid vp intrinsic pointer is expected for this target implementation then there should be an assert to check the dyn_cast result.

1367

Having the entire middle of the function occupied by a path that one probably never wants to execute is slightly painful. I would suggest handling the shouldDoNothing path first instead.

1402

If this is made an or condition with not pwr9 then the pwr10 case can return here so there aren't two return LT.first lines.

1405

Missing vectorCostAdjustment to account for P9.

This revision now requires changes to proceed.Oct 20 2021, 12:02 PM
bmahjour updated this revision to Diff 381314.Oct 21 2021, 10:37 AM
bmahjour marked 5 inline comments as done.

Address all but one of Roland's comments.

bmahjour added inline comments.Oct 21 2021, 10:43 AM
llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
1367

Actually the getVPLegalizationStrategy doesn't seem like the right interface to use (because we may not have the VP intrinsics available at the time we do cost-analysis). What we really need is something that can tell us if the opcode/type combination is considered legal on a target (ie PPC). I think I asked this question before, and someone told me to use hasActiveVectorLength(). However, I think we need to change that interface to take in more pieces of info. I'm going to look into that next.

bmahjour updated this revision to Diff 381544.Oct 22 2021, 7:36 AM
bmahjour marked an inline comment as done.

Implement and use hasActiveVectorLength() for PPC target.

RolandF added inline comments.Oct 28 2021, 12:12 PM
llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
1395

I don't follow this reasoning. Maybe 64-bit data is aligned half the time, but how is that true for char data for instance?

1431

The scalar loads have scalar type results. I assume if the code is being vectorized and the function of a vector load is being replaced, the results have to go in a vector. I think there is some insert cost here.

bmahjour added inline comments.Nov 25 2021, 1:00 PM
llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
1395

Does this sound better?

return P9PipelineFlushEstimate / ((Alignment/8) + 1);

It would also be an equivalent of saying:

return ((Alignment == 8) ? P9PipelineFlushEstimate / 2 : P9PipelineFlushEstimate);

unless we cast the values to float in that formula. Any other suggestion?

bmahjour added inline comments.Nov 26 2021, 11:51 AM
llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
1431

The call to getScalarizationOverhead is supposed to account for insert vs extract cost (notice we pass IsLoad vs !isLoad to indicate whether we need insert or extract cost respectively).

There is an existing function, getCommonMaskedMemoryOpCost() which computes cost in a very similar manor and is called by getMaskedMemoryOpCost(). I'm thinking of replacing lines 1412-1437 with a call to that function instead. Any thoughts @RolandF ?

bmahjour updated this revision to Diff 390113.Nov 26 2021, 1:19 PM

Made the following changes:

  • Made hasActiveVectorLength() return false for anything other than load/stores.
  • Only halve the cost of pipeline flush for 8-byte aligned accesses. For smaller alignments we consider the full cost.
  • Replaced the calculation of the scalarization cost with getMaskedMemoryOpCost().
RolandF added inline comments.Nov 29 2021, 2:51 PM
llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
1395

I think that it makes sense to use float. Something like

Misaligned = (NumElements - 1) / NumElements
Result = Misaligned * P9PipelineFlushEstimate + (1 - Misaligned) * Cost

The present calculation I think implies that the aligned cost is zero.

bmahjour updated this revision to Diff 390734.Nov 30 2021, 8:22 AM
bmahjour marked 3 inline comments as done.

Addressed comments.

RolandF added inline comments.Nov 30 2021, 1:14 PM
llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
1400

I like this approach better. But I think the AlignmentProb calculation is incorrect. This would calculate 0% aligned for align 1 and 7/8 aligned for align 8 instead of 50%. If you want to key off alignment rather than number of elements then I think you want to do size / alignment to figure out the number of possible offsets and use that to calculate probability.

bmahjour updated this revision to Diff 391027.Dec 1 2021, 7:50 AM
bmahjour added inline comments.
llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
1400

The accesses are consecutive and the minimum alignment of the address is specified on the load/store instruction, which is passed to this function, so I don't think the element size or the size of the vector matters.

I've changed the alignment probability calculation to simply divide the specified alignment by the desired alignment. This way we'll get 50% probability for 8-byte aligned accesses, 25% probability for 4-byte aligned accesses and so forth.

RolandF accepted this revision.Dec 1 2021, 12:02 PM

LGTM

This revision is now accepted and ready to land.Dec 1 2021, 12:02 PM
This revision was landed with ongoing or failed builds.Dec 7 2021, 11:20 AM
This revision was automatically updated to reflect the committed changes.