This is an archive of the discontinued LLVM Phabricator instance.

[CostModel][X86] Add support for broadcast shuffle costs
ClosedPublic

Authored by RKSimon on Dec 15 2016, 8:06 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 81588.Dec 15 2016, 8:06 AM
RKSimon retitled this revision from to [CostModel][X86] Add support for broadcast shuffle costs.
RKSimon updated this object.
RKSimon added reviewers: mkuper, hfinkel, spatel, andreadb.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
andreadb edited edge metadata.Dec 15 2016, 8:24 AM

Hi Simon,

I noticed that getShuffleCost is becoming quite big.
Do you think it makes sense to split the logic in getShuffleCost in three parts (a function for each supported ShuffleKind)?. In case, getShuffleCost could be refactored in a separate commit.

lib/Target/X86/X86TargetTransformInfo.cpp
614 ↗(On Diff #81588)

Shouldn't this be LT.first * Entry->Cost ?
I think that you are not accounting for the type legalization cost.

626 ↗(On Diff #81588)

Same.

640 ↗(On Diff #81588)

Same (also, see lines 654, 664, 677 and 686).

679–686 ↗(On Diff #81588)

Can we not just simplify this code into something like this?

if (ST->hasSSE1() && LT.second == MVT::v4f32)
  return LT.first;

You are basically performing a lookup on a table with just a single entry.

I noticed that getShuffleCost is becoming quite big.

You're not kidding ;-)

Do you think it makes sense to split the logic in getShuffleCost in three parts (a function for each supported ShuffleKind)?. In case, getShuffleCost could be refactored in a separate commit.

I have considered hijacking the 'ISD::VECTOR_SHUFFLE' int entry in both CostTbleEmtry and CostTableLookup to do lookup based on TTI::ShuffleKind instead - what do you think?

lib/Target/X86/X86TargetTransformInfo.cpp
614 ↗(On Diff #81588)

That's what I meant in the disclaimer at the top - as we're broadcasting we only reference the first input register and all the outputs are the same - so the costs aren't multiplied by the LT.first scale factor (num vectors). It doesn't account for any register moves that occur but that's true for most throughput costs.

mkuper added inline comments.Dec 15 2016, 9:24 AM
lib/Analysis/CostModel.cpp
93 ↗(On Diff #81588)

We already have this helper in CGP (that version also checks if you're splatting *any* element, not just element 0.)
Maybe make it common? Not entirely sure what the appropriate place for it yes, though. Would it make sense for CGP to use CostModel?

I noticed that getShuffleCost is becoming quite big.

You're not kidding ;-)

Do you think it makes sense to split the logic in getShuffleCost in three parts (a function for each supported ShuffleKind)?. In case, getShuffleCost could be refactored in a separate commit.

I have considered hijacking the 'ISD::VECTOR_SHUFFLE' int entry in both CostTbleEmtry and CostTableLookup to do lookup based on TTI::ShuffleKind instead - what do you think?

I think it is a good idea :-). After all, the opcode can only be ISD::VECTOR_SHUFFLE in this method. So, that bit of information doesn't really need to be stored in any entries.

lib/Target/X86/X86TargetTransformInfo.cpp
614 ↗(On Diff #81588)

Ah I see. That makes sense.

RKSimon added inline comments.Dec 15 2016, 10:09 AM
lib/Analysis/CostModel.cpp
93 ↗(On Diff #81588)

Adding isSplatMask/isSplat/getSplatIndex support to ShuffleVectorInst (worth adding them to all to match ShuffleVectorSDNode ?) would be trivial, then both CodeGenPrepare and CostModel could use it easily.

It would probably be worth upgrading SK_Broadcast at the same time to support broadcasting any Index value (not just 0) - given that almost nothing uses it so far that shouldn't be a problem.

What other cost model cases did you have in mind? CodeGenPrepare::optimizeShuffleVectorInst seems quite limited.

mkuper added inline comments.Dec 15 2016, 11:44 AM
lib/Analysis/CostModel.cpp
93 ↗(On Diff #81588)

It would probably be worth upgrading SK_Broadcast at the same time to support broadcasting any Index value (not just 0) - given that almost nothing uses it so far that shouldn't be a problem.

Thinking about it a bit more, I'm not entirely sure about this. Ignoring what we do in the DAG for a moment, in IR, I'd expect the canonical insert + splat pattern to use index 0.

What other cost model cases did you have in mind? CodeGenPrepare::optimizeShuffleVectorInst seems quite limited.

I didn't, really, just trying to avoid code duplication.

RKSimon updated this revision to Diff 82260.Dec 21 2016, 1:13 PM
RKSimon edited edge metadata.

Added ShuffleVectorInst::isSplat as suggested.

Regarding the refactor to use TTI::ShuffleKind instead of ISD::SHUFFLE_VECTOR in the LUTs - is everyone happy with me to commit this to trunk and I'll then update this patch with the new scheme?

delena added a subscriber: delena.Dec 29 2016, 3:19 AM

I noticed that getShuffleCost is becoming quite big.

You're not kidding ;-)

Do you think it makes sense to split the logic in getShuffleCost in three parts (a function for each supported ShuffleKind)?. In case, getShuffleCost could be refactored in a separate commit.

I have considered hijacking the 'ISD::VECTOR_SHUFFLE' int entry in both CostTbleEmtry and CostTableLookup to do lookup based on TTI::ShuffleKind instead - what do you think?

I think it is a good idea :-). After all, the opcode can only be ISD::VECTOR_SHUFFLE in this method. So, that bit of information doesn't really need to be stored in any entries.

May I ask you to postpone refactoring of getShuffleCost() to the next commit? Andrea, I have another patch https://reviews.llvm.org/D28118 that also changes ShuffleCost for X86 targets.

May I ask you to postpone refactoring of getShuffleCost() to the next commit? Andrea, I have another patch https://reviews.llvm.org/D28118 that also changes ShuffleCost for X86 targets.

No problem, I'll do the cleanup/refactor after your interleaving patch

delena added inline comments.Jan 2 2017, 1:04 AM
lib/Analysis/CostModel.cpp
498 ↗(On Diff #82260)

I suggest to simplify the code - just one function:

static bool isZeroEltBroadcastVectorMask(SmallVectorImpl<int> &Mask) {
  for (unsigned i = 0; i < Mask.size(); ++i)
    if (Mask[i] > 0)
      return false;
  return true;
}
RKSimon updated this revision to Diff 83079.Jan 4 2017, 10:25 AM

Updated having merged the Broadcast/Alternate/Reverse shuffle costs into a single set of LUTs.

andreadb added inline comments.Jan 4 2017, 10:59 AM
lib/Analysis/CostModel.cpp
520–522 ↗(On Diff #83079)

Is this code still needed?
r290810 introduced a check for `isZeroEltBroadcastVectorMask' at line 530.

RKSimon added inline comments.Jan 5 2017, 3:28 AM
lib/Analysis/CostModel.cpp
520–522 ↗(On Diff #83079)

Thanks - I missed that for some reason.

What should we do? Keep with the separate Broadcast matching helpers or merge and use ShuffleVectorInst::isSplat ?

andreadb added inline comments.Jan 5 2017, 3:53 AM
lib/Analysis/CostModel.cpp
520–522 ↗(On Diff #83079)

Good question. I don't have a strong opinion since both approaches sound reasonable to me.

In general, I like the idea of having separate helpers to check shuffle masks, and I am not particularly worried about the small code duplication in isZeroEltBroadcastVectorMask (since that helper is very simple).

delena added inline comments.Jan 5 2017, 4:15 AM
lib/Target/X86/X86TargetTransformInfo.cpp
683 ↗(On Diff #83079)

theoretically, v16i16 should not be more expensive than v32i8, vpshufb should work for v8i16 as it works for v16i8. (I don't know how it is implemented).

RKSimon added inline comments.Jan 5 2017, 6:30 AM
lib/Target/X86/X86TargetTransformInfo.cpp
683 ↗(On Diff #83079)

It does currently have the vpshuflw + vpshufd + vinsertf128 pattern - I'll look improving this in shuffle combining.

RKSimon updated this revision to Diff 83232.Jan 5 2017, 7:11 AM

Use existing broadcast shuffle matching

andreadb accepted this revision.Jan 5 2017, 7:20 AM
andreadb edited edge metadata.

LGTM. Thanks Simon!

This revision is now accepted and ready to land.Jan 5 2017, 7:20 AM
This revision was automatically updated to reflect the committed changes.