Currently only for broadcasts with input and output of the same width.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Hi Simon,
I noticed that getShuffleCost is becoming quite big.
Do you think it makes sense to split the logic in getShuffleCost in three parts (a function for each supported ShuffleKind)?. In case, getShuffleCost could be refactored in a separate commit.
lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
614 ↗ | (On Diff #81588) | Shouldn't this be LT.first * Entry->Cost ? |
626 ↗ | (On Diff #81588) | Same. |
640 ↗ | (On Diff #81588) | Same (also, see lines 654, 664, 677 and 686). |
679–686 ↗ | (On Diff #81588) | Can we not just simplify this code into something like this? if (ST->hasSSE1() && LT.second == MVT::v4f32) return LT.first; You are basically performing a lookup on a table with just a single entry. |
You're not kidding ;-)
Do you think it makes sense to split the logic in getShuffleCost in three parts (a function for each supported ShuffleKind)?. In case, getShuffleCost could be refactored in a separate commit.
I have considered hijacking the 'ISD::VECTOR_SHUFFLE' int entry in both CostTbleEmtry and CostTableLookup to do lookup based on TTI::ShuffleKind instead - what do you think?
lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
614 ↗ | (On Diff #81588) | That's what I meant in the disclaimer at the top - as we're broadcasting we only reference the first input register and all the outputs are the same - so the costs aren't multiplied by the LT.first scale factor (num vectors). It doesn't account for any register moves that occur but that's true for most throughput costs. |
lib/Analysis/CostModel.cpp | ||
---|---|---|
93 ↗ | (On Diff #81588) | We already have this helper in CGP (that version also checks if you're splatting *any* element, not just element 0.) |
I think it is a good idea :-). After all, the opcode can only be ISD::VECTOR_SHUFFLE in this method. So, that bit of information doesn't really need to be stored in any entries.
lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
614 ↗ | (On Diff #81588) | Ah I see. That makes sense. |
lib/Analysis/CostModel.cpp | ||
---|---|---|
93 ↗ | (On Diff #81588) | Adding isSplatMask/isSplat/getSplatIndex support to ShuffleVectorInst (worth adding them to all to match ShuffleVectorSDNode ?) would be trivial, then both CodeGenPrepare and CostModel could use it easily. It would probably be worth upgrading SK_Broadcast at the same time to support broadcasting any Index value (not just 0) - given that almost nothing uses it so far that shouldn't be a problem. What other cost model cases did you have in mind? CodeGenPrepare::optimizeShuffleVectorInst seems quite limited. |
lib/Analysis/CostModel.cpp | ||
---|---|---|
93 ↗ | (On Diff #81588) |
Thinking about it a bit more, I'm not entirely sure about this. Ignoring what we do in the DAG for a moment, in IR, I'd expect the canonical insert + splat pattern to use index 0.
I didn't, really, just trying to avoid code duplication. |
Added ShuffleVectorInst::isSplat as suggested.
Regarding the refactor to use TTI::ShuffleKind instead of ISD::SHUFFLE_VECTOR in the LUTs - is everyone happy with me to commit this to trunk and I'll then update this patch with the new scheme?
May I ask you to postpone refactoring of getShuffleCost() to the next commit? Andrea, I have another patch https://reviews.llvm.org/D28118 that also changes ShuffleCost for X86 targets.
lib/Analysis/CostModel.cpp | ||
---|---|---|
498 ↗ | (On Diff #82260) | I suggest to simplify the code - just one function: static bool isZeroEltBroadcastVectorMask(SmallVectorImpl<int> &Mask) { for (unsigned i = 0; i < Mask.size(); ++i) if (Mask[i] > 0) return false; return true; } |
Updated having merged the Broadcast/Alternate/Reverse shuffle costs into a single set of LUTs.
lib/Analysis/CostModel.cpp | ||
---|---|---|
520–522 ↗ | (On Diff #83079) | Is this code still needed? |
lib/Analysis/CostModel.cpp | ||
---|---|---|
520–522 ↗ | (On Diff #83079) | Thanks - I missed that for some reason. What should we do? Keep with the separate Broadcast matching helpers or merge and use ShuffleVectorInst::isSplat ? |
lib/Analysis/CostModel.cpp | ||
---|---|---|
520–522 ↗ | (On Diff #83079) | Good question. I don't have a strong opinion since both approaches sound reasonable to me. In general, I like the idea of having separate helpers to check shuffle masks, and I am not particularly worried about the small code duplication in isZeroEltBroadcastVectorMask (since that helper is very simple). |
lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
683 ↗ | (On Diff #83079) | theoretically, v16i16 should not be more expensive than v32i8, vpshufb should work for v8i16 as it works for v16i8. (I don't know how it is implemented). |
lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
683 ↗ | (On Diff #83079) | It does currently have the vpshuflw + vpshufd + vinsertf128 pattern - I'll look improving this in shuffle combining. |