Page MenuHomePhabricator

[X86][Costmodel] `trunc v32i16 to v64i8` can appear after legalization, cost is same as for `trunc v32i16 to v32i8`
ClosedPublic

Authored by lebedev.ri on Sun, Nov 14, 10:08 AM.

Details

Summary

Some of the costs get larger here, so i thought i should get a second opinion.

Diff Detail

Event Timeline

lebedev.ri created this revision.Sun, Nov 14, 10:08 AM

Have you been able to step through the mul fix/overflow cost calcs - I assume its the trunc <64 x i16> -> <64 x i8> cost calc that is causing this?

lebedev.ri added inline comments.Mon, Nov 15, 4:56 AM
llvm/test/Analysis/CostModel/X86/mul.ll
321

It looks like what we are missing is *ext MVT::v64i8 -> MVT::v32i16,
at least that is what getCastInstrCost() is queried about and it doesn't know the answer.
But adding even bogus cost still leads to cost for this snippet being 20.

diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index 73e68be97a2d..fdf0c93817ca 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -1564,6 +1564,8 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
                                              TTI::CastContextHint CCH,
                                              TTI::TargetCostKind CostKind,
                                              const Instruction *I) {
+  errs() << "getCastInstrCost " << Opcode << " " << *Src << " to " << *Dst << "\n";
+
   int ISD = TLI->InstructionOpcodeToISD(Opcode);
   assert(ISD && "Invalid opcode");
 
@@ -1738,7 +1740,9 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
     { ISD::ZERO_EXTEND, MVT::v8i64,  MVT::v8i32,  1 },
 
     { ISD::SIGN_EXTEND, MVT::v32i16, MVT::v32i8,  3 }, // FIXME: May not be right
+    { ISD::SIGN_EXTEND, MVT::v32i16, MVT::v64i8,  1 }, // FIXME: May not be right
     { ISD::ZERO_EXTEND, MVT::v32i16, MVT::v32i8,  3 }, // FIXME: May not be right
+    { ISD::ZERO_EXTEND, MVT::v32i16, MVT::v64i8,  1 }, // FIXME: May not be right
 
     { ISD::SINT_TO_FP,  MVT::v8f64,  MVT::v8i1,   4 },
     { ISD::SINT_TO_FP,  MVT::v16f32, MVT::v16i1,  3 },
@@ -2416,6 +2420,10 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
                             TTI::CastContextHint::None, CostKind);
   }
 
+  EVT src = LTSrc.second;
+  EVT tgt = LTDest.second;
+  errs() << "\n{ "<<ISD<<", " << *(tgt.getTypeForEVT(Src->getContext())) << "," << *(src.getTypeForEVT(Src->getContext())) << ",  ? },\n";
+
   return AdjustCost(
       BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I));
 }

Have you been able to step through the mul fix/overflow cost calcs - I assume its the trunc <64 x i16> -> <64 x i8> cost calc that is causing this?

Thinking about it, i think this is working as expected,
e.g. %V64i8 = mul <64 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>
queries zext <64 x i8> to <64 x i16> + trunc <64 x i16> to <64 x i8> costs,
and the patch adds said trunc cost, so we no longer fallback to scalarization cost,
which can be arbitrarily wrong. Not quite sure if anything is broken..

RKSimon accepted this revision.Mon, Nov 15, 7:34 AM

LGTM

This revision is now accepted and ready to land.Mon, Nov 15, 7:34 AM

Thanks for double-checking my reasoning.

LGTM

Thank you for the review!

This revision was landed with ongoing or failed builds.Mon, Nov 15, 8:04 AM
This revision was automatically updated to reflect the committed changes.