Some of the costs get larger here, so i thought i should get a second opinion.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Have you been able to step through the mul fix/overflow cost calcs - I assume its the trunc <64 x i16> -> <64 x i8> cost calc that is causing this?
llvm/test/Analysis/CostModel/X86/mul.ll | ||
---|---|---|
321 | It looks like what we are missing is *ext MVT::v64i8 -> MVT::v32i16, diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp index 73e68be97a2d..fdf0c93817ca 100644 --- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp +++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp @@ -1564,6 +1564,8 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst, TTI::CastContextHint CCH, TTI::TargetCostKind CostKind, const Instruction *I) { + errs() << "getCastInstrCost " << Opcode << " " << *Src << " to " << *Dst << "\n"; + int ISD = TLI->InstructionOpcodeToISD(Opcode); assert(ISD && "Invalid opcode"); @@ -1738,7 +1740,9 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst, { ISD::ZERO_EXTEND, MVT::v8i64, MVT::v8i32, 1 }, { ISD::SIGN_EXTEND, MVT::v32i16, MVT::v32i8, 3 }, // FIXME: May not be right + { ISD::SIGN_EXTEND, MVT::v32i16, MVT::v64i8, 1 }, // FIXME: May not be right { ISD::ZERO_EXTEND, MVT::v32i16, MVT::v32i8, 3 }, // FIXME: May not be right + { ISD::ZERO_EXTEND, MVT::v32i16, MVT::v64i8, 1 }, // FIXME: May not be right { ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i1, 4 }, { ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i1, 3 }, @@ -2416,6 +2420,10 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst, TTI::CastContextHint::None, CostKind); } + EVT src = LTSrc.second; + EVT tgt = LTDest.second; + errs() << "\n{ "<<ISD<<", " << *(tgt.getTypeForEVT(Src->getContext())) << "," << *(src.getTypeForEVT(Src->getContext())) << ", ? },\n"; + return AdjustCost( BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I)); } |
Thinking about it, i think this is working as expected,
e.g. %V64i8 = mul <64 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>
queries zext <64 x i8> to <64 x i16> + trunc <64 x i16> to <64 x i8> costs,
and the patch adds said trunc cost, so we no longer fallback to scalarization cost,
which can be arbitrarily wrong. Not quite sure if anything is broken..
It looks like what we are missing is *ext MVT::v64i8 -> MVT::v32i16,
at least that is what getCastInstrCost() is queried about and it doesn't know the answer.
But adding even bogus cost still leads to cost for this snippet being 20.