Some of the costs get larger here, so i thought i should get a second opinion.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Have you been able to step through the mul fix/overflow cost calcs - I assume its the trunc <64 x i16> -> <64 x i8> cost calc that is causing this?
llvm/test/Analysis/CostModel/X86/mul.ll | ||
---|---|---|
321 | It looks like what we are missing is *ext MVT::v64i8 -> MVT::v32i16, diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp index 73e68be97a2d..fdf0c93817ca 100644 --- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp +++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp @@ -1564,6 +1564,8 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst, TTI::CastContextHint CCH, TTI::TargetCostKind CostKind, const Instruction *I) { + errs() << "getCastInstrCost " << Opcode << " " << *Src << " to " << *Dst << "\n"; + int ISD = TLI->InstructionOpcodeToISD(Opcode); assert(ISD && "Invalid opcode"); @@ -1738,7 +1740,9 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst, { ISD::ZERO_EXTEND, MVT::v8i64, MVT::v8i32, 1 }, { ISD::SIGN_EXTEND, MVT::v32i16, MVT::v32i8, 3 }, // FIXME: May not be right + { ISD::SIGN_EXTEND, MVT::v32i16, MVT::v64i8, 1 }, // FIXME: May not be right { ISD::ZERO_EXTEND, MVT::v32i16, MVT::v32i8, 3 }, // FIXME: May not be right + { ISD::ZERO_EXTEND, MVT::v32i16, MVT::v64i8, 1 }, // FIXME: May not be right { ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i1, 4 }, { ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i1, 3 }, @@ -2416,6 +2420,10 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst, TTI::CastContextHint::None, CostKind); } + EVT src = LTSrc.second; + EVT tgt = LTDest.second; + errs() << "\n{ "<<ISD<<", " << *(tgt.getTypeForEVT(Src->getContext())) << "," << *(src.getTypeForEVT(Src->getContext())) << ", ? },\n"; + return AdjustCost( BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I)); } |
Thinking about it, i think this is working as expected,
e.g. %V64i8 = mul <64 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>
queries zext <64 x i8> to <64 x i16> + trunc <64 x i16> to <64 x i8> costs,
and the patch adds said trunc cost, so we no longer fallback to scalarization cost,
which can be arbitrarily wrong. Not quite sure if anything is broken..
clang-format: please reformat the code
256 diff lines are omitted. See full path.