getCommonMaskedMemoryOpCost tries to cast<FixedVectorType>
on a scalable vector type. Return invalid instead.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I try to support scalable vectors in getGatherScatterOpCost, but I don't know how to support scalable vectors.
https://reviews.llvm.org/D115143 SVE adds on an overhead cost for gathers and scatters, which is a rough estimate based on performance investigations.
Does RVV add similar values?
I actually hit this same issue the day you posted this, which is fun. But I fear this is quite a lot of work to get watertight. I've left some comments showing where we'd still crash. I've also seen us crash on getIntrinsicInstrCost when given, e.g., llvm.cttz.nxv1i8.
Looking at D119529, I'm not sure I see the value in accepting that the BaseTTI version is just allowed to crash on scalable vectors. It's a very common idiom that implementations fall back to that. It also just meaning extra work for new targets supporting scalable vectorization: now they need a fully-fledged TTI implementation to avoid crashing? Seems questionable to me.
llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp | ||
---|---|---|
198 | We can still crash here: I've seen this in CodeMetrics:analyzeBasicBlock because it calls with TCK_CodeSize. | |
201 | We'd still crash here? |
Can adding a cost model for llvm.cttz.nxv1i8 solve this problem? I would like to try.
In my experience, the minimum we need to handle is:
- any scalable-vector intrinsics in getTypeBasedIntrinsicInstrCost which crashes in the base version. This would fix cttz.
- scalable-vector fshl,fshr,experimental_stepvector,experimental_vector_insert and experimental_vector_extract in getIntrinsicInstrCost
In #2, these intrinsics all call BaseT::getIntrinsicInstrCost on scalable-vectors, so we need to catch them ahead of time. The other intrinsics either return some kind of cost or fall through to getTypeBasedIntrinsicInstrCost which we'll handle in #1.
This is what I mean about it looking like a lot of work. I've done the bare minimum in our downstream to fix crashes but the costs I used are wildly inaccurate: I just don't expect us to crash on IR containing intrinsics.
We can still crash here: I've seen this in CodeMetrics:analyzeBasicBlock because it calls with TCK_CodeSize.