vector.contract attempts to enforce the the lhs and rhs values are
required to have the same element type. In cases where this requirement is
not true we should materialize out the arith.extf to the accumulation
type.
Details
- Reviewers
aartbik nicolasvasilache dcaballe
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Hey Rob! Thanks for sending this patch.
E.g. i8 and i16 may accumulate into an i32 legally, despite not matching in bitwidth.
I think if that is the case, we should extend i8 to i16 first and then use an i16, i16 -> i32 contract. We have to keep the number of variants under control at this level of abstraction. Otherwise, the transformations and contract lowerings would have to deal with too many cases.
Would that work for your case?
Thanks,
Diego
So there is actually a slight problem with that. I used the i8 and i16 case for a simple example. The actual case I am encountering is f16 and bf16 which have lossy casting both ways. Happy to look into an alternate solution but none are coming to my mind.
Ha, that's interesting. I guess it's up to the framework to define what happens when we operate an f16 and bf16 (what is the direction of the conversion). It sounds like we want to disambiguate that as early as possible instead of rolling down the ball. That call would have to be made by specific spec in charge.
Alright. I feel like the best option is to arith.extf the operands to the reduction type if they do not match? I cannot see a better option and seems to align with how we treated linalg.matmul internal reductions.
Thanks, Rob! I would like to know what @nicolasvasilache thinks, in case I'm missing something, but I think both linalg.matmul and vector.contract shouldn't allow inputs with different types at all. That should be something to check in the op verification and the proper conversion should happen up in the pipeline, when we generate the linalg.matmul op in first place. Does that make sense to you?
I am not entirely certain I agree but lets wait to here from Nicolas. My assumption is most matmuls are bounded by I/O so decreasing tile sizes would take priority. If we handle the extf prior to the linalg.matmul it could result in materializing out a larger tensor, which would increase memory load times. I assume we would expect the arith.extf to fuse into the inner matmul in the end, however this feels potentially more complex than supporting mixed types.
That could potentially happen with any binary operation, to name a few examples. I don't see a strong reason to make an exception for matmuls given the complexity added to lowerings and transformations. Even when targeting some of the RISC-V widening ops that allow wider accumulators we have been able to deal with the extension instruction without too much of a hasle.