This is an archive of the discontinued LLVM Phabricator instance.

[mlir][vector] Materialize out non-matching vector.contract operands
AbandonedPublic

Authored by rsuderman on Jul 18 2023, 2:28 PM.

Details

Summary

vector.contract attempts to enforce the the lhs and rhs values are
required to have the same element type. In cases where this requirement is
not true we should materialize out the arith.extf to the accumulation
type.

Diff Detail

Event Timeline

rsuderman created this revision.Jul 18 2023, 2:28 PM
Herald added a project: Restricted Project. · View Herald Transcript
rsuderman requested review of this revision.Jul 18 2023, 2:28 PM

Hey Rob! Thanks for sending this patch.

E.g. i8 and i16 may accumulate into an i32 legally, despite not matching in bitwidth.

I think if that is the case, we should extend i8 to i16 first and then use an i16, i16 -> i32 contract. We have to keep the number of variants under control at this level of abstraction. Otherwise, the transformations and contract lowerings would have to deal with too many cases.
Would that work for your case?

Thanks,
Diego

Hey Rob! Thanks for sending this patch.

E.g. i8 and i16 may accumulate into an i32 legally, despite not matching in bitwidth.

I think if that is the case, we should extend i8 to i16 first and then use an i16, i16 -> i32 contract. We have to keep the number of variants under control at this level of abstraction. Otherwise, the transformations and contract lowerings would have to deal with too many cases.
Would that work for your case?

Thanks,
Diego

So there is actually a slight problem with that. I used the i8 and i16 case for a simple example. The actual case I am encountering is f16 and bf16 which have lossy casting both ways. Happy to look into an alternate solution but none are coming to my mind.

Hey Rob! Thanks for sending this patch.

E.g. i8 and i16 may accumulate into an i32 legally, despite not matching in bitwidth.

I think if that is the case, we should extend i8 to i16 first and then use an i16, i16 -> i32 contract. We have to keep the number of variants under control at this level of abstraction. Otherwise, the transformations and contract lowerings would have to deal with too many cases.
Would that work for your case?

Thanks,
Diego

So there is actually a slight problem with that. I used the i8 and i16 case for a simple example. The actual case I am encountering is f16 and bf16 which have lossy casting both ways. Happy to look into an alternate solution but none are coming to my mind.

Ha, that's interesting. I guess it's up to the framework to define what happens when we operate an f16 and bf16 (what is the direction of the conversion). It sounds like we want to disambiguate that as early as possible instead of rolling down the ball. That call would have to be made by specific spec in charge.

Hey Rob! Thanks for sending this patch.

E.g. i8 and i16 may accumulate into an i32 legally, despite not matching in bitwidth.

I think if that is the case, we should extend i8 to i16 first and then use an i16, i16 -> i32 contract. We have to keep the number of variants under control at this level of abstraction. Otherwise, the transformations and contract lowerings would have to deal with too many cases.
Would that work for your case?

Thanks,
Diego

So there is actually a slight problem with that. I used the i8 and i16 case for a simple example. The actual case I am encountering is f16 and bf16 which have lossy casting both ways. Happy to look into an alternate solution but none are coming to my mind.

Ha, that's interesting. I guess it's up to the framework to define what happens when we operate an f16 and bf16 (what is the direction of the conversion). It sounds like we want to disambiguate that as early as possible instead of rolling down the ball. That call would have to be made by specific spec in charge.

Alright. I feel like the best option is to arith.extf the operands to the reduction type if they do not match? I cannot see a better option and seems to align with how we treated linalg.matmul internal reductions.

Updated to remove requirement change and instead update vectorization code

rsuderman retitled this revision from [mlir][vector] Drop matching element type requirement on `vector.contract` to [mlir][vector] Materialize out non-matching vector.contract operands.Jul 19 2023, 12:37 PM
rsuderman edited the summary of this revision. (Show Details)

Thanks, Rob! I would like to know what @nicolasvasilache thinks, in case I'm missing something, but I think both linalg.matmul and vector.contract shouldn't allow inputs with different types at all. That should be something to check in the op verification and the proper conversion should happen up in the pipeline, when we generate the linalg.matmul op in first place. Does that make sense to you?

Thanks, Rob! I would like to know what @nicolasvasilache thinks, in case I'm missing something, but I think both linalg.matmul and vector.contract shouldn't allow inputs with different types at all. That should be something to check in the op verification and the proper conversion should happen up in the pipeline, when we generate the linalg.matmul op in first place. Does that make sense to you?

I am not entirely certain I agree but lets wait to here from Nicolas. My assumption is most matmuls are bounded by I/O so decreasing tile sizes would take priority. If we handle the extf prior to the linalg.matmul it could result in materializing out a larger tensor, which would increase memory load times. I assume we would expect the arith.extf to fuse into the inner matmul in the end, however this feels potentially more complex than supporting mixed types.

Thanks, Rob! I would like to know what @nicolasvasilache thinks, in case I'm missing something, but I think both linalg.matmul and vector.contract shouldn't allow inputs with different types at all. That should be something to check in the op verification and the proper conversion should happen up in the pipeline, when we generate the linalg.matmul op in first place. Does that make sense to you?

I am not entirely certain I agree but lets wait to here from Nicolas. My assumption is most matmuls are bounded by I/O so decreasing tile sizes would take priority. If we handle the extf prior to the linalg.matmul it could result in materializing out a larger tensor, which would increase memory load times. I assume we would expect the arith.extf to fuse into the inner matmul in the end, however this feels potentially more complex than supporting mixed types.

That could potentially happen with any binary operation, to name a few examples. I don't see a strong reason to make an exception for matmuls given the complexity added to lowerings and transformations. Even when targeting some of the RISC-V widening ops that allow wider accumulators we have been able to deal with the extension instruction without too much of a hasle.

rsuderman abandoned this revision.Aug 31 2023, 10:01 AM