This enables this kind of construct in the DSL to generate a named op that is polymorphic over numeric type variables T and U, generating the correct arithmetic casts at construction time:
@tc_def_op def polymorphic_matmul(A=TensorDef(T, S.M, S.K), B=TensorDef(T, S.K, S.N), C=TensorDef(U, S.M, S.N, output=True)): implements(ContractionOpInterface) C[D.m, D.n] += cast(U, A[D.m, D.k]) * cast(U, B[D.k, D.n])
Presently, this only supports type variables that are bound to the element type of one of the arguments, although a further extension that allows binding a type variable to an attribute would allow some more expressiveness and may be useful for some formulations. This is left to a future patch. In addition, this patch does not yet materialize the verifier support which ensures that types are bound correctly (for such simple examples, failing to do so will yield IR that fails verification, it just won't yet fail with a precise error).
Note that the full grid of extensions/truncation/int<->float conversions are supported, but many of them are lossy and higher level code needs to be mindful of numerics (it is not the job of this level).
As-is, this should be sufficient for most integer matmul scenarios we work with in typical quantization schemes.
Can we generalize this to T1, T2 and create an example with e.g. i8 * i16 -> i32 ?