This adds the first strict element-wise named op to Linalg.
The semantics here is to not allow auto-cast, broadcast semantics and to
restrict the operations only to identical types. The remaining semantics
must come in the form of surrounding operations on operands, to avoid
ambiguity.
Examples:
// Cast int-to-fp %0 = linalg.copy ins(%in: tensor<32x32xi32>) outs(%out: tensor<32x32xf32>) %1 = linalg.add ins(%arg, %0: tensor<32x32xf32>, tensor<32x32xf32>) outs(%0: tensor<32x32xf32>) // This can be lowered to %1 = linalg.generic {...} ins(%arg, %in: tensor<32x32xf32>, tensor<32x32xi32>) outs(%0: tensor<32x32xf32>) { ^bb0(%a: f32, %i: i32, %out: f32): %f = arith.uitofp %i : f32 %0 = arith.addf %a, %f : f32 linalg.yield %0 : f32 } // Broadcast %0 = linalg.broadcast ins(%in: tensor<32xf32>) init(%out: tensor<32x32xf32>) %1 = linalg.add ins(%arg, %0: tensor<32x32xf32>, tensor<32x32xf32>) outs(%0: tensor<32x32xf32>) // This can be lowered to #bcast_map = affine_map<(d0, d1) -> (d0)> %1 = linalg.generic {... #bcast_map] } ins(%arg, %in: tensor<32x32xf32>, tensor<32xf32>) outs(%0: tensor<32x32xf32>) { ^bb0(%a: f32, %b: f32, %out: f32): %0 = arith.addf %a, %b : f32 linalg.yield %0 : f32 }
Once this gets accepted, other arithmetic and maths operations will be
added accordingly, with the same semantics.