Generalized constructions of 'fragments' of MMA operations to provide common primitives for construction of the ops.
This will make it easier to add new variants of the instructions that operate on integer types.
Use nested foreach loops which makes it possible to better control naming of the intrinsics.
This patch does not affect LLVM's output, so there are no test changes.
What's _geom / _frag / _ptx_type? Maybe have some examples, or a comment like one of these down below:
wmma.load.[a|b|c].sync.[row|col].m16n16k16[|.global|.shared].[f16|f32]
or maybe a regex in the comment?