This revisions refactors the implementation of mapping to threads to additionally allow warps and linear ids to be specified.
warp_dims is currently specified along with block_dims as a transform attribute.
Linear ids on th other hand use the flattened block_dims to predicate on the first (linearized) k threads.
An additional GPULinearIdMappingAttr is added to the GPU dialect to allow specifying loops mapped to this new scheme.
Various implementation and transform op semantics cleanups are also applied.
Doesn't have to be addressed in this patch but DimX, DimY and DimZ don't really make sense for linearID, I wonder if we could just have an index to give the order of distribution?