This revisions refactors the implementation of mapping to threads to additionally allow warps and linear ids to be specified.
warp_dims is currently specified along with block_dims as a transform attribute.
Linear ids on th other hand use the flattened block_dims to predicate on the first (linearized) k threads.
An additional GPULinearIdMappingAttr is added to the GPU dialect to allow specifying loops mapped to this new scheme.
Various implementation and transform op semantics cleanups are also applied.
Doc comment.