This revision significantly simplifies the specification and implementation of mapping loops to GPU ids.
Each type of mapping (block, warpgroup, warp, thread) now comes with 2 mapping modes:
- a 3-D "grid-like" mode, subject to alignment considerations on threadIdx.x, on which predication may occur on a per-dimension 3-D sub-rectangle basis.
- a n-D linearized mode, on which predication may only occur on a linear basis.
In the process, better size and alignment requirement inference are introduced along with improved runtime verification messages.
The warp_dims attribute was deemed confusing and is removed from the transform in favor of better size inference.
Should we just make it a proper (non-enum) attribute parameterized by an integer?