The Op creates a tensor map descriptor object representing tiled memory region. The descriptor is used by Tensor Memory Access (TMA). The tensor is the source tensor to be tiled. The boxDimensions is the size of the tiled memory region in each dimension.
The pattern here lowers tma.create.descriptor to a runtime function call that eventually calls calls CUDA Driver's cuTensorMapEncodeTiled. For more information see below:
https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TENSOR__MEMORY.html
Depends on D155453
Some minor doc comment plz.