In the ROCm runtime (and probably CUDA as well), all kernel arguments
are aligned. Therefore, enable using bare pointers for memref
arguments to kernels when these memrefs have static shape and a
trivial layout.
This is a substantial optimization to launching kernels that use
memrefs with known, static sizes, since it causes the kernel launch
packet to no longer include information already known to the kernel,
which can enable packing the kernel launch arguments into launch
packets instead of having to allocate an entire separate structure to
hold unneeded memref information.