PTX programming models provides some performance tuning directives; see https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#performance-tuning-directives
The downstream compiler namely ptxas leverages these information for better register allocation or to handle other resource management that improves the performance.
This revision introduce all the kernel based directives to MLIR's NVVM dialect. The list is below
maxnreg -> max register per thread in CTA maxntid -> max threads per CTA reqntid -> exact number of threads per CTA minnctapersm -> min CTA per SM
Can we add a dialect attribute verifier for these to be of the right type?