ROCm device libs can emit permlanex w/ the +gfx10-insts attribute, and it counts on the optimizer to remove the call if the GPU is <GFX10.
When built at O0 it caused codegen issues as Clang allowed this intrinsic to go through with just +gfx10-insts, but the backend wanted the GPU to be >=GFX10 as well.
This patch allows selecting that intrinsic when just minimum required attributes are present. That is, +gfx10-insts & +16-bit-insts.
Depends on D136944
Nothing here requires or implies a need for mad64_32?