Taking address of parameter is legal in PTX and we do generate code that does it
Alas such code currently runs into miscompilation by ptxas on sm_50+ (NVIDIA issue 1789042).
Work around the issue by enforcing minimum alignment on byval arguments of device functions.
The change is effectively a no-op on SASS level for sm_3x where ptxas already aligns local copy by at least 4.
I think the condition is when we support cross-TU linking with nvcc, because this breaks ABI compatibility with nvcc? We should be able to do multi-TU compilations just fine, if all of the object files come from clang.