PTX requires that identifiers consist only of [a-zA-Z0-9_$]. The
existing pass already ensured this for globals and this patch adds
the cleanup for functions with local linkage.
However, there was a different problem in the case of collisions
of the adjusted name: The ValueSymbolTable then automatically
appended ".N" with increasing Ns to get a unique name while helping
the ABI demangling. Special case this behavior to omit the dots and
append N directly. This will always give us legal names according
to the PTX requirements.
This patch addresses "we can't compile generated PTX because LLVM uses illegal characters", but exposes another issue -- having potentially different names on host and device is a problem for CUDA. For some objects host side may need to know what it's called on device side. We need it in order to access it from host (eg cudaMemcpyToSymbol(), or initializing static variables) and we currently assume that the names are the same. If such symbol gets different names on host and device, compilation will succeed, but we'll have problems at runtime.
Does "." have any special meaning? Can we skip the unique delimiter altogether?
If we can't find a suitable way to guarantee identical naming, we'll need a way to have a reliable way to determine the name used on the other side of the compilation.