Attempt to enable MemCpyOpt unconditionally in D104801 uncovered the fact that
there are users that do not expect LLVM to materialize memset intrinsic.
While other passes can do that, too, MemCpyOpt triggers it more frequently and breaks sanitizers and
some downstream users.
For now introduce a flag to force-enable the flag and opt-in only CUDA
compilation with NVPTX back-end.