[MLIR][GPU] Run generic LLVM optimizations when serializing (on AMD)
- Adds hooks that allow SerializeTo* passes to arbitrarily transform
the produced LLVM Module before it is passed to the code generation
passes.
- Uses these hooks within the SerializeToHsaco pass in order to run
LLVM optimizations and to set the optimization level on the
TargetMachine.
- Adds an optLevel parameter to SerializeToHsaco
Future work may include moving much of what's been added to
SerializeToHsaco to SerializeToBlob, but that would require
confirmation from the NVVM backend maintainers that it would be
appropriate to do so.
[MLIR][AMDGPU] Link device libraries where needed
- The ROCm library path is now computed at runtime instead of relying
on a compile-time default (except as a fallback)
- SerializeToHsaco no longer depends on HIP, as it no longer does
chipset autodetection (which wasn't being used anyway)
- A --rocm-path option has been added to allow the user to override
the ROCm path, in addition to the typical ROCM_PATH environment
variable
The default copy constructor wouldn't copy all these already?