The optimizing, non-broken features have all been moved to
AMDGPUAttributor. The only remaining piece of functionality was the
broken propagation of the wavesize features. This was fundamentally
broken and a hack for device library linking. It doesn't matter when
the device libraries are correctly linked and internalized.
In case of linked-as-normal-bitcode (as comgr still does), we're
reliant on the global subtarget anyway. If we can get away without
forcing target-cpu, we should just as well be able to get away without
propagating target-features.