The offload packager embeds the features in the offloading binary when
performing LTO. This had an incorrect interaction with the
--cuda-feature option because we weren't deriving the features from
the CUDA toolchain arguments when it was being specified. This patch
fixes this so the features are correctly overrideen when using this
argument.
However, this brings up a question of how best to handle conflicting
target features. The user could compile many libraries with different
features, in this case we do not know which one to pick. This was not
previously a problem when we simply passed the features in from the CUDA
installation at link-link because we just defaulted to whatever was
current on the system.