Revision rC329829 added the architecture to "target-features". This
prevents inlining of previously generated bitcode because the
feature sets don't match. Thus duplicate the information from
"target-cpu" to avoid writing special cases in the analysis.
I'm not sure if that will save us in the long term because inlining
will break again when we add new features. Additionally, using later
CUDA versions might raise the PTX version which is also a feature...