AMDGPUCodeGenPrepare expands this most of the time, but not always. We
will always at least need a fallback option here. This is the 3rd
implementation of the same expansion in the backend. Eventually I
would like to eliminate the IR expansion (and the DAG version
Currently the new legalizer path produces a better result, since the
IR expansion results in extra operations which need to be combined
out. Notably, the IR expansion results in multiplies by 0.