This is un-optimized by the DAG combiner now to avoid
the from-i1 conversion. We get slightly better code
by doing this than materializing the weird constants
since there is no 64-bit select which end up getting split
up. The expanded pattern also shows up in fceil / ffloor
lowering.
This is worse depending on the rate of v_cvt_f64_i32. I'm
not sure the scheduling models is accurate for every subtarget;
llvm-mca is saying v_cvt_f64_i32 is quarter rate but I believe
it is supposed to be half rate (or at least it used to be on older
subtargets)