Currently we returned true as long as the source type is larger than the dest type, but truncates are only "free" if we can use a subregister extract. This corrects the implementation to match that.
It looks like the EVT signature was also running the check on vectors which was probably unintentional. So I've corrected that here. I think this may have exposed some missing cases in the cost model.
The avx512-mask-op.ll changed because we previously promoted the load to 32-bits under the assumption that truncating from i32 to i1 is free. This ultimately allowed the two ands to be CSEd by the DAG since there were then both i32. Now we have one in i32 and one in i8.
missing AVX512 cost?