A loaded value with multiple users compared with 0 will become a load and test single instruction. The load is not folded in this case (multiple users), but the compare instruction is eliminated.
This patch returns 0 cost for the icmp in these cases.
This changes just 33 instruction query results. One file changed on spec - two loops are now are kept scalar (not vectorized), see:
I tried also handling the load i32 ->sext i64 case, but this doubled the LOCs of the patch, while changing absolutely nothing (not a single LV query / file), so this seemed not useful enough to keep.