This is an optimization for zero extensions of i1:s, which resulted from looking into the perl regression against GCC. I noticed a lot of LHI 0, LHI 1, LOC sequences, which gcc did not seem to have.
Basically, after an ICMP NE 0, or an ICMP EQ 1, those known constants are already the ones needed as the CC result, so there is no need to load them with LHI (LGHI).
The i32 cases where quite straightforward, but then doing the same for i64 was a bit more of an effort. Since we always return i32 from getSetCCResultType(), these cases needed different handling depending on the user. If an i64 use is needed, then I chose to promote the setcc result in combineZERO_EXTEND(). For the i32 user (of i64 comparison), the reused operand was instead truncated.
For the case of i64 user, a lot of llgfr:s remained without the handling in combineZERO_EXTEND():
Optimized legalized selection DAG: %bb.0 'prototype_p:bb' SelectionDAG has 15 nodes: t0: ch = EntryToken t5: i64,ch = load<(load 8 from `%0** undef`)> t0, undef:i64, undef:i64 t25: i32 = truncate t5 t24: i32 = SystemZISD::ICMP t5, Constant:i64<0>, TargetConstant:i32<0> t29: i32 = SystemZISD::SELECT_CCMASK Constant:i32<1>, t25, TargetConstant:i32<14>, TargetConstant:i32<6>, t24 t20: i64 = zero_extend t29 t12: ch,glue = CopyToReg t0, Register:i64 $r2d, t20 t13: ch = SystemZISD::RET_FLAG t12, Register:i64 $r2d, t12:1 -> ltg %r0, 0(%r1) lochilh %r0, 1 llgfr %r2, %r0
The general effects on SPEC'17 output:
- Just the i32/i32 cases of NE 0:
lhi : 225081 221486 -3595 lghi : 445509 444420 -1089 locghilh : 3717 2676 -1041 chsi : 57297 56385 -912 lt : 13672 14348 +676 lochilh : 8796 9424 +628 llgfr : 90010 90533 +523 risbgn : 137540 137980 +440 tmll : 53266 53693 +427 ltr : 6140 6550 +410 lgr : 849527 849890 +363 llc : 39671 39994 +323 locrlh : 1492 1807 +315 ...
- Also the i64 cases, compared to (1):
lghi : 444420 441828 -2592 lhi : 221486 219782 -1704 lr : 62223 62878 +655 cghsi : 32665 32175 -490 tmll : 53693 54181 +488 llgfr : 90533 90066 -467 ltg : 157760 158133 +373 risbgn : 137980 138334 +354 jne : 42684 42990 +306 lg : 982786 982931 +145 je : 335154 335281 +127 cije : 107363 107237 -126 lgfr : 91442 91565 +123 lgr : 849890 850006 +116 ltgr : 10951 11067 +116 ...
- Also EQ 1 (both i32 and i64), compared to (2):
lochilh : 9492 9787 +295 lhi : 219782 219567 -215 lochie : 14183 13975 -208 chi : 53350 53448 +98 chsi : 56337 56259 -78 lr : 62878 62950 +72 tmll : 54181 54243 +62 lghi : 441828 441770 -58 locghie : 7174 7116 -58 risbgn : 138334 138383 +49 ...
In total, master <> (3):
lhi : 225081 219567 -5514 lghi : 445509 441770 -3739 locghilh : 3717 2673 -1044 chsi : 57297 56259 -1038 lochilh : 8796 9787 +991 tmll : 53266 54243 +977 risbgn : 137540 138383 +843 lr : 62152 62950 +798 lt : 13672 14343 +671 jne : 42430 43028 +598 cghsi : 32644 32170 -474 lgr : 849527 849994 +467 ltr : 6140 6557 +417 ...
I see some more LR:s, which I think is when the reused constant also has another user. I did not manage to avoid these cases when working with the DAGs (local only), so some kind of pseudo-expander might be more powerful here. That probably requires more work, and I am not sure if trading an LHI for an LR is bad, since the comparison does not clobber the register...
There are less comparisons with memory - the value is now loaded, compared and reused (see fun9 below).
Does this seem like a good idea to try?
New tests, master <> patched (skipped functions identical):
fun0: fun0: chi %r2, 0 chi %r2, 0 lhi %r2, 0 < lochilh %r2, 1 lochilh %r2, 1 > br %r14 br %r14 fun4: fun4: cghi %r2, 0 cghi %r2, 0 lhi %r2, 0 < lochilh %r2, 1 lochilh %r2, 1 > br %r14 br %r14 fun5: fun5: cghsi 0(%r1), 0 | ltg %r0, 0(%r1) lghi %r0, 0 < locghilh %r0, 1 locghilh %r0, 1 stg %r0, 0(%r1) stg %r0, 0(%r1) br %r14 br %r14 fun6: fun6: chi %r2, 1 chi %r2, 1 lhi %r2, 0 | lochilh %r2, 0 lochie %r2, 1 | br %r14 br %r14 fun8: fun8: cghi %r2, 1 cghi %r2, 1 lhi %r2, 0 | lochilh %r2, 0 lochie %r2, 1 | br %r14 br %r14 fun9: fun9: cghsi 0(%r1), 1 | lg %r0, 0(%r1) lghi %r0, 0 | cghi %r0, 1 locghie %r0, 1 | locghilh %r0, 0 stg %r0, 0(%r1) stg %r0, 0(%r1) br %r14 br %r14
clang-format: please reformat the code
7 diff lines are omitted. See full path.