This patch contains a pass that transforms CBZ/CBNZ/TBZ/TBNZ instructions into a conditional branch (B.cond), when the NZCV flags can be set for "free". This is preferred on targets that have more flexibility when scheduling B.cond instructions as compared to CBZ/CBNZ/TBZ/TBNZ (assuming all other variables are equal). This can also reduce register pressure.
A few examples:
add w8, w0, w1 -> cmn w0, w1 ; CMN is an alias of ADDS. cbz w8, .LBB_2 -> b.eq .LBB0_2 add w8, w0, w1 -> adds w8, w0, w1 ; w8 has multiple uses. cbz w8, .LBB1_2 -> b.eq .LBB1_2 sub w8, w0, w1 -> subs w8, w0, w1 ; w8 has multiple uses. tbz w8, #31, .LBB6_2 -> b.ge .LBB6_2
The pass is run after the Machine Combiner because I saw a few cases where converting from 'ADD' to 'ADDS' prevented fusion. I also noticed the AArch64 Conditional Compares pass (i.e., CCMP formation) doesn't handle 'ANDS' and 'BICS', but since this pass is later the formation of CCMP instructions isn't negatively impacted.
No correctness issues across SPEC2000, SPEC2006 or the llvm-test-suite. When running on Falkor, this improves SPEC2006/libquantum by 8.5%. I also saw a few other small improvements in SPEC200[0|6] of ~1-2%. Otherwise, mostly small improvements within noise and no regressions above noise. I saw a few minor improvements on Kryo and didn't test any other subtargets as none are readily available. FWIW, I'm okay with predicating this with a Feature flag, if preferred.