AArch64 peephole optimizer tries to remove a compare instruction. One case is comparison with 0: CmpInst %vreg, 0. It removes CmpInst and modifies the instruction defining ‘%vreg’ to the S version. It is not checked that the S version and CmpInst produce the same condition flags.
For example the following code
%vreg7<def> = ANDWri %vreg6<kill>, 15; GPR32sp:%vreg7 GPR32:%vreg6 %vreg8<def> = SUBSWri %vreg7<kill>, 0, 0, %NZCV<imp-def>; GPR32:%vreg8 GPR32sp:%vreg7 Bcc 3, <BB#2>, %NZCV<imp-use>
is transformed to the incorrect code:
%vreg7<def> = ANDSWri %vreg6<kill>, 15, %NZCV<imp-def>; GPR32common:%vreg7 GPR32:%vreg6 Bcc 3, <BB#2>, %NZCV<imp-use>
It's incorrect because 'SUBSWri %vreg7<kill>, 0' always sets the carry flag to 1 but 'ANDSWri %vreg6<kill>, 15' always sets the carry flag to 0. As a result Bcc which checks the carry flag is not working as expected.
The submitted patch contains the fix of the defect and additional peephole rules for Cmp+Brc sequences. New regression tests are created.
Changes description:
• Refactoring:
a. Function name ‘modifiesConditionCode’ is changed to ‘areCFlagsAccessedBetweenInstrs’ to reflect that the function can check modifying accesses, reading accesses or both.
b. Function ‘AArch64InstrInfo::optimizeCompareInstr’
- Documented the function
- Cmp_NZCV is DeadNZCVIdx to reflect that it is an operand index of dead NZCV
- The code for the case of substituting CmpInstr is put into separate functions the main of them is ‘substituteCmpInstr’.
• Fix of the defect:
A new algorithm was implemented:
- It’s checked that the same condition code is read by instructions after CmpInstr and before the next modification of NZCV. The optimization is not applied if different condition codes are used. It might be difficult to find a candidate for substitution to satisfy all of them. I think this case with multiple used condition codes does not happen often.
- Then it’s checked that the instruction which defines a register for CmpInstr can produce the needed condition code itself or its S variant. If it or its S variant can produce then CmpInstr is removed.
• New peephole rules for Cmp+Brc:
The following rules were implemented:
SUBS reg, 0; B.LO => B <false path>
SUBS reg, 0; B.HS <label> => B <label>
• Testing status: all existing + new tests passed.
Can you not just use