[ADMGPU] SDWA peephole optimization pass.


[ADMGPU] SDWA peephole optimization pass.

First iteration of SDWA peephole.

This pass tries to combine several instruction into one SDWA instruction. E.g. it converts:

V_LSHRREV_B32_e32 %vreg0, 16, %vreg1
V_ADD_I32_e32 %vreg2, %vreg0, %vreg3
V_LSHLREV_B32_e32 %vreg4, 16, %vreg2


V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD


Pass structure:

  1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''.
  2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0'''
  3. Iterate over all potential instructions and check if they can be converted into SDWA.
  4. Convert instructions to SDWA.

This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done).
There are several ways this pass can be improved:

  1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass.
  2. Introduce more SDWA patterns
  3. Introduce mnemonics to limit when SDWA patterns should apply

Reviewers: vpykhtin, alex-t, arsenm, rampitec

Subscribers: wdng, nhaehnle, mgorny

Differential Revision: https://reviews.llvm.org/D30038


skoltonMar 21 2017, 5:51 AM
Differential Revision
D30038: [ADMGPU] SDWA peephole optimization pass.
rL298364: [X86][AVX512] Add _mm512_cvtsd_f64 and _mm512_cvtss_f32 intrinsics (PR32305)