This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Adding patterns for floating point SVE add instructions.
ClosedPublic

Authored by amehsan on Sep 26 2019, 12:30 PM.

Details

Summary

I started looking into SVE support, and it seems that there is a significant amount of work that needs to be done. We are planning to spend some time to help with SVE support, but we are not quite sure where to start. So I thought to start with a very small patch that helps to understand the current status of SVE support. Also to see if there are any suggestions on how we can help more effectively with the development of SVE.

Diff Detail

Event Timeline

amehsan created this revision.Sep 26 2019, 12:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 26 2019, 12:30 PM

Hi!

Thanks for taking an interest in SVE codegen. The tests look fine, but we had a different approach in mind for generating many of the patterns. In this case instead of making separate def : Pat< lines for each type, we add an SDPatternOperator and pattern to the multiclass for that instruction type.

So FADD_ZZZ is an instance of sve_fp_3op_u_zd, and by changing that definition in SVEInstrFormats.td to look like the following code we create matchers for all the instructions using that template.

class sve_fp_3op_u_zd<bits<2> sz, bits<3> opc, string asm,
                      ZPRRegOp zprty,
                      ValueType vt, ValueType vt2, SDPatternOperator op>
: I<(outs zprty:$Zd), (ins  zprty:$Zn, zprty:$Zm),
  asm, "\t$Zd, $Zn, $Zm",
  "",
  [(set (vt zprty:$Zd), (op (vt zprty:$Zn), (vt2 zprty:$Zm)))]>, Sched<[]> {
  bits<5> Zd;
  bits<5> Zm;
  bits<5> Zn;
  let Inst{31-24} = 0b01100101;
  let Inst{23-22} = sz;
  let Inst{21}    = 0b0;
  let Inst{20-16} = Zm;
  let Inst{15-13} = 0b000;
  let Inst{12-10} = opc;
  let Inst{9-5}   = Zn;
  let Inst{4-0}   = Zd;
}

multiclass sve_fp_3op_u_zd<bits<3> opc, string asm, SDPatternOperator op> {
  def _H : sve_fp_3op_u_zd<0b01, opc, asm, ZPR16, nxv8f16, nxv8f16, op>;
  def _S : sve_fp_3op_u_zd<0b10, opc, asm, ZPR32, nxv4f32, nxv4f32, op>;
  def _D : sve_fp_3op_u_zd<0b11, opc, asm, ZPR64, nxv2f64, nxv2f64, op>;
}

The instruction instantiations in AArch64SVEInstrInfo.td would then need to pass in the correct SDPatternOperator, with a null_frag operator for the others right now so that you don't add untested matchers:

 defm FADD_ZZZ    : sve_fp_3op_u_zd<0b000, "fadd", fadd>;
 defm FSUB_ZZZ    : sve_fp_3op_u_zd<0b001, "fsub", null_frag>;
 defm FMUL_ZZZ    : sve_fp_3op_u_zd<0b010, "fmul", null_frag>;
....

Please let me know if my explanation isn't clear enough :)

Thanks @huntergr. Will update the patch. Once I am sure we got this one right, we can look into others.

amehsan updated this revision to Diff 222309.Sep 28 2019, 5:50 PM

@huntergr Updated according to your comment. Please check. Thanks!

amehsan added inline comments.Sep 28 2019, 5:59 PM
llvm/lib/Target/AArch64/SVEInstrFormats.td
1218

Just realized that I forgot to check if we really need two distinct ValueType here or not. I will check that and remove it if not needed.

huntergr accepted this revision.Sep 30 2019, 2:50 AM

LGTM.

You can add the extra multiclass for ftsmul if you wish, but it's not needed until someone implements a matching pattern for that instruction. We only match it against an ACLE intrinsic downstream, not common SDag nodes.

llvm/lib/Target/AArch64/SVEInstrFormats.td
1218

I guess I left a bit out; the reason for the second valuetype was for the ftsmul instruction, which takes a vector of integers as one of the arguments. We added a second multiclass just for that instruction:

multiclass sve_fp_3op_u_zd_ftsmul<bits<3> opc, string asm,
                                  SDPatternOperator op> {
  def _H : sve_fp_3op_u_zd<0b01, opc, asm, ZPR16, nxv8f16, nxv8i16, op>;
  def _S : sve_fp_3op_u_zd<0b10, opc, asm, ZPR32, nxv4f32, nxv4i32, op>;
  def _D : sve_fp_3op_u_zd<0b11, opc, asm, ZPR64, nxv2f64, nxv2i64, op>;
}
This revision is now accepted and ready to land.Sep 30 2019, 2:50 AM

LGTM.

You can add the extra multiclass for ftsmul if you wish, but it's not needed until someone implements a matching pattern for that instruction. We only match it against an ACLE intrinsic downstream, not common SDag nodes.

Thanks for the review and useful comments.