This is an archive of the discontinued LLVM Phabricator instance.

[flang] Add a subset of PowerPC MMA (Matrix Multiply Accelerate) intrinsics
ClosedPublic

Authored by DanielCChen on Jul 19 2023, 9:43 AM.

Details

Summary

This patch includes the a subset of MMA intrinsics that are included in the mma intrinsic module:

mma_assemble_acc
mma_assemble_pair
mma_build_acc
mma_disassemble_acc
mma_disassemble_pair

Diff Detail

Event Timeline

DanielCChen created this revision.Jul 19 2023, 9:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 19 2023, 9:43 AM
DanielCChen requested review of this revision.Jul 19 2023, 9:43 AM

I bet that the mma ops are only available on newer ppc cores and not on the old Apple ones. I could not find a feature check.

I bet that the mma ops are only available on newer ppc cores and not on the old Apple ones. I could not find a feature check.

Correct. The MMA intrinsics are for Power 10 only. It is documented in IBM's compiler Language Reference

I wanted to say that I could try to use them on an Apple and the compiler would crash somewhere, see https://reviews.llvm.org/D152131.

I wanted to say that I could try to use them on an Apple and the compiler would crash somewhere, see https://reviews.llvm.org/D152131.

Right. There is a pending work item that will add semantic checking of Power10 cpu for these MMA intrinsics. Because there are some vector intrinsics that are for Power 10 only as well. We deferred the work until we have all MMA and vector intrinsics in.

Add the MMA intrinsic module and the testing of LLVM IR.

DanielCChen updated this revision to Diff 542546.EditedJul 20 2023, 8:31 AM

Attempt to fix the x64 Window build failure that failed due to linker error as:

FIRBuilder.lib(PPCIntrinsicCall.cpp.obj) : error LNK2019: unresolved external symbol "void __cdecl Fortran::common::die(char const *,...)" (?die@common@Fortran@@YAXPEBDZZ) referenced in function "class mlir::FunctionType __cdecl fir::getMmaIrFuncType(class mlir::MLIRContext *,enum fir::MMAOp)" (?getMmaIrFuncType@fir@@YA?AVFunctionType@mlir@@PEAVMLIRContext@3@W4MMAOp@1@@Z)
bin\tco.exe : fatal error LNK1120: 1 unresolved externals

Fixing a build failure using clang as the build compiler.

lib/Optimizer/Builder/PPCIntrinsicCall.cpp:1100:3: error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default]
kkwli0 added inline comments.Jul 24 2023, 11:24 AM
flang/lib/Optimizer/Builder/PPCIntrinsicCall.cpp
349

I don't think this is a relevant change for this patch.

371

Nit:

  • "quadCnt: number of arguments that has __vector_quad type"
407

See comment above

704

Unrelated change

812

Unrelated change

846

Unrelated change

858

Unrelated change

1214

Unrelated change

1222

Unrelated change

1225

Unrelated change

1293

use braced initializers

1301

use braced initializers

1363

use braced initializers

flang/lib/Optimizer/Dialect/FIROps.cpp
958–964

Nit: "e.g. fir.vector<4:ui32> => mlir.vector<4xi32>"

flang/module/mma.f90
83

"according to Fortran"

91

"according to Fortran"

DanielCChen marked 14 inline comments as done.Jul 25 2023, 6:48 AM
DanielCChen added inline comments.
flang/lib/Optimizer/Builder/PPCIntrinsicCall.cpp
349

Sure. I will remove the unrelated changes for a separate patch.

DanielCChen marked an inline comment as done.

Address review comments to remove unrelated changes.

DanielCChen marked 2 inline comments as done.

It seems the module and the test are missing from the last update. Put them back in after rebase.

Could you please get the pwr10 check in before continuing to add more unchecked intrinsics.

Could you please get the pwr10 check in before continuing to add more unchecked intrinsics.

Why is this important to you? This is a compiler under active development that is not yet suitable for production use.

Could you please get the pwr10 check in before continuing to add more unchecked intrinsics.

The pwr10 check that covers all PowerPC intrinsics in this patch is in https://reviews.llvm.org/D156243.
I will try to land that one before this one.

Could you please get the pwr10 check in before continuing to add more unchecked intrinsics.

Why is this important to you? This is a compiler under active development that is not yet suitable for production use.

The buildbots run the test suite from trunk and no buildbot caught the fact that there are issues trying to run the power10 intrinsics on a power9. There are missing tests and/or checks.

Could you please get the pwr10 check in before continuing to add more unchecked intrinsics.

Why is this important to you? This is a compiler under active development that is not yet suitable for production use.

The buildbots run the test suite from trunk and no buildbot caught the fact that there are issues trying to run the power10 intrinsics on a power9. There are missing tests and/or checks.

These Power10 intrinsic test in this patch and the other one (to be landed) has "-mcpu=pwr10" specified. It also only dumps the LLRM IR. In theory, they should not run on Power9. That being said, as I mentioned in the previous note, I put the semantic checking code in a a different patch, which should issue a compile time error message for unsupported cpu option.

This revision is now accepted and ready to land.Jul 26 2023, 7:29 AM

LG

Thanks. I will wait one more day in case there are more comments.

Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2023, 11:36 AM

Builds are failing to compile this code in PPCIntrinsicCall.cpp:

const char *getMmaIrIntrName(MMAOp mmaOp) {
  switch (mmaOp) {
  case MMAOp::AssembleAcc:
    return "llvm.ppc.mma.assemble.acc";
  case MMAOp::AssemblePair:
    return "llvm.ppc.vsx.assemble.pair";
  case MMAOp::DisassembleAcc:
    return "llvm.ppc.mma.disassemble.acc";
  case MMAOp::DisassemblePair:
    return "llvm.ppc.vsx.disassemble.pair";
  }
}

mlir::FunctionType getMmaIrFuncType(mlir::MLIRContext *context, MMAOp mmaOp) {
  switch (mmaOp) {
  case MMAOp::AssembleAcc:
    return genMmaVqFuncType(context, /*Quad*/ 0, /*Pair*/ 0, /*Vector*/ 4);
  case MMAOp::AssemblePair:
    return genMmaVpFuncType(context, /*Quad*/ 0, /*Pair*/ 0, /*Vector*/ 2);
  case MMAOp::DisassembleAcc:
    return genMmaDisassembleFuncType(context, mmaOp);
  case MMAOp::DisassemblePair:
    return genMmaDisassembleFuncType(context, mmaOp);
  }
}

with the message

flang/lib/Optimizer/Builder/PPCIntrinsicCall.cpp:1288:1: error: control reaches end of non-void function [-Werror=return-type]

These two switches probably need an error default.

Builds are failing to compile this code in PPCIntrinsicCall.cpp:

const char *getMmaIrIntrName(MMAOp mmaOp) {
  switch (mmaOp) {
  case MMAOp::AssembleAcc:
    return "llvm.ppc.mma.assemble.acc";
  case MMAOp::AssemblePair:
    return "llvm.ppc.vsx.assemble.pair";
  case MMAOp::DisassembleAcc:
    return "llvm.ppc.mma.disassemble.acc";
  case MMAOp::DisassemblePair:
    return "llvm.ppc.vsx.disassemble.pair";
  }
}

mlir::FunctionType getMmaIrFuncType(mlir::MLIRContext *context, MMAOp mmaOp) {
  switch (mmaOp) {
  case MMAOp::AssembleAcc:
    return genMmaVqFuncType(context, /*Quad*/ 0, /*Pair*/ 0, /*Vector*/ 4);
  case MMAOp::AssemblePair:
    return genMmaVpFuncType(context, /*Quad*/ 0, /*Pair*/ 0, /*Vector*/ 2);
  case MMAOp::DisassembleAcc:
    return genMmaDisassembleFuncType(context, mmaOp);
  case MMAOp::DisassemblePair:
    return genMmaDisassembleFuncType(context, mmaOp);
  }
}

with the message

flang/lib/Optimizer/Builder/PPCIntrinsicCall.cpp:1288:1: error: control reaches end of non-void function [-Werror=return-type]

These two switches probably need an error default.

@vdonaldson Thanks for bringing it up! I am looking into it now.

Thanks to @klausler for fixing it!