This is an archive of the discontinued LLVM Phabricator instance.

[mlir][AMDGPU] Add packed 8-bit float conversion ops and lowering
ClosedPublic

Authored by krzysz00 on Jun 8 2023, 10:01 AM.

Details

Summary

Define operations that wrap the gfx940's new operations for converting
between f32 and registers containing packed sets of four 8-bit floats.

Define rocdl operations for the intrinsics and an AMDGPU dialect
wrapper around them (to account for the fact that MLIR distinguishes
the two float formats at the type level but that the LLVM IR does
not).

Define an ArithToAMDGPU pass, meant to run before conversion to LLVM,
that replaces relevant calls to arith.extf and arith.truncf with the
packed operations in the AMDGPU dialect. Note that the conversion
currently only handles scalars and vectors of rank <= 1, as we do not
have a usecase for multi-dimensional vector support right now.

Diff Detail

Event Timeline

krzysz00 created this revision.Jun 8 2023, 10:01 AM
Herald added a reviewer: dcaballe. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
krzysz00 requested review of this revision.Jun 8 2023, 10:01 AM
jsjodin added inline comments.
mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
65

Just curious if naming the op ...Fp8x2Op vs ...2xFp8Op is of significance? Generally types are written N x BaseType, so it might be less confusing with "packed_trunc_2xfp8" instead of "packed_trunc_fp8x2".

krzysz00 updated this revision to Diff 557257.Sep 22 2023, 1:54 PM

Address naming comment from review

jsjodin accepted this revision.Sep 28 2023, 6:36 AM

Looks good to me.

This revision is now accepted and ready to land.Sep 28 2023, 6:36 AM