This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][ROCDL] Add conversion for gpu.lane_id to ROCDL
ClosedPublic

Authored by sjw36 on Jul 6 2023, 4:25 PM.

Details

Summary

Creates rocdl.lane_id op with llvm conversion to:

__device__ static unsigned int __lane_id() {
    return  __builtin_amdgcn_mbcnt_hi(
               -1, __builtin_amdgcn_mbcnt_lo(-1, 0));
}

Diff Detail

Event Timeline

sjw36 created this revision.Jul 6 2023, 4:25 PM
Herald added a reviewer: dcaballe. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
sjw36 requested review of this revision.Jul 6 2023, 4:25 PM

As a high-level note, wouldn't it make more sense to define a rocdl wrapper around the mbcnt intrinsic and then rewrite to that so than we're not hiding a substantial bit of translation in the LLVM IR builder? I've generally seen the rocdl dialect as the place for 1:1 wrappers around LLVM functionality.

Second, as a minor note, when you have the final lane_id number, would it be possible to put range metadata on it - probably a conservative [0, 63] value, but, still, that'll allow for optimizations.

arsenm added a subscriber: arsenm.Jul 7 2023, 7:26 AM

I've generally seen the rocdl dialect as the place for 1:1 wrappers around LLVM functionality.

Let's not keep doing that. Just call intrinsics directly. All of those straight intrinsic wrappers introduce more trouble

arsenm added a comment.Jul 7 2023, 7:27 AM

I've generally seen the rocdl dialect as the place for 1:1 wrappers around LLVM functionality.

Let's not keep doing that. Just call intrinsics directly. All of those straight intrinsic wrappers introduce more trouble

I missed the "dialect" part here. Just don't call the ockl wrapper for this

The ROCDL dialect is meant to directly represent the AMDGPU-specific intrinsics in LLVM IR within MLIR, and to work with the LLVM dialect (which is LLVM-IR-in-MLIR that you can run through a simple translation layer)

sjw36 updated this revision to Diff 543104.Jul 21 2023, 3:45 PM
  • updated for review feedback
  • in the future, the backend should provide a lane_id intrinsic if the HW ever adds it

This overall design works, but I've got minor nitpicks

mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
89

Would it be possible to instead explicitly spell out the arguments and the result type, given that they're known from the LLVM?

mlir/test/Target/LLVMIR/rocdl.mlir
60

Could we get a variable capture in this? %[[loCount:.+]] = call i32 ...

sjw36 added inline comments.Jul 24 2023, 1:01 PM
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
89

Hoping this is short lived and very limited usage. So I am inclined to leave it.

krzysz00 added inline comments.Jul 24 2023, 2:30 PM
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
89

I'm thinking this might have some other use in the future we don't know about yet so let's do it right

sjw36 updated this revision to Diff 544005.Jul 25 2023, 9:10 AM
  • updated per review
krzysz00 accepted this revision.Jul 25 2023, 10:57 AM

Overall, looks good

mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
91

Nit: We don't need to list the argument types here since they're statically known, but this is a weird rare intrinsic so it's fine.

This revision is now accepted and ready to land.Jul 25 2023, 10:57 AM
This revision was landed with ongoing or failed builds.Jul 26 2023, 8:13 AM
This revision was automatically updated to reflect the committed changes.