This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][GPU] Make max flat work group size for ROCDL kernels configurable
ClosedPublic

Authored by krzysz00 on Dec 14 2021, 9:38 AM.

Details

Summary

While the default value for the amdgpu-flat-work-group-size attribute,
"1, 256", matches the defaults from Clang, some users of the ROCDL dialect,
namely Tensorflow, use larger workgroups, such as 1024. Therefore,
instead of hardcoding this value, we add a rocdl.max_flat_work_group_size
attribute that can be set on GPU kernels to override the default value.

Diff Detail

Event Timeline

krzysz00 created this revision.Dec 14 2021, 9:38 AM
krzysz00 requested review of this revision.Dec 14 2021, 9:38 AM
whchung requested changes to this revision.Dec 14 2021, 9:53 AM

How about add a unit test to check the attribute can be overriden?

mlir/lib/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp
21

I assume we can live without this header file?

This revision now requires changes to proceed.Dec 14 2021, 9:53 AM
krzysz00 updated this revision to Diff 394311.Dec 14 2021, 10:47 AM
  • Fix bug, add tests

I've added a unit test

mlir/lib/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp
21

We do need one, since I'm using raw_svector_ostream below

whchung accepted this revision.Dec 14 2021, 10:50 AM
This revision is now accepted and ready to land.Dec 14 2021, 10:50 AM