This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Always emit amdgpu-flat-work-group-size
ClosedPublic

Authored by arsenm on May 31 2019, 9:05 AM.

Details

Summary

The backend default maximum should be the hardware maximum, so the
frontend should set the implementation defined default maximum.

Diff Detail

Event Timeline

arsenm created this revision.May 31 2019, 9:05 AM
b-sumner added inline comments.May 31 2019, 9:56 AM
lib/CodeGen/TargetInfo.cpp
7947

Theoretically, shouldn't the minimum be 1?

arsenm marked an inline comment as done.May 31 2019, 10:32 AM
arsenm added inline comments.
lib/CodeGen/TargetInfo.cpp
7947

That's what I thought, but the backend is defaulting to 2 * wave size now

yaxunl added inline comments.May 31 2019, 1:59 PM
lib/CodeGen/TargetInfo.cpp
7947

I don't get it. This attribute indicates the possible workgroup size range this kernel may be run with, right? It only depends on how user execute the kernel. How is it related to backend defaults?

arsenm marked an inline comment as done.Jun 3 2019, 6:54 AM
arsenm added inline comments.
lib/CodeGen/TargetInfo.cpp
7947

The backend currently assumes 128,256 by default as the bounds. I want to make this a frontend decision, and make the backend assumption the most conservative default

My concern is that this essentially forcing user to add amdgpu_flat_work_group_size attribute to all kernels that are executed outside of (128,256). Potentially this can cause lots of regressions for existing OpenCL apps. I am not sure if it is feasible to force all OpenCL apps to make this change. Should we do some tests before making this change?

We need to communicate with anyone generating IR to ensure this is being generated before we change the default. clang is only one of those generators. This change will also need to be documented in the usage document.

My concern is that this essentially forcing user to add amdgpu_flat_work_group_size attribute to all kernels that are executed outside of (128,256). Potentially this can cause lots of regressions for existing OpenCL apps. I am not sure if it is feasible to force all OpenCL apps to make this change. Should we do some tests before making this change?

This is already the case. This is just moving it to the frontend. There's no user observable change from this patch

We need to communicate with anyone generating IR to ensure this is being generated before we change the default. clang is only one of those generators. This change will also need to be documented in the usage document.

The planned change is to make the backend more conservative, so it shouldn't break other frontends

My concern is that this essentially forcing user to add amdgpu_flat_work_group_size attribute to all kernels that are executed outside of (128,256). Potentially this can cause lots of regressions for existing OpenCL apps. I am not sure if it is feasible to force all OpenCL apps to make this change. Should we do some tests before making this change?

This is already the case. This is just moving it to the frontend. There's no user observable change from this patch

I do want to reduce the minimum bound to 1 at least (which is the default for graphics shaders), but that's a separate change

We need to communicate with anyone generating IR to ensure this is being generated before we change the default. clang is only one of those generators. This change will also need to be documented in the usage document.

The planned change is to make the backend more conservative, so it shouldn't break other frontends

It may not break other frontends, but could cause substantial performance regressions. At a minimum the summary should clearly mention this possibility.

We need to communicate with anyone generating IR to ensure this is being generated before we change the default. clang is only one of those generators. This change will also need to be documented in the usage document.

The planned change is to make the backend more conservative, so it shouldn't break other frontends

It may not break other frontends, but could cause substantial performance regressions. At a minimum the summary should clearly mention this possibility.

The summary of the future backend patch will

rampitec added inline comments.Aug 22 2019, 12:02 PM
lib/CodeGen/TargetInfo.cpp
7947

I would agree that minimum should be 1. Backend's choice is also unclear, but emitting a false attribute is a separate issue.

arsenm updated this revision to Diff 217301.Aug 26 2019, 9:07 PM

Make lower bound 1, although this is a behavior change

This revision is now accepted and ready to land.Aug 27 2019, 8:38 AM
arsenm closed this revision.Aug 27 2019, 12:24 PM

r370101