The backend default maximum should be the hardware maximum, so the
frontend should set the implementation defined default maximum.
I don't get it. This attribute indicates the possible workgroup size range this kernel may be run with, right? It only depends on how user execute the kernel. How is it related to backend defaults?
My concern is that this essentially forcing user to add amdgpu_flat_work_group_size attribute to all kernels that are executed outside of (128,256). Potentially this can cause lots of regressions for existing OpenCL apps. I am not sure if it is feasible to force all OpenCL apps to make this change. Should we do some tests before making this change?
We need to communicate with anyone generating IR to ensure this is being generated before we change the default. clang is only one of those generators. This change will also need to be documented in the usage document.