If the flat work group size implied a larger minimum, this was
ignoring the requested maximum. This was interfering with the logic to
propagate amdgpu-waves-per-eu when accounting for the inferred flat
workgroup size. Just clamp the minimum so we still preserve the
requested maximum.
Plus I'm not really sure what the point of the minimum really is or
does. It is queried in a few IR passes (AMDGPUPromoteAlloca and TTI)
use it for getting a number of VGPRs, but everything else uses the
maximum.
No test here since I don't think this is a directly observable
property, but fixes a future patch which propagates
amdgpu-waves-per-eu.
Just use std::max?