This is mostly used in contexts for creating temporary
stack objects. On AMDGPU, a 4-byte aligned access is
always good enough to be legal for any size. All
stack accesses additionally need to be decomposed into
4-byte accesses, so there's no benefit to increasing
the alignment and the necessary padding for wider
types is just a waste of stack space.
Allow the target to specify a lower preferred alignment.
Add a new DataLayout helper to get the preferred but
ABI respecting alignment for cases where that is necessary
(e.g. passed to an arbitrary function call).
This is the approach discussed as an alternative to D28920