AMDGPU doesn't want/need stack alignment > 4 ever. The
DataLayout's preferred alignment cannot be lower than the ABI
required alignment, which defaults to the type size. For
stack temporaries the ABI alignment constraints do not matter,
so request align 4 in all cases to save space.
This will mitigate regressing the stack usage of every program
in a future commit.