When a struct's size is not a power of 2, the corresponding _Atomic() type is promoted to the nearest. We already correctly handled normal C++ expressions of this form, but direct calls to the __c11_atomic_whatever builtins ended up performing dodgy operations on the smaller non-atomic types (e.g. memcpy too much). Later optimisations removed this as undefined behaviour.
This patch converts EmitAtomicExpr to allocate its temporaries at the full atomic width, sidestepping the issue.
It also tidies up that function a little: previously there was a confusing dual-return situation, where sometimes the result was returned as an RValue, other times stored into a user-provided destination. I don't think this is necessary (it dates back from the very beginning of CGAtomic.cpp).
s/Atomics/AI/? Or perhaps Atomic?