Prior to v8.4a[*] the only way to get a 128-bit atomic load or store was via ldxp/stxp (or casp), which is not only inefficient but outright impossible without write access. So v8.4a extended the memory model so that any 16-byte operation aligned to 16 bytes (as all LLVM atomic load/stores must be) is atomic.
This patch implements ISel for these instructions in both SDAG and GISel. In both cases we go for ldp/stp implementations since atomics are much more likely to be GPR-based operations.
Why not custom legalize? Not that it really matters, I guess.