Proposed patch for ARMv8.1 Large System Extensions support.
Currently missing support for CASP, NAND (not supported by LSE instructions), and subword SUB/AND.
Requesting comments, looking into fixing subword SUB/AND, and writing tests (currently tested by atomic-ops.ll with -mcpu=thunderx2t99). Also, the ATOMIC_LOAD_CLR was added to generic IR, I suspect I should move it to AArch64ISD, feedback requested.
ATOMIC_LOAD_ADD with discarded return should probably be performed via a separate IR (ATOMIC_ADD, etc). This is straightforward to implement and should increase performance, RFC there too.
Finally, weaker memory ordering should be implemented, but even LDX/STX seem to use full acquire/relax even when __ATOMIC_RELAXED is specified. This will likely be looked into in a later patch.
I think this will also catch a load-exclusive. As far as I know those are ok with XZR/WZR as target register. The only problematic once are the atomicrmw operations. These are SWP and the LD<OP> where <OP> is ADD, CLR, EOR, SET, SMAX, SMIN, UMAX, and UMIN.
But then again, it at least is safe to overestimate.