Proposed patch for ARMv8.1 Large System Extensions support.
Currently missing support for CASP, NAND (not supported by LSE instructions), and subword SUB/AND.
Requesting comments, looking into fixing subword SUB/AND, and writing tests (currently tested by atomic-ops.ll with -mcpu=thunderx2t99). Also, the ATOMIC_LOAD_CLR was added to generic IR, I suspect I should move it to AArch64ISD, feedback requested.
ATOMIC_LOAD_ADD with discarded return should probably be performed via a separate IR (ATOMIC_ADD, etc). This is straightforward to implement and should increase performance, RFC there too.
Finally, weaker memory ordering should be implemented, but even LDX/STX seem to use full acquire/relax even when __ATOMIC_RELAXED is specified. This will likely be looked into in a later patch.