This patch adds support for the following builtins on the Cortex-M0 and up:
__sync_fetch_and_add_4 __sync_fetch_and_add_8 __sync_fetch_and_and_4 __sync_fetch_and_and_8 __sync_fetch_and_nand_4 __sync_fetch_and_nand_8 __sync_fetch_and_or_4 __sync_fetch_and_or_8 __sync_fetch_and_sub_4 __sync_fetch_and_sub_8 __sync_fetch_and_xor_4 __sync_fetch_and_xor_8
The assembly for Cortex-M3 and up could be slightly more efficient, but I've chosen to keep it simple so that all ARM processors can deal with it.
Probably want to use https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html as this will track the latest release of GCC.