This patch adds support for the following builtins on the Cortex-M0 and Cortex-M3:
sync_fetch_and_add_4
sync_fetch_and_add_8
sync_fetch_and_and_4
sync_fetch_and_and_8
sync_fetch_and_nand_4
sync_fetch_and_nand_8
sync_fetch_and_or_4
sync_fetch_and_or_8
sync_fetch_and_sub_4
sync_fetch_and_sub_8
sync_fetch_and_xor_4
sync_fetch_and_xor_8
The assembly for Cortex-M3 and up could be slightly more efficient, but I've chosen to keep it simple so that all ARM processors can deal with it.