This PR improves memory barriers generated by atomic operations.
Memory barrier semantics of LL/SC:
LL: <memory-barrier> + <load-exclusive>
SC: <store-conditional> + <memory-barrier>
Changes:
- Remove unnecessary memory barriers before LL and between LL/SC.
- Fix acquire semantics. (If the SC instruction is not executed, then acquire semantics cannot be guaranteed. So, acquire barrier need to be generated when memory ordering has acquire.)