When compiler can't emit native code for atomic operations, it instead emits calls to libatomic which provides fallback atomic operation implementation. For all relevant architectures the fallback happens only for weirdly-sized atomics (e.g. 7 or 64 bytes). There are 4 relevant functions in libatomic: atomic_load, atomic_store, atomic_exchange and atomic_compare_exchange (i.e. we can't do fetch_add on a 64-byte struct). See section 1.2.4 of http://stdatomic.gforge.inria.fr for details of the interface.
Libatomic fallback causes both false positives and false negatives for tsan. Current implementation uses pthread_mutex_t to guard atomic operations, which tsan intercepts and that causes massive false negatives. However, 16-byte operations are done with CMPXCHG16B if possible and that becomes completely invisible to tsan and causes false positives.
Intercept libatomic functions for variable-sized atomics and model them precisely for tsan.
"we may have been already included "
something here is redundant