This is an archive of the discontinued LLVM Phabricator instance.

tsan: optimize __tsan_read/write16
ClosedPublic

Authored by dvyukov on Nov 25 2021, 6:50 AM.

Details

Summary

These callbacks are used for SSE vector accesses.
In some computational programs these accesses dominate.
Currently we do 2 uninlined 8-byte accesses to handle them.
Inline and optimize them similarly to unaligned accesses.
This reduces the vector access benchmark time from 8 to 3 seconds.

Depends on D112603.

Diff Detail

Event Timeline

dvyukov requested review of this revision.Nov 25 2021, 6:50 AM
dvyukov created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptNov 25 2021, 6:50 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
melver accepted this revision.Nov 25 2021, 7:31 AM
melver added inline comments.
compiler-rt/lib/tsan/rtl/tsan_rtl_access.cpp
482

This code mostly duplicates the above. What if you wrote it as a 2-iteration for-loop? Will it generate worse or better code?

This revision is now accepted and ready to land.Nov 25 2021, 7:31 AM
dvyukov updated this revision to Diff 389799.Nov 25 2021, 7:52 AM

hoist and dedup declaration of the cur variable

dvyukov added inline comments.Nov 25 2021, 7:56 AM
compiler-rt/lib/tsan/rtl/tsan_rtl_access.cpp
482

I've tried this code:

ALWAYS_INLINE USED void MemoryAccess16(ThreadState* thr, uptr pc, uptr addr,
                                       AccessType typ) {
  const uptr size = 16;
  FastState fast_state = thr->fast_state;
  if (UNLIKELY(fast_state.GetIgnoreBit()))
    return;
  RawShadow* shadow_mem = MemToShadow(addr);
  bool traced = false;
  Shadow cur(fast_state, 0, 8, typ);
  for (uptr i = 0; i < 2; i++, shadow_mem += kShadowCnt) {
    LOAD_CURRENT_SHADOW(cur, shadow_mem);
    if (LIKELY(ContainsSameAccess(shadow_mem, cur, shadow, access, typ)))
      continue;
    if (!traced && !TryTraceMemoryAccessRange(thr, pc, addr, size, typ))
      return RestartMemoryAccess16(thr, pc, addr, typ);
    traced = true;
    if (UNLIKELY(CheckRaces(thr, shadow_mem, cur, shadow, access, typ)))
      return;
  }
}

and it produces worse code:

before:
$ TIME="%e" time perf record ./bench_memory_access 1 1000000000 12
2.70
2.67
2.64

after:
$ TIME="%e" time perf record ./bench_memory_access 1 1000000000 12
3.22
3.19
3.17

Disasm before:
https://gist.githubusercontent.com/dvyukov/d898e8abaffe5809d9a3ec517ec81ae6/raw/4ce636ce23b3215ef6fc762444b29778b9ab27ff/gistfile1.txt

after:
https://gist.githubusercontent.com/dvyukov/69bf08dc5382c4512fea08ab95f87c61/raw/c357cff208df982428b9854cdeeea01f3ac6afb2/gistfile1.txt

Compiler fails to registrize everything and spills some values onto stack.

This revision was landed with ongoing or failed builds.Dec 21 2021, 2:33 AM
This revision was automatically updated to reflect the committed changes.