Atomic RMW is not necessary in InitializeGuardArray. It is supposed to run when no user code runs. And if user code runs concurrently, then the atomic RMW won't help anyway. So replace it with non-atomic RMW.
InitializeGuardArray takes more than 50% of time during re2 fuzzing:
real 0m47.215s
51.56% a.out a.out [.] __sanitizer_reset_coverage
6.68% a.out a.out [.] __sanitizer_cov 3.41% a.out a.out [.] __sanitizer::internal_bzero_aligned16(void*, unsigned long) 1.79% a.out a.out [.] __asan::Allocator::Allocate(unsigned long, unsigned long,
With this change:
real 0m31.661s
26.21% a.out a.out [.] sanitizer_reset_coverage
10.12% a.out a.out [.] sanitizer_cov
5.38% a.out a.out [.] __sanitizer::internal_bzero_aligned16(void*, unsigned long) 2.53% a.out a.out [.] __asan::Allocator::Allocate(unsigned long, unsigned long,
That's 33% speedup.