- User Since
- Apr 18 2021, 2:38 AM (15 w, 4 d)
Fri, Jul 30
Thu, Jul 29
Wed, Jul 28
Disabled transformation on HWASan as well. Maybe it's too paranoid but I don't have AAarch64 hardware to verify.
Tue, Jul 27
Don't transform when function is calloc or it's instrumented with ASan.
Mon, Jul 26
@delcypher: Thanks for letting me know. Now patch is reverted, Before submitting again I will fix heap-overflow.cpp test.
Fri, Jul 23
*Meant libc, not libc++ but nevermind.
@jeroen.dobbelaere: Yes, I think DSE should special case calloc to avoid infinite recursion (like it does for memset) in libc++. Thanks for catching this. I'm reverting change to fix.
Thu, Jul 22
Thanks for approval. I think premerge failures are unrelated to this change. Similar libarcher/openmp errors can be seen in many other reviews for couple weeks.
Wed, Jul 21
@nikic: Kindly ping, all comments were addressed.
Tue, Jul 20
Mon, Jul 19
Updated just with short FIXME information added to do_not_form_memmove4 UT.
Fri, Jul 16
Thanks. I updated patch and now optimization is disabled for functions with sanitize_memory attribute.
Detecting linear pattern would be possible but I'm not convinced it's worth effort. Anyway now it should work with MSan.
Thu, Jul 15
Just minor comment regarding potential benefits and benchmarking. If I understand patch correctly it flatten multilevel loop with memset into one memset. According simple microbenchmarks: https://godbolt.org/z/MTnYcvvYo on my x86-64 Skylake flattening memset in double loop (memset_3D) into 1 memset gives between even 800% performance boost (on small WS) to ~80% boost (when WS > LLC).
But how useful such transformation would be in practice? I'm not sure. We need to keep in mind that memset is usually just part of initialization/reusing memory code so in real world benchmarks flattening memsets loop may be less beneficial than microbenchmarks shows.
Hello @kcc @eugenis @pgousseau, sorry for bothering you. I added you to this review because transformation introduced in this change breaks msan_test (memcpy_unaligned/TestUnalignedMemcpy unit test).
I'm quite convinced that after my change Clang does what GCC would do if compliled msan_test.cpp file with -Ofast: https://godbolt.org/z/f7s81hjaM
Therefore I'm pretty sure that transformation works correctly (as on GCC) but it simply doesn't play well with MSan.
Since I'm not MSan expert it would be great if you could take a look on this and confirm whether or not my understanding of issue is correct.
Fri, Jul 9
Rebased and addressed comments.
Jul 6 2021
Jun 30 2021
Added test covering IsMemCpy + UseMemMove path.
Jun 29 2021
Fixed last comment.
@nikic: Thanks for your review and LGTM it. Looks like I cannot push it myself to main: "Permission to llvm/llvm-project.git denied to yurai007".
Jun 28 2021
Rebased and addressed yesterday comments.
Jun 27 2021
@efriedma : Thank you for review! Now all comments are addressed and change pass on CI.
Jun 24 2021
@xbolva00: Ok. Added more reviewers.
Jun 23 2021
Added protection against overlapping.
Jun 18 2021
Jun 17 2021
I played a little bit with patch and checked how it performs with GCC unit tests added together with similar change in GCC on 2014.
Jun 15 2021
@nikic: All comments were addressed.
Jun 11 2021
Jun 10 2021
Jun 7 2021
Jun 2 2021
Rely on D103451.
Rebased to D103009.
Jun 1 2021
TODO: rebase when D103009 is delivered.
May 31 2021
Abandoned because of better approach: https://reviews.llvm.org/D103009
Thanks. Updated patch with (hopefully) proper LibFunc_malloc detection. Extra checks are stolen from LibCallSimplifier.
Just ignore my previous comment about test. It doesn't explain how we moved from non-inlined new to calloc. Indeed better explanation was suggested by @xbolva00.
Regarding OutOfMemoryTest: https://github.com/llvm/llvm-project/blob/main/compiler-rt/test/fuzzer/OutOfMemoryTest.cpp failure explanation.
Test has silent assumption that in every loop iteration we request 268MB virtual memory and get 268MB physical memory allocated.
This is the case for malloc+memset combo (page faults trigger actual physical allocation in kernel) but not for calloc (physical memory allocation is deferred - it's faster, which is the whole point of our calloc transformation).
In consequence after my patch there is no OOM in OutOfMemoryTest so UT fails.
It can be easily fixed by saying "we _really_ want allocate and touch physical memory, not only virtual one" which end up in new volatile char + std::fill duo.