This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
20/21
DeadStoreElimination.cpp
-
test/Transforms/DeadStoreElimination/
-
Transforms/
-
DeadStoreElimination/
4/4
noop-stores.ll

Differential D103009

[DSE] Transform memset + malloc --> calloc (PR25892)
ClosedPublic

Authored by yurai007 on May 24 2021, 3:04 AM.

Download Raw Diff

Details

Reviewers

fhahn
nikic
xbolva00
jdoerfert
asbirlea
lebedev.ri
spatel
greened
kcc
eugenis
pgousseau

Commits

rG5c315bee8c9d: [DSE] Transform memset + malloc --> calloc (PR25892)
rG43234b159512: [DSE] Transform memset + malloc --> calloc (PR25892)
rG375694a07bcb: Transform memset + malloc --> calloc (PR25892)

Summary

After this change DSE can eliminate malloc + memset and emit calloc.
It's https://reviews.llvm.org/D101440 follow-up.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

yurai007 created this revision.May 24 2021, 3:04 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptMay 24 2021, 3:04 AM

yurai007 requested review of this revision.May 24 2021, 3:04 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 24 2021, 3:04 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

yurai007 mentioned this in D101440: [DSE] Eliminate store after calloc (PR50143).May 24 2021, 3:07 AM

yurai007 edited the summary of this revision. (Show Details)May 24 2021, 3:19 AM

Harbormaster completed remote builds in B105866: Diff 347336.May 24 2021, 3:44 AM

yurai007 updated this revision to Diff 347368.May 24 2021, 6:25 AM

yurai007 retitled this revision from [DSE] Eliminate store after calloc (PR50143) to [DSE] Eliminate memset after malloc.

yurai007 edited the summary of this revision. (Show Details)

Rebased with proper clang-formatting and updated title to be more accurate.

Harbormaster completed remote builds in B105892: Diff 347368.May 24 2021, 7:05 AM

This looks good to me, but please wait for the reviewers' comments from the previous patch version.

I'd be curious how much the decision to not ignore allocas will affect compile times in (generated) IRs with many locals.

And remove transformation from instcombine?

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1852	We should not copy attributes from malloc.

And remove transformation from instcombine?

Yes. IIRC it's already dead because malloc has always more than one use. Could be done in separate change.

In D103009#2783897, @asbirlea wrote:

This looks good to me, but please wait for the reviewers' comments from the previous patch version.

I'd be curious how much the decision to not ignore allocas will affect compile times in (generated) IRs with many locals.

If this is about mallocs and callocs then impact on compile time is rather insignificant. I wouldn't expect anything more than noise but of course we can try to measure it to make sure.
Probably most time consuming part is in case of malloc - one AA check (AA.getModRefInfo) coming from memoryIsNotModifiedBetween and for calloc case that would be one getClobberingMemoryAccess call from MemorySSA (a little bit heavier then getModRefInfo).
So it's more or less about one extra AA check for every pair of m(c)alloc-memset in block.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1843	That's root cause of some crashes. Cannot assume that isMallocLikeFn imply malloc. Will fix.
1852	Thanks for pointing this out! It saves me time of investigating current backend error from CI. My understanding is that emitCalloc merge input attribute list with default calloc attributes. If so then passing empty list should be fine.

yurai007 updated this revision to Diff 348451.May 28 2021, 12:05 AM

Addressed comments.

Since instcombine is currently only user of emitCalloc, so after you commit this patch, please delete AttrsList parameter from emitCalloc (and adjust this new callsite in DSE) together with removal of this transformation from instcombine.

Otherwise, I think this looks good.

Harbormaster completed remote builds in B106654: Diff 348451.May 28 2021, 12:42 AM

Sure. When I fix tests and commit is ready to land I will prepare appropriate emitCalloc/instcombine change.

Both AddressSanitizer tests and libFuzzer tests fail from same reason so there are only 2 different error causes.
It has something to do rather with way asan_allocator works (used in all failing UT) than broken middle-end transformations.
I will check it further and will back with more details.

xbolva00 added inline comments.May 28 2021, 8:51 AM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1843	https://llvm.org/doxygen/MemoryBuiltins_8cpp_source.html So: isMallocLikeFn && !isOpNewLikeFn

Regarding OutOfMemoryTest: https://github.com/llvm/llvm-project/blob/main/compiler-rt/test/fuzzer/OutOfMemoryTest.cpp failure explanation.
Test has silent assumption that in every loop iteration we request 268MB virtual memory and get 268MB physical memory allocated.
This is the case for malloc+memset combo (page faults trigger actual physical allocation in kernel) but not for calloc (physical memory allocation is deferred - it's faster, which is the whole point of our calloc transformation).
In consequence after my patch there is no OOM in OutOfMemoryTest so UT fails.
It can be easily fixed by saying "we _really_ want allocate and touch physical memory, not only virtual one" which end up in new volatile char[] + std::fill duo.

yurai007 added inline comments.May 31 2021, 4:51 AM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1843	Ok, maybe it would be more accurate than dyn_cast. I need to double check.

Just ignore my previous comment about test. It doesn't explain how we moved from non-inlined new[] to calloc. Indeed better explanation was suggested by @xbolva00.

yurai007 updated this revision to Diff 348788.May 31 2021, 6:26 AM

Ok, this looks fine now I think.

I think this should check for just LibFunc_malloc and nothing else. MallocLike just means an allocation function that return null on failure, and the nothrow new variants are considered MallocLike and not OpNewLike.

Harbormaster completed remote builds in B106908: Diff 348788.May 31 2021, 7:13 AM

In D103009#2789489, @nikic wrote:

I think this should check for just LibFunc_malloc and nothing else. MallocLike just means an allocation function that return null on failure, and the nothrow new variants are considered MallocLike and not OpNewLike.

Good point! Yeah.

Please update the patch subject to be more reflective of "replacement" as opposed to "removal".

For example, "Use calloc for memset+malloc".

Thanks. Updated patch with (hopefully) proper LibFunc_malloc detection. Extra checks are stolen from LibCallSimplifier.

Harbormaster completed remote builds in B106916: Diff 348800.May 31 2021, 8:54 AM

yurai007 mentioned this in D101176: [SimplifyLibCalls] Transform malloc to calloc with redundant memsets elimination (PR25892).May 31 2021, 11:21 AM

yurai007 marked 2 inline comments as done.May 31 2021, 11:29 AM

yurai007 mentioned this in D103451: [SimplifyLibCalls][NFC] Clean up LibCallSimplifier from 'memset + malloc into calloc' transformation.Jun 1 2021, 4:58 AM

@xbolva00 @nikic @hubert.reinterpretcast: I think now after addressing all comments and passing on CI it's in good shape.

yurai007 mentioned this in D103523: [BuildLibCalls][NFC] Remove redundant attribute list from emitCalloc.Jun 2 2021, 7:56 AM

Ping.

nikic added inline comments.Jun 9 2021, 1:34 PM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
514	I'm a bit surprised that MemoryLocation::get() doesn't work for MemSetInst, as it only has a single memory location, so there's no ambiguity here. @asbirlea @fhahn Do you think it would make sense to adjust the API?
1829–1880	Which part here requires the const cast?
1853	Is this cast needed? Shouldn't it be an implicit upcast?
1858	As you already have an IRBuilder, why not `IRB.getIntPtrTy(DL)`?
1864	Looks like this doesn't preserve MemorySSA? Try `-passes='dse,verify<memoryssa>'` in the test.
llvm/test/Transforms/DeadStoreElimination/noop-stores.ll
339	As it was the motivating case, I'd also expect a test where the memset is in a different block. I also don't think that the dominates() condition in your implementation is exercised by tests. Some other conditions aren'T either, for example malloc and memset having different sizes.

vdsered added a subscriber: vdsered.Jun 9 2021, 10:32 PM

yurai007 updated this revision to Diff 351131.Jun 10 2021, 4:41 AM

yurai007 marked 2 inline comments as done.Jun 10 2021, 4:47 AM

yurai007 added inline comments.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1829–1880	Malloc, more precisely starting from this line: IRBuilder<> IRB(Malloc); We can const_cast later, at time of Malloc definition but we cannot remove it completely - it's still required for Builder and replaceAllUsesWith/eraseFromParent. Anyway, I moved it to Malloc definition as it's more appropriate place.
1864	Will check.
llvm/test/Transforms/DeadStoreElimination/noop-stores.ll
339	I also don't think that the dominates() condition in your implementation is exercised by tests. Some other conditions aren'T either, for example malloc and memset having different sizes. Sure, I can add much more tests to cover more conditions. As it was the motivating case, I'd also expect a test where the memset is in a different block. Currently with this patch DSE cannot perform such transformation. Consider original pr25892 from https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/InstCombine/memset-1.ll. IIRC the reason is that malloc block - entry doesn't have related local store to malloced memory so DSE cannot find malloc-memset pair as candidate for elimination. I can provide more details if you want but when I checked it last time I simply concluded that more effort in DSE would be needed to make it work across blocks.

Harbormaster completed remote builds in B108593: Diff 351131.Jun 10 2021, 5:17 AM

More tests.

yurai007 added inline comments.Jun 11 2021, 3:48 AM

llvm/test/Transforms/DeadStoreElimination/noop-stores.ll
339	Currently with this patch DSE cannot perform such transformation.(...) After adding original PR unit test I noticed actually now DSE can perform transformation across blocks. When I checked it before it couldn't, apparently meantime changes unlocked it. Well, that's good :)

Harbormaster completed remote builds in B108773: Diff 351393.Jun 11 2021, 4:41 AM

yurai007 added inline comments.Jun 11 2021, 5:32 AM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1864	Looks like this doesn't preserve MemorySSA? Try -passes='dse,verify<memoryssa>' in the test. Right, missed MemorySSAUpdater. I'm submitting fix + related UT right now.

yurai007 updated this revision to Diff 351417.Jun 11 2021, 5:36 AM

yurai007 retitled this revision from [DSE] Use calloc for memset+malloc to [DSE] Transform memset + malloc --> calloc (PR25892).

yurai007 edited the summary of this revision. (Show Details)

yurai007 marked 3 inline comments as done.Jun 11 2021, 6:22 AM

Harbormaster completed remote builds in B108790: Diff 351417.Jun 11 2021, 6:50 AM

@nikic: All comments were addressed.

xbolva00 added inline comments.Jun 15 2021, 3:08 AM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1851	We dont need this check, do we? p = malloc(20) memset(p, 0, 10) Reading p between 10 and 20 is UB, so with calloc we would have 0s in this area so fine. And reverse case is UB too.

yurai007 added inline comments.Jun 15 2021, 3:46 AM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1851	If we permitted to "calloc more than we memset" wouldn't we hide UB in some cases? Like if we would really read unitinitialized memory much later after memset? The second thing is that GCC doesn't transform malloc to calloc when we memset less memory than malloc allocates: https://godbolt.org/z/Ef94je4KP I'm not saying we should blindly follow them, I'm just not sure what was rationale behind that.

nikic added inline comments.Jun 15 2021, 3:56 AM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1851	If the memset size is different, profitability is also unclear. Converting a malloc into a calloc may be always legal, but if you have malloc(10000) followed by memset(10) it's probably not profitable to actually do it.

xbolva00 added inline comments.Jun 15 2021, 3:59 AM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1851	There is no rule that compiler cannot “hide” UB. Compiler is free to assume that UB never happens.

xbolva00 added inline comments.Jun 15 2021, 4:04 AM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1851	Yeah. Maybe if we know both sizes and memset is unlikely to be expanded (there is some limit defined samowhere), we should still prefer calloc (1 call) than 2 libcalls?

xbolva00 added inline comments.Jun 15 2021, 4:05 AM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1851	But for now, let’s start with that condition you already have. Good enough for this patch.

yurai007 added inline comments.Jun 15 2021, 4:32 AM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1851	If the memset size is different, profitability is also unclear. Converting a malloc into a calloc may be always legal, but if you have malloc(10000) followed by memset(10) it's probably not profitable to actually do it. Right. There is no rule that compiler cannot “hide” UB. Compiler is free to assume that UB never happens. I'm aware that compiler assume UB never happens. My point was that if we permit to "calloc more than we memset" then uninitialized access may become initialized and _maybe_ sanitizers/memcheck/other tools won't detect it. If it's unreal then fine, I agree we shouldn't care.

I played a little bit with patch and checked how it performs with GCC unit tests added together with similar change in GCC on 2014.

In first case: https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/gcc.dg/tree-ssa/calloc-1.c malloc is transformed to calloc like for GCC which is good.
In LLVM case on assembly level register pressure is very similar to GCC (especially for f() function) which means that issue mentioned by Haneef in PR discussion
(https://bugs.llvm.org/show_bug.cgi?id=25892#c1) seems to be mitigated now.

However the second C++-like case: https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/g++.dg/tree-ssa/calloc.C is not transformed.
Here CFG is more complex and I guess that DSE simply doesn't detect malloc - memset pair because they aren't in neighboring blocks.
Apparently there is still room for improvements but let's leave it for future patches :)

yurai007 marked 5 inline comments as done.Jun 17 2021, 3:30 AM

@nikic @xbolva00: All comments addressed.

Ping^2.

No comments from me. Someone more familiar with DSE should LGTM it.

yurai007 added reviewers: lebedev.ri, spatel, greened.Jun 23 2021, 6:57 AM

@xbolva00: Ok. Added more reviewers.

LGTM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1852	It would be clearer to use Malloc instead of DefUOInst here, to match the memoryIsNotModifiedBetween() call.
llvm/test/Transforms/DeadStoreElimination/noop-stores.ll
3	Please change this RUN line to use `-passes='dse,verify<memoryssa>'`.

This revision is now accepted and ready to land.Jun 27 2021, 1:25 PM

yurai007 updated this revision to Diff 354868.Jun 28 2021, 6:02 AM

yurai007 marked 2 inline comments as done.

Rebased and addressed yesterday comments.

Harbormaster completed remote builds in B111269: Diff 354868.Jun 28 2021, 6:38 AM

yurai007 updated this revision to Diff 354987.Jun 28 2021, 12:24 PM

Harbormaster completed remote builds in B111347: Diff 354987.Jun 28 2021, 1:15 PM

@nikic: Thanks for your review and LGTM it. Looks like I cannot push it myself to main: "Permission to llvm/llvm-project.git denied to yurai007".

@yurai007 I would suggest to apply for a commit access, see https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access.

Rebased.

Harbormaster completed remote builds in B112540: Diff 356622.Jul 6 2021, 1:08 AM

Closed by commit rG375694a07bcb: Transform memset + malloc --> calloc (PR25892) (authored by yurai007). · Explain WhyJul 9 2021, 2:02 AM

This revision was automatically updated to reflect the committed changes.

yurai007 added a commit: rG375694a07bcb: Transform memset + malloc --> calloc (PR25892).

lebedev.ri added a reverting change: rG4e332cd41acb: Revert "Transform memset + malloc --> calloc (PR25892)".Jul 9 2021, 6:27 AM

Reverted in rG4e332cd41acb2befa85e20ec1f28413ea4adbb50 - check-msan broke.
https://lab.llvm.org/buildbot/#/builders/18/builds/1934

This revision is now accepted and ready to land.Jul 9 2021, 6:28 AM

yurai007 added reviewers: kcc, eugenis, pgousseau.Jul 15 2021, 2:21 AM

Hello @kcc @eugenis @pgousseau, sorry for bothering you. I added you to this review because transformation introduced in this change breaks msan_test (memcpy_unaligned/TestUnalignedMemcpy unit test).
I'm quite convinced that after my change Clang does what GCC would do if compliled msan_test.cpp file with -Ofast: https://godbolt.org/z/f7s81hjaM
Therefore I'm pretty sure that transformation works correctly (as on GCC) but it simply doesn't play well with MSan.
Since I'm not MSan expert it would be great if you could take a look on this and confirm whether or not my understanding of issue is correct.

In D103009#2879416, @yurai007 wrote:

Hello @kcc @eugenis @pgousseau, sorry for bothering you. I added you to this review because transformation introduced in this change breaks msan_test (memcpy_unaligned/TestUnalignedMemcpy unit test).
I'm quite convinced that after my change Clang does what GCC would do if compliled msan_test.cpp file with -Ofast: https://godbolt.org/z/f7s81hjaM
Therefore I'm pretty sure that transformation works correctly (as on GCC) but it simply doesn't play well with MSan.
Since I'm not MSan expert it would be great if you could take a look on this and confirm whether or not my understanding of issue is correct.

Right, so this change replaces malloc with calloc in

if (src_is_poisoned)
  src_origin = __msan_get_origin(src);
else
  memset(src, 0, sz);

because the other branch contains UB.
The test can be fixed by adding __msan_allocated_memory(ptr, size) before the call to __msan_get_origin, but I'd rather disable this optimization in functions with sanitize_memory attribute because it could make us miss bugs.

If possible, it's OK to disable only the CFG-aware part of the opt. I.e. malloc + memset in linear code (or even same BB) is fair game.

Fixed Msan.

Thanks. I updated patch and now optimization is disabled for functions with sanitize_memory attribute.
Detecting linear pattern would be possible but I'm not convinced it's worth effort. Anyway now it should work with MSan.

Harbormaster completed remote builds in B114479: Diff 359297.Jul 16 2021, 6:36 AM

yurai007 updated this revision to Diff 360028.Jul 20 2021, 12:10 AM

Harbormaster completed remote builds in B115031: Diff 360028.Jul 20 2021, 12:42 AM

This revision was landed with ongoing or failed builds.Jul 20 2021, 2:51 AM

Closed by commit rG43234b159512: [DSE] Transform memset + malloc --> calloc (PR25892) (authored by yurai007). · Explain Why

This revision was automatically updated to reflect the committed changes.

yurai007 added a commit: rG43234b159512: [DSE] Transform memset + malloc --> calloc (PR25892).

@yurai007 Shouldn't we detect that we are implementing 'calloc' and bail out if so ? Just like we do for 'memset' ?
(See https://www.godbolt.org/z/xnMa9bj4r )

@jeroen.dobbelaere: Yes, I think DSE should special case calloc to avoid infinite recursion (like it does for memset) in libc++. Thanks for catching this. I'm reverting change to fix.

*Meant libc, not libc++ but nevermind.

yurai007 added a reverting change: rGbc536c710150: Revert "[DSE] Transform memset + malloc --> calloc (PR25892)".Jul 23 2021, 2:55 AM

@yurai007 I think this patch may have broken the compiler-rt/test/asan/TestCases/heap-overflow.cpp test. This test is failing on our internal bots. The test fails because it expects to see malloc at -O2 in the stacktrace but we are seeing calloc() instead.

@delcypher: Thanks for letting me know. Now patch is reverted, Before submitting again I will fix heap-overflow.cpp test.

Don't transform when function is calloc or it's instrumented with ASan.

Harbormaster completed remote builds in B116399: Diff 361985.Jul 27 2021, 6:02 AM

yurai007 updated this revision to Diff 362312.Jul 28 2021, 2:45 AM

Disabled transformation on HWASan as well. Maybe it's too paranoid but I don't have AAarch64 hardware to verify.

Harbormaster completed remote builds in B116640: Diff 362312.Jul 28 2021, 3:34 AM

yurai007 added a commit: rG5c315bee8c9d: [DSE] Transform memset + malloc --> calloc (PR25892).Jul 29 2021, 9:34 AM

yurai007 mentioned this in rGf8cdde719507: [SimplifyLibCalls][NFC] Clean up LibCallSimplifier from 'memset + malloc into….Aug 5 2021, 7:10 AM

This commit seems to cause memory usage (rss) increase in MariaDB's mysqld by a factor of 4. Looking back into the history, I found that the previous commit here caused the same regression, but we quickly picked up the revert and moved on. Now I'm trying to isolate the problem, but it's taking time.
So far, my sole hypothesis is that malloc + memset of a smaller size can still be converted to calloc. But I have no evidence so far. Naive attempts to synthetically reproduce this didn't work. Maybe this only happens when some UB is in place, but again, I have nothing to back this up.

Given this is a quite serious regression, can we roll this back while I'm investigating?

In D103009#2954430, @alexfh wrote:

This commit seems to cause memory usage (rss) increase in MariaDB's mysqld by a factor of 4. Looking back into the history, I found that the previous commit here caused the same regression, but we quickly picked up the revert and moved on. Now I'm trying to isolate the problem, but it's taking time.
So far, my sole hypothesis is that malloc + memset of a smaller size can still be converted to calloc. But I have no evidence so far. Naive attempts to synthetically reproduce this didn't work. Maybe this only happens when some UB is in place, but again, I have nothing to back this up.

Given this is a quite serious regression, can we roll this back while I'm investigating?

You should be able to use flag -fno-builtin-calloc to disable this transformation.

This transformation is correct and valid; GCC does it as well. No reason to revert, not justified. You should check asm diffs w and w/o patch for any surprises.

In D103009#2954485, @xbolva00 wrote:

In D103009#2954430, @alexfh wrote:

This commit seems to cause memory usage (rss) increase in MariaDB's mysqld by a factor of 4. Looking back into the history, I found that the previous commit here caused the same regression, but we quickly picked up the revert and moved on. Now I'm trying to isolate the problem, but it's taking time.
So far, my sole hypothesis is that malloc + memset of a smaller size can still be converted to calloc. But I have no evidence so far. Naive attempts to synthetically reproduce this didn't work. Maybe this only happens when some UB is in place, but again, I have nothing to back this up.

Given this is a quite serious regression, can we roll this back while I'm investigating?

You should be able to use flag -fno-builtin-calloc to disable this transformation.

This transformation is correct and valid; GCC does it as well. No reason to revert, not justified. You should check asm diffs w and w/o patch for any surprises.

I found and reduced a test case that shows a too eager replacement of malloc with calloc:

https://gcc.godbolt.org/z/dTjonof74

$ cat test.cc
#include <stdlib.h>
#include <string.h>

void *my_malloc(size_t size, int my_flags)
{
  void* point = malloc(size);
  if (my_flags & 32) memset(point, 0, size);
  return point;
}

$ clang -O2 -o test.o -save-temps -c test.cc  && cat test.s
        .text
        .file   "test.cc"
        .globl  _Z9my_mallocmi                  # -- Begin function _Z9my_mallocmi
        .p2align        4, 0x90
        .type   _Z9my_mallocmi,@function
_Z9my_mallocmi:                         # @_Z9my_mallocmi
        .cfi_startproc
# %bb.0:
        pushq   %rax
        .cfi_def_cfa_offset 16
        movq    %rdi, %rsi
        movl    $1, %edi
        callq   calloc@PLT
        popq    %rcx
        .cfi_def_cfa_offset 8
        retq
.Lfunc_end0:
        .size   _Z9my_mallocmi, .Lfunc_end0-_Z9my_mallocmi
        .cfi_endproc
                                        # -- End function
        .ident  "clang version trunk (45ac5f5441818afa1b0ee4a3734583c8cc915a79)"
        .section        ".note.GNU-stack","",@progbits
        .addrsig

This is quite a serious problem. Would you please revert while working on a fix?

I think i see where this is going - the just-malloced, but never touched memory
doesn't need to be actually backed by an actual pages (see overcommit),
while i guess calloc doesn't just mark the pages as zeroed-out,
but actually marks them dirty and needed to be allocated,
at least not unless you happen to allocate in multiples of page size?

I guess the easy fix here is to require that memset post-dominates the malloc,
but i guess we also need some langref blurb about this,
because the transformation is correct, just-malloced memory is filed with undef,
which we can always define into zeros: https://alive2.llvm.org/ce/z/C4vWH2

In D103009#2955964, @alexfh wrote:
In D103009#2954485, @xbolva00 wrote:

In D103009#2954430, @alexfh wrote:

This commit seems to cause memory usage (rss) increase in MariaDB's mysqld by a factor of 4. Looking back into the history, I found that the previous commit here caused the same regression, but we quickly picked up the revert and moved on. Now I'm trying to isolate the problem, but it's taking time.
So far, my sole hypothesis is that malloc + memset of a smaller size can still be converted to calloc. But I have no evidence so far. Naive attempts to synthetically reproduce this didn't work. Maybe this only happens when some UB is in place, but again, I have nothing to back this up.

Given this is a quite serious regression, can we roll this back while I'm investigating?

You should be able to use flag -fno-builtin-calloc to disable this transformation.

This transformation is correct and valid; GCC does it as well. No reason to revert, not justified. You should check asm diffs w and w/o patch for any surprises.

I found and reduced a test case that shows a too eager replacement of malloc with calloc:

https://gcc.godbolt.org/z/dTjonof74
$ cat test.cc
#include <stdlib.h>
#include <string.h>

void *my_malloc(size_t size, int my_flags)
{
  void* point = malloc(size);
  if (my_flags & 32) memset(point, 0, size);
  return point;
}

$ clang -O2 -o test.o -save-temps -c test.cc  && cat test.s
        .text
        .file   "test.cc"
        .globl  _Z9my_mallocmi                  # -- Begin function _Z9my_mallocmi
        .p2align        4, 0x90
        .type   _Z9my_mallocmi,@function
_Z9my_mallocmi:                         # @_Z9my_mallocmi
        .cfi_startproc
# %bb.0:
        pushq   %rax
        .cfi_def_cfa_offset 16
        movq    %rdi, %rsi
        movl    $1, %edi
        callq   calloc@PLT
        popq    %rcx
        .cfi_def_cfa_offset 8
        retq
.Lfunc_end0:
        .size   _Z9my_mallocmi, .Lfunc_end0-_Z9my_mallocmi
        .cfi_endproc
                                        # -- End function
        .ident  "clang version trunk (45ac5f5441818afa1b0ee4a3734583c8cc915a79)"
        .section        ".note.GNU-stack","",@progbits
        .addrsig
This is quite a serious problem. Would you please revert while working on a fix?

For now, use -fno-builtin-calloc, it works fine. Also another solution:

void *my_malloc(size_t size, int my_flags) __attribute__((no_builtin("calloc")))
{
  void* point = malloc(size);
  if (my_flags & 32) memset(point, 0, size);
  return point;
}

I guess the easy fix here is to require that memset post-dominates the malloc,

Yeah, but I am not sure, this is quite very specific case, I would prefer ' attribute((no_builtin("calloc")))' solution here, instead of messing with malloc's internals and their impact on LLVM & Langref.

It would be great not to break much more common pattern:

void* point = malloc(size);
  if (point) memset(point, 0, size);

In D103009#2955983, @lebedev.ri wrote:

I think i see where this is going - the just-malloced, but never touched memory
doesn't need to be actually backed by an actual pages (see overcommit),
while i guess calloc doesn't just mark the pages as zeroed-out,
but actually marks them dirty and needed to be allocated,
at least not unless you happen to allocate in multiples of page size?

I found this problem in mysql compiled with tcmalloc. Mysqld (at least in the somewhat older version I'm looking at) speculatively allocates a potentially large (depending on the configuration parameters) block of memory on start, which is normally used only partially. With malloc the memory is lazily given to the process when it starts using it. With calloc (and tcmalloc) the process actually tries to get all the pages immediately, which increases RSS (and thus, real memory usage). I guess, it may affect performance as well due to the unnecessary filling with zeroes, when user code calls my_malloc without MY_ZEROFILL.

For the context: https://fossies.org/linux/mariadb/mysys/my_malloc.c (this version seems functionally close to what I'm looking at).

I guess the easy fix here is to require that memset post-dominates the malloc,
but i guess we also need some langref blurb about this,
because the transformation is correct, just-malloced memory is filed with undef,
which we can always define into zeros: https://alive2.llvm.org/ce/z/C4vWH2

Yep, sounds like a revert to me.

To be honest i'm not really sure why this transform is worth it/beneficial,
so i'm not sure how much trouble the implementation should go into.

Well, I am not sure, your code compiled with GCC will have same issue. Overallocating is also questionable - your code relies on some many things to work luckily together - not a very ideal state.

This transformation was in simplifylibcalls for very long time anyway so if anything, then postdominating check….

In D103009#2956177, @xbolva00 wrote:

Well, I am not sure, your code compiled with GCC will have same issue.

I'm not sure "gcc also has this problem" is a good excuse ;)

Overallocating is also questionable - your code relies on some many things to work luckily together - not a very ideal state.

Well, it's not my code, it's mysql, which is probably Oracle's code, I suppose ;) But jokes aside, the function I provided as a test case looks totally reasonable to me and it shouldn't zero-fill the allocated memory, if it wasn't asked to. It's good there is a workaround (-fno-builtin-calloc) and it seems like some time ago mysql started using this flag in its gcc builds, because this specific optimization caused substantial performance regressions: http://smalldatum.blogspot.com/2017/11/a-new-optimization-in-gcc-5x-and-mysql.html

I guess, mysql is not the only software using its wrappers around malloc + memset with a similar logic.

This transformation was in simplifylibcalls for very long time anyway

It wasn't this transformation. It was targeting specifically memset(malloc(size), 0, size) pattern, which is much more specific and safe to replace to calloc.

so if anything, then postdominating check….

Yes, please. More strictly, you could try to check that the memset is on all paths from malloc to where the result of malloc can be used (in any way except for checking it for zero maybe). Not sure whether it can be efficiently implemented though.

I found and reduced a test case that shows a too eager replacement of malloc with calloc: https://gcc.godbolt.org/z/dTjonof74
(...) This is quite a serious problem. Would you please revert while working on a fix?

First of all just to make it clear - in this particular case GCC does exactly same: https://gcc.godbolt.org/z/qbE6Kxdnv unless you pass different flags.

In D103009#2956871, @yurai007 wrote:

I found and reduced a test case that shows a too eager replacement of malloc with calloc: https://gcc.godbolt.org/z/dTjonof74
(...) This is quite a serious problem. Would you please revert while working on a fix?

First of all just to make it clear - in this particular case GCC does exactly same: https://gcc.godbolt.org/z/qbE6Kxdnv unless you pass different flags.

I will be blunt: i do not understand what are the profitability heuristics for this transform are?
When is calloc better than a malloc immediately followed by the memset?

Worth to read about it on google..

https://www.google.com/amp/s/willnewton.name/2013/06/17/calloc-versus-malloc-and-memset/amp/

https://vorpus.org/blog/why-does-calloc-exist/

+ some SO pages..

I will be blunt: i do not understand what are the profitability heuristics for this transform are?
When is calloc better than a malloc immediately followed by the memset?

IIRC the main rationale behind this transformation is here: https://stackoverflow.com/questions/2688466/why-mallocmemset-is-slower-than-calloc.
I think I could prepare benchmark showing benefits of calloc over malloc+memset, I already observed reduced RSS and less page faults after transformation on some tests.
However if non-standard malloc is used it may behave differently.

I think more codes with “common pattern” see improvements so few ones which are problematic should use flag or attribute.

Also unroller sometimes goes crazy and degrades performance but here nobody talks about revert/removal/turning off.

I think more codes with “common pattern” see improvements so few ones which are problematic should use flag or attribute.
Also unroller sometimes goes crazy and degrades performance but here nobody talks about revert/removal/turning off.

+1 Totally agree.

In D103009#2956871, @yurai007 wrote:

I found and reduced a test case that shows a too eager replacement of malloc with calloc: https://gcc.godbolt.org/z/dTjonof74
(...) This is quite a serious problem. Would you please revert while working on a fix?

First of all just to make it clear - in this particular case GCC does exactly same: https://gcc.godbolt.org/z/qbE6Kxdnv unless you pass different flags.

As I said, "GCC also has this problem" is not a good excuse.

In D103009#2956948, @xbolva00 wrote:

I think more codes with “common pattern” see improvements so few ones which are problematic should use flag or attribute.

Also unroller sometimes goes crazy and degrades performance but here nobody talks about revert/removal/turning off.

Revert is the safest thing to do when a real problem has been root caused to the commit and there is no quick and obvious fix. If there's no consensus on introducing a postdominance check with the purpose of making this transformation more conservative, I'd insist on reverting until there's a clear path forward.
I'd probably be fine with using a workaround (-fno-builtin-calloc) for now, but it also disables the completely reasonable and uncontroversial transformation of memset(malloc(n), 0, n) to calloc(1, n), which is a net regression from the state before this commit: https://gcc.godbolt.org/z/Ev64KPs85

And needless to say, "there's a problem elsewhere, why should we do better here?" is not a productive approach ;) If you see a change in unroller pessimizing use cases you care about, it *may* be reasonable to speak up, ask for a revert and figure out how to solve the problem while keeping trunk in a better shape.

In D103009#2956897, @yurai007 wrote:

I will be blunt: i do not understand what are the profitability heuristics for this transform are?
When is calloc better than a malloc immediately followed by the memset?

IIRC the main rationale behind this transformation is here: https://stackoverflow.com/questions/2688466/why-mallocmemset-is-slower-than-calloc.
I think I could prepare benchmark showing benefits of calloc over malloc+memset, I already observed reduced RSS and less page faults after transformation on some tests.

Yes, please. A benchmark measuring the benefits of this transformation would be nice. Specifically, it would be interesting to see the differences between the too eager (current) version of this transform vs a more conservative version (to be implemented) with a "memset postdominating malloc" check. For comparison I have a benchmark that shows a 4+x increase in RSS with this transformation in a real production code running in a rather realistic configuration.

However if non-standard malloc is used it may behave differently.

In the post I linked to above (http://smalldatum.blogspot.com/2017/11/a-new-optimization-in-gcc-5x-and-mysql.html) there were measurements of performance impacts of this transformation (when done by GCC) with three different malloc implementations - glibc, tcmalloc, jemalloc. glibc malloc suffered the most (>30% drop in QPS).

there is no quick and obvious fix

There is a flag/attribute.

but it also disables the completely reasonable and uncontroversial transformation of memset(malloc(n), 0, n) to calloc(1, n)

this never worked with llvm.memset.
this never worked if ptr from malloc was only checked for null.

p = memset(malloc(..), 0, ..) is horrible pattern to see in real world code. Rather to have nothing than optimize this horrible pattern.

impacts of this transformation (when done by GCC)

And so many years passed and gcc still has it. :) it does not look like there is a storm on gcc bugzilla about this “issue” either.

Well I think users have some ways how to avoid this optimization - what we could do is to write some info into release notes.

And small change like:

if (ZERO_FILL)

point =  calloc(1, size);

else point = malloc(size);

Looks even better. The fastest.

I am fine with revert if you post LangRef change related to this issue “definition of undef vs malloc”.

I agree with alexfh that this is questionable.
55 lines of complexity and compile time hurt in return of an "optimization naturally performed by humans, deoptimization/surprise at times, with other hidden costs"?

Other hidden costs: new references to msan/asan/hwasan exclusion. (asan/hwasan are perhaps redundant.) When a new sanitizer is introduced, developers may go over the list of reference sites and adjust accordingly. We either lose test coverage or add some low-value tests.

Justification in real applications will help. I am not sure the benefits can be justified (we will need an additional cost: a post-dominance test and its accompanying test).

+1 to revert.

I'm not sure what is the LangRef change requested. This is a valid transformation, sure. It just makes the code run slower and use more memory.

It goes against the user intent in the code that is explicitly trying to avoid zero-initializing the allocated memory.

IMHO this transformation should only apply if memset can be proven to postdominate malloc *for the entire allocation size* (or at least a large part of it).

Anyway, we have some requests for this transformation:

https://bugs.llvm.org/show_bug.cgi?id=25892
https://bugs.llvm.org/show_bug.cgi?id=47159

(Maybe even more..)

So it would be good to find out solution which can be fine for both “sides” :)

asbirlea mentioned this in D108485: [DSE] Check post-dominance for malloc+memset->calloc transform..Aug 20 2021, 2:06 PM

Sent out https://reviews.llvm.org/D108485 in an attempt to mitigate this without reverting the full transformation.
Please add self to reviewers.

asbirlea mentioned this in rGe8723abf43c3: [DSE] Check post-dominance for malloc+memset->calloc transform..Aug 23 2021, 12:40 PM

yurai007 mentioned this in D110021: [DSE] Re-enable calloc transformation with extra care (PR25892).Sep 18 2021, 3:19 AM

yurai007 mentioned this in rG9e65929a8e2c: [DSE] Re-enable calloc transformation with extra care (PR25892).Oct 10 2021, 12:52 PM

nikic mentioned this in D117005: [DSE] Remove alloc function check in canSkipDef().Jan 11 2022, 3:36 AM

nikic mentioned this in rG00b77d917cd8: [DSE] Remove alloc function check in canSkipDef().Jan 17 2022, 12:25 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

DeadStoreElimination.cpp

81 lines

test/

Transforms/

DeadStoreElimination/

noop-stores.ll

153 lines

Diff 362312

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/DebugCounter.h"		#include "llvm/Support/DebugCounter.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/AssumeBundleBuilder.h"		#include "llvm/Transforms/Utils/AssumeBundleBuilder.h"
		#include "llvm/Transforms/Utils/BuildLibCalls.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cstddef>		#include <cstddef>
#include <cstdint>		#include <cstdint>
#include <iterator>		#include <iterator>
#include <map>		#include <map>
#include <utility>		#include <utility>
▲ Show 20 Lines • Show All 411 Lines • ▼ Show 20 Lines	memoryIsNotModifiedBetween(Instruction FirstI, Instruction SecondI,
// visit a block with different addresses.		// visit a block with different addresses.
DenseMap<BasicBlock , Value > Visited;		DenseMap<BasicBlock , Value > Visited;

BasicBlock::iterator FirstBBI(FirstI);		BasicBlock::iterator FirstBBI(FirstI);
++FirstBBI;		++FirstBBI;
BasicBlock::iterator SecondBBI(SecondI);		BasicBlock::iterator SecondBBI(SecondI);
BasicBlock *FirstBB = FirstI->getParent();		BasicBlock *FirstBB = FirstI->getParent();
BasicBlock *SecondBB = SecondI->getParent();		BasicBlock *SecondBB = SecondI->getParent();
MemoryLocation MemLoc = MemoryLocation::get(SecondI);		MemoryLocation MemLoc;
		if (auto *MemSet = dyn_cast<MemSetInst>(SecondI))
		MemLoc = MemoryLocation::getForDest(MemSet);
		else
		MemLoc = MemoryLocation::get(SecondI);
		nikicUnsubmitted Not Done Reply Inline Actions I'm a bit surprised that MemoryLocation::get() doesn't work for MemSetInst, as it only has a single memory location, so there's no ambiguity here. @asbirlea @fhahn Do you think it would make sense to adjust the API? nikic: I'm a bit surprised that MemoryLocation::get() doesn't work for MemSetInst, as it only has a…

auto MemLocPtr = const_cast<Value >(MemLoc.Ptr);		auto MemLocPtr = const_cast<Value >(MemLoc.Ptr);

// Start checking the SecondBB.		// Start checking the SecondBB.
WorkList.push_back(		WorkList.push_back(
std::make_pair(SecondBB, PHITransAddr(MemLocPtr, DL, nullptr)));		std::make_pair(SecondBB, PHITransAddr(MemLocPtr, DL, nullptr)));
bool isFirstBlock = true;		bool isFirstBlock = true;

// Check all blocks going backward until we reach the FirstBB.		// Check all blocks going backward until we reach the FirstBB.
▲ Show 20 Lines • Show All 297 Lines • ▼ Show 20 Lines	if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
default:		default:
return false;		return false;
}		}
}		}
return false;		return false;
}		}

// Check if we can ignore \p D for DSE.		// Check if we can ignore \p D for DSE.
bool canSkipDef(MemoryDef *D, bool DefVisibleToCaller) {		bool canSkipDef(MemoryDef *D, bool DefVisibleToCaller,
		const TargetLibraryInfo &TLI) {
Instruction *DI = D->getMemoryInst();		Instruction *DI = D->getMemoryInst();
// Calls that only access inaccessible memory cannot read or write any memory		// Calls that only access inaccessible memory cannot read or write any memory
// locations we consider for elimination.		// locations we consider for elimination.
if (auto *CB = dyn_cast<CallBase>(DI))		if (auto *CB = dyn_cast<CallBase>(DI))
if (CB->onlyAccessesInaccessibleMemory())		if (CB->onlyAccessesInaccessibleMemory()) {
		if (isAllocLikeFn(DI, &TLI))
		return false;
return true;		return true;
		}
// We can eliminate stores to locations not visible to the caller across		// We can eliminate stores to locations not visible to the caller across
// throwing instructions.		// throwing instructions.
if (DI->mayThrow() && !DefVisibleToCaller)		if (DI->mayThrow() && !DefVisibleToCaller)
return true;		return true;

// We can remove the dead stores, irrespective of the fence and its ordering		// We can remove the dead stores, irrespective of the fence and its ordering
// (release/acquire/seq_cst). Fences only constraints the ordering of		// (release/acquire/seq_cst). Fences only constraints the ordering of
// already visible stores, it does not make a store visible to other		// already visible stores, it does not make a store visible to other
// threads. So, skipping over a fence does not change a store from being		// threads. So, skipping over a fence does not change a store from being
// dead.		// dead.
if (isa<FenceInst>(DI))		if (isa<FenceInst>(DI))
return true;		return true;

// Skip intrinsics that do not really read or modify memory.		// Skip intrinsics that do not really read or modify memory.
if (isNoopIntrinsic(D->getMemoryInst()))		if (isNoopIntrinsic(DI))
return true;		return true;

return false;		return false;
}		}

struct DSEState {		struct DSEState {
Function &F;		Function &F;
AliasAnalysis &AA;		AliasAnalysis &AA;
▲ Show 20 Lines • Show All 531 Lines • ▼ Show 20 Lines	for (;; Current = cast<MemoryDef>(Current)->getDefiningAccess()) {
return Current;		return Current;
}		}

// Below, check if CurrentDef is a valid candidate to be eliminated by		// Below, check if CurrentDef is a valid candidate to be eliminated by
// KillingDef. If it is not, check the next candidate.		// KillingDef. If it is not, check the next candidate.
MemoryDef *CurrentDef = cast<MemoryDef>(Current);		MemoryDef *CurrentDef = cast<MemoryDef>(Current);
Instruction *CurrentI = CurrentDef->getMemoryInst();		Instruction *CurrentI = CurrentDef->getMemoryInst();

if (canSkipDef(CurrentDef, !isInvisibleToCallerBeforeRet(DefUO)))		if (canSkipDef(CurrentDef, !isInvisibleToCallerBeforeRet(DefUO), TLI))
continue;		continue;

// Before we try to remove anything, check for any extra throwing		// Before we try to remove anything, check for any extra throwing
// instructions that block us from DSEing		// instructions that block us from DSEing
if (mayThrowBetween(KillingI, CurrentI, DefUO)) {		if (mayThrowBetween(KillingI, CurrentI, DefUO)) {
LLVM_DEBUG(dbgs() << " ... skip, may throw!\n");		LLVM_DEBUG(dbgs() << " ... skip, may throw!\n");
return None;		return None;
}		}
▲ Show 20 Lines • Show All 410 Lines • ▼ Show 20 Lines	bool storeIsNoop(MemoryDef *Def, const MemoryLocation &DefLoc,
Constant *StoredConstant = nullptr;		Constant *StoredConstant = nullptr;
if (Store)		if (Store)
StoredConstant = dyn_cast<Constant>(Store->getOperand(0));		StoredConstant = dyn_cast<Constant>(Store->getOperand(0));
if (MemSet)		if (MemSet)
StoredConstant = dyn_cast<Constant>(MemSet->getValue());		StoredConstant = dyn_cast<Constant>(MemSet->getValue());

if (StoredConstant && StoredConstant->isNullValue()) {		if (StoredConstant && StoredConstant->isNullValue()) {
auto *DefUOInst = dyn_cast<Instruction>(DefUO);		auto *DefUOInst = dyn_cast<Instruction>(DefUO);
if (DefUOInst && isCallocLikeFn(DefUOInst, &TLI)) {		if (DefUOInst) {
auto *UnderlyingDef = cast<MemoryDef>(MSSA.getMemoryAccess(DefUOInst));		if (isCallocLikeFn(DefUOInst, &TLI)) {
		auto *UnderlyingDef =
		cast<MemoryDef>(MSSA.getMemoryAccess(DefUOInst));
// If UnderlyingDef is the clobbering access of Def, no instructions		// If UnderlyingDef is the clobbering access of Def, no instructions
// between them can modify the memory location.		// between them can modify the memory location.
auto *ClobberDef =		auto *ClobberDef =
MSSA.getSkipSelfWalker()->getClobberingMemoryAccess(Def);		MSSA.getSkipSelfWalker()->getClobberingMemoryAccess(Def);
return UnderlyingDef == ClobberDef;		return UnderlyingDef == ClobberDef;
}		}

		if (MemSet) {
		if (F.hasFnAttribute(Attribute::SanitizeMemory) \|\|
		F.hasFnAttribute(Attribute::SanitizeAddress) \|\|
		F.hasFnAttribute(Attribute::SanitizeHWAddress) \|\|
		yurai007AuthorUnsubmitted Done Reply Inline Actions That's root cause of some crashes. Cannot assume that isMallocLikeFn imply malloc. Will fix. yurai007: That's root cause of some crashes. Cannot assume that isMallocLikeFn imply malloc. Will fix.
		xbolva00Unsubmitted Done Reply Inline Actions https://llvm.org/doxygen/MemoryBuiltins_8cpp_source.html So: isMallocLikeFn && !isOpNewLikeFn xbolva00: https://llvm.org/doxygen/MemoryBuiltins_8cpp_source.html So: isMallocLikeFn && !isOpNewLikeFn
		yurai007AuthorUnsubmitted Done Reply Inline Actions Ok, maybe it would be more accurate than dyn_cast. I need to double check. yurai007: Ok, maybe it would be more accurate than dyn_cast. I need to double check.
		F.getName() == "calloc")
		return false;
		auto Malloc = const_cast<CallInst >(dyn_cast<CallInst>(DefUOInst));
		if (!Malloc)
		return false;
		auto *InnerCallee = Malloc->getCalledFunction();
		if (!InnerCallee)
		return false;
		xbolva00Unsubmitted Done Reply Inline Actions We dont need this check, do we? p = malloc(20) memset(p, 0, 10) Reading p between 10 and 20 is UB, so with calloc we would have 0s in this area so fine. And reverse case is UB too. xbolva00: We dont need this check, do we? p = malloc(20) memset(p, 0, 10) Reading p between 10 and 20…
		yurai007AuthorUnsubmitted Done Reply Inline Actions If we permitted to "calloc more than we memset" wouldn't we hide UB in some cases? Like if we would really read unitinitialized memory much later after memset? The second thing is that GCC doesn't transform malloc to calloc when we memset less memory than malloc allocates: https://godbolt.org/z/Ef94je4KP I'm not saying we should blindly follow them, I'm just not sure what was rationale behind that. yurai007: If we permitted to "calloc more than we memset" wouldn't we hide UB in some cases? Like if we…
		nikicUnsubmitted Done Reply Inline Actions If the memset size is different, profitability is also unclear. Converting a malloc into a calloc may be always legal, but if you have malloc(10000) followed by memset(10) it's probably not profitable to actually do it. nikic: If the memset size is different, profitability is also unclear. Converting a malloc into a…
		xbolva00Unsubmitted Done Reply Inline Actions Yeah. Maybe if we know both sizes and memset is unlikely to be expanded (there is some limit defined samowhere), we should still prefer calloc (1 call) than 2 libcalls? xbolva00: Yeah. Maybe if we know both sizes and memset is unlikely to be expanded (there is some limit…
		xbolva00Unsubmitted Done Reply Inline Actions But for now, let’s start with that condition you already have. Good enough for this patch. xbolva00: But for now, let’s start with that condition you already have. Good enough for this patch.
		xbolva00Unsubmitted Done Reply Inline Actions There is no rule that compiler cannot “hide” UB. Compiler is free to assume that UB never happens. xbolva00: There is no rule that compiler cannot “hide” UB. Compiler is free to assume that UB never…
		yurai007AuthorUnsubmitted Done Reply Inline Actions If the memset size is different, profitability is also unclear. Converting a malloc into a calloc may be always legal, but if you have malloc(10000) followed by memset(10) it's probably not profitable to actually do it. Right. There is no rule that compiler cannot “hide” UB. Compiler is free to assume that UB never happens. I'm aware that compiler assume UB never happens. My point was that if we permit to "calloc more than we memset" then uninitialized access may become initialized and _maybe_ sanitizers/memcheck/other tools won't detect it. If it's unreal then fine, I agree we shouldn't care. yurai007: >If the memset size is different, profitability is also unclear. Converting a malloc into a…
		LibFunc Func;
		xbolva00Unsubmitted Done Reply Inline Actions We should not copy attributes from malloc. xbolva00: We should not copy attributes from malloc.
		yurai007AuthorUnsubmitted Done Reply Inline Actions Thanks for pointing this out! It saves me time of investigating current backend error from CI. My understanding is that emitCalloc merge input attribute list with default calloc attributes. If so then passing empty list should be fine. yurai007: Thanks for pointing this out! It saves me time of investigating current backend error from CI.
		nikicUnsubmitted Done Reply Inline Actions It would be clearer to use Malloc instead of DefUOInst here, to match the memoryIsNotModifiedBetween() call. nikic: It would be clearer to use Malloc instead of DefUOInst here, to match the…
		if (!TLI.getLibFunc(*InnerCallee, Func) \|\| !TLI.has(Func) \|\|
		nikicUnsubmitted Done Reply Inline Actions Is this cast needed? Shouldn't it be an implicit upcast? nikic: Is this cast needed? Shouldn't it be an implicit upcast?
		Func != LibFunc_malloc)
		return false;
		if (Malloc->getOperand(0) == MemSet->getLength()) {
		if (DT.dominates(Malloc, MemSet) &&
		memoryIsNotModifiedBetween(Malloc, MemSet, BatchAA, DL, &DT)) {
		nikicUnsubmitted Done Reply Inline Actions As you already have an IRBuilder, why not `IRB.getIntPtrTy(DL)`? nikic: As you already have an IRBuilder, why not `IRB.getIntPtrTy(DL)`?
		IRBuilder<> IRB(Malloc);
		const auto &DL = Malloc->getModule()->getDataLayout();
		AttributeList EmptyList;
		if (auto *Calloc = emitCalloc(
		ConstantInt::get(IRB.getIntPtrTy(DL), 1),
		Malloc->getArgOperand(0), EmptyList, IRB, TLI)) {
		nikicUnsubmitted Done Reply Inline Actions Looks like this doesn't preserve MemorySSA? Try `-passes='dse,verify<memoryssa>'` in the test. nikic: Looks like this doesn't preserve MemorySSA? Try `-passes='dse,verify<memoryssa>'` in the test.
		yurai007AuthorUnsubmitted Done Reply Inline Actions Will check. yurai007: Will check.
		yurai007AuthorUnsubmitted Done Reply Inline Actions Looks like this doesn't preserve MemorySSA? Try -passes='dse,verify<memoryssa>' in the test. Right, missed MemorySSAUpdater. I'm submitting fix + related UT right now. yurai007: > Looks like this doesn't preserve MemorySSA? Try -passes='dse,verify<memoryssa>' in the test.
		MemorySSAUpdater Updater(&MSSA);
		auto *LastDef = cast<MemoryDef>(
		Updater.getMemorySSA()->getMemoryAccess(Malloc));
		auto *NewAccess = Updater.createMemoryAccessAfter(
		cast<Instruction>(Calloc), LastDef, LastDef);
		auto *NewAccessMD = cast<MemoryDef>(NewAccess);
		Updater.insertDef(NewAccessMD, /RenameUses=/true);
		Updater.removeMemoryAccess(Malloc);
		Malloc->replaceAllUsesWith(Calloc);
		Malloc->eraseFromParent();
		return true;
		}
		return false;
		}
		}
		}
		nikicUnsubmitted Done Reply Inline Actions Which part here requires the const cast? nikic: Which part here requires the const cast?
		yurai007AuthorUnsubmitted Done Reply Inline Actions Malloc, more precisely starting from this line: IRBuilder<> IRB(Malloc); We can const_cast later, at time of Malloc definition but we cannot remove it completely - it's still required for Builder and replaceAllUsesWith/eraseFromParent. Anyway, I moved it to Malloc definition as it's more appropriate place. yurai007: Malloc, more precisely starting from this line: IRBuilder<> IRB(Malloc); We can const_cast…
		}
}		}

if (!Store)		if (!Store)
return false;		return false;

if (auto *LoadI = dyn_cast<LoadInst>(Store->getOperand(0))) {		if (auto *LoadI = dyn_cast<LoadInst>(Store->getOperand(0))) {
if (LoadI->getPointerOperand() == Store->getOperand(1)) {		if (LoadI->getPointerOperand() == Store->getOperand(1)) {
// Get the defining access for the load.		// Get the defining access for the load.
▲ Show 20 Lines • Show All 315 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/noop-stores.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -basic-aa -dse -S \| FileCheck %s		; RUN: opt < %s -basic-aa -dse -S \| FileCheck %s
; RUN: opt < %s -aa-pipeline=basic-aa -passes=dse -S \| FileCheck %s		; RUN: opt < %s -aa-pipeline=basic-aa -passes='dse,verify<memoryssa>' -S \| FileCheck %s
		nikicUnsubmitted Done Reply Inline Actions Please change this RUN line to use `-passes='dse,verify<memoryssa>'`. nikic: Please change this RUN line to use `-passes='dse,verify<memoryssa>'`.
target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"		target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"

declare i8* @calloc(i64, i64)
declare void @memset_pattern16(i8, i8, i64)		declare void @memset_pattern16(i8, i8, i64)

declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i1) nounwind		declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i1) nounwind
declare void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* nocapture, i8, i64, i32) nounwind		declare void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* nocapture, i8, i64, i32) nounwind
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind		declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind
declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32) nounwind		declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32) nounwind
declare void @llvm.init.trampoline(i8, i8, i8*)		declare void @llvm.init.trampoline(i8, i8, i8*)

▲ Show 20 Lines • Show All 289 Lines • ▼ Show 20 Lines	entry:
%lv = load i32, i32* %x, align 4		%lv = load i32, i32* %x, align 4
%inc = add nsw i32 %lv, 1		%inc = add nsw i32 %lv, 1
store i32 %inc, i32* %x, align 4		store i32 %inc, i32* %x, align 4
store i32 0, i32* %y, align 4		store i32 0, i32* %y, align 4
store i32 %lv, i32* %x, align 4		store i32 %lv, i32* %x, align 4
ret void		ret void
}		}

		declare noalias i8* @malloc(i64)
		declare noalias i8* @_Znwm(i64)
		declare void @clobber_memory(float*)

		; based on pr25892_lite
		define i8* @zero_memset_after_malloc(i64 %size) {
		; CHECK-LABEL: @zero_memset_after_malloc(
		; CHECK-NEXT: [[CALL:%.]] = call i8 @calloc(i64 1, i64 [[SIZE:%.*]])
		; CHECK-NEXT: ret i8* [[CALL]]
		;
		%call = call i8* @malloc(i64 %size) inaccessiblememonly
		call void @llvm.memset.p0i8.i64(i8* %call, i8 0, i64 %size, i1 false)
		ret i8* %call
		}

		; based on pr25892_lite
		define i8* @zero_memset_after_malloc_with_intermediate_clobbering(i64 %size) {
		; CHECK-LABEL: @zero_memset_after_malloc_with_intermediate_clobbering(
		; CHECK-NEXT: [[CALL:%.]] = call i8 @malloc(i64 [[SIZE:%.*]])
		; CHECK-NEXT: [[BC:%.]] = bitcast i8 [[CALL]] to float*
		; CHECK-NEXT: call void @clobber_memory(float* [[BC]])
		; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* [[CALL]], i8 0, i64 [[SIZE]], i1 false)
		; CHECK-NEXT: ret i8* [[CALL]]
		;
		%call = call i8* @malloc(i64 %size) inaccessiblememonly
		%bc = bitcast i8* %call to float*
		call void @clobber_memory(float* %bc)
		call void @llvm.memset.p0i8.i64(i8* %call, i8 0, i64 %size, i1 false)
		ret i8* %call
		nikicUnsubmitted Done Reply Inline Actions As it was the motivating case, I'd also expect a test where the memset is in a different block. I also don't think that the dominates() condition in your implementation is exercised by tests. Some other conditions aren'T either, for example malloc and memset having different sizes. nikic: As it was the motivating case, I'd also expect a test where the memset is in a different block.
		yurai007AuthorUnsubmitted Done Reply Inline Actions I also don't think that the dominates() condition in your implementation is exercised by tests. Some other conditions aren'T either, for example malloc and memset having different sizes. Sure, I can add much more tests to cover more conditions. As it was the motivating case, I'd also expect a test where the memset is in a different block. Currently with this patch DSE cannot perform such transformation. Consider original pr25892 from https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/InstCombine/memset-1.ll. IIRC the reason is that malloc block - entry doesn't have related local store to malloced memory so DSE cannot find malloc-memset pair as candidate for elimination. I can provide more details if you want but when I checked it last time I simply concluded that more effort in DSE would be needed to make it work across blocks. yurai007: > I also don't think that the dominates() condition in your implementation is exercised by…
		yurai007AuthorUnsubmitted Done Reply Inline Actions Currently with this patch DSE cannot perform such transformation.(...) After adding original PR unit test I noticed actually now DSE can perform transformation across blocks. When I checked it before it couldn't, apparently meantime changes unlocked it. Well, that's good :) yurai007: > Currently with this patch DSE cannot perform such transformation.(...) After adding original…
		}

		; based on pr25892_lite
		define i8* @zero_memset_after_malloc_with_different_sizes(i64 %size) {
		; CHECK-LABEL: @zero_memset_after_malloc_with_different_sizes(
		; CHECK-NEXT: [[CALL:%.]] = call i8 @malloc(i64 [[SIZE:%.*]])
		; CHECK-NEXT: [[SIZE2:%.*]] = add nsw i64 [[SIZE]], -1
		; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* [[CALL]], i8 0, i64 [[SIZE2]], i1 false)
		; CHECK-NEXT: ret i8* [[CALL]]
		;
		%call = call i8* @malloc(i64 %size) inaccessiblememonly
		%size2 = add nsw i64 %size, -1
		call void @llvm.memset.p0i8.i64(i8* %call, i8 0, i64 %size2, i1 false)
		ret i8* %call
		}

		; based on pr25892_lite
		define i8* @zero_memset_after_new(i64 %size) {
		; CHECK-LABEL: @zero_memset_after_new(
		; CHECK-NEXT: [[CALL:%.]] = call i8 @_Znwm(i64 [[SIZE:%.*]])
		; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* [[CALL]], i8 0, i64 [[SIZE]], i1 false)
		; CHECK-NEXT: ret i8* [[CALL]]
		;
		%call = call i8* @_Znwm(i64 %size)
		call void @llvm.memset.p0i8.i64(i8* %call, i8 0, i64 %size, i1 false)
		ret i8* %call
		}

		; This should not create a calloc and should not crash the compiler.
		define i8* @notmalloc_memset(i64 %size, i8(i64) %notmalloc) {
		; CHECK-LABEL: @notmalloc_memset(
		; CHECK-NEXT: [[CALL1:%.]] = call i8 [[NOTMALLOC:%.]](i64 [[SIZE:%.]])
		; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* [[CALL1]], i8 0, i64 [[SIZE]], i1 false)
		; CHECK-NEXT: ret i8* [[CALL1]]
		;
		%call1 = call i8* %notmalloc(i64 %size)
		call void @llvm.memset.p0i8.i64(i8* %call1, i8 0, i64 %size, i1 false)
		ret i8* %call1
		}

		; This should not create recursive call to calloc.
		define i8* @calloc(i64 %nmemb, i64 %size) {
		; CHECK-LABEL: @calloc(
		; CHECK: entry:
		; CHECK-NEXT: [[MUL:%.]] = mul i64 [[SIZE:%.]], [[NMEMB:%.*]]
		; CHECK-NEXT: [[CALL:%.]] = tail call noalias align 16 i8 @malloc(i64 [[MUL]])
		; CHECK-NEXT: [[TOBOOL_NOT:%.]] = icmp eq i8 [[CALL]], null
		; CHECK-NEXT: br i1 [[TOBOOL_NOT]], label [[IF_END:%.]], label [[IF_THEN:%.]]
		; CHECK: if.then:
		; CHECK-NEXT: tail call void @llvm.memset.p0i8.i64(i8* nonnull align 16 [[CALL]], i8 0, i64 [[MUL]], i1 false)
		; CHECK-NEXT: br label [[IF_END]]
		; CHECK: if.end:
		; CHECK-NEXT: ret i8* [[CALL]]
		;
		entry:
		%mul = mul i64 %size, %nmemb
		%call = tail call noalias align 16 i8* @malloc(i64 %mul)
		%tobool.not = icmp eq i8* %call, null
		br i1 %tobool.not, label %if.end, label %if.then

		if.then: ; preds = %entry
		tail call void @llvm.memset.p0i8.i64(i8* nonnull align 16 %call, i8 0, i64 %mul, i1 false)
		br label %if.end

		if.end: ; preds = %if.then, %entry
		ret i8* %call
		}

		define float* @pr25892(i64 %size) {
		; CHECK-LABEL: @pr25892(
		; CHECK: entry:
		; CHECK-NEXT: [[CALL:%.]] = call i8 @calloc(i64 1, i64 [[SIZE:%.*]])
		; CHECK-NEXT: [[CMP:%.]] = icmp eq i8 [[CALL]], null
		; CHECK-NEXT: br i1 [[CMP]], label [[CLEANUP:%.]], label [[IF_END:%.]]
		; CHECK: if.end:
		; CHECK-NEXT: [[BC:%.]] = bitcast i8 [[CALL]] to float*
		; CHECK-NEXT: br label [[CLEANUP]]
		; CHECK: cleanup:
		; CHECK-NEXT: [[RETVAL_0:%.]] = phi float [ [[BC]], [[IF_END]] ], [ null, [[ENTRY:%.*]] ]
		; CHECK-NEXT: ret float* [[RETVAL_0]]
		;
		entry:
		%call = call i8* @malloc(i64 %size) inaccessiblememonly
		%cmp = icmp eq i8* %call, null
		br i1 %cmp, label %cleanup, label %if.end
		if.end:
		%bc = bitcast i8* %call to float*
		call void @llvm.memset.p0i8.i64(i8* %call, i8 0, i64 %size, i1 false)
		br label %cleanup
		cleanup:
		%retval.0 = phi float* [ %bc, %if.end ], [ null, %entry ]
		ret float* %retval.0
		}

		define float* @pr25892_with_extra_store(i64 %size) {
		; CHECK-LABEL: @pr25892_with_extra_store(
		; CHECK: entry:
		; CHECK-NEXT: [[CALL:%.]] = call i8 @calloc(i64 1, i64 [[SIZE:%.*]])
		; CHECK-NEXT: [[CMP:%.]] = icmp eq i8 [[CALL]], null
		; CHECK-NEXT: br i1 [[CMP]], label [[CLEANUP:%.]], label [[IF_END:%.]]
		; CHECK: if.end:
		; CHECK-NEXT: [[BC:%.]] = bitcast i8 [[CALL]] to float*
		; CHECK-NEXT: br label [[CLEANUP]]
		; CHECK: cleanup:
		; CHECK-NEXT: [[RETVAL_0:%.]] = phi float [ [[BC]], [[IF_END]] ], [ null, [[ENTRY:%.*]] ]
		; CHECK-NEXT: ret float* [[RETVAL_0]]
		;
		entry:
		%call = call i8* @malloc(i64 %size) inaccessiblememonly
		%cmp = icmp eq i8* %call, null
		br i1 %cmp, label %cleanup, label %if.end
		if.end:
		%bc = bitcast i8* %call to float*
		call void @llvm.memset.p0i8.i64(i8* %call, i8 0, i64 %size, i1 false)
		store i8 0, i8* %call, align 1
		br label %cleanup
		cleanup:
		%retval.0 = phi float* [ %bc, %if.end ], [ null, %entry ]
		ret float* %retval.0
		}

; PR50143		; PR50143
define i8* @store_zero_after_calloc_inaccessiblememonly() {		define i8* @store_zero_after_calloc_inaccessiblememonly() {
; CHECK-LABEL: @store_zero_after_calloc_inaccessiblememonly(		; CHECK-LABEL: @store_zero_after_calloc_inaccessiblememonly(
; CHECK-NEXT: [[CALL:%.]] = tail call i8 @calloc(i64 1, i64 10) #[[ATTR6:[0-9]+]]		; CHECK-NEXT: [[CALL:%.]] = tail call i8 @calloc(i64 1, i64 10) #[[ATTR6:[0-9]+]]
; CHECK-NEXT: store i8 0, i8* [[CALL]], align 1		; CHECK-NEXT: store i8 0, i8* [[CALL]], align 1
; CHECK-NEXT: ret i8* [[CALL]]		; CHECK-NEXT: ret i8* [[CALL]]
;		;
%call = tail call i8* @calloc(i64 1, i64 10) inaccessiblememonly		%call = tail call i8* @calloc(i64 1, i64 10) inaccessiblememonly
▲ Show 20 Lines • Show All 134 Lines • Show Last 20 Lines