Page MenuHomePhabricator

[DSE] Transform memset + malloc --> calloc (PR25892)
ClosedPublic

Authored by yurai007 on May 24 2021, 3:04 AM.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
yurai007 marked 2 inline comments as done.May 31 2021, 11:29 AM

@xbolva00 @nikic @hubert.reinterpretcast: I think now after addressing all comments and passing on CI it's in good shape.

nikic added inline comments.Jun 9 2021, 1:34 PM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
514

I'm a bit surprised that MemoryLocation::get() doesn't work for MemSetInst, as it only has a single memory location, so there's no ambiguity here. @asbirlea @fhahn Do you think it would make sense to adjust the API?

1829–1880

Which part here requires the const cast?

1853

Is this cast needed? Shouldn't it be an implicit upcast?

1858

As you already have an IRBuilder, why not IRB.getIntPtrTy(DL)?

1864

Looks like this doesn't preserve MemorySSA? Try -passes='dse,verify<memoryssa>' in the test.

llvm/test/Transforms/DeadStoreElimination/noop-stores.ll
339

As it was the motivating case, I'd also expect a test where the memset is in a different block.

I also don't think that the dominates() condition in your implementation is exercised by tests. Some other conditions aren'T either, for example malloc and memset having different sizes.

yurai007 updated this revision to Diff 351131.Jun 10 2021, 4:41 AM
yurai007 marked 2 inline comments as done.Jun 10 2021, 4:47 AM
yurai007 added inline comments.
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1829–1880

Malloc, more precisely starting from this line: IRBuilder<> IRB(Malloc);
We can const_cast later, at time of Malloc definition but we cannot remove it completely - it's still required for Builder and replaceAllUsesWith/eraseFromParent.
Anyway, I moved it to Malloc definition as it's more appropriate place.

1864

Will check.

llvm/test/Transforms/DeadStoreElimination/noop-stores.ll
339

I also don't think that the dominates() condition in your implementation is exercised by tests. Some other conditions aren'T either, for example malloc and memset having different sizes.

Sure, I can add much more tests to cover more conditions.

As it was the motivating case, I'd also expect a test where the memset is in a different block.

Currently with this patch DSE cannot perform such transformation. Consider original pr25892 from https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/InstCombine/memset-1.ll.
IIRC the reason is that malloc block - entry doesn't have related local store to malloced memory so DSE cannot find malloc-memset pair as candidate for elimination. I can provide more details if you want but when I checked it last time I simply concluded that more effort in DSE would be needed to make it work across blocks.

yurai007 updated this revision to Diff 351393.Jun 11 2021, 3:42 AM

More tests.

yurai007 added inline comments.Jun 11 2021, 3:48 AM
llvm/test/Transforms/DeadStoreElimination/noop-stores.ll
339

Currently with this patch DSE cannot perform such transformation.(...)

After adding original PR unit test I noticed actually now DSE can perform transformation across blocks. When I checked it before it couldn't, apparently meantime changes unlocked it. Well, that's good :)

yurai007 added inline comments.Jun 11 2021, 5:32 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1864

Looks like this doesn't preserve MemorySSA? Try -passes='dse,verify<memoryssa>' in the test.

Right, missed MemorySSAUpdater. I'm submitting fix + related UT right now.

yurai007 updated this revision to Diff 351417.Jun 11 2021, 5:36 AM
yurai007 retitled this revision from [DSE] Use calloc for memset+malloc to [DSE] Transform memset + malloc --> calloc (PR25892).
yurai007 edited the summary of this revision. (Show Details)
yurai007 marked 3 inline comments as done.Jun 11 2021, 6:22 AM

@nikic: All comments were addressed.

xbolva00 added inline comments.Jun 15 2021, 3:08 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1851

We dont need this check, do we?

p = malloc(20)
memset(p, 0, 10)

Reading p between 10 and 20 is UB, so with calloc we would have 0s in this area so fine.

And reverse case is UB too.

yurai007 added inline comments.Jun 15 2021, 3:46 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1851

If we permitted to "calloc more than we memset" wouldn't we hide UB in some cases?
Like if we would really read unitinitialized memory much later after memset?
The second thing is that GCC doesn't transform malloc to calloc when we memset less memory than malloc allocates: https://godbolt.org/z/Ef94je4KP I'm not saying we should blindly follow them, I'm just not sure what was rationale behind that.

nikic added inline comments.Jun 15 2021, 3:56 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1851

If the memset size is different, profitability is also unclear. Converting a malloc into a calloc may be always legal, but if you have malloc(10000) followed by memset(10) it's probably not profitable to actually do it.

xbolva00 added inline comments.Jun 15 2021, 3:59 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1851

There is no rule that compiler cannot “hide” UB. Compiler is free to assume that UB never happens.

xbolva00 added inline comments.Jun 15 2021, 4:04 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1851

Yeah.

Maybe if we know both sizes and memset is unlikely to be expanded (there is some limit defined samowhere), we should still prefer calloc (1 call) than 2 libcalls?

xbolva00 added inline comments.Jun 15 2021, 4:05 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1851

But for now, let’s start with that condition you already have.

Good enough for this patch.

yurai007 added inline comments.Jun 15 2021, 4:32 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1851

If the memset size is different, profitability is also unclear. Converting a malloc into a calloc may be always legal, but if you have malloc(10000) followed by memset(10) it's probably not profitable to actually do it.

Right.

There is no rule that compiler cannot “hide” UB. Compiler is free to assume that UB never happens.

I'm aware that compiler assume UB never happens. My point was that if we permit to "calloc more than we memset" then uninitialized access may become initialized and _maybe_ sanitizers/memcheck/other tools won't detect it. If it's unreal then fine, I agree we shouldn't care.

I played a little bit with patch and checked how it performs with GCC unit tests added together with similar change in GCC on 2014.

In first case: https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/gcc.dg/tree-ssa/calloc-1.c malloc is transformed to calloc like for GCC which is good.
In LLVM case on assembly level register pressure is very similar to GCC (especially for f() function) which means that issue mentioned by Haneef in PR discussion
(https://bugs.llvm.org/show_bug.cgi?id=25892#c1) seems to be mitigated now.

However the second C++-like case: https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/g++.dg/tree-ssa/calloc.C is not transformed.
Here CFG is more complex and I guess that DSE simply doesn't detect malloc - memset pair because they aren't in neighboring blocks.
Apparently there is still room for improvements but let's leave it for future patches :)

yurai007 marked 5 inline comments as done.Jun 17 2021, 3:30 AM

@nikic @xbolva00: All comments addressed.

No comments from me. Someone more familiar with DSE should LGTM it.

@xbolva00: Ok. Added more reviewers.

nikic accepted this revision.Jun 27 2021, 1:25 PM

LGTM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1852

It would be clearer to use Malloc instead of DefUOInst here, to match the memoryIsNotModifiedBetween() call.

llvm/test/Transforms/DeadStoreElimination/noop-stores.ll
3

Please change this RUN line to use -passes='dse,verify<memoryssa>'.

This revision is now accepted and ready to land.Jun 27 2021, 1:25 PM
yurai007 updated this revision to Diff 354868.Jun 28 2021, 6:02 AM
yurai007 marked 2 inline comments as done.

Rebased and addressed yesterday comments.

yurai007 updated this revision to Diff 354987.Jun 28 2021, 12:24 PM

@nikic: Thanks for your review and LGTM it. Looks like I cannot push it myself to main: "Permission to llvm/llvm-project.git denied to yurai007".

This revision was automatically updated to reflect the committed changes.
This revision is now accepted and ready to land.Fri, Jul 9, 6:28 AM

Hello @kcc @eugenis @pgousseau, sorry for bothering you. I added you to this review because transformation introduced in this change breaks msan_test (memcpy_unaligned/TestUnalignedMemcpy unit test).
I'm quite convinced that after my change Clang does what GCC would do if compliled msan_test.cpp file with -Ofast: https://godbolt.org/z/f7s81hjaM
Therefore I'm pretty sure that transformation works correctly (as on GCC) but it simply doesn't play well with MSan.
Since I'm not MSan expert it would be great if you could take a look on this and confirm whether or not my understanding of issue is correct.

Hello @kcc @eugenis @pgousseau, sorry for bothering you. I added you to this review because transformation introduced in this change breaks msan_test (memcpy_unaligned/TestUnalignedMemcpy unit test).
I'm quite convinced that after my change Clang does what GCC would do if compliled msan_test.cpp file with -Ofast: https://godbolt.org/z/f7s81hjaM
Therefore I'm pretty sure that transformation works correctly (as on GCC) but it simply doesn't play well with MSan.
Since I'm not MSan expert it would be great if you could take a look on this and confirm whether or not my understanding of issue is correct.

Right, so this change replaces malloc with calloc in

if (src_is_poisoned)
  src_origin = __msan_get_origin(src);
else
  memset(src, 0, sz);

because the other branch contains UB.
The test can be fixed by adding __msan_allocated_memory(ptr, size) before the call to __msan_get_origin, but I'd rather disable this optimization in functions with sanitize_memory attribute because it could make us miss bugs.

If possible, it's OK to disable only the CFG-aware part of the opt. I.e. malloc + memset in linear code (or even same BB) is fair game.

yurai007 updated this revision to Diff 359297.Fri, Jul 16, 5:57 AM

Fixed Msan.

Thanks. I updated patch and now optimization is disabled for functions with sanitize_memory attribute.
Detecting linear pattern would be possible but I'm not convinced it's worth effort. Anyway now it should work with MSan.

yurai007 updated this revision to Diff 360028.Tue, Jul 20, 12:10 AM
This revision was landed with ongoing or failed builds.Tue, Jul 20, 2:51 AM
This revision was automatically updated to reflect the committed changes.

@yurai007 Shouldn't we detect that we are implementing 'calloc' and bail out if so ? Just like we do for 'memset' ?
(See https://www.godbolt.org/z/xnMa9bj4r )

@jeroen.dobbelaere: Yes, I think DSE should special case calloc to avoid infinite recursion (like it does for memset) in libc++. Thanks for catching this. I'm reverting change to fix.

*Meant libc, not libc++ but nevermind.

delcypher added subscribers: yuri, delcypher.EditedFri, Jul 23, 1:05 PM

@yurai007 I think this patch may have broken the compiler-rt/test/asan/TestCases/heap-overflow.cpp test. This test is failing on our internal bots. The test fails because it expects to see malloc at -O2 in the stacktrace but we are seeing calloc() instead.

@delcypher: Thanks for letting me know. Now patch is reverted, Before submitting again I will fix heap-overflow.cpp test.

yurai007 updated this revision to Diff 361985.Tue, Jul 27, 5:13 AM

Don't transform when function is calloc or it's instrumented with ASan.

yurai007 updated this revision to Diff 362312.Wed, Jul 28, 2:45 AM

Disabled transformation on HWASan as well. Maybe it's too paranoid but I don't have AAarch64 hardware to verify.