This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyLibCalls] Transform malloc to calloc with redundant memsets elimination (PR25892)
AbandonedPublic

Authored by yurai007 on Apr 23 2021, 9:15 AM.

Details

Summary

Before this change LLVM cannot perform any of following transformation:

memset(malloc(x), 0, x) --> calloc(1, x)
memset(calloc(1, x), 0, x) --> calloc(1, x)

However GCC is able to do that: https://godbolt.org/z/Ef94je4KP

Both transformations are connected especially when malloc is followed by more then one memset.
Therefore both transformations can be handled together in same code paths by SimplifyLibCalls.
It's very common to have malloc and memset in different blocks like in addressed PR:
https://bugs.llvm.org/show_bug.cgi?id=25892
This patch detects common pattern (icmp+br) and examine affected basic blocks conservatively looking for
intermediate store-like instructions. The change is loosely based on previous PR-related work:
https://reviews.llvm.org/D93360 and https://reviews.llvm.org/D16337.

TODO: Integrate with DSE.
TODO: Extract tests into NFC part.

Diff Detail

Event Timeline

yurai007 created this revision.Apr 23 2021, 9:15 AM
yurai007 requested review of this revision.Apr 23 2021, 9:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2021, 9:15 AM
yurai007 edited the summary of this revision. (Show Details)Apr 23 2021, 9:15 AM
yurai007 added inline comments.
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1207

It's lightweight but very conservative approach. Anyway it's good enough to transform many simple cases (take a look at tests to get idea)
and in particular to satisfy PR case. However we could use alias analysis instead simple scan and in consequence transform more complex scenarios with intermediate store-like instructions. I guess it's feasible but also more intrusive way - keep in in mind currently SimplifyLibCalls knows nothing about AA. Anyway I can go in this direction if you are interested.

I am still not sure if this is the place for such transformation (just look how many new lines of code are needed...)

cc @fhahn @efriedma @nikic as they are more experienced and may recommend better place for this transformation.

I would think this naturally slots into some pass that's already doing the relevant dependence analysis, like DSE or memcpyopt.

@efriedma @xbolva00 : Ok. I can try to integrate this with memcpyopt or DCE. It should simplify scanning stores part. Not sure if it's possible to get rid of rest.
For example we still need to perform malloc/calloc checks (for now memcpyopt knows nothing about allocation functions) and some kind of matching for icmp+br pattern.

I meant *DSE ofc.

some kind of matching for icmp+br pattern.

Do you actually need to check this? The transform is correct either way... although maybe it's a bit dubious to transform malloc->calloc if the memset is rarely executed and/or smaller than the allocation.

DSE currently includes optimizations for memset of calloc.

Do you actually need to check this? The transform is correct either way... although maybe it's a bit dubious to transform malloc->calloc if the memset is rarely executed and/or smaller than the allocation.

It may be redundant. Patch is inspired by previous work and by GCC solution: https://patchwork.ozlabs.org/project/gcc/patch/alpine.DEB.2.02.1402282337290.27023@stedding.saclay.inria.fr/ .
In both cases they do such matching which doesn't mean yet it's crucial.

DSE currently includes optimizations for memset of calloc.

I assume you mean recent: https://reviews.llvm.org/D101391. Yes, it would be good to integrate this change with DSE on top of D101391.

yurai007 edited the summary of this revision. (Show Details)May 7 2021, 4:55 AM

Hello, some days ago I was looking at it.

proof of concept code: https://pastebin.com/V8m1v03a

but then I hit same issue as for calloc, you can read about it here: https://reviews.llvm.org/D101440, especially my last comments: https://reviews.llvm.org/D101440#inline-959170

We fails to find def - malloc call, since malloc is marked with inaccessiblememonly. Looks like inaccessiblememonly does not fit well for allocation functions.

Obvious fix is to remove inaccessiblememonly from (m)alloc functions and my POC code should work.. But maybe we could find better solution?

Opinions?

cc @fhahn @nikic @jdoerfert

Hi, I found time to look into it and put response in original chain: https://reviews.llvm.org/D101440#inline-959170

Abandoned because of better approach: https://reviews.llvm.org/D103009

yurai007 abandoned this revision.May 31 2021, 11:21 AM