This is an archive of the discontinued LLVM Phabricator instance.

[DSE] Re-enable calloc transformation with extra care (PR25892)
ClosedPublic

Authored by yurai007 on Sep 18 2021, 3:19 AM.

Details

Summary

Transformation from malloc+memset to calloc is always correct and in many situations
it brings significant observable benefits in terms of execution speed and memory consumption [1][2].
Unfortunately there are cases when producing calloc cause performance drops [3].
As discussed here: https://reviews.llvm.org/D103009 it's possible to differentiate between those 2 scenarios.
If optimizer is able to prove that after malloc call it's _very_ likely to reach memset branch then after
calloc emission we shouldn't observe any performance hits. Therefore finding "null pointer check" pattern
before memset basic block sounds like good justification for performing transformation.
Also that method was already suggested by GCC folks [4]. Main reason for change is that for now
to be safe we check for post dominance relation which is way too conservative approach making transformation
"almost" disabled in practice. This patch tends to enable transformation again but with extra care.

[1] https://stackoverflow.com/questions/2688466/why-mallocmemset-is-slower-than-calloc
[2] https://vorpus.org/blog/why-does-calloc-exist/
[3] http://smalldatum.blogspot.com/2017/11/a-new-optimization-in-gcc-5x-and-mysql.html
[4] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83022

Diff Detail

Event Timeline

yurai007 created this revision.Sep 18 2021, 3:19 AM
yurai007 requested review of this revision.Sep 18 2021, 3:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 18 2021, 3:19 AM
xbolva00 added inline comments.Sep 18 2021, 3:39 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1864

m_Specific(Ptr->stripPointerCasts())),

Do we need it? Can it happen from C source?

float * ptr = (float*)malloc(N * sizeof(float));
                if (ptr != (float*)NULL) {
                    buf[i] = ptr;
                    memset(ptr,'\0',N);
                }

We have there:

%5 = tail call noalias align 16 i8* @malloc(i64 %4) #3
%6 = icmp eq i8* %5, null
1868

You dont need to check for NE, as EQ is canonical. EQ is NE with swapped BBs.

1870

Check that BB2 (False BB) must be MemsetBB?

xbolva00 added inline comments.Sep 18 2021, 3:40 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1855

hmm.. maybe rename this lambda as "shouldCreateCalloc" or "isProfitableToCreateCalloc" or something like that?

yurai007 added inline comments.Sep 20 2021, 5:16 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1864

Right, given that memset always takes i8* (modulo alignment) no reason to assume in-between casting.

yurai007 updated this revision to Diff 373557.Sep 20 2021, 5:17 AM
yurai007 marked 4 inline comments as done.

Addressed comments.

Thanks, looks fine

@asbirlea?

nikic accepted this revision.Oct 10 2021, 7:09 AM

LGTM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1862–1864
This revision is now accepted and ready to land.Oct 10 2021, 7:09 AM
yurai007 updated this revision to Diff 378519.Oct 10 2021, 10:57 AM
yurai007 updated this revision to Diff 378520.Oct 10 2021, 11:08 AM

Just poked Buildkite with exactly same change to make sure it's green.