This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] Optimize MemoryDef insertion
ClosedPublic

Authored by nikic on Aug 7 2021, 2:16 PM.

Details

Summary

When converting a store into a memset, we currently insert the new MemoryDef after the store MemoryDef, which requires all uses to be renamed to the new def using a whole block scan. Instead, we can insert the new MemoryDef before the store and not rename uses, because we know that the location is immediately overwritten, so all uses should still refer to the old MemoryDef. Those uses will get renamed when the old MemoryDef is actually dropped, which is efficient.

I expect something similar can be done for some of the other MSSA updates in MemCpyOpt. This may be an alternative to D107513, at least for this particular case.

Diff Detail

Event Timeline

nikic created this revision.Aug 7 2021, 2:16 PM
nikic requested review of this revision.Aug 7 2021, 2:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 7 2021, 2:16 PM
aeubanks added a comment.EditedAug 9 2021, 12:27 PM

this does make memcpyopt not insanely slow for large functions

import sys

count = int(sys.argv[1])

with open(sys.argv[2], 'w') as f:
    for i in xrange(count):
        f.write('@g{} = global [2 x i64] zeroinitializer\n'.format(i))
    f.write('define void @a() {\n')
    for i in xrange(count):
        f.write('  store [2 x i64] zeroinitializer, [2 x i64]* @g{}\n'.format(i))
    f.write('  ret void\n')
    f.write('}\n')
$ python /tmp/a.py 100000 /tmp/a.ll
$ opt -passes=memcpyopt -disable-output /tmp/a.ll

for a function with 100000 stores, it previously took 88s, with this patch it takes 1s

asbirlea accepted this revision.Aug 9 2021, 5:27 PM

Thanks, this LGTM, and it does resolve the efficiency case.

This revision is now accepted and ready to land.Aug 9 2021, 5:27 PM
This revision was automatically updated to reflect the committed changes.