Page MenuHomePhabricator

[Coroutine] Move all used local allocas to the .resume function
ClosedPublic

Authored by lxfind on Nov 6 2020, 2:56 PM.

Details

Summary

Prior to D89768, any alloca that's used after suspension points will be put on to the coroutine frame, and hence they will always be reloaded in the resume function.
However D89768 introduced a more precise way to determine whether an alloca should live on the frame. Allocas that are only used within one suspension region (hence does not need to live across suspension points) will not be put on the frame. They will remain local to the resume function.
When creating the new entry for the .resume function, the existing logic only moved all the allocas from the old entry to the new entry. This covers every alloca from the old entry. However allocas that's defined afer coro.begin are put into a separate basic block during CoroSplit (the PostSpill basic block). We need to make sure these allocas are moved to the new entry as well if they are used.
This patch walks through all allocas, and check if they are still used but are not reachable from the new entry, if so, we move them to the new entry.
This should fix the bug reported in D89768

Diff Detail

Unit TestsFailed

TimeTest
380 mslinux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -m64 -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -mcmodel=large -mllvm -hwasan-globals -mllvm -hwasan-use-short-granules -mllvm -hwasan-instrument-landing-pads=0 -mllvm -hwasan-instrument-personality-functions /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/hwasan/TestCases/sizes.cpp -nostdlib++ -lstdc++ -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/hwasan/X86_64/TestCases/Output/sizes.cpp.tmp
480 mswindows > LLVM.CodeGen/AMDGPU::ds_read2.ll
Script: -- : 'RUN: at line 2'; c:\ws\w16n2-1\llvm-project\premerge-checks\build\bin\llc.exe -march=amdgcn -mcpu=bonaire -verify-machineinstrs -mattr=+load-store-opt < C:\ws\w16n2-1\llvm-project\premerge-checks\llvm\test\CodeGen\AMDGPU\ds_read2.ll | c:\ws\w16n2-1\llvm-project\premerge-checks\build\bin\filecheck.exe -enable-var-scope -check-prefixes=GCN,CI C:\ws\w16n2-1\llvm-project\premerge-checks\llvm\test\CodeGen\AMDGPU\ds_read2.ll

Event Timeline

lxfind created this revision.Nov 6 2020, 2:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 6 2020, 2:56 PM
lxfind requested review of this revision.Nov 6 2020, 2:56 PM
lxfind updated this revision to Diff 303561.Nov 6 2020, 3:02 PM

update description

lxfind updated this revision to Diff 303562.Nov 6 2020, 3:03 PM
lxfind edited the summary of this revision. (Show Details)

update description

lxfind updated this revision to Diff 303563.Nov 6 2020, 3:04 PM
lxfind edited the summary of this revision. (Show Details)

rebase

Harbormaster completed remote builds in B77944: Diff 303562.
jrprice added a subscriber: jrprice.Nov 9 2020, 5:32 AM

I can confirm that this fixes the crash we've been seeing - thank you!

lxfind added a comment.Nov 9 2020, 1:07 PM

I can confirm that this fixes the crash we've been seeing - thank you!

Cool. I also noticed that your code have quite lots of allocas, which makes it a great example to benchmark the effect of memory savings from D89768.
I was wondering, comparing to before D89768 and now, how much reduction do you see in the coroutine frame size in your codebase (i.e. the eventual argument to @coroutine_alloc_frame)? I would really appreciate if you could share some insights on the magnitude of that.

junparser accepted this revision.Nov 9 2020, 5:00 PM

LGTM, Thanks!

This revision is now accepted and ready to land.Nov 9 2020, 5:00 PM
This revision was landed with ongoing or failed builds.Nov 9 2020, 5:25 PM
This revision was automatically updated to reflect the committed changes.