This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0
ClosedPublic

Authored by tmatheson on Apr 23 2021, 7:25 AM.

Details

Summary

atomicrmw instructions are expanded by AtomicExpandPass before register allocation
into cmpxchg loops. Register allocation can insert spills between the exclusive loads
and stores, which invalidates the exclusive monitor and can lead to infinite loops.

To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them
after register allocation.

Floating point legalisation:
f16 ATOMIC_LOAD_FADD(*f16, f16) is legalised to
f32 ATOMIC_LOAD_FADD(*i16, f32) and then eventually
f32 ATOMIC_LOAD_FADD_16(*i16, f32)

Diff Detail

Event Timeline

tmatheson created this revision.Apr 23 2021, 7:25 AM
tmatheson requested review of this revision.Apr 23 2021, 7:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2021, 7:25 AM
tmatheson updated this revision to Diff 340452.Apr 26 2021, 2:02 AM

Fix bad merge

tmatheson updated this revision to Diff 340462.Apr 26 2021, 2:33 AM

Fix bad merge

lenary accepted this revision.Apr 28 2021, 6:59 AM

LGTM

This revision is now accepted and ready to land.Apr 28 2021, 6:59 AM

I will merge this today

tmatheson updated this revision to Diff 341869.Apr 30 2021, 6:20 AM

Rebase to try to get CI to pass

This revision was landed with ongoing or failed builds.Apr 30 2021, 8:41 AM
This revision was automatically updated to reflect the committed changes.

It's already reverted, but FYI in case you've missed this https://lab.llvm.org/buildbot/#/builders/70/builds/6916

chill added a subscriber: chill.May 4 2021, 6:52 AM
chill added inline comments.
llvm/lib/Target/ARM/ARMISelLowering.cpp
19004

Why are we doing this only at -O0 ?

efriedma added inline comments.
llvm/lib/Target/ARM/ARMISelLowering.cpp
19004

As far as we know, the problem with spills only shows up with fast regalloc (i.e. at -O0). Using the pesudo-instruction at other optimization levels is possible, but I'm not sure what the performance implications would be.