[mips] Fix atomic operations at O0, v3
AcceptedPublic

Authored by sdardis on Mar 23 2017, 8:18 AM.

Details

Summary

Similar to PR/25526, fast-regalloc introduces spills at the end of basic
blocks. When this occurs in between an ll and sc, the stores can cause the
atomic sequence to fail.

This patch fixes the issue by introducing more pseudos to represent atomic
operations and moving their lowering to after the expansion of postRA
pseudos.

This version addresses issues with the initial implementation and covers
all atomic operations.

This resolves PR/32020.

Thanks to James Cowgill for reporting the issue!

Diff Detail

sdardis created this revision.Mar 23 2017, 8:18 AM
slthakur accepted this revision.Apr 4 2017, 5:56 AM

LGTM

test/CodeGen/Mips/atomic.ll
30

Nit: Spelling should be "tests"

This revision is now accepted and ready to land.Apr 4 2017, 5:56 AM
sdardis planned changes to this revision.Jun 14 2017, 3:22 AM

I'm going to upload a new version of this diff, I found problems with the register allocation in the posted version.

sdardis updated this revision to Diff 102517.Jun 14 2017, 3:28 AM

Updated the liveness tracking for new basic blocks produced by MipsExpandPseudo.
Reduced the number of basic blocks created by MipsISelLowering for atomic operations - this helps optimize the code operated on by FastRegAlloc, i.e. less spurious spills and reloads.
Updated the lowering of atomic operations not to produce spurious copies, the better liveness tracking handles values being live.
Updated the register flags used for atomic pseudos, my initial implementation marked the inputs as early clobber when it was the output that needed to be marked early clobber.

This revision is now accepted and ready to land.Jun 14 2017, 3:28 AM

Sagar, would you mind taking a second look at this?

Thanks,
Simon

sdardis planned changes to this revision.May 21 2018, 5:36 AM
sdardis added reviewers: atanasyan, arichardson.

I have a new version of this patch which should hopefully address this atomics issue.

sdardis updated this revision to Diff 147772.May 21 2018, 5:46 AM
sdardis retitled this revision from [mips] Fix atomic compare and swap at O0, v2 to [mips] Fix atomic compare and swap at O0, v3.
sdardis edited the summary of this revision. (Show Details)
sdardis set the repository for this revision to rL LLVM.

Rebase to master, and address some of the register allocation issues.

This revision is now accepted and ready to land.May 21 2018, 5:46 AM
sdardis requested review of this revision.May 21 2018, 5:46 AM

LGTM. Also looks like it should be easy to adjust our CHERI ll/sc instructions to use the same mechanism when I do the next upstream merge.

Mostly comment fix suggestions

lib/Target/Mips/MipsExpandPseudo.cpp
219–220

Lowercase i here: hasMIps32r6()

488

BNE and MOVE seem to be unused in any of the BuildMI calls in this function.

492–495

STI->hasMips32r6()

lib/Target/Mips/MipsISelLowering.cpp
1465–1466

Put | Implicit before flags, registers -> register

1469

amoung -> among

1473

amoung -> among

1509–1510

In accordance with the comment above, shouldn't the scratch register have the accompanying dead flag as well?

1672

Change to unique among registers

1679–1683

Indentation

1744

Change to unique among registers

1752

Indentation

1793

is -> are, add Implicit to the list of flags too since it is used in the addReg() for Scratch and Scratch2 below.

1798

amoung -> among

1861–1876

Either purposes -> purpose or is -> are

1863

Change to unique among registers

1874–1877

Indentation, RegState::EarlyClobber appears twice for Scratch.

test/CodeGen/Mips/atomic.ll
25

tets -> tests

sdardis updated this revision to Diff 149456.Fri, Jun 1, 6:52 AM
sdardis marked 18 inline comments as done.

Address review comments.

This revision is now accepted and ready to land.Fri, Jun 1, 7:21 AM
asb added a subscriber: asb.Fri, Jun 8, 5:32 AM
asb added inline comments.
lib/Target/Mips/MipsTargetMachine.cpp
288–290

Would it be more robust to have this pass under addPreEmit2? This guarantees it runs after potentially troublesome passes such as the machineoutliner (obviously not yet enabled for Mips anyway), and you'd no longer need to worry about machine block placement.

sdardis added inline comments.Mon, Jun 11, 3:55 AM
lib/Target/Mips/MipsISelLowering.cpp
1679–1683

This is weird, but it is the output of clang-format.

1752

This is clang-formatted.

lib/Target/Mips/MipsTargetMachine.cpp
288–290

I hadn't considered that yet. Most of my concerns were about making sure the register allocator didn't break the scratch registers by making them non-unique with the address or data registers for the pseudo instruction.

@sdardis I hadn't checked myself before posting but as you say that is indeed what clang-format outputs.

lib/Target/Mips/MipsISelLowering.cpp
1793

Implicit typo

sdardis retitled this revision from [mips] Fix atomic compare and swap at O0, v3 to [mips] Fix atomic operations at O0, v3.Wed, Jun 13, 2:19 AM
sdardis marked an inline comment as done.Wed, Jun 13, 4:43 AM
sdardis added inline comments.
lib/Target/Mips/MipsTargetMachine.cpp
288–290

After looking at the machine outliner, I can see that it's possible to use the getMachineOutlinerMBBFlags function to detect ll/sc sequences and mark those entire blocks as illegal to outline*.

The choice of preEmit2 is more complex for MIPS as we would then need to perform the equivalent of MipsBranchExpansion at the MC layer as we would have invalidated its' assumptions, along with the delay slot filler and the size reduction pass.

PreEmit seems to be the right place.

  • ISTM that if the registers used by all the occurrences of a single type of atomic operation were the same, then they could be merged into a single outlined function.
sdardis updated this revision to Diff 151131.Wed, Jun 13, 5:19 AM

Moved pass to after the scheduler and machine block placement.
Normalized the probabilities for the successors of modified basic blocks for sanity purposes.

sdardis updated this revision to Diff 151180.Wed, Jun 13, 8:50 AM

Actually include the expansion pass.

jyknight added inline comments.
lib/Target/Mips/MipsTargetMachine.cpp
288–290

The problem being that MipsBranchExpansion expects that every MI (by the time it runs) turns into 4 bytes of output? I think that just means these pseudos must be handled in getInstSizeInBytes(), right?

I think ideally the ll/sc loop should be treated as if it was effectively an inline-asm blob -- no mutations made by any other passes to the contents. So, e.g. if you want a delay slot filled, you place an instruction there yourself.

And, yes, it's perfectly reasonable to outline the LL/SC loop in its entirety, just not _parts_ of it.

But maybe this discussion should be continued on the llvm-dev email thread. Certainly moving to preEmit is closer to the ideal state than before. :)