This is an archive of the discontinued LLVM Phabricator instance.

CodeGen: Fix ARM cmpxchg64 register fragmentation in fast-regalloc
AbandonedPublic

Authored by strager on Jun 1 2017, 5:38 PM.

Details

Summary

ARM's CMP_SWAP_64 pseudo-instruction (introduced in r266679,
used with -O0) uses three GPRPair-class registers and two
GPR-class registers. The fast register allocator (also used
with -O0) allocates GPR-class registers before allocating
GPRPair-class registers.

GPRPair includes is r0+r1, r2+r3, r4+r5, r6+r7, r8+r9,
r10+r11, and r12+sp. With Clang's -ffixed-r9 option, with
the frame pointer enabled, and with sp reserved, the
register allocator can only use r0+r1, r2+r3, r4+r5, and
r6+r7 from GPRPair.

If the fast register allocator allocates CMP_SWAP_64's GPR
operands first, it may decide to allocate r1 and r3. Later,
when the allocator allocates CMP_SWAP_64's GPRPair oprands,
it realizes only two GPRPair-class registers are available
(r4+r5 and r6+r7), and it can't spill the already-allocated
registers to make room. LLVM then fails with the message:

error: ran out of registers during register allocation

In short, the fast register allocator fragments the
registers and LLVM can't compile 64-bit compare-exchanges.

As a workaround, reduce the risk of fragmentation by
allocating registers for bigger register classes before
smaller ones. For CMP_SWAP_64 on ARM, this means
registers are allocated for GPRPair operands before GPR
operands.

For consistency, change all architectures, not just ARM.

This fixes PR30228.

Test Plan:
See the included test case (test/CodeGen/ARM/cmpxchg-O0.ll
test_cmpxchg_64_register_pressure). It is based on the
following C program compiled with Clang with -O0:

void f(unsigned long long *addr, unsigned long long desired, unsigned long long new) {
  while (!__sync_bool_compare_and_swap(addr, desired, new)) {
  }
}

Event Timeline

strager created this revision.Jun 1 2017, 5:38 PM

Let me know if there's a better solution than sorting by size! (Aside from fixing the bug which necessitates CMP_SWAP_64 in the first place...)

lib/CodeGen/RegAllocFast.cpp
997–1000

(I know I should delete this before merging. Locally I need to work with LLVM 4.0.0.)

smeenai added a subscriber: smeenai.Jun 1 2017, 5:41 PM

Ping! This is ready for design and code review.

hans added a reviewer: MatzeB.Aug 4 2017, 4:28 PM
hans added a subscriber: hans.

+Matthias for register allocation

MatzeB edited edge metadata.

+ Quentin for register allocation

I'd prefer not to add additional logic into the main path of the fast register allocator. How about just reordering the operands of CMP_SWAP_64 (or for all of the CMP_SWAP operations for consistency) instead?

qcolombet requested changes to this revision.Aug 7 2017, 10:36 AM

Hi,

The approach sounds sensible to me however, I'd like we use the AllocationPriority instead of the size of the register class.
Moreover, please refactor the code (I am thinking helper function) so that the definitions are processed in the same way.
Finally, keep the order of the operand as tie breaker.

Cheers,
-Quentin
PS: If AllocationPriority does not fix the problem, then you'll have to fix the description of the related register class, because size alone is too magic.

This revision now requires changes to proceed.Aug 7 2017, 10:36 AM
qcolombet added inline comments.Aug 7 2017, 10:37 AM
lib/CodeGen/RegAllocFast.cpp
984

Reject Reg == 0

991

Why can't you just get the the reg class here?

Hi,

The approach sounds sensible to me however, I'd like we use the AllocationPriority instead of the size of the register class.
Moreover, please refactor the code (I am thinking helper function) so that the definitions are processed in the same way.
Finally, keep the order of the operand as tie breaker.

Cheers,
-Quentin
PS: If AllocationPriority does not fix the problem, then you'll have to fix the description of the related register class, because size alone is too magic.

The ARM target does not use AllocationPriority yet. Though it would be great to start using it for the GPRPair and tuple classes (in a separate commit though).

strager abandoned this revision.Nov 18 2017, 6:33 PM

I don't plan to work on this anymore.