Page MenuHomePhabricator

[RegAllocFast] Handle case where operand of copy cannot be assigned.
Needs ReviewPublic

Authored by fhahn on Feb 4 2021, 4:26 AM.

Details

Summary

This patch adds special handling to detect the case where we try to
assign a register to the VReg use of a copy, but all registers in the
required class are pre-assigned. We cannot assign such registers and
error out later.

Instead, we can try to move the use right after the definition, in the
hope that one of the registers in the class gets freed up.

This scenario can happen when assigning registers to copies to physical
registers for a function call, so initially this is limited to copies to
physical registers.

The only extra cost in practice should be the scan of the register
class. Finding the defining MI should only be needed in cases where we
would run out of registers otherwise.

I am planning to convert the test case into a MIR test case, but I'd
like to make sure the patch is on the right track first.

Diff Detail

Unit TestsFailed

TimeTest
360 msx64 debian > libarcher.races::task-dependency.c
Script: -- : 'RUN: at line 13'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -Wl,-rpath,/mnt/disks/ssd0/agent/llvm-project/build/lib /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-dependency.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-dependency.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/deflake.bash /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-dependency.c.tmp 2>&1 | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-dependency.c.tmp.log | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-dependency.c
350 msx64 debian > libarcher.races::task-taskgroup-unrelated.c
Script: -- : 'RUN: at line 13'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -Wl,-rpath,/mnt/disks/ssd0/agent/llvm-project/build/lib /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-taskgroup-unrelated.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-taskgroup-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/deflake.bash /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-taskgroup-unrelated.c.tmp 2>&1 | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-taskgroup-unrelated.c.tmp.log | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-taskgroup-unrelated.c
330 msx64 debian > libarcher.races::task-taskwait-nested.c
Script: -- : 'RUN: at line 13'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -Wl,-rpath,/mnt/disks/ssd0/agent/llvm-project/build/lib /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-taskwait-nested.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/deflake.bash /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp 2>&1 | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp.log | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-taskwait-nested.c
260 msx64 debian > libarcher.races::task-two.c
Script: -- : 'RUN: at line 13'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -Wl,-rpath,/mnt/disks/ssd0/agent/llvm-project/build/lib /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-two.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-two.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/deflake.bash /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-two.c.tmp 2>&1 | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-two.c.tmp.log | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-two.c
330 msx64 debian > libarcher.task::task-barrier.c
Script: -- : 'RUN: at line 15'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -Wl,-rpath,/mnt/disks/ssd0/agent/llvm-project/build/lib /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/task/task-barrier.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/task/Output/task-barrier.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/task/Output/task-barrier.c.tmp 2>&1 | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/task/Output/task-barrier.c.tmp.log | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/task/task-barrier.c
View Full Test Results (62 Failed)

Event Timeline

fhahn created this revision.Feb 4 2021, 4:26 AM
fhahn requested review of this revision.Feb 4 2021, 4:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 4 2021, 4:26 AM
Herald added a subscriber: wdng. · View Herald Transcript
arsenm added a comment.Feb 4 2021, 5:51 AM

My first thoughts is there should be no scan or instruction moving necessary. Why can't you just spill registers to satisfy this? The MIR test would help understand this

llvm/lib/CodeGen/RegAllocFast.cpp
1083–1086

I don't see why copies specifically would be special for this problem

1089

Both operands are required to be a register for COPY, you can drop the isReg checks

1094–1101

Moving this to a separate function would be worthwhile

1103–1113

What about partial defs?

llvm/test/CodeGen/X86/regallocfast-need-to-move-copy.ll
3 ↗(On Diff #321382)

Should add a comment explaining the test

fhahn added a comment.Feb 4 2021, 5:57 AM

My first thoughts is there should be no scan or instruction moving necessary. Why can't you just spill registers to satisfy this? The MIR test would help understand this

The problem is all registers in the required class are pre-assigned and cannot be spilled (according to RegAllocFast::calcSpillCost https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/RegAllocFast.cpp#L598)

The MIR for the problematic block is below. It cannot find a register for %8 in $al = COPY %8, because all register in the class for %8 are defined between $al = COPY %8 and %8:gr8 = MOV8ri 2

bb.3.bb5:
  successors: %bb.4(0x40000000), %bb.5(0x40000000)

  %5:gr8_norex = COPY %4.sub_8bit_hi
  %6:gr32_norex = MOVZX32rr8_NOREX killed %5
  EH_LABEL <mcsymbol >
  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
  %7:gr32 = MOV32ri 3427
  %8:gr8 = MOV8ri 2
  %9:gr64 = IMPLICIT_DEF
  $rdi = COPY %9
  $esi = COPY %7
  %10:gr64 = IMPLICIT_DEF
  $rdx = COPY %10
  %11:fr64 = IMPLICIT_DEF
  $xmm0 = COPY %11
  %12:fr64 = IMPLICIT_DEF
  $xmm1 = COPY %12
  %13:gr32 = IMPLICIT_DEF
  $ecx = COPY %13
  $r8d = COPY %6
  $al = COPY %8
  CALL64pcrel32 target-flags(x86-plt) @baz, csr_64, implicit $rsp, implicit $ssp, implicit $rdi, implicit $esi, implicit $rdx, implicit $xmm0, implicit $xmm1, implicit $ecx, implicit $r8d, implicit $al, implicit-def $rsp, implicit-def $ssp
  ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
  EH_LABEL <mcsymbol >
  JMP_1 %bb.4
fhahn marked an inline comment as done.Feb 4 2021, 10:00 AM
fhahn added inline comments.
llvm/lib/CodeGen/RegAllocFast.cpp
1083–1086

I don't think they are special, I just thought it might be worth initially limiting it to copies, as the most likely case is probably during moving values to physical registers for calls.

We can remove it, we just have to do a bit more scanning.

1103–1113

I'll update this, if the direction is the right approach to pursue.

fhahn updated this revision to Diff 321502.Feb 4 2021, 10:01 AM

Add mir test, remove isReg checks.

Is it something we could fix in the scheduler instead?

I think it is fair to end up with a "run out of registers" error here. Without changing the schedule, the allocation is not possible.

The expectation is the scheduler is supposed to keep the physical registers live-ranges as small as possible.

Basically, I am mildly against having the fast allocator doing some sort of rescheduling. This opens the door of all of crazy thing :). (Like you realized, copy are not special here, the same problem could happen with any instruction.)

fhahn added a comment.Feb 4 2021, 2:43 PM

Is it something we could fix in the scheduler instead?

I think it is fair to end up with a "run out of registers" error here. Without changing the schedule, the allocation is not possible.

The expectation is the scheduler is supposed to keep the physical registers live-ranges as small as possible.

Unfortunately this is only a problem with -O0 and the MachineScheduler does not run and the list scheduler has register pressure tracking disabled I think.

How was this test working before?

Again I am not very keen on this approach because technically we could have this problem with any instruction and:

  1. It doesn't make sense to support only COPY then
  2. We really are doing the job of the scheduler here

Note that SDISel does some scheduling, maybe the problem is there?

the list scheduler has register pressure tracking disabled I think.

That's not really a problem with register pressure tracking, this is mainly a matter of keeping the physical live-ranges as small as possible.

How was this test working before?

Again I am not very keen on this approach because technically we could have this problem with any instruction and:

  1. It doesn't make sense to support only COPY then
  2. We really are doing the job of the scheduler here

Note that SDISel does some scheduling, maybe the problem is there?

I'm not an expert in this area, but why is it any more job of the scheduler to keep the live ranges small for the sake of the allocator? Shouldn't the allocator be able to deal with any valid input (within computational reason)?

I'm not an expert in this area, but why is it any more job of the scheduler to keep the live ranges small for the sake of the allocator? Shouldn't the allocator be able to deal with any valid input (within computational reason)?

Because the input is not valid: if you don't reschedule, the code is not colorable.

To elaborate on the answer:

why is it any more job of the scheduler to keep the live ranges small for the sake of the allocator?

Because:

  1. the allocator assumes a fixed schedule, it keeps the live ranges small by splitting/spilling them (only spilling for the fast allocator)
  2. the live-ranges that are causing problems in that example are physical registers, which the allocator cannot touch, so we cannot split or spill them

Therefore, the only way to make this input colorable is by changing the schedule, which by #1 is not what the allocator does and knows to handle to be fair.

This patch adds some mechanics to teach the allocator how to do some re-scheduling in a very specific case, and I argue that we are setting a dangerous path. E.g., what if instead of a copy we have physreg = add %v, #1, then the allocator would fail exactly the same way.

Long story short, with today's capabilities and intended design, the input is invalid.

To elaborate on the answer:

why is it any more job of the scheduler to keep the live ranges small for the sake of the allocator?

Because:

  1. the allocator assumes a fixed schedule, it keeps the live ranges small by splitting/spilling them (only spilling for the fast allocator)
  2. the live-ranges that are causing problems in that example are physical registers, which the allocator cannot touch, so we cannot split or spill them

Ok, with 2. it's clearer now why this is a problem.

Therefore, the only way to make this input colorable is by changing the schedule, which by #1 is not what the allocator does and knows to handle to be fair.

This patch adds some mechanics to teach the allocator how to do some re-scheduling in a very specific case, and I argue that we are setting a dangerous path. E.g., what if instead of a copy we have physreg = add %v, #1, then the allocator would fail exactly the same way.

Long story short, with today's capabilities and intended design, the input is invalid.

It sounds like we need to have the fast-RA be able to defer to the more sophisticated allocators in certain cases, but I don't think it's possible to do with the existing pass manager.

It sounds like we need to have the fast-RA be able to defer to the more sophisticated allocators in certain cases.

That's a good idea! We could do something similar to what we do with GISel: set some function attributes to inform the other allocators whether or not they have something to do.

Now, ideally, I wish we had one performant allocator that works fast enough for O0.

It sounds like we need to have the fast-RA be able to defer to the more sophisticated allocators in certain cases.

That's a good idea! We could do something similar to what we do with GISel: set some function attributes to inform the other allocators whether or not they have something to do.

I think this is actually a problem GlobalISel does have since it doesn't have the DAG scheduler/linearizer to minimize physical register live ranges. We need an MI equivalent pass to do this

lkail added a subscriber: lkail.Feb 19 2021, 6:53 AM

I think this is actually a problem GlobalISel does have since it doesn't have the DAG scheduler/linearizer to minimize physical register live ranges. We need an MI equivalent pass to do this

That's a good point. I've been advocating for this for years, if not publicly at least internally. Without a sort of scheduler in the pipeline, on targets like x86, we would generate awful code whenever we carry around flags, since so many instructions would clobber them. Currently, I believe we would just spill the flags!

pengfei added inline comments.Feb 25 2021, 9:54 PM
llvm/lib/CodeGen/RegAllocFast.cpp
1094–1099

nit: this can be a any_of.