This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Don't use the Machine Scheduler for cortex-m at minsize
ClosedPublic

Authored by dmgreen on May 14 2019, 12:49 AM.

Details

Summary

The new cortex-m schedule in D54142 helps performance, but can increase the amount of high-registers used. This, on average, ends up increasing the codesize by a fair amount (because less instructions are converted from T2 to T1). On cortex-m at -Oz, where we are quite size-paranoid, it is better to use the existing DAG scheduler with the RegPressure scheduling preference (at least until the issues around T2 vs T1 instructions can be improved).

I have also made sure that the Sched::RegPressure dag scheduler is always chosen for MinSize.

The test shows one case where we increase the number of registers used.

Diff Detail

Repository
rL LLVM

Event Timeline

dmgreen created this revision.May 14 2019, 12:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2019, 12:49 AM
This revision is now accepted and ready to land.May 14 2019, 1:25 AM
fhahn added a comment.May 14 2019, 4:16 AM

This looks like an interesting case! I think there are 2 things going wrong (see input to the scheduler below)

  1. The scheduler does not know it would be beneficial to keep things in the lower registers (the instructions mostly use the GPR/GPRNOPC classes). We could tweak the RegPressureSetLimits for Oz.
  2. We add nodes that reduce register pressure to the available queue too late, because currently we only move from pending to available when we reached the ReadyCycle for in-order CPUs. This means we end up in a situation in which we have to schedule a node that further increases reg pressure.

While the patch is a quick fix, I think we should properly address those issues. Would you mind creating a bug report?

bb.0.entry:
  liveins: $r0, $r1
  %1:rgpr = COPY $r1
  %0:gpr = COPY $r0
  %2:rgpr = t2LSRri %1:rgpr, 8, 14, $noreg, $noreg
  %3:gprnopc = t2LDRi12 %0:gpr, 16, 14, $noreg :: (load 4 from %ir.msg_out)
  %4:rgpr = t2LDRHi12 %0:gpr, 20, 14, $noreg :: (load 2 from %ir.options_out_len, align 4)
  %5:rgpr = t2ADDri %4:rgpr, 1, 14, $noreg, $noreg
  t2STRHi12 %5:rgpr, %0:gpr, 20, 14, $noreg :: (store 2 into %ir.options_out_len, align 4)
  %6:gprnopc = t2ADDrr %3:gprnopc, %4:rgpr, 14, $noreg, $noreg
  t2STRBi12 %2:rgpr, %6:gprnopc, 240, 14, $noreg :: (store 1 into %ir.arrayidx)
  %7:gprnopc = t2LDRi12 %0:gpr, 16, 14, $noreg :: (load 4 from %ir.msg_out)
  %8:rgpr = t2LDRHi12 %0:gpr, 20, 14, $noreg :: (load 2 from %ir.options_out_len, align 4)
  %9:rgpr = t2ADDri %8:rgpr, 1, 14, $noreg, $noreg
  t2STRHi12 %9:rgpr, %0:gpr, 20, 14, $noreg :: (store 2 into %ir.options_out_len, align 4)
  %10:gprnopc = t2ADDrr %7:gprnopc, %8:rgpr, 14, $noreg, $noreg
  t2STRBi12 %1:rgpr, %10:gprnopc, 240, 14, $noreg :: (store 1 into %ir.arrayidx10)
  tBX_RET 14, $noreg

I agree with this:

While the patch is a quick fix, I think we should properly address those issues. Would you mind creating a bug report?

This is a workaround for suboptimal decision making elsewhere. Teaching the scheduler new tricks and heuristics is usually not low-hanging fruit. So, since this workaround is very minimal and clean, creating a bug report looks like an excellent way forward to me.

fhahn accepted this revision.May 15 2019, 3:37 AM

LGTM, it would be great to have Cortex-M4 use the MachineScheduler in the non -Oz cases. Lets keep track of the issues (I expect more to surface) and go from there.

Yep, cheers, allocation of the high registers has been bothering us for a while now, and will indeed probably resurface sooner rather than later.

Sorry guys, I was distracted by some other work.

Florian, Nice one looking into the issues. I had tried some quick experiments around adjusting the priority of tryPressure in tryCandidate, but didn't get it to do much. I think because as you said the limits were a lot higher enough to kick in. My further ideas were more in the direction of adjusting the register allocation hints to pick higher registers less for instructions that can be shrunk into T1 instructions. Or (re-)adjusting the schedule taking into account things like CPSR live ranges, again for which instructions can and can't be shrank.

I've created a bug report in PR41884. Liek I tried to put into the comment, hopefully this is something that we don't need for long.

This revision was automatically updated to reflect the committed changes.