This is an archive of the discontinued LLVM Phabricator instance.

BPF: use Source instead of ILP scheduler for selection dag
AbandonedPublic

Authored by yonghong-song on Sep 29 2020, 9:02 PM.

Details

Reviewers
ast
Summary

This is to fix a bug reported by

https://github.com/iovisor/bpftrace/issues/1305

For the following initial selection dag,

t79: ch,glue = BPFISD::CALL t78, Constant:i64<4>, Register:i64 $r1, Register:i64 $r2, Register:i64 $r3, t78:1
t80: ch,glue = callseq_end t79, TargetConstant:i64<0>, TargetConstant:i64<0>, t79:1 
  t81: i64,ch,glue = CopyFromReg t80, Register:i64 $r0, t80:1
t82: i64,ch = load<(dereferenceable load 8 from %ir."struct cgroup.kn")> t81:1, FrameIndex:i64<2>, undef:i64
          t83: ch = lifetime.end<0 to -1> t82:1, TargetFrameIndex:i64<2>
        t86: ch = lifetime.start<0 to -1> t83, TargetFrameIndex:i64<1> 
      t89: ch = store<(store 8 into %ir.20)> t86, Constant:i64<0>, FrameIndex:i64<1>, undef:i64
    t91: ch = lifetime.start<0 to -1> t89, TargetFrameIndex:i64<0>
  t92: ch,glue = callseq_start t91, TargetConstant:i64<0>, TargetConstant:i64<0>
t93: ch,glue = CopyToReg t92, Register:i64 $r1, FrameIndex:i64<0>
t94: ch,glue = CopyToReg t93, Register:i64 $r2, Constant:i64<8>, t93:1
  t87: i64 = add t82, Constant:i64<8>
t95: ch,glue = CopyToReg t94, Register:i64 $r3, t87, t94:1
t96: ch,glue = BPFISD::CALL t95, Constant:i64<4>, Register:i64 $r1, Register:i64 $r2, Register:i64 $r3, t95:1
t97: ch,glue = callseq_end t96, TargetConstant:i64<0>, TargetConstant:i64<0>, t96:1

Note that node t89 depends on t86 which recursively depends on t82. This will enforce
load happens before store.

The optimized dag becomes

t79: ch,glue = BPFISD::CALL t78, Constant:i64<4>, Register:i64 $r1, Register:i64 $r2, Register:i64 $r3, t78:1
t80: ch,glue = callseq_end t79, TargetConstant:i64<0>, TargetConstant:i64<0>, t79:1
              t81: i64,ch,glue = CopyFromReg t80, Register:i64 $r0, t80:1
            t131: ch = TokenFactor t81:1, t130:1
          t83: ch = lifetime.end<0 to -1> t131, TargetFrameIndex:i64<2>
        t86: ch = lifetime.start<0 to -1> t83, TargetFrameIndex:i64<1>
        t128: ch = store<(store 8 into %ir.20)> t80, Constant:i64<0>, FrameIndex:i64<1>, undef:i64
      t129: ch = TokenFactor t86, t128
    t91: ch = lifetime.start<0 to -1> t129, TargetFrameIndex:i64<0>
  t92: ch,glue = callseq_start t91, TargetConstant:i64<0>, TargetConstant:i64<0>
t93: ch,glue = CopyToReg t92, Register:i64 $r1, FrameIndex:i64<0>
t94: ch,glue = CopyToReg t93, Register:i64 $r2, Constant:i64<8>, t93:1
  t87: i64 = add t130, Constant:i64<8>

...

t130: i64,ch = load<(dereferenceable load 8 from %ir."struct cgroup.kn")> t80, FrameIndex:i64<2>, undef:i64

For the optimized tag, store t128 now depends on t80 and load t130 also
depends on t80. Depending on how schedule dag scheduler works,
this opens possibility that load and store may be reordered.

Note the above optimized dag code is generated for both bpf and x86.
But x86 actually generates correct code. The reason is that x86 (and many other
architectures) is using "Source" selection dag scheduler which favors
source order in case of multiple choices. The default for bpf is ILP
which tries to optimize for instruction parallelism.

I disabled "Source" scheduler for x86 and used "ILP" scheduler,
the generated code still correct. This is because x86 has different lowering
code and it happens to work. Tweaking "ILP" might not be a robust idea
for BPF, so this patch enabled BPF to use "Source" scheduler.

There are two different ways to use "Source" scheduler.
One is to do setSchedulingPreference() in backend which will only
affect selection dag scheduling, and the other is to do
enableMachineScheduler() which enables selection dag "Source"
scheduling, but also enables "Machine Scheduler" phase and impacts
register allocation (favoring global-scope register collescing over
local collescing).

This patch uses setSchedulingPreference() for selection dag scheduler only
as benefit from machine scheduler and additional register allocation
is not clear to bpf backend at this point.

Diff Detail

Event Timeline

yonghong-song created this revision.Sep 29 2020, 9:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 29 2020, 9:02 PM
yonghong-song requested review of this revision.Sep 29 2020, 9:02 PM
ast accepted this revision.Sep 29 2020, 9:22 PM

ouch. sounds like we don't have much choice. Let's figure out what happened with test_verif_scale3 first, but looks like we have to land regardless.

This revision is now accepted and ready to land.Sep 29 2020, 9:22 PM
yonghong-song retitled this revision from [RFC] BPF: use Source instead of ILP scheduler for selection dag to BPF: use Source instead of ILP scheduler for selection dag.
yonghong-song edited the summary of this revision. (Show Details)

removed RFC tag as bpf selftests are solved.
using setSchedulingPreference() instead of enableMachineScheduler() to limit the scope
Regarding previous reported failures for bpf selftest test_verif_scale3.o:

  • current llvm-project master: processed 997315 insns
  • enableMachineScheduler() processed 1005967 insns (hacked verifier to increase to permit 2M insns), resulted from more instructions due to enabled machine scheduling phase (and possibly additional regalloc functionality, not verified).
  • setSchedulingPreference() processed 998991 insns
ast added a comment.Sep 30 2020, 2:04 PM

looks like x86 64-bit is ILP, so it's not niche.
Looks like Source gets the least amount of testing (judging by archs that use it).
RegPressure is probably 2nd most used after ILP.
It feels that we should fix it inside ILP instead.

In D88525#2304656, @ast wrote:

looks like x86 64-bit is ILP, so it's not niche.
Looks like Source gets the least amount of testing (judging by archs that use it).
RegPressure is probably 2nd most used after ILP.
It feels that we should fix it inside ILP instead.

The default for x86 is Source. See

ScheduleDAGSDNodes* createDefaultScheduler(SelectionDAGISel *IS,
                                           CodeGenOpt::Level OptLevel) {
  const TargetLowering *TLI = IS->TLI;
  const TargetSubtargetInfo &ST = IS->MF->getSubtarget();

  // Try first to see if the Target has its own way of selecting a scheduler
  if (auto *SchedulerCtor = ST.getDAGScheduler(OptLevel)) {
    return SchedulerCtor(IS, OptLevel);
  }
  
  if (OptLevel == CodeGenOpt::None ||
      (ST.enableMachineScheduler() && ST.enableMachineSchedDefaultSched()) ||
      TLI->getSchedulingPreference() == Sched::Source)
    return createSourceListDAGScheduler(IS, OptLevel);
  if (TLI->getSchedulingPreference() == Sched::RegPressure)
    return createBURRListDAGScheduler(IS, OptLevel);
  if (TLI->getSchedulingPreference() == Sched::Hybrid)
    return createHybridListDAGScheduler(IS, OptLevel);
  if (TLI->getSchedulingPreference() == Sched::VLIW)
    return createVLIWDAGScheduler(IS, OptLevel);
  assert(TLI->getSchedulingPreference() == Sched::ILP &&
         "Unknown sched type!");
  return createILPListDAGScheduler(IS, OptLevel);
}

If ST.enableMachineScheduler() && ST.enableMachineSchedDefaultSched()
is true, it does not matter whether what target specifies SchedulingPreference,
it will use Source. For x86, it has enableMachineScheduler() and somehow
enableMachineSchedDefaultSched() is true, hence Source...

But this is only the default Subtarget scheduler, it is possible other
subtarget may use different schedulers, e.g., ILP.

Anyway, there is a discussion in https://bugs.llvm.org/show_bug.cgi?id=47591.
Possibly there is a real bug in selection dag. It would be the best the bug can
be fixed there.

dxu added a subscriber: dxu.Nov 30 2020, 11:23 AM
yonghong-song abandoned this revision.Dec 1 2020, 7:41 PM

Looks like https://reviews.llvm.org/D91833 fixed the issue. Abandon this.