This is to fix a bug reported by
https://github.com/iovisor/bpftrace/issues/1305
For the following initial selection dag,
t79: ch,glue = BPFISD::CALL t78, Constant:i64<4>, Register:i64 $r1, Register:i64 $r2, Register:i64 $r3, t78:1 t80: ch,glue = callseq_end t79, TargetConstant:i64<0>, TargetConstant:i64<0>, t79:1 t81: i64,ch,glue = CopyFromReg t80, Register:i64 $r0, t80:1 t82: i64,ch = load<(dereferenceable load 8 from %ir."struct cgroup.kn")> t81:1, FrameIndex:i64<2>, undef:i64 t83: ch = lifetime.end<0 to -1> t82:1, TargetFrameIndex:i64<2> t86: ch = lifetime.start<0 to -1> t83, TargetFrameIndex:i64<1> t89: ch = store<(store 8 into %ir.20)> t86, Constant:i64<0>, FrameIndex:i64<1>, undef:i64 t91: ch = lifetime.start<0 to -1> t89, TargetFrameIndex:i64<0> t92: ch,glue = callseq_start t91, TargetConstant:i64<0>, TargetConstant:i64<0> t93: ch,glue = CopyToReg t92, Register:i64 $r1, FrameIndex:i64<0> t94: ch,glue = CopyToReg t93, Register:i64 $r2, Constant:i64<8>, t93:1 t87: i64 = add t82, Constant:i64<8> t95: ch,glue = CopyToReg t94, Register:i64 $r3, t87, t94:1 t96: ch,glue = BPFISD::CALL t95, Constant:i64<4>, Register:i64 $r1, Register:i64 $r2, Register:i64 $r3, t95:1 t97: ch,glue = callseq_end t96, TargetConstant:i64<0>, TargetConstant:i64<0>, t96:1
Note that node t89 depends on t86 which recursively depends on t82. This will enforce
load happens before store.
The optimized dag becomes
t79: ch,glue = BPFISD::CALL t78, Constant:i64<4>, Register:i64 $r1, Register:i64 $r2, Register:i64 $r3, t78:1 t80: ch,glue = callseq_end t79, TargetConstant:i64<0>, TargetConstant:i64<0>, t79:1 t81: i64,ch,glue = CopyFromReg t80, Register:i64 $r0, t80:1 t131: ch = TokenFactor t81:1, t130:1 t83: ch = lifetime.end<0 to -1> t131, TargetFrameIndex:i64<2> t86: ch = lifetime.start<0 to -1> t83, TargetFrameIndex:i64<1> t128: ch = store<(store 8 into %ir.20)> t80, Constant:i64<0>, FrameIndex:i64<1>, undef:i64 t129: ch = TokenFactor t86, t128 t91: ch = lifetime.start<0 to -1> t129, TargetFrameIndex:i64<0> t92: ch,glue = callseq_start t91, TargetConstant:i64<0>, TargetConstant:i64<0> t93: ch,glue = CopyToReg t92, Register:i64 $r1, FrameIndex:i64<0> t94: ch,glue = CopyToReg t93, Register:i64 $r2, Constant:i64<8>, t93:1 t87: i64 = add t130, Constant:i64<8>
...
t130: i64,ch = load<(dereferenceable load 8 from %ir."struct cgroup.kn")> t80, FrameIndex:i64<2>, undef:i64
For the optimized tag, store t128 now depends on t80 and load t130 also
depends on t80. Depending on how schedule dag scheduler works,
this opens possibility that load and store may be reordered.
Note the above optimized dag code is generated for both bpf and x86.
But x86 actually generates correct code. The reason is that x86 (and many other
architectures) is using "Source" selection dag scheduler which favors
source order in case of multiple choices. The default for bpf is ILP
which tries to optimize for instruction parallelism.
I disabled "Source" scheduler for x86 and used "ILP" scheduler,
the generated code still correct. This is because x86 has different lowering
code and it happens to work. Tweaking "ILP" might not be a robust idea
for BPF, so this patch enabled BPF to use "Source" scheduler.
There are two different ways to use "Source" scheduler.
One is to do setSchedulingPreference() in backend which will only
affect selection dag scheduling, and the other is to do
enableMachineScheduler() which enables selection dag "Source"
scheduling, but also enables "Machine Scheduler" phase and impacts
register allocation (favoring global-scope register collescing over
local collescing).
This patch uses setSchedulingPreference() for selection dag scheduler only
as benefit from machine scheduler and additional register allocation
is not clear to bpf backend at this point.