The following bpf linux kernel selftest failed with latest
$ ./test_progs -n 7/10 ... The sequence of 8193 jumps is too complex. verification time 126272 usec stack depth 320 processed 114799 insns (limit 1000000) ... libbpf: failed to load object 'pyperf600_nounroll.o' test_bpf_verif_scale:FAIL:110 #7/10 pyperf600_nounroll.o:FAIL #7 bpf_verif_scale:FAIL
After some investigation, I found the following llvm patch
is responsible. The patch disabled hoisting common instructions
in SimplifyCFG by default. Later on, the code changes and a
SimplifyCFG phase with hoisting on cannot do the work any more.
A test is provided to demonstrate the problem.
The IR before simplifyCFG looks like:
for.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %cmp = icmp ult i32 %i.0, 6 br i1 %cmp, label %for.body, label %for.cond.cleanup for.cond.cleanup: %2 = load i8*, i8** %frame_ptr, align 8, !tbaa !2 %cmp2 = icmp eq i8* %2, null %conv = zext i1 %cmp2 to i32 call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %1) #3 call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %0) #3 ret i32 %conv for.body: %3 = load i8*, i8** %frame_ptr, align 8, !tbaa !2 %tobool.not = icmp eq i8* %3, null br i1 %tobool.not, label %for.inc, label %land.lhs.true
The first two insns of for.cond.cleanup and for.body, load and
icmp, can be hoisted to for.cond block. With Patch D84108, the
optimization is delayed. But unfortunately, later on loop rotation
added addition phi nodes to for.body and hoisting cannot
be done any more.
Note such a hoisting is beneficial to bpf programs as
bpf verifier does path sensitive analysis and verification.
The hoisting preverts reloading from stack which will assume
conservative value and increase exploited insns. In this case,
it caused verifier failure.
To fix this problem, I added an IR pass from bpf target
to performance additional simplifycfg with hoisting common inst