LLVM optimization may generate certain codes which cannot be
handled by kernel verifier, e.g., some optimizations in
InstCombine and SimplifyCFG as bpf verifier is implemented
in kernel and it cannot perform complicated code analysis like
llvm compiler. Memory, verification speed and verifier complexity
are all part of considerations when adding new analysis
in verifier.
To avoid such harmful transformation, BPF backend has implemented
some passes, esp. through 'barrier' builtin function to prevent
certain InstCombine and SimplifyCFG transformations.
In these BPF backend passes pattern matching are used
to capture some specific patterns to prevent some
llvm transformations. But such pattern matching may not be precise
and may prevent some useful transformations. It would be great
if we can directly disable llvm transformations and this will
also avoid bpf specific transformation-preventing passes.
The following is 'git show --stat' to show that we can remove
lots of bpf hacking codes by adding one TTI hook.
llvm/include/llvm/Analysis/TargetTransformInfo.h | 9 +++ llvm/include/llvm/Analysis/TargetTransformInfoImpl.h | 2 + llvm/include/llvm/IR/IntrinsicsBPF.td | 3 - llvm/include/llvm/Transforms/InstCombine/InstCombiner.h | 1 + llvm/include/llvm/Transforms/Utils/LoopUtils.h | 8 +- llvm/lib/Analysis/TargetTransformInfo.cpp | 4 + llvm/lib/Target/BPF/BPF.h | 7 -- llvm/lib/Target/BPF/BPFAdjustOpt.cpp | 393 -------------------------------------------------------------------------------------------- llvm/lib/Target/BPF/BPFCheckAndAdjustIR.cpp | 45 +---------- llvm/lib/Target/BPF/BPFTargetMachine.cpp | 5 -- llvm/lib/Target/BPF/BPFTargetTransformInfo.h | 3 + llvm/lib/Target/BPF/CMakeLists.txt | 1 - llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | 4 + llvm/lib/Transforms/Scalar/LICM.cpp | 27 ++++--- llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 3 + llvm/test/CodeGen/BPF/adjust-opt-icmp1.ll | 14 +--- llvm/test/CodeGen/BPF/adjust-opt-icmp2.ll | 10 +-- llvm/test/CodeGen/BPF/adjust-opt-icmp3.ll | 12 +-- llvm/test/CodeGen/BPF/adjust-opt-icmp4.ll | 12 +-- llvm/test/CodeGen/BPF/adjust-opt-speculative1.ll | 17 +--- llvm/test/CodeGen/BPF/adjust-opt-speculative2.ll | 22 +----- 21 files changed, 69 insertions(+), 533 deletions(-)
Below are detailed explanations for three transformations
which hurts bpf verification.
FoldAndOrOfICmpsUsingRanges
The following is an example to show how FoldAndOrOfICmpsUsingRanges
transformation may generate codes which hurts bpf verifier.
For bpf prog in linux/tools/testing/selftests/bpf/progs/map_kptr_fail.c:
... id = ctx->protocol; if (id < 4 || id > 12) return 0; *(u64 *)((void *)v + id) = 0; ...
With FoldAndOrOfICmpsUsingRanges, the find pseudo code looks like:
... id = ctx->protocol; tmp = id; tmp += -13; if (tmp < 0xfffffff7) goto next; v += id; *v = 0; next:
In the above code, the verifier considers 'id' in 'v += id' as a arbitrary
unsigned integer so later '*v = 0' is considered as possible out-of-bound
memory access. This is because the verifier, as a post analysis tool,
does not know the relationship of tmp/id at 'v += id' point. Although it
is possible to improve verifier to track tmp/id relationship, this would
increase bpf verifier complexity a lot. llvm FoldAndOrOfICmpsUsingRanges
does the transformation based on pattern matching and it certainly aware
tmp/id relationship.
The actual verification failure looks like below:
; id = ctx->protocol; 9: (61) r1 = *(u32 *)(r6 +16) ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R6_w=ctx(off=0,imm=0) ; if (id < 4 || id > 12) 10: (bc) w2 = w1 ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) 11: (04) w2 += -13 ; R2=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) 12: (a6) if w2 < 0xfffffff7 goto pc+3 ; R2=scalar(umin=4294967287,umax=4294967295,var_off=(0xfffffff0; 0xf),s32_min=-9,s32_max=-1) ; *(u64 *)((void *)v + id) = 0; 13: (0f) r0 += r1 ; R0_w=map_value(off=0,ks=4,vs=32,umax=4294967295,var_off=(0x0; 0xffffffff)) R1=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) 14: (b7) r1 = 0 ; R1_w=0 ; *(u64 *)((void *)v + id) = 0; 15: (7b) *(u64 *)(r0 +0) = r1 R0 unbounded memory access, make sure to bounds check any such access
FoldTwoEntryPHINode
The following is an example to show FoldTwoEntryPHINode transformation
may generate codes which hurts bpf verifier.
For bpf prog in linux/tools/testing/selftests/bpf/progs/test_tc_dtime.c:
static void inc_errs(__u32 idx) { if (test < 9) errs[test][idx]++; else errs[UKN_TEST][idx]++; } ... if (skb->tstamp == 0xb9fbeef) inc_errs(2); ...
With FoldTwoEntryPHINode, the final generated code looks like
... r1 = test; r2 = skb->tstamp; if (r2 != 0xb9fbeef) goto next; w2 = w1; // w1/w2 are lower 32 bit values of r1/r2. if (w1 >= 9) w2 = 9; tmp = r2 * 28; r3 = errs + tmp; ... *r3 ... ...
next:
In the above code, for the case where 'w1 >= 9' is false, verifier
concludes that 'r2' at 'tmp = r2 * 28' as an arbitrary scalar which
caused verificaiton failure for later dereference of r3.
The actual verification failure looks like below:
8: (18) r1 = 0xffffc900001ca230 ; R1_w=map_value(off=560,ks=4,vs=564,imm=0) 10: (61) r1 = *(u32 *)(r1 +0) ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; if (skb->tstamp == EGRESS_ENDHOST_MAGIC) 11: (79) r2 = *(u64 *)(r6 +152) ; R2_w=scalar() R6=ctx(off=0,imm=0) ; if (skb->tstamp == EGRESS_ENDHOST_MAGIC) 12: (55) if r2 != 0xb9fbeef goto pc+10 ; R2_w=195018479 13: (bc) w2 = w1 ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; if (test < __NR_TESTS) 14: (a6) if w1 < 0x9 goto pc+1 16: R0=2 R1_w=scalar(umax=8,var_off=(0x0; 0xf)) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R6=ctx(off=0,imm=0) R10=fp0 ; 16: (27) r2 *= 28 ; R2_w=scalar(umax=120259084260,var_off=(0x0; 0x1ffffffffc),s32_max=2147483644,u32_max=-4) 17: (18) r3 = 0xffffc900001ca118 ; R3_w=map_value(off=280,ks=4,vs=564,imm=0) 19: (0f) r3 += r2 ; R2_w=scalar(umax=120259084260,var_off=(0x0; 0x1ffffffffc),s32_max=2147483644,u32_max=-4) R3_w=map_value(off=280,ks=4,vs=564,umax=120259084260,var_off=(0x0; 0x1ffffffffc),s32_max=2147483644,u32_max=-4) 20: (61) r2 = *(u32 *)(r3 +0) R3 unbounded memory access, make sure to bounds check any such access
The unit test adjust-opt-speculative2.ll also shows how FoldTwoEntryPHINode
might hurt verifier.
The original code:
unsigned foo(); void *test(void *p) { unsigned ret = foo(); if (ret <= 7) p += ret; return p; }
Compiled with clang -target bpf -O2 -S t.c, with FoldTwoEntryPHINode enabled,
the following code is generated:
1: r6 = r1 2: call foo 3: r1 = r0 4: r1 <<= 32 5: r1 >>= 32 6: r2 = 8 7: if r2 > r1 goto LBB0_2 8: r0 = 0 LBB0_2: 9: r0 <<= 32 10: r0 >>= 32 11: r6 += r0 12: r0 = r6 13: exit
In the above example, insn 3 establishes r1 and r0 equivalence. Insns 4-7
establishes r1 < 8 if branch is taken. However, with branch taken, later
verifier is not able to ensure r0 < 8 and 'r6 += r0' may have a verificaiton
error.
With FoldTwoEntryPHINode disabled, the following code is generated:
1: r6 = r1 2: call foo 3: r0 <<= 32 4: r0 >>= 32 5: if r0 > 7 goto LBB0_2 6: r6 += r0 LBB0_2: 7: r0 = r6 8: exit
The 'r6 += r0' is safe as the verifier can deduce 'r0 <= 7' based on the branch.
MinMaxHoisting
Furthermore, recently we hit another issue related LICM
MinMaxHoisting transformation (https://reviews.llvm.org/D147078)
which also hurts verifier.
For bpf prog in linux/tools/testing/selftests/bpf/progs/loop6.c:
The original code:
for (i = 0; (i < VIRTIO_MAX_SGS) && (i < out_sgs); i++) { for (n = 0, sgp = get_sgp(sgs, i); sgp && (n < SG_MAX); sgp = __sg_next(sgp)) { bpf_probe_read_kernel(&len, sizeof(len), &sgp->length); length1 += len; n++; } }
After MinMaxHoisting,
upper = MIN(VIRTIO_MAX_SGS, out_sgs); for (i = 0; i < upper; i++) { for (n = 0, sgp = get_sgp(sgs, i); sgp && (n < SG_MAX); sgp = __sg_next(sgp)) { bpf_probe_read_kernel(&len, sizeof(len), &sgp->length); length1 += len; n++; } }
The verifier is not able to verify properly since it assumes the loop upper
is a arbitrary scalar (up to 32bit integer). The actual verification failure
looks like:
... 119: (15) if r1 == 0x0 goto pc+1 The sequence of 8193 jumps is too complex.
We have a draft to show bpf backend
implementation to undo the transformation (https://reviews.llvm.org/D147990).
Proposal
We feel adding proper TTI hooks is a better solution.
TTI provides a mechanism so backend can influence the
transformation. This also helps remove the associated bpf backend
pattern matching transformations.
This patch undos previous InstCombine/SimplifyCFG
transformation-preventing passes and adds one TTI hook
TTI->needsPreserveRangeInfoInVerification() such that
the above mentioned transformations can be disabled
by the target. The hook name needsPreserveRangeInfoInVerification()
implies that the transformation is disabled due to
later downstrean code verification.
Another possible solution is to legalize IR for verificaiton requirement.
This may require to add the verification requirement to IR, or
establish certain illegal code patterns, etc. This approach
requires more thought as downstream verification capability and
new code pattern verification failure is also a moving target.
You've left a lot of orphan CHECK-DISABLE in various files where you've removed the bpf-disable-serialize-icmp RUN