This is an archive of the discontinued LLVM Phabricator instance.

[TTI][BPF]: Undo specific transform-preventing passes and add one TTI hook
Needs ReviewPublic

Authored by yonghong-song on Apr 10 2023, 1:18 PM.

Details

Summary

LLVM optimization may generate certain codes which cannot be
handled by kernel verifier, e.g., some optimizations in
InstCombine and SimplifyCFG as bpf verifier is implemented
in kernel and it cannot perform complicated code analysis like
llvm compiler. Memory, verification speed and verifier complexity
are all part of considerations when adding new analysis
in verifier.

To avoid such harmful transformation, BPF backend has implemented
some passes, esp. through 'barrier' builtin function to prevent
certain InstCombine and SimplifyCFG transformations.
In these BPF backend passes pattern matching are used
to capture some specific patterns to prevent some
llvm transformations. But such pattern matching may not be precise
and may prevent some useful transformations. It would be great
if we can directly disable llvm transformations and this will
also avoid bpf specific transformation-preventing passes.
The following is 'git show --stat' to show that we can remove
lots of bpf hacking codes by adding one TTI hook.

llvm/include/llvm/Analysis/TargetTransformInfo.h        |   9 +++
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h    |   2 +
llvm/include/llvm/IR/IntrinsicsBPF.td                   |   3 -
llvm/include/llvm/Transforms/InstCombine/InstCombiner.h |   1 +
llvm/include/llvm/Transforms/Utils/LoopUtils.h          |   8 +-
llvm/lib/Analysis/TargetTransformInfo.cpp               |   4 +
llvm/lib/Target/BPF/BPF.h                               |   7 --
llvm/lib/Target/BPF/BPFAdjustOpt.cpp                    | 393 --------------------------------------------------------------------------------------------
llvm/lib/Target/BPF/BPFCheckAndAdjustIR.cpp             |  45 +----------
llvm/lib/Target/BPF/BPFTargetMachine.cpp                |   5 --
llvm/lib/Target/BPF/BPFTargetTransformInfo.h            |   3 +
llvm/lib/Target/BPF/CMakeLists.txt                      |   1 -
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp |   4 +
llvm/lib/Transforms/Scalar/LICM.cpp                     |  27 ++++---
llvm/lib/Transforms/Utils/SimplifyCFG.cpp               |   3 +
llvm/test/CodeGen/BPF/adjust-opt-icmp1.ll               |  14 +---
llvm/test/CodeGen/BPF/adjust-opt-icmp2.ll               |  10 +--
llvm/test/CodeGen/BPF/adjust-opt-icmp3.ll               |  12 +--
llvm/test/CodeGen/BPF/adjust-opt-icmp4.ll               |  12 +--
llvm/test/CodeGen/BPF/adjust-opt-speculative1.ll        |  17 +---
llvm/test/CodeGen/BPF/adjust-opt-speculative2.ll        |  22 +-----
21 files changed, 69 insertions(+), 533 deletions(-)

Below are detailed explanations for three transformations
which hurts bpf verification.

FoldAndOrOfICmpsUsingRanges

The following is an example to show how FoldAndOrOfICmpsUsingRanges
transformation may generate codes which hurts bpf verifier.
For bpf prog in linux/tools/testing/selftests/bpf/progs/map_kptr_fail.c:

...
id = ctx->protocol;
if (id < 4 || id > 12)
  return 0;
*(u64 *)((void *)v + id) = 0;
...

With FoldAndOrOfICmpsUsingRanges, the find pseudo code looks like:

...
id = ctx->protocol;
tmp = id;
tmp += -13;
if (tmp < 0xfffffff7) goto next;
v += id;
*v = 0;
next:

In the above code, the verifier considers 'id' in 'v += id' as a arbitrary
unsigned integer so later '*v = 0' is considered as possible out-of-bound
memory access. This is because the verifier, as a post analysis tool,
does not know the relationship of tmp/id at 'v += id' point. Although it
is possible to improve verifier to track tmp/id relationship, this would
increase bpf verifier complexity a lot. llvm FoldAndOrOfICmpsUsingRanges
does the transformation based on pattern matching and it certainly aware
tmp/id relationship.

The actual verification failure looks like below:

; id = ctx->protocol;
9: (61) r1 = *(u32 *)(r6 +16)         ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R6_w=ctx(off=0,imm=0)
; if (id < 4 || id > 12)
10: (bc) w2 = w1                      ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
11: (04) w2 += -13                    ; R2=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
12: (a6) if w2 < 0xfffffff7 goto pc+3         ; R2=scalar(umin=4294967287,umax=4294967295,var_off=(0xfffffff0; 0xf),s32_min=-9,s32_max=-1)
; *(u64 *)((void *)v + id) = 0;
13: (0f) r0 += r1                     ; R0_w=map_value(off=0,ks=4,vs=32,umax=4294967295,var_off=(0x0; 0xffffffff)) R1=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
14: (b7) r1 = 0                       ; R1_w=0
; *(u64 *)((void *)v + id) = 0;
15: (7b) *(u64 *)(r0 +0) = r1
R0 unbounded memory access, make sure to bounds check any such access

FoldTwoEntryPHINode

The following is an example to show FoldTwoEntryPHINode transformation
may generate codes which hurts bpf verifier.
For bpf prog in linux/tools/testing/selftests/bpf/progs/test_tc_dtime.c:

static void inc_errs(__u32 idx)
{
      if (test < 9)
              errs[test][idx]++;
      else
              errs[UKN_TEST][idx]++;
}
...
if (skb->tstamp == 0xb9fbeef)
  inc_errs(2);
...

With FoldTwoEntryPHINode, the final generated code looks like

...
r1 = test;
r2 = skb->tstamp;
if (r2 != 0xb9fbeef) goto next;
w2 = w1; // w1/w2 are lower 32 bit values of r1/r2.
if (w1 >= 9)
  w2 = 9;
tmp = r2 * 28;
r3 = errs + tmp;
... *r3 ...
...

next:

In the above code, for the case where 'w1 >= 9' is false, verifier
concludes that 'r2' at 'tmp = r2 * 28' as an arbitrary scalar which
caused verificaiton failure for later dereference of r3.

The actual verification failure looks like below:

8: (18) r1 = 0xffffc900001ca230       ; R1_w=map_value(off=560,ks=4,vs=564,imm=0)
10: (61) r1 = *(u32 *)(r1 +0)         ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
; if (skb->tstamp == EGRESS_ENDHOST_MAGIC)
11: (79) r2 = *(u64 *)(r6 +152)       ; R2_w=scalar() R6=ctx(off=0,imm=0)
; if (skb->tstamp == EGRESS_ENDHOST_MAGIC)
12: (55) if r2 != 0xb9fbeef goto pc+10        ; R2_w=195018479
13: (bc) w2 = w1                      ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
; if (test < __NR_TESTS)
14: (a6) if w1 < 0x9 goto pc+1 16: R0=2 R1_w=scalar(umax=8,var_off=(0x0; 0xf)) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R6=ctx(off=0,imm=0) R10=fp0
;
16: (27) r2 *= 28                     ; R2_w=scalar(umax=120259084260,var_off=(0x0; 0x1ffffffffc),s32_max=2147483644,u32_max=-4)
17: (18) r3 = 0xffffc900001ca118      ; R3_w=map_value(off=280,ks=4,vs=564,imm=0)
19: (0f) r3 += r2                     ; R2_w=scalar(umax=120259084260,var_off=(0x0; 0x1ffffffffc),s32_max=2147483644,u32_max=-4) R3_w=map_value(off=280,ks=4,vs=564,umax=120259084260,var_off=(0x0; 0x1ffffffffc),s32_max=2147483644,u32_max=-4)
20: (61) r2 = *(u32 *)(r3 +0)
R3 unbounded memory access, make sure to bounds check any such access

The unit test adjust-opt-speculative2.ll also shows how FoldTwoEntryPHINode
might hurt verifier.

The original code:

unsigned foo();
void *test(void *p) {
  unsigned ret = foo();
  if (ret <= 7)
    p += ret;
  return p;
}

Compiled with clang -target bpf -O2 -S t.c, with FoldTwoEntryPHINode enabled,
the following code is generated:

1:    r6 = r1
2:    call foo
3:    r1 = r0
4:    r1 <<= 32
5:    r1 >>= 32
6:    r2 = 8
7:    if r2 > r1 goto LBB0_2
8:    r0 = 0
   LBB0_2:
9:    r0 <<= 32
10:   r0 >>= 32
11:   r6 += r0
12:   r0 = r6
13:   exit

In the above example, insn 3 establishes r1 and r0 equivalence. Insns 4-7
establishes r1 < 8 if branch is taken. However, with branch taken, later
verifier is not able to ensure r0 < 8 and 'r6 += r0' may have a verificaiton
error.

With FoldTwoEntryPHINode disabled, the following code is generated:

1:    r6 = r1
2:    call foo
3:    r0 <<= 32
4:    r0 >>= 32
5:    if r0 > 7 goto LBB0_2
6:    r6 += r0
   LBB0_2:
7:    r0 = r6
8:    exit

The 'r6 += r0' is safe as the verifier can deduce 'r0 <= 7' based on the branch.

MinMaxHoisting

Furthermore, recently we hit another issue related LICM
MinMaxHoisting transformation (https://reviews.llvm.org/D147078)
which also hurts verifier.
For bpf prog in linux/tools/testing/selftests/bpf/progs/loop6.c:

The original code:

for (i = 0; (i < VIRTIO_MAX_SGS) && (i < out_sgs); i++) {
        for (n = 0, sgp = get_sgp(sgs, i); sgp && (n < SG_MAX);
             sgp = __sg_next(sgp)) {
                bpf_probe_read_kernel(&len, sizeof(len), &sgp->length);
                length1 += len;
                n++;
        }
}

After MinMaxHoisting,

upper = MIN(VIRTIO_MAX_SGS, out_sgs);
for (i = 0; i < upper; i++) {
        for (n = 0, sgp = get_sgp(sgs, i); sgp && (n < SG_MAX);
             sgp = __sg_next(sgp)) {
                bpf_probe_read_kernel(&len, sizeof(len), &sgp->length);
                length1 += len;
                n++;
        }
}

The verifier is not able to verify properly since it assumes the loop upper
is a arbitrary scalar (up to 32bit integer). The actual verification failure
looks like:

...
119: (15) if r1 == 0x0 goto pc+1
The sequence of 8193 jumps is too complex.

We have a draft to show bpf backend
implementation to undo the transformation (https://reviews.llvm.org/D147990).

Proposal

We feel adding proper TTI hooks is a better solution.
TTI provides a mechanism so backend can influence the
transformation. This also helps remove the associated bpf backend
pattern matching transformations.

This patch undos previous InstCombine/SimplifyCFG
transformation-preventing passes and adds one TTI hook
TTI->needsPreserveRangeInfoInVerification() such that
the above mentioned transformations can be disabled
by the target. The hook name needsPreserveRangeInfoInVerification()
implies that the transformation is disabled due to
later downstrean code verification.

Another possible solution is to legalize IR for verificaiton requirement.
This may require to add the verification requirement to IR, or
establish certain illegal code patterns, etc. This approach
requires more thought as downstream verification capability and
new code pattern verification failure is also a moving target.

Diff Detail

Event Timeline

yonghong-song created this revision.Apr 10 2023, 1:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2023, 1:18 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
yonghong-song requested review of this revision.Apr 10 2023, 1:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2023, 1:18 PM

The previous patch caused several bpf selftest failures. Remove two flags and changed a few tests so bpf selftests can pass.

yonghong-song retitled this revision from [RFC] BPF: Undo specific transform-preventing passes and add opt flags to [RFC][TTI][BPF]: Undo specific transform-preventing passes and add TTI hooks.
yonghong-song edited the summary of this revision. (Show Details)
  • Using TTI hooks instead of flags
yonghong-song edited the summary of this revision. (Show Details)
  • do not add TTI to InstCombinerImpl, use the existing one in InstCombiner
  • add a code example to illustrate the problem in Summary
yonghong-song retitled this revision from [RFC][TTI][BPF]: Undo specific transform-preventing passes and add TTI hooks to [TTI][BPF]: Undo specific transform-preventing passes and add TTI hooks.
yonghong-song edited the summary of this revision. (Show Details)
  • add TTI hook for LICM/HoistMinMax transformations.
  • add detailed code analysis in commit message.
yonghong-song edited the summary of this revision. (Show Details)Apr 16 2023, 9:30 AM
yonghong-song edited the summary of this revision. (Show Details)
yonghong-song edited the summary of this revision. (Show Details)Apr 16 2023, 4:08 PM

Ping again. @mkazantsev @nikic @chandlerc @lebedev.ri @spatel Could you help review the patch and share your opinion? Thanks!

In my opinion this new approach of using target hooks is way better than trying to amend the effect of certain passes by adding "counter passes", which is not only a moving target, but also fragile and IMO a waste of effort. We do have the same problem with the GCC BPF backend, and will be using a similar strategy to what Yonghong is proposing here.

RKSimon added inline comments.May 14 2023, 6:43 AM
llvm/test/CodeGen/BPF/adjust-opt-icmp1.ll
36–37

You've left a lot of orphan CHECK-DISABLE in various files where you've removed the bpf-disable-serialize-icmp RUN

nikic requested changes to this revision.May 14 2023, 7:32 AM

This has already been discussed in D147078. All the middle-end maintainers who chimed in on that one were opposed to the proposal, and I don't think their responses will be different this time. I don't really want to repeat that discussion one more time here, because it felt a bit like talking at a wall. If you want to pursue this further, I would suggest starting an RFC on Discourse, because it seems pretty clear that we don't have a consensus between BPF and middle-end maintainers here.

PS: I would recommend shifting your thinking about these undo transforms from "precisely undo what this pass did" to "legalize IR for BPF". The draft patch at D147990 is needlessly complex because it tries to precisely undo what LICM does (down to loop invariance checks!) instead of treating it as a generic legalization problem.

This revision now requires changes to proceed.May 14 2023, 7:32 AM

Can we have a middle ground here. Instead of having too many new TTI interfaces here, a single one can be used instead: TTI:needsPreserveRangeInfoInVerification(). If this returns true, some passes (or subpasses in instcombine) can be turned off for the target. Similar things are done in vectorizer.

Transformations that are guarded by this check also need to be checked case by case. If the verifier can be enhanced without too much compile time overhead, it should probably be done there -- it adds more benefit of allowing more flexibility in source patterns.

Legalization can be a longer term thing to to think about. Like undoing transformations (excluding pure canonicalization ones), it does add unnecessary compile time overhead.

@davidxl Thanks for your suggestions. Adding a single TTI->needsPreserveRangeInfoInVerification() sounds a reasonable idea. Legalization of IR for BPF verification is a great idea but it requires more thought, e.g., how IR will be enhanced to encode verification requirement and how middle end optimization reacts to such requirement as typically it is not clear whether a particular optimization will hurt bpf verification or not unless running though bpf verifier or having a deep knowledge of bpf verifier. We can continue to think and discuss 'legalization of IR for BPF verification' idea, but it could be great we can have current TTI->needsPreserveRangeInfoInVerification() approach which gcc community intends to do the same.

Also as you mentioned, bpf verifier is a moving target as well and we are constantly improving verifier as well. Yes, once we fixed verifier, after some times, old kernels are considered not important, we might undo some hooks in llvm.

I will post another version of the patch soon.

yonghong-song retitled this revision from [TTI][BPF]: Undo specific transform-preventing passes and add TTI hooks to [TTI][BPF]: Undo specific transform-preventing passes and add one TTI hook.
yonghong-song edited the summary of this revision. (Show Details)
yonghong-song added a reviewer: davidxl.
  • use one hook TTI->needsPreserveRangeInfoInVerification() instead of three hooks
  • remove unused checks in the tests

Considering the overall longer term maintenance cost to middle-end maintainers and the cost the BPF backend developers, this approach seems like the least intrusive method to me. Middle end maintainers (nikic@) need to chime in if there are more concerns on the approach to unblock the progress. The assumption is that more longer term solution will be explored further (e.g. legalization, or maintaining range info etc).

wenlei added a subscriber: wenlei.May 24 2023, 12:36 PM

@nikic Could you comment on @davidxl suggestion? I would be great if we can find a path forward for this particular issue. Thanks!

My two cents:

I think that in general honoring canonicalization is good, but it's not always a clear cut. Legality and profitability are sometimes relative, and in the case of BPF, I do think there's case to be made for mid-end to look at TTI more then what is traditionally allowed. The failure mode here is different from profitability, where you may just end up with inefficient code if you don't undo later; for BPF though, it can generate program that will be rejected by verifier (not run slower). Technically BPF backend can define what is legal for that target, i.e. one could argue that BPF target requires preserving predicates in certain form (needs to be well defined though), which would then restrict certain optimization from the mid-end.

Perhaps BPF is an uncharted territory in the sense that it's so restrictive that its requirements are often at odds with more canonicalizations, and that requiring backend to undo everything is bordering impractical. A special case grant for untraditional use of TTI may be reasonable here - we can make the reasons clear so it won't set bad precedence for others.

nikic added a comment.Jun 13 2023, 2:09 AM

I think that in general honoring canonicalization is good, but it's not always a clear cut. Legality and profitability are sometimes relative, and in the case of BPF, I do think there's case to be made for mid-end to look at TTI more then what is traditionally allowed. The failure mode here is different from profitability, where you may just end up with inefficient code if you don't undo later; for BPF though, it can generate program that will be rejected by verifier (not run slower). Technically BPF backend can define what is legal for that target, i.e. one could argue that BPF target requires preserving predicates in certain form (needs to be well defined though), which would then restrict certain optimization from the mid-end.

I am, generally speaking, not opposed to doing TTI-based legality checks in special circumstances. I have approved exceptions for doing so in the past. A recent one was using TTI in InstCombine to check whether a certain address space cast is legal for the target. Without TTI, we assume that all address space casts are illegal. This exception made sense to me, because addrspacecast semantics are fundamentally target-dependent, and the legality check is compatible with the LangRef semantics.

The case here is very different, because the legality conditions for BPF are not well-defined and not compatible with IR semantics. BPF considers transforms "illegal" that are clearly legal under our operational semantics, and, to the best of my knowledge, there is no principled way a maintainer could determine whether or not a given transform would be "legal" for BPF or not.

We have other targets with somewhat "unusual" legality requirements, where the target does not accept arbitrary inputs. These include things like wasm (which has verifier requirements) and gpu targets like amdgpu and nvptx (which also have verifier requirements, as well as convergence restrictions). These additional legality requirements are handled in one of two ways: Either the backend takes the responsibility of converting IR into a legal form (e.g. CFG structurization or removal of irreducible cycles) or the additional legality requirements are encoded into the IR, e.g. through the use special intrinsics, operand bundles and/or token values. This makes the legality constraint part of the normal IR semantics, and we can use our normal reasoning and tools to determine whether transforms are legal or not.

I maintain my position that the BPF target should be using either of those approaches (or a combination of them). Otherwise we will end up with checks for the BPF target littered over random places in the code base, with a new check being added every time a commit breaks the BPF verifier.

@nikic Thanks for your comment and pointer! I will study Webassembly/AMDGPU/NVPTX to see how they resolve their respective 'legality' issues and will report back once I got some understanding about their strategy and commonality/difference w.r.t. BPF.

llvm/lib/Target/BPF/BPF.h