This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/test/CodeGen/
-
test/
-
CodeGen/
-
bpf-context-access-marker-sinking-1.c
-
bpf-context-access-marker-sinking-2.c
-
llvm/
-
lib/Target/BPF/
-
Target/
-
BPF/
-
BPF.h
-
BPFAbstractMemberAccess.cpp
-
BPFCORE.h
-
BPFContextAccessMarkerPass.cpp
-
BPFTargetMachine.cpp
-
CMakeLists.txt
-
test/CodeGen/BPF/
-
CodeGen/
-
BPF/
-
context-access-marker.ll

Differential D131633

[BPF] BPFContextAccessMarkerPass added to avoid verifier unfriendly context access
AbandonedPublicDraft

Authored by eddyz87 on Aug 10 2022, 4:51 PM.

Download Raw Diff

Details

Reviewers: None

Summary

This commit adds a new BPF specific pass named
BPFContextAccessMarkerPass. The goal of this pass is to restraint
the SimplifyCFGPass optimization pass to avoid generation of Kernel
BPF verifier unfriendly context access pattern.

Context access means an access to the context register, R1, the first
parameter of a BPF program. This patch assumes that context parameters
are marked using the following attribute:

__attribute__((btf_decl_tag("ctx")))

Kernel BPF verifier only allows accesses of form BASE + static-offset
if BASE is the context register (or it's alias).

For access to function arguments, CLang frontend generates IR that is
translated in the conforming way. However the SimplifyCFGPass might
move some instructions and add intermediate values that would lead to
dynamic base offset computation. This is could be illustrated by the
next example:

struct bpf_sock { ... }
struct bpf_sockopt { ... }
extern int f(int x);

int _getsockopt(struct bpf_sockopt *ctx __attribute__((btf_decl_tag("ctx"))))
{
  unsigned g = 0;
  switch (ctx->level) {
  case 10:
    g = f(ctx->sk->family);
    break;
  case 20:
    g = f(ctx->optlen);
    break;
  }
  return g % 2;
}

The following (simplified) IR is generated for function _getsockopt:

define dso_local i32 @_getsockopt(ptr noundef %ctx)
  ...
sw.bb:
  %1 = load ptr, ptr %ctx                  ;; access
  %family = getelementptr                  ;; to ctx->sk->family
    inbounds %struct.bpf_sock, ptr %1      ;; (a)
  %2 = load i32, ptr %family               ;;
  %call = call i32 @f(i32 noundef %2)
  br label %sw.epilog

sw.bb1:
  %optlen = getelementptr                  ;; access
    inbounds %struct.bpf_sockopt, ptr %ctx ;; to ctx->optlen
  %3 = load i32, ptr %optlen               ;; (b)
  %call2 = call i32 @f(i32 noundef %3)
  br label %sw.epilog

sw.epilog:
  ...

W/o SimplifyCFGPass machine code for (a) and (b) looks as follows:

$r1 = LDW $r1, 4  ;; for ctx->sk->family
$r1 = LDW $r1, 12 ;; for ctx->optlen

Which is allowed by BPF verifier. However, if SimplifyCFGPass is
executed the code is transformed as follows:

  ...
sw.bb:
  %1 = load ptr, ptr %ctx
  %family = getelementptr inbounds %struct.bpf_sock, ptr %1
  br label %sw.epilog.sink.split

sw.bb1:
  %optlen = getelementptr inbounds %struct.bpf_sockopt, ptr %ctx
  br label %sw.epilog.sink.split

sw.epilog.sink.split:
  %optlen.sink = phi
    ptr [ %optlen, %sw.bb1 ], [ %family, %sw.bb ]
  %2 = load i32, ptr %optlen.sink                 ;; (c)
  %call2 = call fastcc i32 @f(i32 noundef %2)
  br label %sw.epilog

sw.epilog:
  ...

Note that load instructions (a) and (b) are replaced by a single load
instruction (c) that gets it's value from a PHI node. This is done by
a code sinking part of the SimplifyCFGPass. This leads to the
following machine code:

bb.2.sw.bb:
  $r1 = LDD $r1, 0
  $r1 = ADD_ri $r1, 4
  JMP %bb.4

bb.3.sw.bb1:
  $r1 = ADD_ri $r1, 12

bb.4.sw.epilog.sink.split:
  $r1 = LDW $r1, 0
  JAL @f

Here the offset is dynamically added to r1 (context register), this
access pattern is not allowed by BPF verifier.

To prevent the undesired code motion the BPFContextAccessMarkerPass
inserts a call to llvm.bpf.passthrough function after each load of a
value from location that aliases the context parameter.

llvm.bpf.passthrough accepts a unique integer constant as one of the
parameters, thus preventing the common code sinking after certain
position.

The IR from above is transformed as follows:

sw.bb:
  %2 = load ptr, ptr %ctx, align 8
  %family = getelementptr inbounds %struct.bpf_sock, ptr %2
  %3 = load i32, ptr %family, align 4
  %4 = call i32 @llvm.bpf.passthrough.i32.i32(i32 2, i32 %3)  ;; <-- added call
  %call = call i32 @f(i32 noundef %4)
  br label %sw.epilog

sw.bb1:
  %optlen = getelementptr inbounds %struct.bpf_sockopt, ptr %ctx
  %5 = load i32, ptr %optlen
  %6 = call i32 @llvm.bpf.passthrough.i32.i32(i32 3, i32 %5)  ;; <-- added call
  %call2 = call i32 @f(i32 noundef %6)
  br label %sw.epilog

sw.epilog:
  ...

This addresses the issue reported by the following thread:

https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

eddyz87 created this revision.Aug 10 2022, 4:51 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2022, 4:51 PM

Herald added subscribers: hiraditya, mgorny. · View Herald Transcript

Current implementation depends on the alias analysis to identify all
aliases of parameters marked with btf_decl_tag("ctx"). The calls to
llvm.bpf.passthrough are inserted after loads from these aliases.

However, during testing I identified that default alias analysis pass
is a bit too conservative for the purpose. For example the unnecessary
passthrough calls are inserted in the following cases:

#define __CTX__ __attribute__((btf_decl_tag("ctx")))

struct outer { ..., int last; };

extern struct outer *magic();

int no_marker_expected_2(struct outer *ctx __CTX__) {
  struct outer *x = magic();
  return x->last; // passthrough is added but not needed
}

int no_marker_expected_3(struct outer *ctx __CTX__, struct outer *non_ctx) {
  return non_ctx->last; // passthrough is added but not needed
}

I want to experiment with a custom algorithm instead of alias
analysis. As far as I understand only limited amount of instructions
has to be handled, all of them (except PHI) are present in the snippet
below:

define dso_local i32 @h(ptr noundef %i) #1 section "sockops" !dbg !15 {
entry:
  %i.addr = alloca ptr, align 8
  store ptr %i, ptr %i.addr, align 8, !tbaa !23
  %0 = load ptr, ptr %i.addr, align 8, !dbg !28, !tbaa !23
  %1 = load i64, ptr @"llvm.bpf_sock_ops:0:0$0:0", align 8
  %2 = bitcast ptr %0 to ptr
  %3 = getelementptr i8, ptr %2, i64 %1
  %4 = bitcast ptr %3 to ptr
  %5 = call ptr @llvm.bpf.passthrough.p0.p0(i32 1, ptr %4)

The custom alias analysis replacement algorithm might look as follows:

EC = EquivalenceClasses<Value*, ...>()
For each instruction in the function match one of the patterns:
- case store ptr %x, ptr %y -> EC.unionSets(%x, %y)
- case %x = load ptr, ptr %y -> EC.unionSets(%x, %y)
- case %x = bitcast %y -> EC.unionSets(%x, %y)
- case %x = getelementptr _, ptr %y, _ -> EC.unionSets(%x, %y)
- case %x = call ptr @llvm.bpf.passthrough.p0.p0(_, ptr %y) -> EC.unionSets(%x, %y)
- case %x = phi [_, %y], [_, %z] -> EC.unionSets(%x, %y), EC.unionSets(%x, %z)
Use the resulting equivalence classes instead of alias analysis results, proceed as in the current draft.

Harbormaster completed remote builds in B180563: Diff 451681.Aug 10 2022, 7:28 PM

Do we need alias analysis? The hook is in very early stage of optimization which has lots of stack operations. No sure whether alias analysis can do a good job or not. Also in typical bpf programs, ctx is used and not really aliased to other variables due to special rewrite requirement for ctx references. You can check bpf selftest and cilium bpf code (https://github.com/cilium/cilium.git, bpf directory) for some examples.

Also it is not needed to add the passthrough builtin after every ctx load. In ideal case, we only want to add passthrough builtin in cases where simplifyCFG may do "harmful" optimization. But it is possible that adding after every ctx load is okay from the final code point of view. Some analysis is needed here to ensure we don't cause potential performance degradation as simplifyCFG 'harmful' optimization is actually not that common.

0001-bpf-test-Added-attribte-btf_decl_tag-ctx-for-pointer.patch349 KBDownload

Patch for Linux Kernel BPF tests.

For testing purposes I've updated BPF tests in tools/testing/selftests/bpf/progs with btf_decl_tag("ctx") attribute. When tests are compiled with such modification I get the following statistics (program == BPF .o file):

363 programs are patched;
51 programs differ when compiled with and without btf_decl_tag("ctx");
13 programs increased in size
26 programs decreased in size
12 programs have the same size but some changes in code gen
total number of added instructions for increased programs is 215 insns
total number of removed instructions for decreased programs is 91 insns

I'm still looking through the disassembly diff to assess changes in the code generation.

When tests are executed with btf_decl_tag("ctx") attribute a single test fails. The test name is test_sk_lookup.c. Here is a part of this test:

int access_ctx_sk(struct bpf_sk_lookup *ctx __CTX__)
{
	struct bpf_sock *sk1 = NULL, *sk2 = NULL;
	int err, ret;

	ret = SK_DROP;

	/* Try accessing unassigned (NULL) ctx->sk field */
	if (ctx->sk && ctx->sk->family != AF_INET)
		goto out;

	/* Assign a value to ctx->sk */
	sk1 = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
	if (!sk1)
		goto out;
	err = bpf_sk_assign(ctx, sk1, 0);
	if (err)
		goto out;
	if (ctx->sk != sk1)
		goto out;

	/* Access ctx->sk fields */
	if (ctx->sk->family != AF_INET ||    // <--------- error here because of access to ->family
	    ctx->sk->type != SOCK_STREAM ||
	    ctx->sk->state != BPF_TCP_LISTEN)
		goto out;
        // ...
}

The error message from BPF verifier looks as follows:

libbpf: prog 'access_ctx_sk': BPF program load failed: Permission denied
libbpf: prog 'access_ctx_sk': -- BEGIN PROG LOAD LOG --
0: R1=ctx(off=0,imm=0) R10=fp0
; int access_ctx_sk(struct bpf_sk_lookup *ctx __CTX__)
0: (bf) r8 = r1                       ; R1=ctx(off=0,imm=0) R8_w=ctx(off=0,imm=0)
; if (ctx->sk && ctx->sk->family != AF_INET)
1: (79) r1 = *(u64 *)(r8 +0)          ; R1_w=sock_or_null(id=1,off=0,imm=0) R8_w=ctx(off=0,imm=0)
; if (ctx->sk && ctx->sk->family != AF_INET)
2: (15) if r1 == 0x0 goto pc+3        ; R1_w=sock(off=0,imm=0)
3: (b4) w7 = 0                        ; R7_w=0
; if (ctx->sk && ctx->sk->family != AF_INET)
4: (61) r1 = *(u32 *)(r1 +4)          ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
; if (ctx->sk && ctx->sk->family != AF_INET)
5: (56) if w1 != 0x2 goto pc+63       ; R1_w=2
; sk1 = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
6: (18) r1 = 0xffff8881037d4c00       ; R1_w=map_ptr(off=0,ks=4,vs=8,imm=0)
8: (18) r2 = 0xffffc90000322000       ; R2_w=map_value(off=0,ks=4,vs=200,imm=0)
10: (85) call bpf_map_lookup_elem#1   ; R0=map_value_or_null(id=3,off=0,ks=4,vs=8,imm=0) refs=3
11: (bf) r6 = r0                      ; R0=map_value_or_null(id=3,off=0,ks=4,vs=8,imm=0) R6_w=map_value_or_null(id=3,off=0,ks=4,vs=8,imm=0) refs=3
12: (b4) w7 = 0                       ; R7_w=0 refs=3
; if (!sk1)
13: (15) if r6 == 0x0 goto pc+55      ; R6_w=sock(ref_obj_id=3,off=0,imm=0) refs=3
; err = bpf_sk_assign(ctx, sk1, 0);
14: (bf) r1 = r8                      ; R1_w=ctx(off=0,imm=0) R8=ctx(off=0,imm=0) refs=3
15: (bf) r2 = r6                      ; R2_w=sock(ref_obj_id=3,off=0,imm=0) R6_w=sock(ref_obj_id=3,off=0,imm=0) refs=3
16: (b7) r3 = 0                       ; R3_w=0 refs=3
17: (85) call bpf_sk_assign#124       ; R0_w=scalar() refs=3
18: (b4) w7 = 0                       ; R7_w=0 refs=3
19: (bf) r9 = r6                      ; R6=sock(ref_obj_id=3,off=0,imm=0) R9=sock(ref_obj_id=3,off=0,imm=0) refs=3
; if (err)
20: (56) if w0 != 0x0 goto pc+46      ; R0=scalar(smax=9223372032559808512,umax=18446744069414584320,var_off=(0x0; 0xffffffff00000000),s32_min=0,s32_max=0,u32_max=0) refs=3
; if (ctx->sk != sk1)
21: (79) r1 = *(u64 *)(r8 +0)         ; R1_w=sock_or_null(id=4,off=0,imm=0) R8=ctx(off=0,imm=0) refs=3
22: (bf) r9 = r6                      ; R6=sock(ref_obj_id=3,off=0,imm=0) R9_w=sock(ref_obj_id=3,off=0,imm=0) refs=3
; if (ctx->sk != sk1)
23: (5d) if r1 != r6 goto pc+43       ; R1_w=sock_or_null(id=4,off=0,imm=0) R6=sock(ref_obj_id=3,off=0,imm=0) refs=3
; if (ctx->sk->family != AF_INET ||
24: (61) r2 = *(u32 *)(r1 +4)
R1 invalid mem access 'sock_or_null'
processed 23 insns (limit 1000000) max_states_per_insn 0 total_states 2 peak_states 2 mark_read 1
-- END PROG LOAD LOG --
libbpf: prog 'access_ctx_sk': failed to load: -13
libbpf: failed to load object 'test_sk_lookup'
libbpf: failed to load BPF skeleton 'test_sk_lookup': -13
test_sk_lookup:FAIL:skel open_and_load failed

The verifier failure is caused by the a small change in the code generation for ctx->sk->family access. Code generated without btf_decl_tag("ctx") looks as follows (r6 contains the result of call to bpf_map_lookup_elem, effectively sk1):

293:	5d 61 2b 00 00 00 00 00	if r1 != r6 goto +43 <LBB14_18>
294:	61 61 04 00 00 00 00 00	r1 = *(u32 *)(r6 + 4)

Code generated with btf_decl_tag("ctx") looks as follows (r6 has the same meaning):

293:	5d 61 2b 00 00 00 00 00	if r1 != r6 goto +43 <LBB14_18>
294:	61 12 04 00 00 00 00 00	r2 = *(u32 *)(r1 + 4)

Note r1 access instead of r6.

This change in code generation is caused by interference between GVNPass transformation and call to @llvm.bpf.passthrough.p0.p0. Without the ctx attribute IR before/after the GVNPass looks as follows:

;;; IR Dump Before GVNPass on access_ctx_sk

if.end:                                           ; preds = %land.lhs.true, %entry
  %call = tail call ptr inttoptr (i64 1 to ptr)
    (ptr noundef nonnull @redir_map, ptr noundef nonnull @KEY_SERVER_A) #8, !dbg !588
; ...
if.end7:                                          ; preds = %if.end3
  %3 = load ptr, ptr %ctx, align 8, !dbg !596, !tbaa !550
  %cmp8.not = icmp eq ptr %3, %call, !dbg !598
  br i1 %cmp8.not, label %if.end11, label %if.end59.thread103, !dbg !599

if.end11:                                         ; preds = %if.end7
  %family12 = getelementptr inbounds %struct.bpf_sock, ptr %3, i64 0, i32 1, !dbg !600
; ...

;;; IR Dump After GVNPass on access_ctx_sk

if.end7:                                          ; preds = %if.end3
  %3 = load ptr, ptr %ctx, align 8, !dbg !596, !tbaa !550
  %cmp8.not = icmp eq ptr %3, %call, !dbg !598
  br i1 %cmp8.not, label %if.end11, label %if.end59.thread103, !dbg !599

if.end11:                                         ; preds = %if.end7
  %family12 = getelementptr inbounds %struct.bpf_sock, ptr %call, i64 0, i32 1, !dbg !600
                                                           ^^^^^
                                                           %3 replaced with %call
; ...

With the ctx attribute IR the change does not happen and IR after GVNPass looks as follows:

if.end7:                                          ; preds = %if.end3
  %5 = load ptr, ptr %ctx, align 8, !dbg !596, !tbaa !550
  %6 = tail call ptr @llvm.bpf.passthrough.p0.p0(i32 16, ptr %5)
  %cmp8.not = icmp eq ptr %6, %call, !dbg !598
  br i1 %cmp8.not, label %if.end11, label %if.end59.thread101, !dbg !599

if.end11:                                         ; preds = %if.end7
  %family12 = getelementptr inbounds %struct.bpf_sock, ptr %5, i64 0, i32 1, !dbg !600
                                                           ^^
                                                           no replacement

This test depends on a particular behavior of the GVNPass transformation, namely replacement of access to ctx->sk by access to sk1 in the else branch of the check ctx->sk != sk1. This behavior is not guaranteed. Additionally the verifier does not propagate register equivalence information from equality checks. Thus I think that either test or verifier should be updated:

An additional NULL check could be added to the test to satisfy verifier
The verifier could be updated to propagate register equivalence information from equality checks. At the first glance this seem to be an easy modification, verifier already has some logic to process comparisons with scalar 0, it could be extended with registers comparison.

The same verifier failure could be demonstrated by the following kernel test case:

{
	"sk_fullsock(skb->sk): sk->type [fullsock field], indirect null check",
	.insns = {
	/* This is equivalent to the following program:
	 *
	 *   r6 = skb->sk;
	 *   r7 = sk_fullsock(r6);
	 *   r0 = sk_fullsock(r6);
	 *   if (r0 == 0) return 0;    (a)
	 *   if (r0 != r7) return 0;   (b)
	 *   *r7->type;                (c)
	 *   return 0;
	 *
	 * It is safe to dereference r7 at point (c), because of (a) and (b).
	 * The test passes if relation r0 == r7 is propagated from (b) to (c).
         */

	/* read skb->sk, check null */
	BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, offsetof(struct __sk_buff, sk)),
	BPF_JMP_IMM(BPF_JNE, BPF_REG_6, 0, 2),
	BPF_MOV64_IMM(BPF_REG_0, 0),
	BPF_EXIT_INSN(),
	/* r7 = sk_fullsock(skb) */
	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
	BPF_EMIT_CALL(BPF_FUNC_sk_fullsock),
	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
	/* r0 = sk_fullsock(skb) */
	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
	BPF_EMIT_CALL(BPF_FUNC_sk_fullsock),
	/* if r0 == null then exit */
	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
	BPF_MOV64_IMM(BPF_REG_0, 0),
	BPF_EXIT_INSN(),
	/* if r0 != r7 then exit else read r7->type, use BPF_JNE on purpose */
	BPF_JMP_REG(BPF_JNE, BPF_REG_0, BPF_REG_7, 1),
	BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_7, offsetof(struct bpf_sock, type)),
	/* return 0 */
	BPF_MOV64_IMM(BPF_REG_0, 0),
	BPF_EXIT_INSN(),
	},
	.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
	.result = ACCEPT,
}

Currently this test case fails to pass verifier with the same error message.

Rewrite of the BPFContextAccessMarkerPass pass to use simple
usage tracking instead of alias analysis.

With this update the pass looks for access chains that read fields
of context parameters and puts a call to llvm.bpf.passthrough at
the last load for each chain.

Access chains are recognized as a sequence of interdependent load,
getelementptr, bitcast instructions and passthrough calls.

Access chains start from a load of a known store location
of the context parameter.

E.g. this is an access chain:

;; %i metadata contains btf_decl_tag("ctx") annotation
define dso_local i32 @h(ptr noundef %i)
  %i.addr = alloca ptr, align 8
  store ptr %i, ptr %i.addr, align 8  ;; %i.addr store location is recorded
  %0 = load ptr, ptr %i.addr, align 8                      ;; chain starts
  %1 = load i64, ptr @"llvm.bpf_sock_ops:0:0$0:0"
  %2 = bitcast ptr %0 to ptr                               ;; chain continues
  %3 = getelementptr i8, ptr %2, i64 %1                    ;; chain continues
  %4 = bitcast ptr %3 to ptr                               ;; chain continues
  %5 = call ptr @llvm.bpf.passthrough.p0.p0(i32 1, ptr %4) ;; chain continues
  %6 = load i64, ptr %5, align 8                           ;; last chain load

Updated statistics for kernel BPF tests looks as follows:

47 programs differ when compiled with and without btf_decl_tag("ctx");
17 programs increased in size
18 programs decreased in size
12 programs have the same size but some changes in code gen
total number of added instructions for increased programs is 85 insns
total number of removed instructions for decreased programs is 45 insns

Harbormaster completed remote builds in B181348: Diff 452764.Aug 15 2022, 12:38 PM

Hi Younghong,

I've tried to move the transformation to the later point in the pipeline to avoid dealing with stack load / store operations. This point has to be somewhere inside PassBuilder::buildFunctionSimplificationPipeline. At the first glance it has to be placed with the following constraints:

after SROAPass (removes unnecessary load, store and alloca);
after EarlyCSEPass (removes unnecessary bitcast's, a soft constraint);
before SimplifyCFG with sinking option enabled (the very end of the pipeline).

For testing purposes I used PeepholeEPCallbacks.

However, I found out that InstCombinePass can do some unwanted code movement as well. Thus the new registration point has to be between SROAPass and InstCombinePass. Unfortunately, there are no extension points in this range.

Here is the example transformed by InstCombinePass:

int one_marker_in_each_branch(struct outer *ctx __CTX__) {
  if (ctx->first)
    return ctx->butlast;
  else
    return ctx->last;
}

Here it is before InstCombinePass:

...
if.then:
  %butlast = getelementptr inbounds %struct.outer, ptr %ctx, i32 0, i32 3
  %1 = load i32, ptr %butlast, align 8
  br label %return

if.else:
  %last = getelementptr inbounds %struct.outer, ptr %ctx, i32 0, i32 4
  %2 = load i32, ptr %last, align 4
  br label %return

return:
  %retval.0 = phi i32 [ %1, %if.then ], [ %2, %if.else ]
  ret i32 %retval.0

And here it is after InstCombinePass:

...
if.then:
  %butlast = getelementptr inbounds %struct.outer, ptr %ctx, i64 0, i32 3
  br label %return

if.else:
  %last = getelementptr inbounds %struct.outer, ptr %ctx, i64 0, i32 4
  br label %return

return:
  %retval.0.in = phi ptr [ %butlast, %if.then ], [ %last, %if.else ]
  %retval.0 = load i32, ptr %retval.0.in, align 4 ;; <---------------- load moved
  ret i32 %retval.0

Note the sunk load.

The transformation itself is quite trivial and removal of stack handling does not have much impact on the overall code size:

llvm/lib/Target/BPF/BPFContextAccessMarkerPass.cpp | 27 +++++++++++----------------
llvm/lib/Target/BPF/BPFTargetMachine.cpp           |  4 ++--
2 files changed, 13 insertions(+), 18 deletions(-)

Thus, I don't think that current registration point (registerPipelineStartEPCallback) should be changed.

I'll continue work on the remaining action points.

Hi Yonghong,

As discussed I've updated SimplifyCFG and InstCombine passes to obtain a baseline for amount of changes in the modified BPF test cases. I attach the patch to this revision for archeological purposes.

I get the following statistics for the baseline:

Same   : 355 
Differ : 8 
insn count inc : 2 
insn count dec : 1 
# of programs with more insn : 1
# of programs with less insn : 1
# of programs with diff insn but same size : 6

bpf_iter_bpf_map.o                39 vs 38
bpf_iter_ksym.o                  150 vs 152
bpf_iter_task_btf.o               37 vs 37
fexit_bpf2bpf.o                  128 vs 128
pyperf600_bpf_loop.o            1073 vs 1073
sockopt_inherit.o                 61 vs 61
test_core_reloc_type_id.o         44 vs 44
test_tcpnotify_kern.o             57 vs 57

While exemining the reasons for difference between the baseline and marker trasnformation I identified interference with GVN and CSE passes as a main reason. To mitigate it I've implemented two adjustements to the algorithm:

access chains are only marked with passthrough if access chain length is more than 1 (should have been done in the first version...);
passthrough sequence numbers are reused for identical access chains.

After these modifications I get the following stats:

Same   : 343
Differ : 20
insn count inc : 16
insn count dec : 16
# of programs with more insn : 1
# of programs with less insn : 15
# of programs with diff insn but same size : 4

bind4_prog.o                          244 vs 244
bind6_prog.o                          306 vs 306
bpf_iter_bpf_map.o                     39 vs 38
bpf_iter_netlink.o                    101 vs 100
bpf_iter_task_btf.o                    37 vs 36
bpf_iter_task_file.o                   56 vs 55
bpf_iter_task.o                       135 vs 134
bpf_iter_tcp4.o                       401 vs 400
bpf_iter_tcp6.o                       457 vs 456
bpf_iter_test_kern4.o                  61 vs 60
bpf_iter_udp4.o                       100 vs 99
bpf_iter_udp6.o                       119 vs 118
connect4_prog.o                       343 vs 342
lsm.o                                 157 vs 155
sendmsg6_prog.o                        66 vs 65
tcp_ca_write_sk_pacing.o               32 vs 32
test_core_reloc_type_id.o              44 vs 44
test_sk_lookup.o                      708 vs 724   # <-- the only increase
test_sockmap_invalid_update.o          13 vs 12
xdping_kern.o                         193 vs 192

The reason for size increase in test_sk_lookup is that a sequence of transformations GVNPass, InstCombinePass and SimplifyCFGPass can remove several if branches when passthrough calls are not inserted.

I want to polish the implementation a bit before committing, will update the revision tomorrow.

I've spent some time today trying to figure out what to do about test_sk_lookup, will continue tomorrow as well.

0001-BPF-Baseline-for-BPFContextAccessMarkerPass.patch50 KBDownload

As noted in the previous comment, while exemining the reasons for difference between the baseline and marker trasnformation interference with GVN and CSE passes was identified as a main reason for size increase of the test programs. This version includes two updates:

access chains are only marked with passthrough if access chain length is more than 1 (should have been done in the first version...);
passthrough sequence numbers are reused for identical access chains.

After these changes the stats for changes in the test programs are as follows:

Same   : 345
Differ : 18
insn count inc : 6
insn count dec : 14
# of programs with more insn : 1
# of programs with less insn : 13
# of programs with diff insn but same size : 4

bind4_prog.o                  244 vs 244
bind6_prog.o                  306 vs 306
bpf_iter_netlink.o            101 vs 100
bpf_iter_task_btf.o            37 vs 36
bpf_iter_task_file.o           56 vs 55
bpf_iter_task.o               135 vs 134
bpf_iter_tcp4.o               401 vs 400
bpf_iter_tcp6.o               457 vs 456
bpf_iter_test_kern4.o          61 vs 60
bpf_iter_udp4.o               100 vs 99
bpf_iter_udp6.o               119 vs 118
connect4_prog.o               343 vs 342
lsm.o differs                 157 vs 155
tcp_ca_write_sk_pacing.o       32 vs 32
test_core_reloc_type_id.o      44 vs 44
test_sk_lookup.o              708 vs 714
test_sockmap_invalid_update.o  13 vs 12
xdping_kern.o differs         193 vs 192

The only program that increased in size is test_sk_lookup, a few branches in this program can't be removed when __CTX__ is defined due unique sequence numbers in passthrough calls limiting the GVN pass.

The reason for size _decrease_ of the programs is the following pattern:

; without __CTX__
  %0 = load i64, ptr @"llvm.bpf_sock_ops:0:184$0:35:0"
  %1 = getelementptr i8, ptr %skops, i64 %0
  %2 = call ptr @llvm.bpf.passthrough.p0.p0(i32 0, ptr %1) ;; (a)
  %3 = load ptr, ptr %2                                    ;; (b)
; with __CTX__
  %0 = load i64, ptr @"llvm.bpf_sock_ops:0:184$0:35:0"
  %1 = getelementptr i8, ptr %skops, i64 %0
  %2 = load ptr, ptr %1                                    ;; (b)
  %3 = call ptr @llvm.bpf.passthrough.p0.p0(i32 2, ptr %2) ;; (a)

The BPFContextAccessMarkerPass pushes down the passthrough call relative to the load instruction. This enables EarlyCSE pass to reuse the result of load.

Kernel BPF test cases are passing, except for the access_ctx_sk as described in https://reviews.llvm.org/D131633#3722231 .

Harbormaster completed remote builds in B182120: Diff 453840.Aug 18 2022, 7:58 PM

eddyz87 abandoned this revision.Sep 28 2022, 6:11 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 28 2022, 6:11 AM

Herald added subscribers: llvm-commits, cfe-commits. · View Herald Transcript

Revision Contents

Path

Size

clang/

test/

CodeGen/

bpf-context-access-marker-sinking-1.c

55 lines

bpf-context-access-marker-sinking-2.c

55 lines

llvm/

lib/

Target/

BPF/

BPF.h

9 lines

BPFAbstractMemberAccess.cpp

19 lines

BPFCORE.h

9 lines

BPFContextAccessMarkerPass.cpp

487 lines

BPFTargetMachine.cpp

3 lines

CMakeLists.txt

1 line

test/

CodeGen/

BPF/

context-access-marker.ll

594 lines

Diff 453840

clang/test/CodeGen/bpf-context-access-marker-sinking-1.c

This file was added.

				// REQUIRES: bpf-registered-target
				// RUN: %clang_cc1 %s -O1 -triple bpf -emit-llvm \
				// RUN: -debug-info-kind=limited -o - \| FileCheck %s

				// This test is related to the issue described in the following thread:
				//
				// https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6
				//
				// Verifies that calls to bpf.llvm.passthrough are placed after loads
				// from parameter marked with __attribute__((btf_decl_tag("ctx"))).
				//
				// See lib/Target/BPF/BPFContextAccessMarkerPass.cpp for additional details.

				#define __ctx__ __attribute__((btf_decl_tag("ctx")))

				struct bpf_sock {
				int bound_dev_if;
				int family;
				};

				struct bpf_sockopt {
				struct bpf_sock *sk;
				int level;
				int optlen;
				};

				__attribute__((noinline))
				static int f(int x) {
				return x + 1;
				}

				__attribute__((section("cgroup/getsockopt")))
				int _getsockopt(struct bpf_sockopt *ctx __ctx__)
				{
				unsigned g = 0;
				switch (ctx->level) {
				// CHECK: %level = getelementptr inbounds %struct.bpf_sockopt, ptr %ctx
				// CHECK: [[r0:%[0-9]+]] = load i32, ptr %level
				// CHECK: [[r1:%[0-9]+]] = tail call i32 @llvm.bpf.passthrough.i32.i32({{.*}}, i32 [[r0]])
				case 10:
				g = f(ctx->sk->family);
				break;
				// CHECK: [[r2:%[0-9]+]] = load ptr, ptr %ctx
				// CHECK: %family = getelementptr inbounds %struct.bpf_sock, ptr [[r2]]
				// CHECK: [[r3:%[0-9]+]] = load i32, ptr %family
				// CHECK: [[r4:%[0-9]+]] = tail call i32 @llvm.bpf.passthrough.i32.i32({{.*}}, i32 [[r3]])
				case 20:
				g = f(ctx->optlen);
				break;
				// CHECK: %optlen = getelementptr inbounds %struct.bpf_sockopt, ptr %ctx
				// CHECK: [[r5:%[0-9]+]] = load i32, ptr %optlen
				// CHECK: [[r6:%[0-9]+]] = tail call i32 @llvm.bpf.passthrough.i32.i32({{.*}}, i32 [[r5]])
				}
				return g % 2;
				}

clang/test/CodeGen/bpf-context-access-marker-sinking-2.c

This file was added.

				// REQUIRES: bpf-registered-target
				// RUN: %clang_cc1 %s -O1 -triple bpf -emit-llvm \
				// RUN: -debug-info-kind=limited -o - \| FileCheck %s

				// This test is related to the issue described in the following thread:
				//
				// https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6
				//
				// Verifies that calls to bpf.llvm.passthrough are placed after loads
				// from parameter marked with __attribute__((btf_decl_tag("ctx"))) and
				// do not interfere with the __attribute__((preserve_access_index)).
				//
				// For additional details refer to:
				// - lib/Target/BPF/BPFContextAccessMarkerPass.cpp
				// - lib/Target/BPF/BPFAbstractMemberAccess.cpp

				#define __reloc__ __attribute__((preserve_access_index))
				#define __ctx__ __attribute__((btf_decl_tag("ctx")))

				struct bpf_sock_ops {
				int op;
				int bpf_sock_ops_cb_flags;
				} __reloc__;

				__attribute__((noinline))
				static int f(int x) {
				return x + 1;
				}

				int g;

				__attribute__((section("sockops")))
				int h(struct bpf_sock_ops *i __ctx__) {
				switch (i->op) {
				// CHECK: [[r0:%[0-9]+]] = load i64, ptr @"llvm.bpf_sock_ops:0:0$0:0"
				// CHECK: [[r1:%[0-9]+]] = getelementptr i8, ptr %i, i64 [[r0]]
				// CHECK: [[r2:%[0-9]+]] = load i32, ptr [[r1]]
				// CHECK: [[r3:%[0-9]+]] = tail call i32 @llvm.bpf.passthrough.i32.i32({{.*}}, i32 [[r2]])
				case 10:
				g = f(i->bpf_sock_ops_cb_flags);
				break;
				// CHECK [[r4:%[0-9]+]] = load i64, ptr @"llvm.bpf_sock_ops:0:4$0:1"
				// CHECK [[r5:%[0-9]+]] = getelementptr i8, ptr %i, i64 [[r4]]
				// CHECK [[r6:%[0-9]+]] = load i32, ptr [[r5]]
				// CHECK [[r7:%[0-9]+]] = tail call i32 @llvm.bpf.passthrough.i32.i32({{.*}}, i32 [[r6]])
				case 20:
				g = f(i->bpf_sock_ops_cb_flags);
				break;
				// CHECK [[r8:%[0-9]+]] = load i64, ptr @"llvm.bpf_sock_ops:0:4$0:1"
				// CHECK [[r9:%[0-9]+]] = getelementptr i8, ptr %i, i64 [[r8]]
				// CHECK [[r10:%[0-9]+]] = load i32, ptr [[r9]]
				// CHECK [[r11:%[0-9]+]] = tail call i32 @llvm.bpf.passthrough.i32.i32({{.*}}, i32 [[r10]])
				}
				return 0;
				}

llvm/lib/Target/BPF/BPF.h

Show All 24 Lines
FunctionPass *createBPFPreserveDIType();		FunctionPass *createBPFPreserveDIType();
FunctionPass *createBPFIRPeephole();		FunctionPass *createBPFIRPeephole();
FunctionPass *createBPFISelDag(BPFTargetMachine &TM);		FunctionPass *createBPFISelDag(BPFTargetMachine &TM);
FunctionPass *createBPFMISimplifyPatchablePass();		FunctionPass *createBPFMISimplifyPatchablePass();
FunctionPass *createBPFMIPeepholePass();		FunctionPass *createBPFMIPeepholePass();
FunctionPass *createBPFMIPeepholeTruncElimPass();		FunctionPass *createBPFMIPeepholeTruncElimPass();
FunctionPass *createBPFMIPreEmitPeepholePass();		FunctionPass *createBPFMIPreEmitPeepholePass();
FunctionPass *createBPFMIPreEmitCheckingPass();		FunctionPass *createBPFMIPreEmitCheckingPass();
		FunctionPass *createBPFContextAccessMarkerPass();

void initializeBPFAdjustOptPass(PassRegistry&);		void initializeBPFAdjustOptPass(PassRegistry&);
void initializeBPFCheckAndAdjustIRPass(PassRegistry&);		void initializeBPFCheckAndAdjustIRPass(PassRegistry&);

void initializeBPFAbstractMemberAccessLegacyPassPass(PassRegistry &);		void initializeBPFAbstractMemberAccessLegacyPassPass(PassRegistry &);
void initializeBPFPreserveDITypePass(PassRegistry&);		void initializeBPFPreserveDITypePass(PassRegistry&);
void initializeBPFIRPeepholePass(PassRegistry&);		void initializeBPFIRPeepholePass(PassRegistry&);
void initializeBPFMISimplifyPatchablePass(PassRegistry&);		void initializeBPFMISimplifyPatchablePass(PassRegistry&);
void initializeBPFMIPeepholePass(PassRegistry&);		void initializeBPFMIPeepholePass(PassRegistry&);
void initializeBPFMIPeepholeTruncElimPass(PassRegistry&);		void initializeBPFMIPeepholeTruncElimPass(PassRegistry&);
void initializeBPFMIPreEmitPeepholePass(PassRegistry&);		void initializeBPFMIPreEmitPeepholePass(PassRegistry&);
void initializeBPFMIPreEmitCheckingPass(PassRegistry&);		void initializeBPFMIPreEmitCheckingPass(PassRegistry&);
		void initializeBPFContextAccessMarkerLegacyPassPass(PassRegistry &);

class BPFAbstractMemberAccessPass		class BPFAbstractMemberAccessPass
: public PassInfoMixin<BPFAbstractMemberAccessPass> {		: public PassInfoMixin<BPFAbstractMemberAccessPass> {
BPFTargetMachine *TM;		BPFTargetMachine *TM;

public:		public:
BPFAbstractMemberAccessPass(BPFTargetMachine *TM) : TM(TM) {}		BPFAbstractMemberAccessPass(BPFTargetMachine *TM) : TM(TM) {}
PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);		PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
Show All 14 Lines	public:

static bool isRequired() { return true; }		static bool isRequired() { return true; }
};		};

class BPFAdjustOptPass : public PassInfoMixin<BPFAdjustOptPass> {		class BPFAdjustOptPass : public PassInfoMixin<BPFAdjustOptPass> {
public:		public:
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);		PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
};		};

		class BPFContextAccessMarkerPass
		: public PassInfoMixin<BPFContextAccessMarkerPass> {
		public:
		PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
		};

} // namespace llvm		} // namespace llvm

#endif		#endif

llvm/lib/Target/BPF/BPFAbstractMemberAccess.cpp

	Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
	#define DEBUG_TYPE "bpf-abstract-member-access"			#define DEBUG_TYPE "bpf-abstract-member-access"

	namespace llvm {			namespace llvm {
	constexpr StringRef BPFCoreSharedInfo::AmaAttr;			constexpr StringRef BPFCoreSharedInfo::AmaAttr;
	uint32_t BPFCoreSharedInfo::SeqNum;			uint32_t BPFCoreSharedInfo::SeqNum;

	Instruction BPFCoreSharedInfo::insertPassThrough(Module M, BasicBlock *BB,			Instruction BPFCoreSharedInfo::insertPassThrough(Module M, BasicBlock *BB,
	Instruction *Input,			Instruction *Input,
	Instruction *Before) {			BasicBlock::InstListType::iterator Where,
				uint32_t SeqNum) {
	Function *Fn = Intrinsic::getDeclaration(			Function *Fn = Intrinsic::getDeclaration(
	M, Intrinsic::bpf_passthrough, {Input->getType(), Input->getType()});			M, Intrinsic::bpf_passthrough, {Input->getType(), Input->getType()});
	Constant *SeqNumVal = ConstantInt::get(Type::getInt32Ty(BB->getContext()),			Constant *SeqNumVal = ConstantInt::get(Type::getInt32Ty(BB->getContext()),
	BPFCoreSharedInfo::SeqNum++);			SeqNum);

	auto *NewInst = CallInst::Create(Fn, {SeqNumVal, Input});			auto *NewInst = CallInst::Create(Fn, {SeqNumVal, Input});
	BB->getInstList().insert(Before->getIterator(), NewInst);			BB->getInstList().insert(Where, NewInst);
	return NewInst;			return NewInst;
	}			}

				Instruction BPFCoreSharedInfo::insertPassThrough(Module M, BasicBlock *BB,
				Instruction *Input,
				BasicBlock::InstListType::iterator Where) {
				return insertPassThrough(M, BB, Input, Where, BPFCoreSharedInfo::SeqNum++);
				}

				Instruction BPFCoreSharedInfo::insertPassThrough(Module M, BasicBlock *BB,
				Instruction *Input,
				Instruction *Before) {
				return insertPassThrough(M, BB, Input, Before->getIterator());
				}
	} // namespace llvm			} // namespace llvm

	using namespace llvm;			using namespace llvm;

	namespace {			namespace {
	class BPFAbstractMemberAccess final {			class BPFAbstractMemberAccess final {
	public:			public:
	BPFAbstractMemberAccess(BPFTargetMachine *TM) : TM(TM) {}			BPFAbstractMemberAccess(BPFTargetMachine *TM) : TM(TM) {}
	▲ Show 20 Lines • Show All 1,100 Lines • Show Last 20 Lines

llvm/lib/Target/BPF/BPFCORE.h

//===- BPFCORE.h - Common info for Compile-Once Run-EveryWhere -- C++ --===//		//===- BPFCORE.h - Common info for Compile-Once Run-EveryWhere -- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_LIB_TARGET_BPF_BPFCORE_H		#ifndef LLVM_LIB_TARGET_BPF_BPFCORE_H
#define LLVM_LIB_TARGET_BPF_BPFCORE_H		#define LLVM_LIB_TARGET_BPF_BPFCORE_H

#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
		#include "llvm/IR/BasicBlock.h"

namespace llvm {		namespace llvm {

class BasicBlock;		class BasicBlock;
class Instruction;		class Instruction;
class Module;		class Module;

class BPFCoreSharedInfo {		class BPFCoreSharedInfo {
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	public:

/// llvm.bpf.passthrough builtin seq number		/// llvm.bpf.passthrough builtin seq number
static uint32_t SeqNum;		static uint32_t SeqNum;

/// Insert a bpf passthrough builtin function.		/// Insert a bpf passthrough builtin function.
static Instruction insertPassThrough(Module M, BasicBlock *BB,		static Instruction insertPassThrough(Module M, BasicBlock *BB,
Instruction *Input,		Instruction *Input,
Instruction *Before);		Instruction *Before);

		static Instruction *
		insertPassThrough(Module M, BasicBlock BB, Instruction *Input,
		BasicBlock::InstListType::iterator Where);

		static Instruction *
		insertPassThrough(Module M, BasicBlock BB, Instruction *Input,
		BasicBlock::InstListType::iterator Where, uint32_t SeqNum);
};		};

} // namespace llvm		} // namespace llvm

#endif		#endif

llvm/lib/Target/BPF/BPFContextAccessMarkerPass.cpp

This file was added.

				//===------ BPFContextAccessMarkerPass.cpp --------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// BPF verifier limits access patterns allowed for BPF program
				// parameter passed in a context register (r1).
				// Only BASE + static-offset memory accesses are allowed.
				//
				// The goal of the BPFContextAccessMarkerPass is to ensure that
				// SimplifyCFGPass optimization pass will not generate the code that
				// uses unsupported access patterns for context parameter.
				//
				// The following code is used as a running example:
				//
				// #define __ctx__ __attribute__((btf_decl_tag("ctx")))
				//
				// struct bpf_sock {
				// int bound_dev_if;
				// int family;
				// };
				//
				// struct bpf_sockopt {
				// struct bpf_sock *sk;
				// int level;
				// int optlen;
				// };
				//
				// __attribute__((noinline))
				// static int f(int x) { ... }
				//
				// __attribute__((section("cgroup/getsockopt")))
				// int _getsockopt(struct bpf_sockopt *ctx __ctx__)
				// {
				// unsigned g = 0;
				// switch (ctx->level) {
				// case 10:
				// g = f(ctx->sk->family);
				// break;
				// case 20:
				// g = f(ctx->optlen);
				// break;
				// }
				// return g % 2;
				// }
				//
				// Here the attribute btf_decl_tag("ctx") marks a context parameter.
				// The initial (simplified) IR for function _getsockopt looks as follows:
				//
				// define dso_local i32 @_getsockopt(ptr noundef %ctx)
				// ...
				// sw.bb:
				// %1 = load ptr, ptr %ctx ;;
				// %family = getelementptr inbounds %struct.bpf_sock, ptr %1 ;; access to ctx->sk->family
				// %2 = load i32, ptr %family ;; (a)
				// %call = call i32 @f(i32 noundef %2)
				// br label %sw.epilog
				//
				// sw.bb1:
				// %optlen = getelementptr inbounds %struct.bpf_sockopt, ptr %ctx ;; access to ctx->optlen
				// %3 = load i32, ptr %optlen ;; (b)
				// %call2 = call i32 @f(i32 noundef %3)
				// br label %sw.epilog
				//
				// sw.epilog:
				// ...
				//
				// W/o additional code motion machine code for field accesses would
				// looks as follows:
				//
				// ...
				// $r1 = LDW $r1, 4 ;; for ctx->sk->family
				// ...
				// $r1 = LDW $r1, 12 ;; for ctx->optlen
				//
				// Which matches the pattern allowed by BPF verifier.
				//
				// However, SimplifyCFGPass may rewrite the above IR separating
				// getelementptr and load instructions as shown below:
				//
				// ...
				// sw.bb:
				// %1 = load ptr, ptr %ctx
				// %family = getelementptr inbounds %struct.bpf_sock, ptr %1
				// br label %sw.epilog.sink.split
				//
				// sw.bb1:
				// %optlen = getelementptr inbounds %struct.bpf_sockopt, ptr %ctx
				// br label %sw.epilog.sink.split
				//
				// sw.epilog.sink.split:
				// %optlen.sink = phi ptr [ %optlen, %sw.bb1 ], [ %family, %sw.bb ]
				// %2 = load i32, ptr %optlen.sink ;; (c)
				// %call2 = call fastcc i32 @f(i32 noundef %2)
				// br label %sw.epilog
				//
				// sw.epilog:
				// ...
				//
				// Note that load instructions (a) and (b) are replaced by a single
				// load instruction (c) that gets it's value from a PHI node. The two
				// calls to @f are also replaced by a single call that uses result of
				// (c). This is done by a code sinking part of the
				// SimplifyCFGPass. This leads to the following machine code:
				//
				// bb.2.sw.bb:
				// $r1 = LDD $r1, 0
				// $r1 = ADD_ri $r1, 4
				// JMP %bb.4
				//
				// bb.3.sw.bb1:
				// $r1 = ADD_ri $r1, 12
				//
				// bb.4.sw.epilog.sink.split:
				// $r1 = LDW $r1, 0
				// JAL @f
				//
				// Here the offset is dynamically added to r1 (context register), this
				// access pattern is not allowed by BPF verifier.
				//
				// To prevent the undesired code motion by SimplifyCFGPass the
				// BPFContextAccessMarkerPass inserts a call to llvm.bpf.passthrough
				// function after last load in each context access chain (e.g. for
				// a->b.c->d the passthrough call is inserted after load of d).
				// llvm.bpf.passthrough accepts a unique integer constant as one of
				// the parameters, thus preventing the common code sinking after
				// certain position.
				//
				// E.g. the IR from above is transformed as follows:
				//
				// sw.bb:
				// %2 = load ptr, ptr %ctx, align 8
				// %family = getelementptr inbounds %struct.bpf_sock, ptr %2
				// %3 = load i32, ptr %family, align 4
				// %4 = call i32 @llvm.bpf.passthrough.i32.i32(i32 2, i32 %3) ;; <-- added call
				// %call = call i32 @f(i32 noundef %4)
				// br label %sw.epilog
				//
				// sw.bb1:
				// %optlen = getelementptr inbounds %struct.bpf_sockopt, ptr %ctx
				// %5 = load i32, ptr %optlen
				// %6 = call i32 @llvm.bpf.passthrough.i32.i32(i32 3, i32 %5) ;; <-- added call
				// %call2 = call i32 @f(i32 noundef %6)
				// br label %sw.epilog
				//
				// sw.epilog:
				// ...

				#include "BPF.h"
				#include "BPFCORE.h"
				#include "llvm/ADT/None.h"
				#include "llvm/ADT/Optional.h"
				#include "llvm/ADT/SmallPtrSet.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/Analysis/AliasAnalysis.h"
				#include "llvm/IR/DebugInfoMetadata.h"
				#include "llvm/IR/InstIterator.h"
				#include "llvm/IR/Instruction.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/Metadata.h"
				#include "llvm/IR/PassManager.h"
				#include "llvm/InitializePasses.h"
				#include "llvm/Pass.h"
				#include <map>
				#include <set>

				#define DEBUG_TYPE "bpf-context-access-marker"

				using namespace llvm;

				static Metadata findMetadata(Metadata Root,
				llvm::function_ref<bool(Metadata *)> Fn) {
				if (!Root)
				return nullptr;
				if (Fn(Root))
				return Root;
				if (auto *Node = dyn_cast<MDNode>(Root))
				for (auto &Operand : Node->operands())
				if (auto *Result = findMetadata(Operand.get(), Fn))
				return Result;
				return nullptr;
				}

				static bool hasContextTag(Metadata *Root) {
				return findMetadata(Root, [](Metadata *MD) {
				auto *Node = dyn_cast<MDNode>(MD);
				if (!Node \|\| Node->operands().size() != 2)
				return false;
				auto *Name = dyn_cast<MDString>(Node->getOperand(0));
				auto *Val = dyn_cast<MDString>(Node->getOperand(1));
				return Name
				&& Val
				&& Name->getString().equals("btf_decl_tag")
				&& Val->getString().equals("ctx");
				});
				}

				using CtxParamsSet = SmallPtrSet<Value*, 4>;

				static CtxParamsSet collectContextParams(Function &F) {
				CtxParamsSet CtxParams;

				if (DISubprogram *SP = F.getSubprogram()) {
				for (const DINode *DN : SP->getRetainedNodes()) {
				if (const auto *DV = dyn_cast<DILocalVariable>(DN)) {
				if (DV->isParameter() && hasContextTag(DV->getRawAnnotations())) {
				Value *Arg = F.getArg(DV->getArg() - 1);
				if (Arg->getType()->isPointerTy())
				CtxParams.insert(Arg);
				}
				}
				}
				}

				return CtxParams;
				}

				static bool isPassThroughCall(Value *I) {
				auto *Call = dyn_cast<CallInst>(I);
				if (!Call)
				return false;
				auto *Func = Call->getCalledFunction();
				if (!Func)
				return false;
				return Func->getName().startswith("llvm.bpf.passthrough");
				}

				template <typename T>
				static bool contains(const std::set<T> &Set, const T &Value) {
				return Set.find(Value) != Set.end();
				}

				namespace {

				using ChainFingerprint = SmallVector<uint64_t>;

				} // Anonymous namespace

				// ChainFingerprint is a unique identifier for the access chain. It
				// is used as an std::map key to reuse sequence numbers passed in
				// llvm.bpf.passthrough calls. Some chains have no fingerprint and
				// sequence numbers are not reused for these chains.
				//
				// Fingerprint is computed recursively starting from the last
				// instruction in a chain and moving backwards to the context
				// parameter access.
				//
				// This backwards movement is supported for the following
				// instructions:
				// - load
				// - getelementptr with constant indexes
				// - bitcast
				//
				// If chain contains any other instruction the chain is considered to
				// not have a fingerprint and false is returned.
				static bool collectFingerprintElements(Value *ChainEnd,
				ChainFingerprint &Elements,
				CtxParamsSet &CtxStorePtrs) {
				auto Recur = [&](Value *Next) {
				return collectFingerprintElements(Next, Elements, CtxStorePtrs);
				};

				if (auto *Load = dyn_cast<LoadInst>(ChainEnd)) {
				if (CtxStorePtrs.contains(Load->getPointerOperand())) {
				Elements.push_back(Load->getOpcode());
				Elements.push_back((uint64_t) Load->getPointerOperand());
				return true;
				}

				Elements.push_back(Load->getOpcode());
				Elements.push_back(0);
				return Recur(Load->getPointerOperand());
				}

				if (auto *GetElementPtr = dyn_cast<GetElementPtrInst>(ChainEnd)) {
				Elements.push_back(GetElementPtr->getOpcode());
				Elements.push_back(GetElementPtr->getNumIndices());
				for (unsigned int I = 1; I <= GetElementPtr->getNumIndices(); ++I) {
				auto *Index = dyn_cast<ConstantInt>(GetElementPtr->getOperand(I));
				if (!Index)
				return false;

				Elements.push_back(Index->getZExtValue());
				}
				return Recur(GetElementPtr->getPointerOperand());
				}

				if (auto *BitCast = dyn_cast<BitCastInst>(ChainEnd))
				return Recur(BitCast->getOperand(0));

				return false;
				}

				static Optional<ChainFingerprint>
				computeChainFingerprint(Value *ChainEnd, CtxParamsSet &CtxStorePtrs) {
				ChainFingerprint Elements;
				if (collectFingerprintElements(ChainEnd, Elements, CtxStorePtrs))
				return Optional(Elements);
				return None;
				}

				// Inserts calls to llvm.bpf.passthrough after specified
				// instructions. Each instruction is considered to be a context access
				// chain end. Reuses sequence number parameters passed to passthrough
				// calls for chains with identical fingerprint (see above). Sequence
				// number re-usage allows CSE and GVN passes to remove redundant
				// computation of identical access chains.
				static void insertPassThroughCalls(Function &F,
				std::set<Instruction *> &After,
				CtxParamsSet &CtxStorePtrs) {
				auto *Module = F.getParent();
				std::map<ChainFingerprint, uint32_t> CachedSeqNums;
				for (auto *Insn : After) {
				auto *BB = Insn->getParent();
				auto OptFingerprint = computeChainFingerprint(Insn, CtxStorePtrs);
				Instruction *LoadPassTrhoughInst;
				if (OptFingerprint.has_value()) {
				auto Fingerprint = OptFingerprint.value();
				auto SeqNum = CachedSeqNums.find(Fingerprint);
				if (SeqNum == CachedSeqNums.end()) {
				LoadPassTrhoughInst =
				BPFCoreSharedInfo::insertPassThrough(Module, BB, Insn, ++Insn->getIterator());
				auto *CurNum = dyn_cast<ConstantInt>(LoadPassTrhoughInst->getOperand(0));
				assert(CurNum && "llvm.bpf.passthrough parameter expected to be ConstantInt");
				CachedSeqNums[Fingerprint] = CurNum->getZExtValue();
				} else {
				LoadPassTrhoughInst =
				BPFCoreSharedInfo::insertPassThrough(Module, BB, Insn, ++Insn->getIterator(),
				SeqNum->second);
				}
				} else {
				LoadPassTrhoughInst =
				BPFCoreSharedInfo::insertPassThrough(Module, BB, Insn, ++Insn->getIterator());
				}
				Insn->replaceUsesWithIf(LoadPassTrhoughInst, [&](Use &OtherU) {
				return LoadPassTrhoughInst != OtherU.getUser();
				});
				}
				}

				static bool trackContextAccessChain(Instruction *FirstInsn,
				Instruction *Insn,
				std::set<Instruction *> &ChainEnds,
				std::set<Instruction *> &PassThroughCalls) {
				bool IsLoad = dyn_cast<LoadInst>(Insn);
				bool IsPassThrough = isPassThroughCall(Insn);
				bool IsChainInsn =
				IsLoad \|\|
				IsPassThrough \|\|
				dyn_cast<BitCastInst>(Insn) \|\|
				dyn_cast<GetElementPtrInst>(Insn);

				bool EndAdded = false;
				if (IsChainInsn)
				for (auto &Use : Insn->uses())
				if (auto *UseInsn = dyn_cast<Instruction>(Use.getUser()))
				EndAdded \|= trackContextAccessChain(FirstInsn, UseInsn, ChainEnds, PassThroughCalls);

				if (EndAdded) {
				if (IsPassThrough)
				PassThroughCalls.insert(Insn);
				return true;
				}

				if (IsLoad && Insn != FirstInsn) {
				ChainEnds.insert(Insn);
				return true;
				}

				return false;
				}

				void collectContextAccessChainEnds(Function &F,
				CtxParamsSet &CtxStorePtrs,
				std::set<Instruction *> &ChainEnds,
				std::set<Instruction *> &PassThroughCalls) {
				for (auto *StorePtr : CtxStorePtrs)
				for (auto &StorePtrUse : StorePtr->uses())
				if (auto *Load = dyn_cast<LoadInst>(StorePtrUse.getUser()))
				trackContextAccessChain(Load, Load, ChainEnds, PassThroughCalls);
				}

				static CtxParamsSet collectContextStorePtrs(Function &F,
				CtxParamsSet &CtxParams) {
				CtxParamsSet ContextStorePtrs;
				for (auto &Insn : instructions(F))
				if (auto *Store = dyn_cast<StoreInst>(&Insn))
				if (CtxParams.contains(Store->getOperand(0)))
				ContextStorePtrs.insert(Store->getOperand(1));
				return ContextStorePtrs;
				}

				static void removePassThroughCalls(std::set<Instruction *> &Calls) {
				for (auto *Call : Calls) {
				auto *Input = Call->getOperand(1);
				Call->replaceAllUsesWith(Input);
				Call->eraseFromParent();
				}
				}

				// Looks for access chains that read fields of context parameters,
				// puts a call to llvm.bpf.passthrough at the last load for each
				// chain. Access chains are recognized as a sequence of
				// interdependent load, getelementptr, bitcast instructions and
				// passthrough calls. Access chains start from a load of a known store
				// location of the context parameter.
				//
				// E.g. this is an access chain:
				//
				// ;; %i metadata contains btf_decl_tag("ctx") annotation
				// define dso_local i32 @h(ptr noundef %i)
				// %i.addr = alloca ptr, align 8
				// store ptr %i, ptr %i.addr, align 8 ;; %i.addr store location is recorded
				// %0 = load ptr, ptr %i.addr, align 8 ;; chain starts
				// %1 = load i64, ptr @"llvm.bpf_sock_ops:0:0$0:0"
				// %2 = bitcast ptr %0 to ptr ;; chain continues
				// %3 = getelementptr i8, ptr %2, i64 %1 ;; chain continues
				// %4 = bitcast ptr %3 to ptr ;; chain continues
				// %5 = call ptr @llvm.bpf.passthrough.p0.p0(i32 1, ptr %4) ;; chain continues
				// %6 = load i64, ptr %5, align 8 ;; last chain load
				static bool insertContextAccessMarkers(Function &F) {
				LLVM_DEBUG(dbgs() << "******** Context Access Markers **********\n");

				auto *Module = F.getParent();
				if (!Module)
				return false;

				// The btf_decl_tag attribute is saved in debug info
				if (Module->debug_compile_units().empty())
				return false;

				auto CtxParams = collectContextParams(F);
				LLVM_DEBUG(dbgs()
				<< "There are " << CtxParams.size() << " context parameters\n");
				if (CtxParams.empty())
				return false;

				std::set<Instruction *> CtxAccessChainEnds;
				std::set<Instruction *> PassThroughCallsToRemove;
				auto CtxStorePtrs = collectContextStorePtrs(F, CtxParams);
				collectContextAccessChainEnds(F,
				CtxStorePtrs,
				CtxAccessChainEnds,
				PassThroughCallsToRemove);
				LLVM_DEBUG(dbgs()
				<< "Modifying " << CtxAccessChainEnds.size() << " access chains\n");
				insertPassThroughCalls(F, CtxAccessChainEnds, CtxStorePtrs);
				removePassThroughCalls(PassThroughCallsToRemove);
				return !CtxAccessChainEnds.empty();
				}

				namespace {

				class BPFContextAccessMarkerLegacyPass final : public FunctionPass {
				public:
				static char ID;

				BPFContextAccessMarkerLegacyPass() : FunctionPass(ID) {}

				bool runOnFunction(Function &F) override {
				return insertContextAccessMarkers(F);
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<AAResultsWrapperPass>();
				}
				};

				} // End anonymous namespace

				char BPFContextAccessMarkerLegacyPass::ID = 0;
				INITIALIZE_PASS(BPFContextAccessMarkerLegacyPass, DEBUG_TYPE,
				"BPF Context Access Marker", false, false)

				FunctionPass *llvm::createBPFContextAccessMarkerPass() {
				return new BPFContextAccessMarkerLegacyPass();
				}

				PreservedAnalyses
				llvm::BPFContextAccessMarkerPass::run(Function &F, FunctionAnalysisManager &AM) {
				return insertContextAccessMarkers(F)
				? PreservedAnalyses::none()
				: PreservedAnalyses::all();
				}

llvm/lib/Target/BPF/BPFTargetMachine.cpp

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeBPFTarget() {
PassRegistry &PR = *PassRegistry::getPassRegistry();		PassRegistry &PR = *PassRegistry::getPassRegistry();
initializeBPFAbstractMemberAccessLegacyPassPass(PR);		initializeBPFAbstractMemberAccessLegacyPassPass(PR);
initializeBPFPreserveDITypePass(PR);		initializeBPFPreserveDITypePass(PR);
initializeBPFIRPeepholePass(PR);		initializeBPFIRPeepholePass(PR);
initializeBPFAdjustOptPass(PR);		initializeBPFAdjustOptPass(PR);
initializeBPFCheckAndAdjustIRPass(PR);		initializeBPFCheckAndAdjustIRPass(PR);
initializeBPFMIPeepholePass(PR);		initializeBPFMIPeepholePass(PR);
initializeBPFMIPeepholeTruncElimPass(PR);		initializeBPFMIPeepholeTruncElimPass(PR);
		initializeBPFContextAccessMarkerLegacyPassPass(PR);
}		}

// DataLayout: little or big endian		// DataLayout: little or big endian
static std::string computeDataLayout(const Triple &TT) {		static std::string computeDataLayout(const Triple &TT) {
if (TT.getArch() == Triple::bpfeb)		if (TT.getArch() == Triple::bpfeb)
return "E-m:e-p:64:64-i64:64-i128:128-n32:64-S128";		return "E-m:e-p:64:64-i64:64-i128:128-n32:64-S128";
else		else
return "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128";		return "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128";
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
}		}

void BPFTargetMachine::adjustPassManager(PassManagerBuilder &Builder) {		void BPFTargetMachine::adjustPassManager(PassManagerBuilder &Builder) {
Builder.addExtension(		Builder.addExtension(
PassManagerBuilder::EP_EarlyAsPossible,		PassManagerBuilder::EP_EarlyAsPossible,
[&](const PassManagerBuilder &, legacy::PassManagerBase &PM) {		[&](const PassManagerBuilder &, legacy::PassManagerBase &PM) {
PM.add(createBPFAbstractMemberAccess(this));		PM.add(createBPFAbstractMemberAccess(this));
PM.add(createBPFPreserveDIType());		PM.add(createBPFPreserveDIType());
		PM.add(createBPFContextAccessMarkerPass());
PM.add(createBPFIRPeephole());		PM.add(createBPFIRPeephole());
});		});

Builder.addExtension(		Builder.addExtension(
PassManagerBuilder::EP_Peephole,		PassManagerBuilder::EP_Peephole,
[&](const PassManagerBuilder &, legacy::PassManagerBase &PM) {		[&](const PassManagerBuilder &, legacy::PassManagerBase &PM) {
PM.add(createCFGSimplificationPass(		PM.add(createCFGSimplificationPass(
SimplifyCFGOptions().hoistCommonInsts(true)));		SimplifyCFGOptions().hoistCommonInsts(true)));
});		});
Builder.addExtension(		Builder.addExtension(
PassManagerBuilder::EP_ModuleOptimizerEarly,		PassManagerBuilder::EP_ModuleOptimizerEarly,
[&](const PassManagerBuilder &, legacy::PassManagerBase &PM) {		[&](const PassManagerBuilder &, legacy::PassManagerBase &PM) {
PM.add(createBPFAdjustOpt());		PM.add(createBPFAdjustOpt());
});		});
}		}

void BPFTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) {		void BPFTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) {
PB.registerPipelineStartEPCallback(		PB.registerPipelineStartEPCallback(
[=](ModulePassManager &MPM, OptimizationLevel) {		[=](ModulePassManager &MPM, OptimizationLevel) {
FunctionPassManager FPM;		FunctionPassManager FPM;
FPM.addPass(BPFAbstractMemberAccessPass(this));		FPM.addPass(BPFAbstractMemberAccessPass(this));
FPM.addPass(BPFPreserveDITypePass());		FPM.addPass(BPFPreserveDITypePass());
		FPM.addPass(BPFContextAccessMarkerPass());
FPM.addPass(BPFIRPeepholePass());		FPM.addPass(BPFIRPeepholePass());
MPM.addPass(createModuleToFunctionPassAdaptor(std::move(FPM)));		MPM.addPass(createModuleToFunctionPassAdaptor(std::move(FPM)));
});		});
PB.registerPeepholeEPCallback([=](FunctionPassManager &FPM,		PB.registerPeepholeEPCallback([=](FunctionPassManager &FPM,
OptimizationLevel Level) {		OptimizationLevel Level) {
FPM.addPass(SimplifyCFGPass(SimplifyCFGOptions().hoistCommonInsts(true)));		FPM.addPass(SimplifyCFGPass(SimplifyCFGOptions().hoistCommonInsts(true)));
});		});
PB.registerPipelineEarlySimplificationEPCallback(		PB.registerPipelineEarlySimplificationEPCallback(
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/lib/Target/BPF/CMakeLists.txt

	Show All 10 Lines
	tablegen(LLVM BPFGenMCCodeEmitter.inc -gen-emitter)			tablegen(LLVM BPFGenMCCodeEmitter.inc -gen-emitter)
	tablegen(LLVM BPFGenRegisterInfo.inc -gen-register-info)			tablegen(LLVM BPFGenRegisterInfo.inc -gen-register-info)
	tablegen(LLVM BPFGenSubtargetInfo.inc -gen-subtarget)			tablegen(LLVM BPFGenSubtargetInfo.inc -gen-subtarget)

	add_public_tablegen_target(BPFCommonTableGen)			add_public_tablegen_target(BPFCommonTableGen)

	add_llvm_target(BPFCodeGen			add_llvm_target(BPFCodeGen
	BPFAbstractMemberAccess.cpp			BPFAbstractMemberAccess.cpp
				BPFContextAccessMarkerPass.cpp
	BPFAdjustOpt.cpp			BPFAdjustOpt.cpp
	BPFAsmPrinter.cpp			BPFAsmPrinter.cpp
	BPFCheckAndAdjustIR.cpp			BPFCheckAndAdjustIR.cpp
	BPFFrameLowering.cpp			BPFFrameLowering.cpp
	BPFInstrInfo.cpp			BPFInstrInfo.cpp
	BPFIRPeephole.cpp			BPFIRPeephole.cpp
	BPFISelDAGToDAG.cpp			BPFISelDAGToDAG.cpp
	BPFISelLowering.cpp			BPFISelLowering.cpp
	Show All 34 Lines

llvm/test/CodeGen/BPF/context-access-marker.ll

This file was added.

				; RUN: opt -O2 -S %s \| FileCheck %s
				;
				; This test verifies that llvm.bpf.passthrough calls are inserted by
				; BPFContextAccessMarkerPass after access to the fields of function
				; parameters marked with __attribute__((btf_decl_tag("ctx"))).
				;
				; Compile command:
				; clang -g -target bpf -O2 -Xclang -disable-llvm-passes -S -emit-llvm t.c -o t.ll
				;
				; Source:
				;
				; #define __CTX__ __attribute__((btf_decl_tag("ctx")))

				; struct inner {
				; int a;
				; int b;
				; };

				; struct outer {
				; int first;
				; struct inner inner_inline;
				; struct inner *inner_ptr;
				; int butlast;
				; int last;
				; };

				; int marker_expected(struct outer *ctx __CTX__) {
				; return ctx->last;
				; }

				; int no_marker_expected_1(struct outer *non_ctx) {
				; return non_ctx->last;
				; }

				; extern struct outer *magic();

				; int no_marker_expected_2(struct outer *ctx __CTX__) {
				; struct outer *x = magic();
				; return x->last;
				; }

				; int no_marker_expected_3(struct outer ctx __CTX__, struct outer non_ctx) {
				; return non_ctx->last;
				; }

				; int one_marker_expected_1(struct outer *ctx __CTX__) {
				; return ctx->inner_inline.b;
				; }

				; int one_marker_expected_2(struct outer *ctx __CTX__) {
				; return ctx->inner_ptr->b;
				; }

				; int one_marker_in_each_branch(struct outer *ctx __CTX__) {
				; if (ctx->first) {
				; return ctx->butlast;
				; } else {
				; return ctx->last;
				; }
				; }

				; extern int magic2(int x);

				; int do_not_sink(struct outer *ctx __CTX__) {
				; if (ctx->first)
				; return magic2(ctx->butlast);
				; else
				; return magic2(ctx->last);
				; }

				; struct ctx2 { int a; int b; int c; };

				; int reuse_sequence_number(struct ctx2 *ctx __CTX__) {
				; if (ctx->b)
				; return magic2(ctx->c);
				; else
				; return magic2(ctx->c);
				; }

				; struct ctx3 { int a; int b; int c; } __attribute__((preserve_access_index));

				; int dont_reuse_sequence_number(struct ctx3 *ctx __CTX__) {
				; if (ctx->b)
				; return magic2(ctx->c);
				; else
				; return magic2(ctx->c);
				; }

				; ModuleID = 't.c'
				source_filename = "t.c"
				target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
				target triple = "bpf"

				%struct.outer = type { i32, %struct.inner, ptr, i32, i32 }
				%struct.inner = type { i32, i32 }
				%struct.ctx2 = type { i32, i32, i32 }
				%struct.ctx3 = type { i32, i32, i32 }

				; Function Attrs: nounwind
				define dso_local i32 @marker_expected(ptr noundef %ctx) #0 !dbg !7 {
				entry:
				%ctx.addr = alloca ptr, align 8
				store ptr %ctx, ptr %ctx.addr, align 8, !tbaa !29
				call void @llvm.dbg.declare(metadata ptr %ctx.addr, metadata !26, metadata !DIExpression()), !dbg !33
				%0 = load ptr, ptr %ctx.addr, align 8, !dbg !34, !tbaa !29
				%last = getelementptr inbounds %struct.outer, ptr %0, i32 0, i32 4, !dbg !35
				%1 = load i32, ptr %last, align 4, !dbg !35, !tbaa !36
				ret i32 %1, !dbg !40
				; CHECK: [[r0:%[0-9]+]] = load i32, ptr %last
				; CHECK-NEXT: [[r1:%[0-9]+]] = tail call i32 @llvm.bpf.passthrough.i32.i32({{.*}}, i32 [[r0]])
				; CHECK-NEXT: ret i32 [[r1]]
				}

				; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
				declare void @llvm.dbg.declare(metadata, metadata, metadata) #1

				; Function Attrs: nounwind
				define dso_local i32 @no_marker_expected_1(ptr noundef %non_ctx) #0 !dbg !41 {
				entry:
				%non_ctx.addr = alloca ptr, align 8
				store ptr %non_ctx, ptr %non_ctx.addr, align 8, !tbaa !29
				call void @llvm.dbg.declare(metadata ptr %non_ctx.addr, metadata !43, metadata !DIExpression()), !dbg !44
				%0 = load ptr, ptr %non_ctx.addr, align 8, !dbg !45, !tbaa !29
				%last = getelementptr inbounds %struct.outer, ptr %0, i32 0, i32 4, !dbg !46
				%1 = load i32, ptr %last, align 4, !dbg !46, !tbaa !36
				ret i32 %1, !dbg !47
				; CHECK: %last = getelementptr inbounds %struct.outer, ptr %non_ctx, i64 0, i32 4
				; CHECK-NEXT: [[r0:%[0-9]+]] = load i32, ptr %last
				; CHECK-NEXT: ret i32 [[r0]]
				}

				; Function Attrs: nounwind
				define dso_local i32 @no_marker_expected_2(ptr noundef %ctx) #0 !dbg !48 {
				entry:
				%ctx.addr = alloca ptr, align 8
				%x = alloca ptr, align 8
				store ptr %ctx, ptr %ctx.addr, align 8, !tbaa !29
				call void @llvm.dbg.declare(metadata ptr %ctx.addr, metadata !50, metadata !DIExpression()), !dbg !52
				call void @llvm.lifetime.start.p0(i64 8, ptr %x) #5, !dbg !53
				call void @llvm.dbg.declare(metadata ptr %x, metadata !51, metadata !DIExpression()), !dbg !54
				%call = call ptr @magic(), !dbg !55
				store ptr %call, ptr %x, align 8, !dbg !54, !tbaa !29
				%0 = load ptr, ptr %x, align 8, !dbg !56, !tbaa !29
				%last = getelementptr inbounds %struct.outer, ptr %0, i32 0, i32 4, !dbg !57
				%1 = load i32, ptr %last, align 4, !dbg !57, !tbaa !36
				call void @llvm.lifetime.end.p0(i64 8, ptr %x) #5, !dbg !58
				ret i32 %1, !dbg !59
				; CHECK: %call = tail call ptr @magic()
				; CHECK: %last = getelementptr inbounds %struct.outer, ptr %call, i64 0, i32 4
				; CHECK-NEXT: [[r0:%[0-9]+]] = load i32, ptr %last
				; CHECK-NEXT: ret i32 [[r0]]
				}

				; Function Attrs: argmemonly nocallback nofree nosync nounwind willreturn
				declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #2

				declare dso_local ptr @magic(...) #3

				; Function Attrs: argmemonly nocallback nofree nosync nounwind willreturn
				declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #2

				; Function Attrs: nounwind
				define dso_local i32 @no_marker_expected_3(ptr noundef %ctx, ptr noundef %non_ctx) #0 !dbg !60 {
				entry:
				%ctx.addr = alloca ptr, align 8
				%non_ctx.addr = alloca ptr, align 8
				store ptr %ctx, ptr %ctx.addr, align 8, !tbaa !29
				call void @llvm.dbg.declare(metadata ptr %ctx.addr, metadata !64, metadata !DIExpression()), !dbg !66
				store ptr %non_ctx, ptr %non_ctx.addr, align 8, !tbaa !29
				call void @llvm.dbg.declare(metadata ptr %non_ctx.addr, metadata !65, metadata !DIExpression()), !dbg !67
				%0 = load ptr, ptr %non_ctx.addr, align 8, !dbg !68, !tbaa !29
				%last = getelementptr inbounds %struct.outer, ptr %0, i32 0, i32 4, !dbg !69
				%1 = load i32, ptr %last, align 4, !dbg !69, !tbaa !36
				ret i32 %1, !dbg !70
				; CHECK: %last = getelementptr inbounds %struct.outer, ptr %non_ctx, i64 0, i32 4
				; CHECK-NEXT: [[r0]] = load i32, ptr %last
				; CHECK-NEXT: ret i32 [[r0]]
				}

				; Function Attrs: nounwind
				define dso_local i32 @one_marker_expected_1(ptr noundef %ctx) #0 !dbg !71 {
				entry:
				%ctx.addr = alloca ptr, align 8
				store ptr %ctx, ptr %ctx.addr, align 8, !tbaa !29
				call void @llvm.dbg.declare(metadata ptr %ctx.addr, metadata !73, metadata !DIExpression()), !dbg !74
				%0 = load ptr, ptr %ctx.addr, align 8, !dbg !75, !tbaa !29
				%inner_inline = getelementptr inbounds %struct.outer, ptr %0, i32 0, i32 1, !dbg !76
				%b = getelementptr inbounds %struct.inner, ptr %inner_inline, i32 0, i32 1, !dbg !77
				%1 = load i32, ptr %b, align 4, !dbg !77, !tbaa !78
				ret i32 %1, !dbg !79
				; CHECK: %b = getelementptr inbounds %struct.outer, ptr %ctx, i64 0, i32 1, i32 1
				; CHECK-NEXT: [[r0:%[0-9]+]] = load i32, ptr %b
				; CHECK-NEXT: [[r1:%[0-9]+]] = tail call i32 @llvm.bpf.passthrough.i32.i32({{.*}}, i32 [[r0]])
				; CHECK-NEXT: ret i32 [[r1]]
				}

				; Function Attrs: nounwind
				define dso_local i32 @one_marker_expected_2(ptr noundef %ctx) #0 !dbg !80 {
				entry:
				%ctx.addr = alloca ptr, align 8
				store ptr %ctx, ptr %ctx.addr, align 8, !tbaa !29
				call void @llvm.dbg.declare(metadata ptr %ctx.addr, metadata !82, metadata !DIExpression()), !dbg !83
				%0 = load ptr, ptr %ctx.addr, align 8, !dbg !84, !tbaa !29
				%inner_ptr = getelementptr inbounds %struct.outer, ptr %0, i32 0, i32 2, !dbg !85
				%1 = load ptr, ptr %inner_ptr, align 8, !dbg !85, !tbaa !86
				%b = getelementptr inbounds %struct.inner, ptr %1, i32 0, i32 1, !dbg !87
				%2 = load i32, ptr %b, align 4, !dbg !87, !tbaa !88
				ret i32 %2, !dbg !89
				; CHECK: %inner_ptr = getelementptr inbounds %struct.outer, ptr %ctx, i64 0, i32 2
				; CHECK-NEXT: [[r0:%[0-9]+]] = load ptr, ptr %inner_ptr
				; CHECK-NEXT: %b = getelementptr inbounds %struct.inner, ptr [[r0]], i64 0, i32 1
				; CHECK-NEXT: [[r1:%[0-9]+]] = load i32, ptr %b
				; CHECK-NEXT: [[r2:%[0-9]+]] = tail call i32 @llvm.bpf.passthrough.i32.i32({{.*}}, i32 [[r1]])
				; CHECK-NEXT: ret i32 [[r2]]
				}

				; Function Attrs: nounwind
				define dso_local i32 @one_marker_in_each_branch(ptr noundef %ctx) #0 !dbg !90 {
				entry:
				%retval = alloca i32, align 4
				%ctx.addr = alloca ptr, align 8
				store ptr %ctx, ptr %ctx.addr, align 8, !tbaa !29
				call void @llvm.dbg.declare(metadata ptr %ctx.addr, metadata !92, metadata !DIExpression()), !dbg !93
				%0 = load ptr, ptr %ctx.addr, align 8, !dbg !94, !tbaa !29
				%first = getelementptr inbounds %struct.outer, ptr %0, i32 0, i32 0, !dbg !96
				%1 = load i32, ptr %first, align 8, !dbg !96, !tbaa !97
				%tobool = icmp ne i32 %1, 0, !dbg !94
				br i1 %tobool, label %if.then, label %if.else, !dbg !98
				; CHECK: [[r0:%[0-9]+]] = load i32, ptr %ctx
				; CHECK-NEXT: [[r1:%[0-9]+]] = tail call i32 @llvm.bpf.passthrough.i32.i32({{.*}}, i32 [[r0]])
				; CHECK-NEXT: %tobool.not = icmp eq i32 [[r1]], 0
				; CHECK-NEXT: br i1 %tobool.not, label %if.else, label %if.then

				if.then: ; preds = %entry
				%2 = load ptr, ptr %ctx.addr, align 8, !dbg !99, !tbaa !29
				%butlast = getelementptr inbounds %struct.outer, ptr %2, i32 0, i32 3, !dbg !101
				%3 = load i32, ptr %butlast, align 8, !dbg !101, !tbaa !102
				store i32 %3, ptr %retval, align 4, !dbg !103
				br label %return, !dbg !103
				; CHECK: %butlast = getelementptr inbounds %struct.outer, ptr %ctx, i64 0, i32 3
				; CHECK-NEXT: [[r2]] = load i32, ptr %butlast
				; CHECK-NEXT: [[r3:%[0-9]+]] = tail call i32 @llvm.bpf.passthrough.i32.i32({{.*}}, i32 [[r2]])
				; CHECK-NEXT: br label %return

				if.else: ; preds = %entry
				%4 = load ptr, ptr %ctx.addr, align 8, !dbg !104, !tbaa !29
				%last = getelementptr inbounds %struct.outer, ptr %4, i32 0, i32 4, !dbg !106
				%5 = load i32, ptr %last, align 4, !dbg !106, !tbaa !36
				store i32 %5, ptr %retval, align 4, !dbg !107
				br label %return, !dbg !107
				; CHECK: %last = getelementptr inbounds %struct.outer, ptr %ctx, i64 0, i32 4
				; CHECK-NEXT: [[r4:%[0-9]+]] = load i32, ptr %last
				; CHECK-NEXT: [[r5:%[0-9]+]] = tail call i32 @llvm.bpf.passthrough.i32.i32({{.*}}, i32 [[r4]])
				; CHECK-NEXT: br label %return

				return: ; preds = %if.else, %if.then
				%6 = load i32, ptr %retval, align 4, !dbg !108
				ret i32 %6, !dbg !108
				}

				; Function Attrs: nounwind
				define dso_local i32 @do_not_sink(ptr noundef %ctx) #0 !dbg !109 {
				entry:
				%retval = alloca i32, align 4
				%ctx.addr = alloca ptr, align 8
				store ptr %ctx, ptr %ctx.addr, align 8, !tbaa !29
				call void @llvm.dbg.declare(metadata ptr %ctx.addr, metadata !111, metadata !DIExpression()), !dbg !112
				%0 = load ptr, ptr %ctx.addr, align 8, !dbg !113, !tbaa !29
				%first = getelementptr inbounds %struct.outer, ptr %0, i32 0, i32 0, !dbg !115
				%1 = load i32, ptr %first, align 8, !dbg !115, !tbaa !97
				%tobool = icmp ne i32 %1, 0, !dbg !113
				br i1 %tobool, label %if.then, label %if.else, !dbg !116
				; CHECK: [[r0:%[0-9]+]] = load i32, ptr %ctx
				; CHECK-NEXT: [[r1:%[0-9]+]] = tail call i32 @llvm.bpf.passthrough.i32.i32({{.*}}, i32 [[r0]])
				; CHECK-NEXT: %tobool.not = icmp eq i32 [[r1]], 0
				; CHECK-NEXT: br i1 %tobool.not, label %if.else, label %if.then

				if.then: ; preds = %entry
				%2 = load ptr, ptr %ctx.addr, align 8, !dbg !117, !tbaa !29
				%butlast = getelementptr inbounds %struct.outer, ptr %2, i32 0, i32 3, !dbg !118
				%3 = load i32, ptr %butlast, align 8, !dbg !118, !tbaa !102
				%call = call i32 @magic2(i32 noundef %3), !dbg !119
				store i32 %call, ptr %retval, align 4, !dbg !120
				br label %return, !dbg !120
				; CHECK: %butlast = getelementptr inbounds %struct.outer, ptr %ctx, i64 0, i32 3
				; CHECK-NEXT: [[r2:%[0-9]+]] = load i32, ptr %butlast
				; CHECK-NEXT: [[r3:%[0-9]+]] = tail call i32 @llvm.bpf.passthrough.i32.i32({{.*}}, i32 [[r2]])
				; CHECK-NEXT: br label %return

				if.else: ; preds = %entry
				%4 = load ptr, ptr %ctx.addr, align 8, !dbg !121, !tbaa !29
				%last = getelementptr inbounds %struct.outer, ptr %4, i32 0, i32 4, !dbg !122
				%5 = load i32, ptr %last, align 4, !dbg !122, !tbaa !36
				%call1 = call i32 @magic2(i32 noundef %5), !dbg !123
				store i32 %call1, ptr %retval, align 4, !dbg !124
				br label %return, !dbg !124
				; CHECK: %last = getelementptr inbounds %struct.outer, ptr %ctx, i64 0, i32 4
				; CHECK-NEXT: [[r4:%[0-9]+]] = load i32, ptr %last
				; CHECK-NEXT: [[r5:%[0-9]+]] = tail call i32 @llvm.bpf.passthrough.i32.i32({{.*}}, i32 [[r4]])
				; CHECK-NEXT: br label %return

				return: ; preds = %if.else, %if.then
				%6 = load i32, ptr %retval, align 4, !dbg !125
				ret i32 %6, !dbg !125
				}

				declare !dbg !126 dso_local i32 @magic2(i32 noundef) #3

				; Function Attrs: nounwind
				define dso_local i32 @reuse_sequence_number(ptr noundef %ctx) #0 !dbg !130 {
				; CHECK: define dso_local i32 @reuse_sequence_number
				; CHECK-NEXT: entry:
				; CHECK-NEXT: call void @llvm.dbg.value({{.*}})
				; CHECK-NEXT: %c1 = getelementptr inbounds %struct.ctx2, ptr %ctx, i64 0, i32 2
				; CHECK-NEXT: [[r0:%[0-9]+]] = load i32, ptr %c1
				; CHECK-NEXT: [[r1:%[0-9]+]] = tail call i32 @llvm.bpf.passthrough.i32.i32({{.*}}, i32 [[r0]])
				; CHECK-NEXT: {{.*}} = tail call i32 @magic2(i32 noundef [[r1]]) #8
				; CHECK-NEXT: ret i32 {{.*}}
				entry:
				%retval = alloca i32, align 4
				%ctx.addr = alloca ptr, align 8
				store ptr %ctx, ptr %ctx.addr, align 8, !tbaa !29
				call void @llvm.dbg.declare(metadata ptr %ctx.addr, metadata !140, metadata !DIExpression()), !dbg !141
				%0 = load ptr, ptr %ctx.addr, align 8, !dbg !142, !tbaa !29
				%b = getelementptr inbounds %struct.ctx2, ptr %0, i32 0, i32 1, !dbg !144
				%1 = load i32, ptr %b, align 4, !dbg !144, !tbaa !145
				%tobool = icmp ne i32 %1, 0, !dbg !142
				br i1 %tobool, label %if.then, label %if.else, !dbg !147

				if.then: ; preds = %entry
				%2 = load ptr, ptr %ctx.addr, align 8, !dbg !148, !tbaa !29
				%c = getelementptr inbounds %struct.ctx2, ptr %2, i32 0, i32 2, !dbg !149
				%3 = load i32, ptr %c, align 4, !dbg !149, !tbaa !150
				%call = call i32 @magic2(i32 noundef %3), !dbg !151
				store i32 %call, ptr %retval, align 4, !dbg !152
				br label %return, !dbg !152

				if.else: ; preds = %entry
				%4 = load ptr, ptr %ctx.addr, align 8, !dbg !153, !tbaa !29
				%c1 = getelementptr inbounds %struct.ctx2, ptr %4, i32 0, i32 2, !dbg !154
				%5 = load i32, ptr %c1, align 4, !dbg !154, !tbaa !150
				%call2 = call i32 @magic2(i32 noundef %5), !dbg !155
				store i32 %call2, ptr %retval, align 4, !dbg !156
				br label %return, !dbg !156

				return: ; preds = %if.else, %if.then
				%6 = load i32, ptr %retval, align 4, !dbg !157
				ret i32 %6, !dbg !157
				}

				; Function Attrs: nounwind
				define dso_local i32 @dont_reuse_sequence_number(ptr noundef %ctx) #0 !dbg !158 {
				entry:
				%retval = alloca i32, align 4
				%ctx.addr = alloca ptr, align 8
				store ptr %ctx, ptr %ctx.addr, align 8, !tbaa !29
				call void @llvm.dbg.declare(metadata ptr %ctx.addr, metadata !168, metadata !DIExpression()), !dbg !169
				%0 = load ptr, ptr %ctx.addr, align 8, !dbg !170, !tbaa !29
				%1 = call ptr @llvm.preserve.struct.access.index.p0.p0(ptr elementtype(%struct.ctx3) %0, i32 1, i32 1), !dbg !172, !llvm.preserve.access.index !162
				%2 = load i32, ptr %1, align 4, !dbg !172, !tbaa !173
				%tobool = icmp ne i32 %2, 0, !dbg !170
				br i1 %tobool, label %if.then, label %if.else, !dbg !175

				if.then: ; preds = %entry
				%3 = load ptr, ptr %ctx.addr, align 8, !dbg !176, !tbaa !29
				%4 = call ptr @llvm.preserve.struct.access.index.p0.p0(ptr elementtype(%struct.ctx3) %3, i32 2, i32 2), !dbg !177, !llvm.preserve.access.index !162
				%5 = load i32, ptr %4, align 4, !dbg !177, !tbaa !178
				%call = call i32 @magic2(i32 noundef %5), !dbg !179
				store i32 %call, ptr %retval, align 4, !dbg !180
				br label %return, !dbg !180
				; CHECK: if.then:
				; CHECK-NEXT: [[r7:%[0-9]+]] = tail call i32 @llvm.bpf.passthrough.i32.i32({{.*}}, i32 [[r6:%[0-9]+]])
				; CHECK-NEXT: br label %return

				if.else: ; preds = %entry
				%6 = load ptr, ptr %ctx.addr, align 8, !dbg !181, !tbaa !29
				%7 = call ptr @llvm.preserve.struct.access.index.p0.p0(ptr elementtype(%struct.ctx3) %6, i32 2, i32 2), !dbg !182, !llvm.preserve.access.index !162
				%8 = load i32, ptr %7, align 4, !dbg !182, !tbaa !178
				%call1 = call i32 @magic2(i32 noundef %8), !dbg !183
				store i32 %call1, ptr %retval, align 4, !dbg !184
				br label %return, !dbg !184
				; CHECK: if.else:
				; CHECK-NEXT: [[r8:%[0-9]+]] = tail call i32 @llvm.bpf.passthrough.i32.i32({{.*}}, i32 [[r6]])
				; CHECK-NEXT: br label %return

				return: ; preds = %if.else, %if.then
				%9 = load i32, ptr %retval, align 4, !dbg !185
				ret i32 %9, !dbg !185
				; CHECK: return:
				; CHECK-NEXT: %.sink = phi i32 [ [[r8]], %if.else ], [ [[r7]], %if.then ]
				; CHECK-NEXT: {{.*}} = tail call i32 @magic2(i32 noundef %.sink) #8
				; CHECK-NEXT: ret i32 {{.*}}
				}

				; Function Attrs: nocallback nofree nosync nounwind readnone willreturn
				declare ptr @llvm.preserve.struct.access.index.p0.p0(ptr, i32 immarg, i32 immarg) #4

				attributes #0 = { nounwind "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
				attributes #1 = { nocallback nofree nosync nounwind readnone speculatable willreturn }
				attributes #2 = { argmemonly nocallback nofree nosync nounwind willreturn }
				attributes #3 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
				attributes #4 = { nocallback nofree nosync nounwind readnone willreturn }
				attributes #5 = { nounwind }

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!2, !3, !4, !5}
				!llvm.ident = !{!6}

				!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 16.0.0 (https://github.com/llvm/llvm-project.git 7a3c35f68d87fef9e3b1d97a7c664139d6bb84fa)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None)
				!1 = !DIFile(filename: "/home/eddy/work/tasks/reloc-offset-tmp-issue/t.c", directory: "/home/eddy/work/llvm-project/build", checksumkind: CSK_MD5, checksum: "0fb75f548492a4e2f6bc405bb9623512")
				!2 = !{i32 7, !"Dwarf Version", i32 5}
				!3 = !{i32 2, !"Debug Info Version", i32 3}
				!4 = !{i32 1, !"wchar_size", i32 4}
				!5 = !{i32 7, !"frame-pointer", i32 2}
				!6 = !{!"clang version 16.0.0 (https://github.com/llvm/llvm-project.git 7a3c35f68d87fef9e3b1d97a7c664139d6bb84fa)"}
				!7 = distinct !DISubprogram(name: "marker_expected", scope: !8, file: !8, line: 16, type: !9, scopeLine: 16, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !25)
				!8 = !DIFile(filename: "tasks/reloc-offset-tmp-issue/t.c", directory: "/home/eddy/work", checksumkind: CSK_MD5, checksum: "0fb75f548492a4e2f6bc405bb9623512")
				!9 = !DISubroutineType(types: !10)
				!10 = !{!11, !12}
				!11 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
				!12 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !13, size: 64)
				!13 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "outer", file: !8, line: 8, size: 256, elements: !14)
				!14 = !{!15, !16, !21, !23, !24}
				!15 = !DIDerivedType(tag: DW_TAG_member, name: "first", scope: !13, file: !8, line: 9, baseType: !11, size: 32)
				!16 = !DIDerivedType(tag: DW_TAG_member, name: "inner_inline", scope: !13, file: !8, line: 10, baseType: !17, size: 64, offset: 32)
				!17 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "inner", file: !8, line: 3, size: 64, elements: !18)
				!18 = !{!19, !20}
				!19 = !DIDerivedType(tag: DW_TAG_member, name: "a", scope: !17, file: !8, line: 4, baseType: !11, size: 32)
				!20 = !DIDerivedType(tag: DW_TAG_member, name: "b", scope: !17, file: !8, line: 5, baseType: !11, size: 32, offset: 32)
				!21 = !DIDerivedType(tag: DW_TAG_member, name: "inner_ptr", scope: !13, file: !8, line: 11, baseType: !22, size: 64, offset: 128)
				!22 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !17, size: 64)
				!23 = !DIDerivedType(tag: DW_TAG_member, name: "butlast", scope: !13, file: !8, line: 12, baseType: !11, size: 32, offset: 192)
				!24 = !DIDerivedType(tag: DW_TAG_member, name: "last", scope: !13, file: !8, line: 13, baseType: !11, size: 32, offset: 224)
				!25 = !{!26}
				!26 = !DILocalVariable(name: "ctx", arg: 1, scope: !7, file: !8, line: 16, type: !12, annotations: !27)
				!27 = !{!28}
				!28 = !{!"btf_decl_tag", !"ctx"}
				!29 = !{!30, !30, i64 0}
				!30 = !{!"any pointer", !31, i64 0}
				!31 = !{!"omnipotent char", !32, i64 0}
				!32 = !{!"Simple C/C++ TBAA"}
				!33 = !DILocation(line: 16, column: 35, scope: !7)
				!34 = !DILocation(line: 17, column: 10, scope: !7)
				!35 = !DILocation(line: 17, column: 15, scope: !7)
				!36 = !{!37, !38, i64 28}
				!37 = !{!"outer", !38, i64 0, !39, i64 4, !30, i64 16, !38, i64 24, !38, i64 28}
				!38 = !{!"int", !31, i64 0}
				!39 = !{!"inner", !38, i64 0, !38, i64 4}
				!40 = !DILocation(line: 17, column: 3, scope: !7)
				!41 = distinct !DISubprogram(name: "no_marker_expected_1", scope: !8, file: !8, line: 20, type: !9, scopeLine: 20, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !42)
				!42 = !{!43}
				!43 = !DILocalVariable(name: "non_ctx", arg: 1, scope: !41, file: !8, line: 20, type: !12)
				!44 = !DILocation(line: 20, column: 40, scope: !41)
				!45 = !DILocation(line: 21, column: 10, scope: !41)
				!46 = !DILocation(line: 21, column: 19, scope: !41)
				!47 = !DILocation(line: 21, column: 3, scope: !41)
				!48 = distinct !DISubprogram(name: "no_marker_expected_2", scope: !8, file: !8, line: 26, type: !9, scopeLine: 26, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !49)
				!49 = !{!50, !51}
				!50 = !DILocalVariable(name: "ctx", arg: 1, scope: !48, file: !8, line: 26, type: !12, annotations: !27)
				!51 = !DILocalVariable(name: "x", scope: !48, file: !8, line: 27, type: !12)
				!52 = !DILocation(line: 26, column: 40, scope: !48)
				!53 = !DILocation(line: 27, column: 3, scope: !48)
				!54 = !DILocation(line: 27, column: 17, scope: !48)
				!55 = !DILocation(line: 27, column: 21, scope: !48)
				!56 = !DILocation(line: 28, column: 10, scope: !48)
				!57 = !DILocation(line: 28, column: 13, scope: !48)
				!58 = !DILocation(line: 29, column: 1, scope: !48)
				!59 = !DILocation(line: 28, column: 3, scope: !48)
				!60 = distinct !DISubprogram(name: "no_marker_expected_3", scope: !8, file: !8, line: 31, type: !61, scopeLine: 31, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !63)
				!61 = !DISubroutineType(types: !62)
				!62 = !{!11, !12, !12}
				!63 = !{!64, !65}
				!64 = !DILocalVariable(name: "ctx", arg: 1, scope: !60, file: !8, line: 31, type: !12, annotations: !27)
				!65 = !DILocalVariable(name: "non_ctx", arg: 2, scope: !60, file: !8, line: 31, type: !12)
				!66 = !DILocation(line: 31, column: 40, scope: !60)
				!67 = !DILocation(line: 31, column: 67, scope: !60)
				!68 = !DILocation(line: 32, column: 10, scope: !60)
				!69 = !DILocation(line: 32, column: 19, scope: !60)
				!70 = !DILocation(line: 32, column: 3, scope: !60)
				!71 = distinct !DISubprogram(name: "one_marker_expected_1", scope: !8, file: !8, line: 35, type: !9, scopeLine: 35, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !72)
				!72 = !{!73}
				!73 = !DILocalVariable(name: "ctx", arg: 1, scope: !71, file: !8, line: 35, type: !12, annotations: !27)
				!74 = !DILocation(line: 35, column: 41, scope: !71)
				!75 = !DILocation(line: 36, column: 10, scope: !71)
				!76 = !DILocation(line: 36, column: 15, scope: !71)
				!77 = !DILocation(line: 36, column: 28, scope: !71)
				!78 = !{!37, !38, i64 8}
				!79 = !DILocation(line: 36, column: 3, scope: !71)
				!80 = distinct !DISubprogram(name: "one_marker_expected_2", scope: !8, file: !8, line: 39, type: !9, scopeLine: 39, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !81)
				!81 = !{!82}
				!82 = !DILocalVariable(name: "ctx", arg: 1, scope: !80, file: !8, line: 39, type: !12, annotations: !27)
				!83 = !DILocation(line: 39, column: 41, scope: !80)
				!84 = !DILocation(line: 40, column: 10, scope: !80)
				!85 = !DILocation(line: 40, column: 15, scope: !80)
				!86 = !{!37, !30, i64 16}
				!87 = !DILocation(line: 40, column: 26, scope: !80)
				!88 = !{!39, !38, i64 4}
				!89 = !DILocation(line: 40, column: 3, scope: !80)
				!90 = distinct !DISubprogram(name: "one_marker_in_each_branch", scope: !8, file: !8, line: 43, type: !9, scopeLine: 43, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !91)
				!91 = !{!92}
				!92 = !DILocalVariable(name: "ctx", arg: 1, scope: !90, file: !8, line: 43, type: !12, annotations: !27)
				!93 = !DILocation(line: 43, column: 45, scope: !90)
				!94 = !DILocation(line: 44, column: 7, scope: !95)
				!95 = distinct !DILexicalBlock(scope: !90, file: !8, line: 44, column: 7)
				!96 = !DILocation(line: 44, column: 12, scope: !95)
				!97 = !{!37, !38, i64 0}
				!98 = !DILocation(line: 44, column: 7, scope: !90)
				!99 = !DILocation(line: 45, column: 12, scope: !100)
				!100 = distinct !DILexicalBlock(scope: !95, file: !8, line: 44, column: 19)
				!101 = !DILocation(line: 45, column: 17, scope: !100)
				!102 = !{!37, !38, i64 24}
				!103 = !DILocation(line: 45, column: 5, scope: !100)
				!104 = !DILocation(line: 47, column: 12, scope: !105)
				!105 = distinct !DILexicalBlock(scope: !95, file: !8, line: 46, column: 10)
				!106 = !DILocation(line: 47, column: 17, scope: !105)
				!107 = !DILocation(line: 47, column: 5, scope: !105)
				!108 = !DILocation(line: 49, column: 1, scope: !90)
				!109 = distinct !DISubprogram(name: "do_not_sink", scope: !8, file: !8, line: 53, type: !9, scopeLine: 53, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !110)
				!110 = !{!111}
				!111 = !DILocalVariable(name: "ctx", arg: 1, scope: !109, file: !8, line: 53, type: !12, annotations: !27)
				!112 = !DILocation(line: 53, column: 31, scope: !109)
				!113 = !DILocation(line: 54, column: 7, scope: !114)
				!114 = distinct !DILexicalBlock(scope: !109, file: !8, line: 54, column: 7)
				!115 = !DILocation(line: 54, column: 12, scope: !114)
				!116 = !DILocation(line: 54, column: 7, scope: !109)
				!117 = !DILocation(line: 55, column: 19, scope: !114)
				!118 = !DILocation(line: 55, column: 24, scope: !114)
				!119 = !DILocation(line: 55, column: 12, scope: !114)
				!120 = !DILocation(line: 55, column: 5, scope: !114)
				!121 = !DILocation(line: 57, column: 19, scope: !114)
				!122 = !DILocation(line: 57, column: 24, scope: !114)
				!123 = !DILocation(line: 57, column: 12, scope: !114)
				!124 = !DILocation(line: 57, column: 5, scope: !114)
				!125 = !DILocation(line: 58, column: 1, scope: !109)
				!126 = !DISubprogram(name: "magic2", scope: !8, file: !8, line: 51, type: !127, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !129)
				!127 = !DISubroutineType(types: !128)
				!128 = !{!11, !11}
				!129 = !{}
				!130 = distinct !DISubprogram(name: "reuse_sequence_number", scope: !8, file: !8, line: 62, type: !131, scopeLine: 62, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !139)
				!131 = !DISubroutineType(types: !132)
				!132 = !{!11, !133}
				!133 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !134, size: 64)
				!134 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "ctx2", file: !8, line: 60, size: 96, elements: !135)
				!135 = !{!136, !137, !138}
				!136 = !DIDerivedType(tag: DW_TAG_member, name: "a", scope: !134, file: !8, line: 60, baseType: !11, size: 32)
				!137 = !DIDerivedType(tag: DW_TAG_member, name: "b", scope: !134, file: !8, line: 60, baseType: !11, size: 32, offset: 32)
				!138 = !DIDerivedType(tag: DW_TAG_member, name: "c", scope: !134, file: !8, line: 60, baseType: !11, size: 32, offset: 64)
				!139 = !{!140}
				!140 = !DILocalVariable(name: "ctx", arg: 1, scope: !130, file: !8, line: 62, type: !133, annotations: !27)
				!141 = !DILocation(line: 62, column: 40, scope: !130)
				!142 = !DILocation(line: 63, column: 7, scope: !143)
				!143 = distinct !DILexicalBlock(scope: !130, file: !8, line: 63, column: 7)
				!144 = !DILocation(line: 63, column: 12, scope: !143)
				!145 = !{!146, !38, i64 4}
				!146 = !{!"ctx2", !38, i64 0, !38, i64 4, !38, i64 8}
				!147 = !DILocation(line: 63, column: 7, scope: !130)
				!148 = !DILocation(line: 64, column: 19, scope: !143)
				!149 = !DILocation(line: 64, column: 24, scope: !143)
				!150 = !{!146, !38, i64 8}
				!151 = !DILocation(line: 64, column: 12, scope: !143)
				!152 = !DILocation(line: 64, column: 5, scope: !143)
				!153 = !DILocation(line: 66, column: 19, scope: !143)
				!154 = !DILocation(line: 66, column: 24, scope: !143)
				!155 = !DILocation(line: 66, column: 12, scope: !143)
				!156 = !DILocation(line: 66, column: 5, scope: !143)
				!157 = !DILocation(line: 67, column: 1, scope: !130)
				!158 = distinct !DISubprogram(name: "dont_reuse_sequence_number", scope: !8, file: !8, line: 71, type: !159, scopeLine: 71, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !167)
				!159 = !DISubroutineType(types: !160)
				!160 = !{!11, !161}
				!161 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !162, size: 64)
				!162 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "ctx3", file: !8, line: 69, size: 96, elements: !163)
				!163 = !{!164, !165, !166}
				!164 = !DIDerivedType(tag: DW_TAG_member, name: "a", scope: !162, file: !8, line: 69, baseType: !11, size: 32)
				!165 = !DIDerivedType(tag: DW_TAG_member, name: "b", scope: !162, file: !8, line: 69, baseType: !11, size: 32, offset: 32)
				!166 = !DIDerivedType(tag: DW_TAG_member, name: "c", scope: !162, file: !8, line: 69, baseType: !11, size: 32, offset: 64)
				!167 = !{!168}
				!168 = !DILocalVariable(name: "ctx", arg: 1, scope: !158, file: !8, line: 71, type: !161, annotations: !27)
				!169 = !DILocation(line: 71, column: 45, scope: !158)
				!170 = !DILocation(line: 72, column: 7, scope: !171)
				!171 = distinct !DILexicalBlock(scope: !158, file: !8, line: 72, column: 7)
				!172 = !DILocation(line: 72, column: 12, scope: !171)
				!173 = !{!174, !38, i64 4}
				!174 = !{!"ctx3", !38, i64 0, !38, i64 4, !38, i64 8}
				!175 = !DILocation(line: 72, column: 7, scope: !158)
				!176 = !DILocation(line: 73, column: 19, scope: !171)
				!177 = !DILocation(line: 73, column: 24, scope: !171)
				!178 = !{!174, !38, i64 8}
				!179 = !DILocation(line: 73, column: 12, scope: !171)
				!180 = !DILocation(line: 73, column: 5, scope: !171)
				!181 = !DILocation(line: 75, column: 19, scope: !171)
				!182 = !DILocation(line: 75, column: 24, scope: !171)
				!183 = !DILocation(line: 75, column: 12, scope: !171)
				!184 = !DILocation(line: 75, column: 5, scope: !171)
				!185 = !DILocation(line: 76, column: 1, scope: !158)