This is an archive of the discontinued LLVM Phabricator instance.

[BPF] Attribute preserve_static_offset for structs
ClosedPublic

Authored by eddyz87 on Sep 6 2022, 8:31 AM.

Details

Summary

This commit adds a new BPF specific structure attribte
__attribute__((preserve_static_offset)) and a pass to deal with it.

This attribute may be attached to a struct or union declaration, where
it notifies the compiler that this structure is a "context" structure.
The following limitations apply to context structures:

  • runtime environment might patch access to the fields of this type by updating the field offset;

    BPF verifier limits access patterns allowed for certain data types. E.g. struct __sk_buff and struct bpf_sock_ops. For these types only LD/ST <reg> <static-offset> memory loads and stores are allowed.

    This is so because offsets of the fields of these structures do not match real offsets in the running kernel. During BPF program load/verification loads and stores to the fields of these types are rewritten so that offsets match real offsets. For this rewrite to happen static offsets have to be encoded in the instructions.

    See kernel/bpf/verifier.c:convert_ctx_access function in the Linux kernel source tree for details.
  • runtime environment might disallow access to the field of the type through modified pointers.

    During BPF program verification a tag PTR_TO_CTX is tracked for register values. In case if register with such tag is modified BPF programs are not allowed to read or write memory using register. See kernel/bpf/verifier.c:check_mem_access function in the Linux kernel source tree for details.

Access to the structure fields is translated to IR as a sequence:

  • (load (getelementptr %ptr %offset)) or
  • (store (getelementptr %ptr %offset))

During instruction selection phase such sequences are translated as a
single load instruction with embedded offset, e.g. LDW %ptr, %offset,
which matches access pattern necessary for the restricted
set of types described above (when %offset is static).

Multiple optimizer passes might separate these instructions, this
includes:

  • SimplifyCFGPass (sinking)
  • InstCombine (sinking)
  • GVN (hoisting)

The preserve_static_offset attribute marks structures for which the
following transformations happen:

  • at the early IR processing stage:
    • (load (getelementptr ...)) replaced by call to intrinsic llvm.bpf.getelementptr.and.load;
    • (store (getelementptr ...)) replaced by call to intrinsic llvm.bpf.getelementptr.and.store;
  • at the late IR processing stage this modification is undone.

Such handling prevents various optimizer passes from generating
sequences of instructions that would be rejected by BPF verifier.

The attribute((preserve_static_offset)) has a priority over
attribute((preserve_access_index)). When preserve_access_index
attribute is present preserve access index transformations are not
applied.

This addresses the issue reported by the following thread:

https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6

Diff Detail

Event Timeline

eddyz87 created this revision.Sep 6 2022, 8:31 AM
Herald added a project: Restricted Project. · View Herald Transcript
eddyz87 updated this revision to Diff 459560.Sep 12 2022, 2:32 PM
  • Added folding for GEP chains with constant indices:
    • inline anonymous unions work as expected
    • type casts to char* with consequitive array access work as expected
  • Moved context rewrite to a late callback function processing pipeline stage:
    • necessary to run the rewrite after loop unrolling, as rewrite can't handle non-constant indices for non structural GEP chains
eddyz87 updated this revision to Diff 460356.Sep 15 2022, 3:44 AM
  • added GVN pass after context rewrite to handle cases clobbered by marker calls
  • added marker simplification and context rewrite passes for opt pipeline
  • more test cases
eddyz87 updated this revision to Diff 461103.Sep 18 2022, 4:42 PM

Merged simplify and rewrite passes, various adjustments to unclobber CSE and GVN opportunities

eddyz87 updated this revision to Diff 461317.Sep 19 2022, 12:43 PM

Merge getelementptr and get.and.load/store calls for function inlining

eddyz87 updated this revision to Diff 462038.Sep 21 2022, 4:23 PM

Force doesNotAccessMemory for context marker function

eddyz87 updated this revision to Diff 462709.Sep 25 2022, 3:24 AM

Moved readonly/writeonly attrs as properties of call site to handle volatile load and stores

eddyz87 updated this revision to Diff 534561.Jun 26 2023, 7:55 AM

Rebase, replaced .c based back-end changes by .ll based ones.

eddyz87 updated this revision to Diff 535210.Jun 27 2023, 6:25 PM
  • renamed llvm.bpf.context.marker to llvm.context.marker.bpf (to avoid issues with intrinsic being looked up in the wrong table and not getting an ID / Attrs for CallInst);
  • use new pass manager for tests.
eddyz87 updated this revision to Diff 535380.Jun 28 2023, 6:55 AM
  • removed lots of uses of auto
  • removed usage of std::set
eddyz87 updated this revision to Diff 536643.Jul 2 2023, 5:43 PM

Generate immarg for index arguments of gep.and.load, gep.and.store.

eddyz87 published this revision for review.Jul 2 2023, 6:20 PM
eddyz87 edited reviewers, added: yonghong-song; removed: aaron.ballman.

Hi Yonghong,

Could you please take a look at this revision? I tried to describe the mechanics in the description / commit message and comments. Below are details regarding testing.
I tested these changes using a Linux Kernel special branch that declares decl_tag("ctx") for relevant functions (available here).
While comparing object files generated without decl_tag("ctx") and with decl_tag("ctx") I found some differences in 5 tests. (I use "with ctx" and "without ctx" below as a shorthand).

bpf_flow.bpf.o

Without ctx: 2713 instructions
With ctx: 2711 instructions

Difference: 2 load instructions appear in different BB and this requires two additional register to register assignments in "without ctx" case.
Difference is caused by GVNPass::PerformLoadPRE sub-pass and disappears if -mllvm -enable-load-pre=false option is passed. As the name implies this sub-pass operates only on load instructions.

connect4_prog.bpf.o

Without ctx: 351 instructions
With ctx: 351 instructions

Difference: one redundant load instruction is put in it's own basic block when compiled without ctx.

c
static __inline int set_notsent_lowat(struct bpf_sock_addr *ctx)
{ ... ctx->type ... }

int connect_v4_prog(struct bpf_sock_addr *ctx)
{
	...
	if (set_notsent_lowat(ctx))
		return 0;

	if (ctx->type ...) // ctx->type load is partially redundat after inlining
		return 0;
}

In the without ctx this is done by a part of JumpThreadingPass:

c++
/// simplifyPartiallyRedundantLoad - If LoadI is an obviously partially
/// redundant load instruction, eliminate it by replacing it with a PHI node.
/// This is an important optimization that encourages jump threading, and needs
/// to be run interlaced with other jump threading tasks.
bool JumpThreadingPass::simplifyPartiallyRedundantLoad(LoadInst *LoadI)

test_cls_redirect.bpf.o

Without ctx: 1616 instructions
With ctx: 1615 instructions

Difference: some differences in basic blocks placement. IR basic block structure is identical up until 'Branch Probability Basic Block Placement' machine pass.
Before that in without ctx case there is one additional basic block, which is introduced by GVNPass on process_icmpv4 function. Differences caused by GVNPass::PerformLoadPRE sub-pass which calls
llvm::SplitCriticalEdge that introduces the basic block.

test_misc_tcp_hdr_options.bpf.o

Without ctx: 611 instructions
With ctx: 612 instructions

Difference: one more load instruction generated in with ctx case. Happens in the following code snippet:

c
unsigned int nr_data = 0;
...
static int check_active_hdr_in(struct bpf_sock_ops *skops)
{
	...
	if (... < skops->skb_len)  // (1)
		nr_data++;
	...
	if (... == skops->skb_len) // (2)
		nr_pure_ack++;
	...
}

When compiled without ctx EarlyCSEPass reuses skops->skb_len computed at (1) for point (2). However, when compiled with ctx this does not happen. This happens because clang is not sure if skops, passed as parameter to getelementptr.and.load shares memory with nr_data. The following modification removes all differences:

c
static int check_active_hdr_in(struct bpf_sock_ops * restrict skops)

test_tcpnotify_kern.bpf.o

Without ctx: 57 instructions
With ctx: 58 instructions

Difference: one more store instruction generated, CFG structure differs slightly.
Difference is caused by SimplifyCFGPass, difference disappears if -mllvm -sink-common-insts=false option is supplied. The following IR snippet is the cause:

llvm
...
if.then:                                          ; preds = %entry
  %1 = getelementptr inbounds %struct.bpf_sock_ops, ptr %skops, i64 0, i32 1
  store i32 -1, ptr %1, align 4, !tbaa !240
  br label %cleanup30
...
...
sw.epilog:                                        ; preds = %if.end, ...
  %rv.0 = phi i32 [ -1, %sw.default ], ...
  %7 = getelementptr inbounds %struct.bpf_sock_ops, ptr %skops, i64 0, i32 1
  store i32 %rv.0, ptr %7, align 4, !tbaa !240
  br label %cleanup30, !dbg !278

cleanup30:                                        ; preds = %sw.epilog, %if.then
  %retval.0 = phi i32 [ 0, %if.then ], [ 1, %sw.epilog ], !dbg !226
  ret i32 %retval.0

Without ctx SimplifyCFGPass sinks getelementptr and store instructions to cleanup30 BB:

llvm
cleanup30:                                        ; preds = %sw.bb19, ...
  %rv.0.sink = phi i32 [ -1, %entry ], ...
  %retval.0 = phi i32 [ 0, %entry ], ...
  %6 = getelementptr inbounds %struct.bpf_sock_ops, ptr %skops, i64 0, i32 1, !dbg !226
  store i32 %rv.0.sink, ptr %6, align 4, !dbg !226, !tbaa !270
  ret i32 %retval.0, !dbg !271

For ctx case snippet looks as follows:

llvm
if.then:                                          ; preds = %entry
  tail call void ... @llvm.bpf.getelementptr.and.store.i32
    (i32 -1, ptr nonnull writeonly elementtype(%struct.bpf_sock_ops) %skops, ...
  br label %cleanup30
...
...
sw.epilog:                                        ; preds = %if.end ...
  %rv.0 = phi i32 [ -1, %sw.default ], ...
  call void ... @llvm.bpf.getelementptr.and.store.i32
    (i32 %rv.0, ptr nonnull writeonly elementtype(%struct.bpf_sock_ops) %skops, ...
  br label %cleanup30

cleanup30:                                        ; preds = %sw.epilog, %if.then
  %retval.0 = phi i32 [ 0, %if.then ], [ 1, %sw.epilog ], !dbg !228
  ret i32 %retval.0, !dbg !279

Note that in if.then the instruction is tail call while for sw.epilog the instruction is call. These instructions are considered not to be the same by SimplifyCFG.cpp:canSinkInstructions. I have an old merge request that fixes this: https://reviews.llvm.org/D134743 . Need to rebase it and ping someone to review.

Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
eddyz87 updated this revision to Diff 553819.Aug 27 2023, 5:12 PM
  • Removed special Sema processing and btf_decl_tag("ctx") prioritization in CGExpr, preserve access index calls are now handled by BPFContextMarker transformation.
  • Correctness fixes for isPointerOperand().
yonghong-song added inline comments.Aug 27 2023, 8:19 PM
llvm/include/llvm/IR/Intrinsics.td
2431

Is it possible to make this builtin as BPF specific one?

eddyz87 added inline comments.Aug 28 2023, 3:18 AM
llvm/include/llvm/IR/Intrinsics.td
2431

Currently llvm::Intrinsic::context_marker_bpf gets defined in llvm/IR/Intrinsics.h (via include of generated file IntrinsicEnums.inc, same as preserve_{struct,union,array}_access_index).

BPF specific intrinsics are defined in llvm/IR/IntrinsicsBPF.h (generated directly w/o .inc intermediary).

Thus, if I move context_marker_bpf to IntrinsicsBPF.td I would have to include IntrinsicsBPF.h in CGExpr.cpp. However, I don't see any target specific includes in that file.

eddyz87 updated this revision to Diff 553884.Aug 28 2023, 3:22 AM

git-clang-format fixes.

I will try to review other parts of code in the next few days.

llvm/include/llvm/IR/Intrinsics.td
2431

I went through the related clang code and indeed found it is hard to get a BPF target defined function in CGF or CGM. On the other hand, we can consider this new builtin under the umbrella "Intrinsics that are used to preserve debug information". Maybe we can rename the intrinsic name
to 'int_preserve_context_marker'? The goal of this builtin to preserve certain load/store which should be immune from optimizations. I try to generalize this so your function name 'wrapWithBPFContextMarker' can be renamed to
'wrapWithContextMarker'. There is no need to mention BPF any more.

In the commit message, you can mention that
similar to int_preserve_array_access_index, int_preserve_context_marker is only implemented in BPF backend. But other architectures can implement processing these intrinsics if they want to achieve some results similar to bpf backend.

WDYT?

eddyz87 added inline comments.Aug 29 2023, 5:22 AM
llvm/include/llvm/IR/Intrinsics.td
2431

I can rename these things, but tbh I don't think this functionality would be useful anywhere outside BPF, thus such renaming would be kind-of deceptive (and in case it would be useful, the renaming could be done at the time of second use).


Something more generic might look like !btf_decl_tag <str-value> metadata node attached to something. However, in the current form this would require transfer of such decl tag from type to function parameters and variables, e.g.:

struct foo {  ... } __attribute__((btf_decl_tag("ctx")));
void bar(struct foo *p) { ... }

During code-gen for bar use rule like "if function parameter has type annotated with btf_decl_tag, attach metadata to such parameter":

define void @bar(ptr %p !btf_decl_tag !1) { ... }
!1 = { "ctx" }

Such rule looks a bit weird:

  • tag is transferred from type to it's usage
  • what usages should be annotated? we care about function parameters but from generic point of view allocas or field accesses should be annotated as well.

The same metadata approach but with "ctx" attributes annotating function parameters (as you suggested originally, if I recall correctly) seems to be most generic and least controversial of all, e.g.:

void bar(struct foo *p __attribute__((btf_decl_tag("ctx")))) { ... }

Converted to:

define void @bar(ptr %p !btf_decl_tag !1) { ... }
!1 = { "ctx" }

However, this is less ergonomic for BPF, because user will have to annotate function parameters. (On the other hand, no changes in the kernel headers would be necessary).

I can rename these things, but tbh I don't think this functionality would be useful anywhere outside BPF, thus such renaming would be kind-of deceptive (and in case it would be useful, the renaming could be done at the time of second use).

Then let us keep this for now. We can decide what to do after clang frontend people reviewed this code.

ast added a subscriber: ast.Aug 30 2023, 10:33 AM

I can rename these things, but tbh I don't think this functionality would be useful anywhere outside BPF, thus such renaming would be kind-of deceptive (and in case it would be useful, the renaming could be done at the time of second use).

I agree that it's not useful outside of BPF, but it's useful outside of 'ctx'. I think 'preserve_constant_field_offset' would be more accurate description of the restriction.
We can expand in the doc that it's a constant offset when field of the struct is accessed.

Also instead of btf_tag it would be better to add another builtin similar to preserve_access_index.
Currently we add attribute((preserve_access_index)) to trigger CO-RE.
This one will be a new attribute((preserve_constant_field_offset)) that will be specified manually either in uapi/bpf.h or in vmlinux.h on some structs
and it will have precedence over preserve_access_index, so
(attribute((preserve_access_index)), apply_to = record) in vmlinux.h will be ignored on such structs.
Otherwise it's a bit odd that special names inside btf_tag have stronger rules than other attribute((preserve_access_index)).

I agree that it's not useful outside of BPF, but it's useful outside of 'ctx'. I think 'preserve_constant_field_offset' would be more accurate description of the restriction.
We can expand in the doc that it's a constant offset when field of the struct is accessed.

Also instead of btf_tag it would be better to add another builtin similar to preserve_access_index.
Currently we add attribute((preserve_access_index)) to trigger CO-RE.
This one will be a new attribute((preserve_constant_field_offset)) that will be specified manually either in uapi/bpf.h or in vmlinux.h on some structs

This makes sense, I'll adjust the implementation to use new attribute (also will try to use metadata instead of intrinsic call, replace by intrinsic on the back-end side).

and it will have precedence over preserve_access_index, so
(attribute((preserve_access_index)), apply_to = record) in vmlinux.h will be ignored on such structs.
Otherwise it's a bit odd that special names inside btf_tag have stronger rules than other attribute((preserve_access_index)).

Note on current propagation logic: whenever there is an expression E of type T, where T is a struct annotated with btf_decl_tag("ctx"), all usages of E are traversed recursively visiting getelementptr and calls to preserve_{struct,union,array}_index, the latter are replaced by getelementptr. E.g.:

#define __pai __attribute__((preserve_access_context));
#define __ctx __attribute__((btf_decl_tag("ctx")))
struct bar { int a; int b; } __pai;
struct foo {
  struct bar b;
} __ctx __pai;
... struct foo f; ...
... f.b.bb ...

The call to preserve_struct_index generated for bb in f.b.bb would be replaced by getelementptr and later by getelementptr.and.load.
However, context structures don't have nested structures at the moment. The list of context structures is:

  • __sk_buff
  • bpf_cgroup_dev_ctx
  • bpf_sk_lookup
  • bpf_sock
  • bpf_sock_addr
  • bpf_sock_ops
  • bpf_sockopt
  • bpf_sysctl
  • sk_msg_md
  • sk_reuseport_md
  • xdp_md
ast added a comment.Aug 30 2023, 1:25 PM

Note on current propagation logic: whenever there is an expression E of type T, where T is a struct annotated with btf_decl_tag("ctx"), all usages of E are traversed recursively visiting getelementptr and calls to preserve_{struct,union,array}_index, the latter are replaced by getelementptr. E.g.:

#define __pai __attribute__((preserve_access_context));
#define __ctx __attribute__((btf_decl_tag("ctx")))
struct bar { int a; int b; } __pai;
struct foo {
  struct bar b;
} __ctx __pai;
... struct foo f; ...
... f.b.bb ...

The call to preserve_struct_index generated for bb in f.b.bb would be replaced by getelementptr and later by getelementptr.and.load.
However, context structures don't have nested structures at the moment. The list of context structures is:

  • __sk_buff
  • bpf_cgroup_dev_ctx
  • bpf_sk_lookup
  • bpf_sock
  • bpf_sock_addr
  • bpf_sock_ops
  • bpf_sockopt
  • bpf_sysctl
  • sk_msg_md
  • sk_reuseport_md
  • xdp_md

Right. Such recursive propagation of PAI is necessary. For btf_tag we cannot do it. Always propagating it won't be correct.
New preserve_const_field_offset would need to be propagated too and we actually have nested unions.
__bpf_md_ptr is such example. btf_tag wouldn't propagate into that union, but attr(preserve_const_field_offset) should.

After retesting kernel build with LLVM=1 and libbpf patch to reconstruct btf_decl_tag [1], statistics for BPF selftests looks as follows:

  • out of 653 object files 13 have some differences with and w/o this change;
  • for 2 programs there is small instruction count increase (+2 insn total);
  • for 5 programs there is small instruction decrease (-6 insn total);
  • 6 programs differ slightly but number of instructions is the same.

(The differences are insignificant, the rest of the comment could be skipped as it is probably not interesting for anyone but me).


Differences for first 5 programs were already described in this comment. The rest of the differences in described below.

netns_cookie_prog.bpf.o

Without ctx: 46 instructions
With ctx: 46 instructions

Instruction reordering:

 <get_netns_cookie_sk_msg>:
 	r6 = r1
-	r2 = *(u64 *)(r6 + 0x48)
 	r1 = *(u32 *)(r6 + 0x10)
-	if w1 != 0xa goto +0xb <LBB1_4>
+	if w1 != 0xa goto +0xc <LBB1_4>
+	r2 = *(u64 *)(r6 + 0x48)
 	if r2 == 0x0 goto +0xa <LBB1_4>
 	r1 = 0x0 ll
 	r3 = 0x0

The difference is introduced by "Machine code sinking" transformation. Before the transformation both 0x48 and 0x10 loads reside in the same basic block:

;; Old:
bb.0.entry:
  ...
  %0:gpr = CORE_LD64 345, %2:gpr, @"llvm.sk_msg_md:0:72$0:10:0"
  %9:gpr32 = CORE_LD32 350, %2:gpr, @"llvm.sk_msg_md:0:16$0:2"
  JNE_ri_32 killed %9:gpr32, 10, %bb.3

;; New:
bb.0.entry:
  ...
  %0:gpr = LDD %2:gpr, 72
  %3:gpr32 = LDW32 %2:gpr, 16
  JNE_ri_32 killed %3:gpr32, 10, %bb.3

Note: CORE pseudo-instructions are replaced by regular loads because btf_decl_tag("ctx") has priority over preserve_access_index attribute. The "Machine code sinking" transformation (MachineSink.cpp) can move LDD, LDW instructions, but can't move CORE_LD* because CORE_LD* instructions are marked as MCID::UnmodeledSideEffects in BPFGenInstrInfo.inc (maybe something to adjust):

// called from MachineSinking::SinkInstruction
bool MachineInstr::isSafeToMove(AAResults *AA, bool &SawStore) const {
  if (... hasUnmodeledSideEffects())
    return false;
  ...
}

sock_destroy_prog.bpf.o

Without ctx: 102 instructions
With ctx: 101 instructions

In the following code fragment:

	if (ctx->protocol == IPPROTO_TCP)
		bpf_map_update_elem(&tcp_conn_sockets, &key, &sock_cookie, 0);
	else if (ctx->protocol == IPPROTO_UDP)
		bpf_map_update_elem(&udp_conn_sockets, &keyc, &sock_cookie, 0);
	else
		return 1;

Version w/o btf_decl_tag("ctx") keeps two loads for ctx->protocol because of the llvm.bpf.passthrough call. Version with btf_decl_tag("ctx") eliminates second load via a combination of EarlyCSEPass/InstCombinePass/SimplifyCFGPass passes.

socket_cookie_prog.bpf.o

Without ctx: 66 instructions
With ctx: 66 instructions

For the following C code fragment:

SEC("sockops")
int update_cookie_sockops(struct bpf_sock_ops *ctx)
{
	struct bpf_sock *sk = ctx->sk;
	struct socket_cookie *p;

	if (ctx->family != AF_INET6)
		return 1;

	if (ctx->op != BPF_SOCK_OPS_TCP_CONNECT_CB)
		return 1;

	if (!sk)
		return 1;
    ...
}

Code with decl_tag("ctx") does reordering for ctx->sk load relative to ctx->family and ctx->op loads:

//  old                                       new
 <update_cookie_sockops>:                  <update_cookie_sockops>:
    r6 = r1                                   r6 = r1
-   r2 = *(u64 *)(r6 + 0xb8)
    r1 = *(u32 *)(r6 + 0x14)                  r1 = *(u32 *)(r6 + 0x14)
    if w1 != 0xa goto +0x13 <LBB1_6>          if w1 != 0xa goto +0x14 <LBB1_6>
    r1 = *(u32 *)(r6 + 0x0)                   r1 = *(u32 *)(r6 + 0x0)
    if w1 != 0x3 goto +0x11 <LBB1_6>          if w1 != 0x3 goto +0x12 <LBB1_6>
                                          +   r2 = *(u64 *)(r6 + 0xb8)
    if r2 == 0x0 goto +0x10 <LBB1_6>          if r2 == 0x0 goto +0x10 <LBB1_6>
    r1 = 0x0 ll                               r1 = 0x0 ll
    r3 = 0x0                                  r3 = 0x0

Code w/o decl_tag("ctx") uses CORE_LD* instructions for these loads and does not reorder loads due to reasons as in netns_cookie_prog.bpf.o.

test_lwt_reroute.bpf.o

Without ctx: 18 instructions
With ctx: 17 instructions

The difference boils down EarlyCSEPass being able to remove last load in the store/load pair:

llvm
; Before EarlyCSEPass
  store i32 %and, ptr %mark24, align 8
  %mark25 = getelementptr inbounds %struct.__sk_buff, ptr %skb, i32 0, i32 2
  %19 = load i32, ptr %mark25, align 8
  %cmp26 = icmp eq i32 %19, 0
; After EarlyCSEPass
  %and = and i32 %cond, 255
  store i32 %and, ptr %mark, align 8
  %cmp26 = icmp eq i32 %and, 0

And unable to do so when get.element.and.{store,load} intrinsics are used. Which leads to slight codegen differences downstream.

test_sockmap_invalid_update.bpf.o

Without ctx: 13 instructions
With ctx: 12 instructions

In the following C fragment:

c
	if (skops->sk)
		bpf_map_update_elem(&map, &key, skops->sk, 0);

Code with decl_tag("ctx") loads skops->sk only once. Code w/o decl_tag("ctx") uses CO-RE relocations and does load twice. As with sock_destroy_prog.bpf.o, EarlyCSEPass does not consolidate identical %x = call llvm.bpf.passthrough; load %x pairs.

type_cast.bpf.o

Without ctx: 96 instructions
With ctx: 96 instructions

__builtin_memcpy(name, dev->name, IFNAMSIZ) is unrolled in a different order. No idea why.

core_kern.bpf.o, test_verif_scale2.bpf.o

For both programs number of instructions is unchanged (11249, 12286). Some instructions have different order after DAG->DAG Pattern Instruction Selection. Instruction selection with CO-RE and non-CO-RE loads produces slightly different result.

...
Right. Such recursive propagation of PAI is necessary. For btf_tag we cannot do it. Always propagating it won't be correct.
New preserve_const_field_offset would need to be propagated too and we actually have nested unions.
__bpf_md_ptr is such example. btf_tag wouldn't propagate into that union, but attr(preserve_const_field_offset) should.

Hi Alexei,

It just occurred to me that such an attribute would also require DWARF and BTF encoding in order to get reflected in vmlinux.h (which we already have for btf_decl_tag). Given this I think we can rename decl tag "ctx" to btf_decl_tag("preserve_const_field_offset") but we should still keep it a btf_decl_tag. I'll try to replace usage of bpf_context_marker intrinsic by metadata, if that fails will just rename the intrinsic to preserve_const_field_offset.
What do you think? (Sorry, I should have thought this through last week).

ast added a comment.Sep 4 2023, 2:21 PM

...
Right. Such recursive propagation of PAI is necessary. For btf_tag we cannot do it. Always propagating it won't be correct.
New preserve_const_field_offset would need to be propagated too and we actually have nested unions.
__bpf_md_ptr is such example. btf_tag wouldn't propagate into that union, but attr(preserve_const_field_offset) should.

Hi Alexei,

It just occurred to me that such an attribute would also require DWARF and BTF encoding in order to get reflected in vmlinux.h (which we already have for btf_decl_tag). Given this I think we can rename decl tag "ctx" to btf_decl_tag("preserve_const_field_offset") but we should still keep it a btf_decl_tag. I'll try to replace usage of bpf_context_marker intrinsic by metadata, if that fails will just rename the intrinsic to preserve_const_field_offset.
What do you think? (Sorry, I should have thought this through last week).

I still feel that new attr is much cleaner from llvm implementation/design perspective and vmlinux.h inconvenience should be low priority in this considerations.
Since ctx only applies to uapi/bpf.h header the users don't have to use vmlinux.h. I know that today we have pains combining uapi headers and vmlinux.h and several solutions were
proposed. None were accepted yet, but that shouldn't mean that we should sacrifice llvm implementation due to orthogonal issues.
As a temporary workaround for vmlinux.h we can have uapi/bpf.h to do attr(preserve_const_field_offset) _and_ btf_decl_tag("bpf_ctx_struct"). Then teach pahole to emit attr(preserve_const_field_offset)
when it sees btf_decl_tag("bpf_ctx_struct") in vmlinux BTF.
Other workarounds are possible.

eddyz87 updated this revision to Diff 556819.Sep 14 2023, 4:50 PM
eddyz87 retitled this revision from [BPF] Attribute btf_decl_tag("ctx") for structs to [BPF] Attribute preserve_static_offset for structs.
eddyz87 edited the summary of this revision. (Show Details)Sep 14 2023, 4:50 PM
  • Use (new) attribute((preserve_static_offset)) instead of attribute((btf_decl_tag("ctx"))).
  • Rename files, passes, etc accordingly.
eddyz87 updated this revision to Diff 556858.Sep 15 2023, 8:01 AM

Fix for failed unit test pragma-attribute-supported-attributes-list.test

ast added a subscriber: jemarch.Sep 19 2023, 2:36 AM

I don't have a whole lot of opinions on the attribute itself (I'm not super familiar with BPF), but did spot some things to do on the Clang side. This will also need reviewers for the LLVM changes -- any ideas on who usually reviews BPF-related changes in LLVM?

clang/lib/Sema/SemaDeclAttr.cpp
7626–7627
8877
clang/test/Sema/bpf-attr-preserve-static-offset.c
1 ↗(On Diff #556858)

You should also add a -verify test that verifies we diagnose applying the attribute to something other than a structure or union, accepts no arguments, is diagnosed on a non-BPF target, etc to ensure we've got correct diagnostic behavior.

The CFE stuff is pretty innocuous and I don't see a reason to stop this on our accord, but would like someone familiar with the LLVM stuff to review this at one point.

clang/lib/CodeGen/CGExpr.cpp
3723
3727

You can use E->getType()->getPointeeType() here instead, unless you REALLY care that this is only a normal-pointer deref (and not a PMF or reference type).

3825

I'd suggest just making hasBPFPreserveStaticOffset be nullptr tolerant and remove the 1st half of this.

clang/lib/Sema/SemaDeclAttr.cpp
7630

This should use BPFPreserveStaticOffsetAttr::Create instead of using placement new. We've been meaning to translate these all at one point, but never got around to it.

8877

Or this, this is probably better.

llvm/test/CodeGen/BPF/preserve-static-offset/load-align.ll
61 ↗(On Diff #556858)

Are we sure we want to do something like this? It seems this both depends on YOUR computer AND us never releasing Clang 18.

eddyz87 updated this revision to Diff 557502.Sep 29 2023, 5:17 PM
eddyz87 marked 8 inline comments as done.

Rebase, changes as requested by @aaron.ballman and @erichkeane.

Hi @aaron.ballman, @erichkeane,

Thank you for taking a look.
I beleive this commit covers all feedback except "clang version"
metadata comment by @erichkeane, I added inline reply there.

This will also need reviewers for the LLVM changes -- any ideas on
who usually reviews BPF-related changes in LLVM?

I'll communicate with @ast and @yonghong-song.

Rebase, changes as requested by @aaron.ballman and @erichkeane.

Hi @aaron.ballman, @erichkeane,

Thank you for taking a look.
I beleive this commit covers all feedback except "clang version"
metadata comment by @erichkeane, I added inline reply there.

This will also need reviewers for the LLVM changes -- any ideas on
who usually reviews BPF-related changes in LLVM?

I'll communicate with @ast and @yonghong-song.

I don't see the comment response you had to me.

clang/lib/CodeGen/CGExpr.cpp
3729

getPointeeType can also return nullptr, so unless you have a test elsewhere to ensure it isn't, you likely have to do a little more work here (and if so, I'd need an assert).

I don't see the comment response you had to me.

Sorry, forgot to click submit.

clang/lib/Sema/SemaDeclAttr.cpp
7630

Replaced by a call to handleSimpleAttribute as suggested by Aaron.

clang/test/Sema/bpf-attr-preserve-static-offset.c
1 ↗(On Diff #556858)

-verify looks neat, thank you.

I've added two test cases:

  • clang/test/Sema/bpf-attr-preserve-static-offset-warns.c
  • clang/test/Sema/bpf-attr-preserve-static-offset-warns-nonbpf.c

Those check for things you listed:

  • can't be used for non-bpf target
  • can't take parameters
  • is error to put it on something other than struct / union.

Tbh, I don't know if there is anything else to test in this regard.

llvm/test/CodeGen/BPF/preserve-static-offset/load-align.ll
61 ↗(On Diff #556858)

Are you sure this would be an issue?
The specific line is not a part of a CHECK and I tried the following command using my system's llvm 16 opt:

opt -O2 -mtriple=bpf-pc-linux -S -o - load-align.ll

And module was loaded / processed w/o any issues.
In general grepping shows that people don't usually mask these in tests:

$ cd llvm/test/CodeGen/
$ ag '{!"clang version' | wc -l
452
eddyz87 added inline comments.Oct 2 2023, 7:33 AM
clang/lib/CodeGen/CGExpr.cpp
3729

Is it? I actually double-checked this before pushing an update, clangd jumps to the following definition:

QualType Type::getPointeeType() const {
  if (const auto *PT = getAs<PointerType>())
    return PT->getPointeeType();
  ...
  return {};
}

The getAsRecordDecl() can return null indeed, but that null is checked.

erichkeane added inline comments.Oct 2 2023, 7:38 AM
clang/lib/CodeGen/CGExpr.cpp
3729

More correctly, it returns an empty QualType. The operator-> on that will cause a nullptr, causing the call to Type::getAsRecordDecl to have a nullptr this.

llvm/test/CodeGen/BPF/preserve-static-offset/load-align.ll
61 ↗(On Diff #556858)

I don't write LLVM tests ever, so I'm not sure. It just seems odd to provide that much irrelevant info, perhaps one of hte LLVM reviewers can comment. Also, look at those ~450 and see what they contain?

eddyz87 added inline comments.Oct 2 2023, 7:45 AM
clang/lib/CodeGen/CGExpr.cpp
3729

Oh, right, I'll add an additional check, thank you.

llvm/test/CodeGen/BPF/preserve-static-offset/load-align.ll
61 ↗(On Diff #556858)

Also, look at those ~450 and see what they contain?

Same random clang versions:

$ ag '{!"clang version' | head
X86/debug-loclists.ll:129:!6 = !{!"clang version 10.0.0 (trunk 374581) (llvm/trunk 374579)"}
X86/dbg-combine.ll:84:!11 = !{!"clang version 3.7.0 (trunk 227074)"}
X86/debuginfo-locations-dce.ll:46:!6 = !{!"clang version 8.0.0 (trunk 339665)"}
X86/pr31242.ll:48:!5 = !{!"clang version 4.0.0 (trunk 288844)"}
X86/catchpad-regmask.ll:140:!1 = !{!"clang version 3.8.0 "}
X86/debug-nodebug-crash.ll:48:!5 = !{!"clang version 4.0.0"}
X86/limit-split-cost.mir:64:  !2 = !{!"clang version 7.0.0 (trunk 335057)"}
X86/swap.ll:169:!1 = !{!"clang version 9.0.0 (trunk 352631) (llvm/trunk 352632)"}
X86/dwarf-aranges-available-externally.ll:65:!16 = !{!"clang version 15.0.0 (https://github.com/llvm/llvm-project.git 2f52a868225755ebfa5242992d3a650ac6aadce7)"}
X86/label-annotation.ll:96:!7 = !{!"clang version 9.0.0 (git@github.com:llvm/llvm-project.git 7f9a008a2db285aca57bfa0c09858c9527a7aa98)"}
erichkeane added inline comments.Oct 2 2023, 7:47 AM
llvm/test/CodeGen/BPF/preserve-static-offset/load-align.ll
61 ↗(On Diff #556858)

It seems at least removing your home-path would be a good idea, but I can't really review these either.

Note the 'trunk #' is from our SVN days, so that isn't particularly useful at all. IMO everything in the parens is worthless in the tests, but hopefully someone familiar with the LLVM tests can stop by and correct me.

eddyz87 added inline comments.Oct 2 2023, 7:57 AM
llvm/test/CodeGen/BPF/preserve-static-offset/load-align.ll
61 ↗(On Diff #556858)

Most of the tests use generated IR as a starting point and it looks like people don't bother to hide these details. But I checked and test passes if I replace this string with just "clang", I'll update the tests, not a big deal.

eddyz87 updated this revision to Diff 557541.Oct 2 2023, 3:02 PM

Additional changes to address feedback from @erichkeane:

  • Update for hasBPFPreserveStaticOffset: added null check for getPointeeType() result (note: currently this function is only called for array base, so getPointeeType() always returns a non-null but I think that having a null check instead of assert is still better in this context).
  • Update for all .ll test cases to remove references to clang version or compilation directory from all metadata.
eddyz87 marked 4 inline comments as done.Oct 2 2023, 3:03 PM
erichkeane accepted this revision.Oct 3 2023, 7:29 AM

2 nits on the CFE, else LGTM! Obviously you still need an LLVM reviewer for the backend changes.

clang/lib/CodeGen/CGExpr.cpp
3730

We override operator bool to make this work.

3732

if possible?

This revision is now accepted and ready to land.Oct 3 2023, 7:29 AM
eddyz87 updated this revision to Diff 557580.Oct 3 2023, 5:12 PM
eddyz87 marked an inline comment as done.

Rebase, added "const" qualifier in hasBPFPreserveStaticOffset().

eddyz87 added inline comments.Oct 3 2023, 5:14 PM
clang/lib/CodeGen/CGExpr.cpp
3730

Sorry, just to clarify, currently such modification fails with the following error:

clang/lib/CodeGen/CGExpr.cpp:3710:7: error: invalid argument type 'QualType' to unary expression
  if (!PointeeType)
      ^~~~~~~~~~~~
1 error generated.

And you want me to modify QualType as follows:

--- a/clang/include/clang/AST/Type.h
+++ b/clang/include/clang/AST/Type.h
@@ -796,6 +796,8 @@ public:
     return getTypePtr();
   }
 
+  explicit operator bool() const { return isNull(); }
+
   bool isCanonical() const;
   bool isCanonicalAsParam() const;

Right?

erichkeane added inline comments.Oct 4 2023, 6:53 AM
clang/lib/CodeGen/CGExpr.cpp
3730

No, don't do that, you can leave it just checking isNull. I could have sworn we already had that operator, but perhaps it was removed at one point.

eddyz87 marked 3 inline comments as done.Oct 4 2023, 6:56 AM
eddyz87 added inline comments.
clang/lib/CodeGen/CGExpr.cpp
3730

Understood.
Thank you for the review.

ast added inline comments.Oct 10 2023, 11:55 AM
clang/lib/CodeGen/CGExpr.cpp
3826

If I'm reading this correctly wrapping with preserve_static_offset doesn't prevent further preserver_access_index wrapping which is a wasted effort for pai at the end ?

eddyz87 marked an inline comment as done.Oct 10 2023, 12:00 PM
eddyz87 added inline comments.
clang/lib/CodeGen/CGExpr.cpp
3826

Yes, pai calls are undone in BPFPreserveStaticOffset.cpp:removePAICalls(). I can put back the logic that suppresses pai if preserve static offset is present.

ast added inline comments.Oct 10 2023, 1:15 PM
clang/lib/CodeGen/CGExpr.cpp
3826

I see. I guess I missed a previous discussion. Why this approach was chosen?

eddyz87 added inline comments.Oct 10 2023, 1:20 PM
clang/lib/CodeGen/CGExpr.cpp
3826

Initial version used __attribute__((btf_decl_tag("ctx"))) and Yonghong did not want to have prioritization between btf_decl_tag and preserve_access_index basing on decl tag string parameter. Now this limitation is gone (and I think this was one of your arguments in favor of separate attribute).

ast accepted this revision.Oct 10 2023, 1:28 PM
ast added inline comments.
clang/lib/CodeGen/CGExpr.cpp
3826

Ahh. Right. It made sense to avoid special treatment of strings in decl_tag, but now it's gone and PSO takes precedence over PAI. Here we're adding PAI just to remove it later. Looks like a waste of cpu cycles and code. Unless applying PAI to outer struct and PSO in inner makes implementation tricky. I doubt we need to support such combo though. I'm fine cleaning this up in a follow up. If such cleanup makes sense at all.

eddyz87 added inline comments.Oct 10 2023, 1:30 PM
clang/lib/CodeGen/CGExpr.cpp
3826

If such cleanup makes sense at all.

I think it does, I'll implement this change and push an update.

I went through LLVM part of the change, mostly look good to me from high-level and I need to go through another pass with details.
It would be great if you can post corresponding kernel patches which utilizes this functionality?

clang/include/clang/Basic/AttrDocs.td
2227 ↗(On Diff #557580)

g->b.a ?

llvm/lib/Target/BPF/BPFPreserveStaticOffset.cpp
82 ↗(On Diff #557580)

Is it possible that such a pattern already changed by GVN/CSE so later on converting to preserve_static_offset becomes impossible? I guess it is unlikely but want to double check. Maybe add a comment to clarify.

102 ↗(On Diff #557580)

So the above implies that this is bad for CSE/GVN due to different representation, right? But we want to do (1) since we want to avoid later CSE/GVN etc, so argument of clobbering is not that important, right?

I tried a couple of examples with both preserve_access_index and preserve_static_offset (both 'preserve_access_index preserve_static_offset' and 'preserve_static_offset preserve_access_index'). Looks they cooperate with each other properly. preserve_static_offset is for base pointer while
preserve_access_index is for offset. So the current implementation seems OKAY. But preserve_static_offset actually means 'static offset' (no offset change),
so
preserve_access_index seems not necessary, so if possible maybe clang frontend should make a choice to remove preserve_access_index if
preserve_static_offset is there. But this can be done later (need frontend review again, need to update doc etc.)

eddyz87 updated this revision to Diff 558190.Wed, Nov 29, 4:35 PM
eddyz87 marked an inline comment as done.

Fix for documentation type g->a.b -> g->b.a.

eddyz87 marked an inline comment as done.Wed, Nov 29, 4:45 PM
eddyz87 added inline comments.
llvm/lib/Target/BPF/BPFPreserveStaticOffset.cpp
82 ↗(On Diff #557580)

Actually yes, that is possible. It can probably be mitigated by allowing non-constant indices in the get.and.{load,store} functions. As we already agreed to land this, I'll do this as a separate MR.

102 ↗(On Diff #557580)

We want to forbid some changes by CSE/GVN but allow others. E.g. suppose the following example:

struct foo {
  int a;
  int b;
} __attribute__((preserve_static_offset));

void consume(int);

static void bar(int *p) { consume(*p); }
void foo(struct foo *foo) {
  consume(foo->b);
  bar(&foo->b);
}

After inlining of bar to foo it would look like:

consume(foo->b);
consume(foo->b);

And I attempted not to break CSE/GVN for such situations.

eddyz87 updated this revision to Diff 558196.Thu, Nov 30, 5:12 AM
eddyz87 marked an inline comment as done.

Rebase, hope for green CI.

This revision was automatically updated to reflect the committed changes.
eddyz87 reopened this revision.Thu, Nov 30, 5:20 PM

Reopening revision to fix errors reported by testbot.

This revision is now accepted and ready to land.Thu, Nov 30, 5:20 PM
eddyz87 updated this revision to Diff 558201.EditedThu, Nov 30, 5:20 PM

Fixes for issues reported by testbot:

  • bpf-preserve-static-offset-bitfield.c updated to use -target bpfel instead of -target bpf to avoid different code generation between big and little endian machines.
  • BPFPreserveStaticOffset.cpp:removePAICalls() updated to avoid use after free for WorkList elements: series of if (isXXX(V)) removeXXX(V) modified to be else-if statement.
This revision was automatically updated to reflect the committed changes.