Page MenuHomePhabricator

[BPF] Support for compile once and run everywhere
ClosedPublic

Authored by yonghong-song on May 3 2019, 10:56 AM.

Details

Summary

Introduction

This patch added intial support for bpf program compile once
and run everywhere (CO-RE).

The main motivation is for bpf program which depends on
kernel headers which may vary between different kernel versions.
The initial discussion can be found at https://lwn.net/Articles/773198/.

Currently, bpf program accesses kernel internal data structure
through bpf_probe_read() helper. The idea is to capture the
kernel data structure to be accessed through bpf_probe_read()
and relocate them on different kernel versions.

On each host, right before bpf program load, the bpfloader
will look at the types of the native linux through vmlinux BTF,
calculates proper access offset and patch the instruction.

To accommodate this, three intrinsic functions

preserve_{array,union,struct}_access_index

are introduced which in clang will preserve the base pointer,
struct/union/array access_index and struct/union debuginfo type
information. Later, bpf IR pass can reconstruct the whole gep
access chains without looking at gep itself.

This patch did the following:

. An IR pass is added to convert preserve_*_access_index to
  global variable who name encodes the getelementptr  
  access pattern. The global variable has metadata
  attached to describe the corresponding struct/union
  debuginfo type.
. An SimplifyPatchable MachineInstruction pass is added
  to remove unnecessary loads.
. The BTF output pass is enhanced to generate relocation
  records located in .BTF.ext section.

Typical CO-RE also needs support of global variables which can
be assigned to different values to different hosts. For example,
kernel version can be used to guard different versions of codes.
This patch added the support for patchable externals as well.

Example

The following is an example.

struct pt_regs {
  long arg1;
  long arg2;
};
struct sk_buff {
  int i;
  struct net_device *dev;
};

#define _(x) (__builtin_preserve_access_index(x))
static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) =
        (void *) 4;
extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version;
int bpf_prog(struct pt_regs *ctx) {
  struct net_device *dev = 0;

  // ctx->arg* does not need bpf_probe_read
  if (__kernel_version >= 41608)
    bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev));
  else
    bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev));
  return dev != 0;
}

In the above, we want to translate the third argument of
bpf_probe_read() as relocations.

-bash-4.4$ clang -target bpf -O2 -g -S trace.c

The compiler will generate two new subsections in .BTF.ext,
OffsetReloc and ExternReloc.
OffsetReloc is to record the structure member offset operations,
and ExternalReloc is to record the external globals where
only u8, u16, u32 and u64 are supported.

 BPFOffsetReloc Size
 struct SecLOffsetReloc for ELF section #1
 A number of struct BPFOffsetReloc for ELF section #1
 struct SecOffsetReloc for ELF section #2
 A number of struct BPFOffsetReloc for ELF section #2
 ...
 BPFExternReloc Size
 struct SecExternReloc for ELF section #1
 A number of struct BPFExternReloc for ELF section #1
 struct SecExternReloc for ELF section #2
 A number of struct BPFExternReloc for ELF section #2

struct BPFOffsetReloc {
  uint32_t InsnOffset;    ///< Byte offset in this section
  uint32_t TypeID;        ///< TypeID for the relocation
  uint32_t OffsetNameOff; ///< The string to traverse types
};

struct BPFExternReloc {
  uint32_t InsnOffset;    ///< Byte offset in this section
  uint32_t ExternNameOff; ///< The string for external variable
};

Note that only externs with attribute section ".BPF.patchable_externs"
are considered for Extern Reloc which will be patched by bpf loader
right before the load.

For the above test case, two offset records and one extern record
will be generated:

OffsetReloc records:
      .long   .Ltmp12                 # Insn Offset
      .long   7                       # TypeId
      .long   242                     # Type Decode String
      .long   .Ltmp18                 # Insn Offset
      .long   7                       # TypeId
      .long   242                     # Type Decode String

ExternReloc record:
      .long   .Ltmp5                  # Insn Offset
      .long   165                     # External Variable

In string table:
      .ascii  "0:1"                   # string offset=242
      .ascii  "__kernel_version"      # string offset=165

The default member offset can be calculated as

the 2nd member offset (0 representing the 1st member) of struct "sk_buff".

The asm code:

.Ltmp5:
.Ltmp6:
        r2 = 0
        r3 = 41608
.Ltmp7:
.Ltmp8:
        .loc    1 18 9 is_stmt 0        # t.c:18:9
.Ltmp9:
        if r3 > r2 goto LBB0_2
.Ltmp10:
.Ltmp11:
        .loc    1 0 9                   # t.c:0:9
.Ltmp12:
        r2 = 8
.Ltmp13:
        .loc    1 19 66 is_stmt 1       # t.c:19:66
.Ltmp14:
.Ltmp15:
        r3 = *(u64 *)(r1 + 0)
        goto LBB0_3
.Ltmp16:
.Ltmp17:
LBB0_2: 
        .loc    1 0 66 is_stmt 0        # t.c:0:66
.Ltmp18:
        r2 = 8
        .loc    1 21 66 is_stmt 1       # t.c:21:66
.Ltmp19:
        r3 = *(u64 *)(r1 + 8)
.Ltmp20:
.Ltmp21:
LBB0_3:
        .loc    1 0 66 is_stmt 0        # t.c:0:66
        r3 += r2
        r1 = r10
.Ltmp22:
.Ltmp23:
.Ltmp24:
        r1 += -8
        r2 = 8
        call 4

For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number
8 is the structure offset based on the current BTF.
Loader needs to adjust it if it changes on the host.

For instruction .Ltmp5, "r2 = 0", the external variable
got a default value 0, loader needs to supply an appropriate
value for the particular host.

Compiling to generate object code and disassemble:

0000000000000000 bpf_prog:
        0:       b7 02 00 00 00 00 00 00         r2 = 0
        1:       7b 2a f8 ff 00 00 00 00         *(u64 *)(r10 - 8) = r2
        2:       b7 02 00 00 00 00 00 00         r2 = 0
        3:       b7 03 00 00 88 a2 00 00         r3 = 41608
        4:       2d 23 03 00 00 00 00 00         if r3 > r2 goto +3 <LBB0_2>
        5:       b7 02 00 00 08 00 00 00         r2 = 8
        6:       79 13 00 00 00 00 00 00         r3 = *(u64 *)(r1 + 0)
        7:       05 00 02 00 00 00 00 00         goto +2 <LBB0_3>

 0000000000000040 LBB0_2:
        8:       b7 02 00 00 08 00 00 00         r2 = 8
        9:       79 13 08 00 00 00 00 00         r3 = *(u64 *)(r1 + 8)

 0000000000000050 LBB0_3:
       10:       0f 23 00 00 00 00 00 00         r3 += r2
       11:       bf a1 00 00 00 00 00 00         r1 = r10
       12:       07 01 00 00 f8 ff ff ff         r1 += -8
       13:       b7 02 00 00 08 00 00 00         r2 = 8
       14:       85 00 00 00 04 00 00 00         call 4

Instructions #2, #5 and #8 need relocation resoutions from the loader.

Diff Detail

Repository
rL LLVM

Event Timeline

yonghong-song created this revision.May 3 2019, 10:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2019, 10:56 AM

I've only skimmed the patch, but the BPFAbstractMemberAccess pass seems like a bad idea. In general, given a series of pointer arithmetic instructions in optimized LLVM IR, it is not possible to recover the original C types; LLVM types are not IR types, and we intentionally do not have any mapping from LLVM types to clang types. The names are only a weak hint to people reading the IR, and the layouts are only a weak hint to the optimizer about the values stored in the underlying memory.

Your commit message specifically calls out geps with zero indexes, and unions, but those are definitely not the only cases where you can run into trouble. The optimizer will rewrite your GEPs in unfriendly ways if it thinks the result is simpler; this can mean, for example, that you end up with a gep on a completely different type than the one used by the original code.

If you need to reliably support something like Objective-C non-fragile ivars, at least some of that support needs to be implemented in clang: instead of trying to rewrite geps as an IR "optimization", the IR emitted by the clang frontend should contain some representation of the appropriate arithmetic.

@efriedma thanks for the comments. I will take a look at clang gep generation to see whether relocatable gep can be done where.

yonghong-song edited the summary of this revision. (Show Details)
yonghong-song edited reviewers, added: eli.friedman; removed: JiongWang.
yonghong-song added a subscriber: JiongWang.

clang encodes struct/union typename, field_index, array subscript and the original getelementptr base address into preserve_di_access_index intrinsics. this way, the BPF IR passes can just look at these intrinsics to get the original type info in order to construct the complete gep access pattern.

ormris removed a subscriber: ormris.May 10 2019, 3:54 PM

A new apporach to use intrindics to pass info. from clang is used. The following two patches implements the intrinsics:

https://reviews.llvm.org/D61809
https://reviews.llvm.org/D61810

My current implementation has an issue like below:

For the intrinsic call,

@0 = private unnamed_addr constant [8 x i8] c"sk_buff\00", align 1
...
  %4 = tail call %struct.anon* @llvm.preserve.di.access.index.p0s_struct.anons.p0s_struct.anons.p0s_struct.sk_buffs(%struct.anon* nonnull %3, %struct.sk_buff* %0, i8* getelementptr inbounds ([8 x i8], [8 x i8]* @0, i64 0, i64 0), i32 1), !dbg !43

The third argument i8* getelementptr inbounds ([8 x i8], [8 x i8]* @0, i64 0, i64 0) is constructed with CreateGlobalStringPtr which uses GEP internally.

For a test example, if I am using the clang directly to generate .s, everything is okay. But if I am using llc on IR instead, I hit a verification error.

-bash-4.4$ clang -target bpf -O2 -g -Xclang -target-feature -Xclang +offsetreloc -S struct.c                                                 
-bash-4.4$ clang -target bpf -O2 -g -Xclang -target-feature -Xclang +offsetreloc -S -emit-llvm struct.c
-bash-4.4$ llc -march=bpfel -filetype=asm -mattr=offsetreloc struct.ll
Global is referenced by parentless instruction!
[8 x i8]* @0
; ModuleID = 'struct.ll'
  <badref> = getelementptr inbounds [8 x i8], [8 x i8]* @0, i64 0, i64 0
Global is referenced by parentless instruction!
[1 x i8]* @1
; ModuleID = 'struct.ll'
  <badref> = getelementptr inbounds [1 x i8], [1 x i8]* @1, i64 0, i64 0
Global is referenced by parentless instruction!
[1 x i8]* @1
; ModuleID = 'struct.ll'
  <badref> = getelementptr inbounds [1 x i8], [1 x i8]* @1, i64 0, i64 0
LLVM ERROR: Broken module found, compilation aborted!
-bash-4.4$

This could be related to my code below

Call->replaceAllUsesWith(Call->getArgOperand(0));
Call->eraseFromParent();

But I do not understand why it works with clang alone and fails with clang/llc.

Have anybody seen similar issues?

@eli.friedman I just uploaded a new revision which uses a newly-added intrinsics to pass information from clang generated IR to BPF IR pass. Could you take a look and give your opinion on this approach?
This patch is not mergeable as I have a llvm verification failure described in my previous comments. If you have some insight on this issue, please let me know. Thanks!

"Global is referenced by parentless instruction" means in memory, there's an instruction which refers to the global, but wasn't inserted into a basic block in any function in that module. This indicates you're either creating a GEP and not inserting it into the module correctly, or memory is corrupted due to a use-after-free or something like that. I don't see any obvious issue, but I'm not reading very carefully. asan might be helpful.

In terms of the general approach, I think you're getting closer; I'll reply on D61809.

efriedma added inline comments.May 10 2019, 6:38 PM
lib/Target/BPF/BPFAbstrctMemberAccess.cpp
287 ↗(On Diff #199103)

getAsInstruction probably doesn't do what you want here. It actually creates a new Instruction*, instead of casting the expression. Probably this is the source of the verifier error.

getAsInstruction probably doesn't do what you want here. It actually creates a new Instruction*, instead of casting the expression. Probably this is the source of the verifier error.

Yes, this is indeed the problem. Thanks @eli.friedman Changing it to directly check opcode (Instruction::GetElementPtr) and operands fixed the issue.

yonghong-song edited the summary of this revision. (Show Details)

use the new preserve_*_access_index intrinsics

yonghong-song edited the summary of this revision. (Show Details)

changes due to intrinsics change

yonghong-song edited the summary of this revision. (Show Details)
ast added inline comments.May 23 2019, 6:10 PM
lib/Target/BPF/BPFAbstrctMemberAccess.cpp
522 ↗(On Diff #200641)

does it still need to be limited to bpf_probe_read?
Shouldn't new __builtin_preserve_access_index intrinsic allow relocations to be inserted in all possible helpers?

yonghong-song marked an inline comment as done.May 23 2019, 6:52 PM
yonghong-song added inline comments.
lib/Target/BPF/BPFAbstrctMemberAccess.cpp
522 ↗(On Diff #200641)

Yes, the new __builtin_preserve_access_index should make BPF backend does not care about bpf_probe_read at all. The only reason I did here is for debugging. People may miss some relocatable bpf_probe_read. Using +checkoffsetreloc will print out all bpf_probe_read() calls without relocatable bpf_probe_read(). I thought this is a good debugging feature.

If we do not want to provide this debugging facility, the whole bpf_probe_read() things can be removed.

ast added inline comments.May 23 2019, 7:00 PM
lib/Target/BPF/BPFAbstrctMemberAccess.cpp
522 ↗(On Diff #200641)

it seems to be that probe_read check also gating relocation work.
If some other call uses __builtin_preserve_access_index() it will stay un-convereted
and will probably cause compilation failure?

It indeed makes sense to have it for debugging, but probably as a backend flag so that users can catch not only non-relocated probe_read, but other bpf helpers in the future?

yonghong-song marked an inline comment as done.May 23 2019, 10:59 PM
yonghong-song added inline comments.
lib/Target/BPF/BPFAbstrctMemberAccess.cpp
522 ↗(On Diff #200641)

If some other call uses __builtin_preserve_access_index() it will stay un-convereted

and will probably cause compilation failure?
I think we would prefer to flag this out as they are unnecessary and may cause performance regressions.

It indeed makes sense to have it for debugging, but probably as a backend flag so that users can catch not only non-relocated probe_read, but other bpf helpers in the future?

The current flag +checkoffsetreloc can be renamed as something like check. The flag is for BPF backend only. The clang also recognizes it since for CO-RE, many compilation will simply use clang only. The check can check for offsetreloc, but also check any future helpers, etc. Or we could have checklevel=# with default level 0. The higher checklevel is, the more compiler checking.

yonghong-song edited the summary of this revision. (Show Details)

remove BPF "checkoffsetreloc" subtarget option as just honoring what user says. Later on, if needed, we can introduce a pass to help show user what is transformed and what is not so user can decide whether there is any missing __builtin_preserve_access_index()'s.

ast accepted this revision.May 28 2019, 5:21 PM
This revision is now accepted and ready to land.May 28 2019, 5:21 PM

supported new .maps format by avoiding aggressive pointee type pruning for those map definition structures.
(https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=abd29c9314595b1ee5ec6c61d7c49a497ffb30a3)

ast accepted this revision.Jun 24 2019, 12:36 PM

Handling the following case:

// Collect MapDef types. Map definition needs to collect
// pointee types. Do it first. Otherwise, for the following
// case:
//    struct m { ...};
//    struct t {
//      struct m *key;
//    };
//    foo(struct t *arg);
//    
//    struct mapdef {
//      ...
//      struct m *key;
//      ...
//    } __attribute__((section(".maps"))) hash_map;
//
// If subroutine foo is traversed first, a type chain
// "ptr->struct m(fwd)" will be created and later on
// when traversing mapdef, since "ptr->struct m" exists,
// the traversal of "struct m" will be omitted.

Now, all kernel bpf selftest passed without any warning as well.

yonghong-song edited the summary of this revision. (Show Details)

fixed a bug in BTFDebug related to btf fwd type generation

for bpf maps section, change from strict .maps section name to .maps* section name, that is, the map section name starts with .maps

change some tests where spFlags will not be used in order for the codes easier backport to older compiler (llvm8).

This revision was automatically updated to reflect the committed changes.
thakis added inline comments.
llvm/trunk/lib/Target/BPF/CMakeLists.txt
16

Typo "Abstrct"

yonghong-song marked an inline comment as done.Jul 9 2019, 11:08 AM
yonghong-song added inline comments.
llvm/trunk/lib/Target/BPF/CMakeLists.txt
16

Thanks! Will fix it!