This patch tries to add BPF Debug Format (BTF)
for BPF target in LLVM.
What is BTF?
First, the BPF is a linux kernel virtual machine
and widely used for tracing, networking and security.
https://www.kernel.org/doc/Documentation/networking/filter.txt https://cilium.readthedocs.io/en/v1.2/bpf/
BTF is the debug info format for BPF, introduced in the below
linux patch
https://github.com/torvalds/linux/commit/69b693f0aefa0ed521e8bd02260523b5ae446ad7#diff-06fb1c8825f653d7e539058b72c83332
in the patch set mentioned in the below lwn article.
https://lwn.net/Articles/752047/
The BTF debug info will be passed to kernel, so
it is designed to be simple enough to (1) contain
just enough information the kernel BPF subsystem cares, and
(2) be simple enough for kernel to parse and verify.
The BTF format is specified in the above github commit.
In summary, its layout looks like
struct btf_header type subsection (a list of types) string subsection (a list of strings)
With such information, the kernel and the user space is able to
pretty print a particular bpf map key/value. One possible example below:
Withtout BTF: key: [ 0x01, 0x01, 0x00, 0x00 ] With BTF: key: struct t { a : 1; b : 1; c : 0} where struct is defined as struct t { char a; char b; short c; };
How BTF is generated?
Currently, the BTF is generated through pahole.
https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=68645f7facc2eb69d0aeb2dd7d2f0cac0feb4d69
and available in pahole v1.12
https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=4a21c5c8db0fcd2a279d067ecfb731596de822d4
Basically, the bpf program needs to be compiled with -g with
dwarf sections generated. The pahole is enhanced such that
dwarf can be converted to a .BTF section. This format
of the .BTF section matches the format expected by
the kernel, so a bpf loader can just take the .BTF section
and load it into the kernel.
https://github.com/torvalds/linux/commit/8a138aed4a807ceb143882fb23a423d524dcdb35
The .BTF section layout is also specified in this patch:
with file include/llvm/MC/MCBTFContext.h.
What use cases this patch tries to address?
Currently, only the bpf instruction stream is required to
pass to the kernel. The kernel verifies it, jits it if configured
to do so, attaches it to a particular kernel attachment point,
and later executes when a particular event happens.
This patch tries to expand BTF to support two more use cases below:
(1). BPF supports subroutine calls. During performance analysis, it would be good to differentiate which call is hot instead of just providing a virtual address. This would require to pass a unique identifier for each subroutine to the kernel, the subroutine name is a natual choice. (2). If a particular jitted instruction is hot, we want user to know which source line this jitted instruction belongs to. This would require the source information is available to various profiling tools.
Note that in a single ELF file,
. there may be multiple loadable bpf programs, . for a particular to-be-loaded bpf instruction stream, its instructions may come from multiple PROGBITS sections, the bpf loader needs to merge them together to a single consecutive insn stream before loading to the kernel.
For example:
section .text: subroutines funcFoo section _progA: calling funcFoo section _progB: calling funcFoo
The bpf loader could construct two loadable bpf instruction
streams and load them into the kernel:
. _progA funcFoo . _progB funcFoo
So per ELF section function offset and instruction offset
will need to be adjusted before passing to the kernel, and
the kernel essentially expect only one code section regardless
of how many in the ELF file.
What do we propose and Why?
To suppose the above two use cases, we propose to
add an additional section, .BTF.ext, to the ELF file
which is the input of the bpf loader.
The .BTF.ext section has a similar header to the .BTF section
and it contains two subsections for func_info and line_info.
. the func_info maps the func insn byte offset to a func type in the .BTF type subsection. . the line_info maps the insn byte offset to a line info. . both func_info and line_info subsections are organized by ELF PROGBITS AX sections.
The reason to use a different ELF section .BTF.ext than
extending the existing .BTF section:
. The existing .BTF section can be directly loaded into the kernel. The above proposed .BTF.ext contents cannot since bpf loader needs to perform certain merging among multiple ELF sections before the loading.
The layout of the .BTF.ext section can be found at
include/llvm/MC/MCBTFContext.h in this patch.
pahole seems not a good place to implement .BTF.ext as
pahole is mostly for structure hole information and more
importantly, we want to pass the actual code to the
kernel because of the following reasons:
. bpf program typically is small so storage overhead should be small. . in bpf land, it is totally possible that an application loads the bpf program into the kernel and then that application quits, so holding debug info by the user space application is not practical as you may not even know who loads this bpf program. . having source codes directly kept by kernel would ease deployment since the original source code does not need ship on every hosts and kernel-devel package does not need to be deployed even if kernel headers are used.
LLVM seems a good place to implement with the following
. The only reliable time to get the source code is during compilation time. This will result in both more accurate information and easier deployment as stated in the above. . Another consideration is for JIT. The project like bcc (https://github.com/iovisor/bcc) use MCJIT to compile a C program into bpf insns and load them to the kernel. The llvm generated BTF sections will be readily available for such cases as well.
Design and implementation of emiting .BTF.ext section
This patch implemented generation of .BTF.ext
section in llvm compiler. It implemented generation of
.BTF as well since .BTF.ext has dependence on it
for types and strings.
The BTF related ELF sections will be generated
when both -target bpf and -g are specified. Two sections
are generated:
.BTF contains all the type and string information, and .BTF.ext contains the func_info and line_info.
Note that dwarf sections will be still generated to
satisfy userspace tools like llvm-objdump or others
which relies on dwarf info.
. dwarf info is used for userspace applications like llvm-objdump or any others which inspect dwarf debug information. . BTF sections are used for kernel . When ready to deploy to different machines for execution, dwarf related sections can be stripped since the BPF loader and kernel only needs BTF sections.
The type and func_info are gathered during CodeGen/AsmPrinter
by traversing dwarf debug_info. The line_info is
gathered in MCObjectStreamer before writing to
the object file. After all the information is gathered,
the two sections are emitted in MCObjectStreamer::finishImpl.
The instruction byte offsets are generated by generating
Fixup records in MCObjectStreamer BTF emit function.
With cmake CMAKE_BUILD_TYPE=Debug, the compiler can
dump out all the tables except insn offset, which
will be resolved later as relocation records.
The debug type "btf" is used for BTFContext dump.
This patch also contains tests to verify generated
.BTF and .BTF.ext contents for all supported types,
func_info and line_info subsections, by comparing
llvm-readelf dumping of the section contents to
the expected values.
Note that the .BTF and .BTF.ext information will not
be emitted to assembly code and there is no assembler
support for BTF either.
In the below, with a clang/llvm built with CMAKE_BUILD_TYPE=Debug,
Each table contents are shown for a simple C program.
-bash-4.2$ cat -n test.c 1 struct A { 2 int a; 3 char b; 4 }; 5 6 int test(struct A *t) { 7 return t->a; 8 } -bash-4.2$ clang -O2 -target bpf -g -mllvm -debug-only=btf -c test.c Type Table: [1] FUNC NameOff=1 Info=0x0c000001 Size/Type=2 ParamType=3 [2] INT NameOff=12 Info=0x01000000 Size/Type=4 Desc=0x01000020 [3] PTR NameOff=0 Info=0x02000000 Size/Type=4 [4] STRUCT NameOff=16 Info=0x04000002 Size/Type=8 NameOff=18 Type=2 BitOffset=0 NameOff=20 Type=5 BitOffset=32 [5] INT NameOff=22 Info=0x01000000 Size/Type=1 Desc=0x02000008 String Table: 0 : 1 : test 6 : .text 12 : int 16 : A 18 : a 20 : b 22 : char 27 : test.c 34 : int test(struct A *t) { 58 : return t->a; FuncInfo Table: SecNameOff=6 InsnOffset=<Omitted> TypeId=1 LineInfo Table: SecNameOff=6 InsnOffset=<Omitted> FileNameOff=27 LineOff=34 LineNum=6 ColumnNum=0 InsnOffset=<Omitted> FileNameOff=27 LineOff=58 LineNum=7 ColumnNum=3 -bash-4.2$ readelf -S test.o ...... [12] .BTF PROGBITS 0000000000000000 0000028d 00000000000000c1 0000000000000000 0 0 1 [13] .BTF.ext PROGBITS 0000000000000000 0000034e 0000000000000050 0000000000000000 0 0 1 [14] .rel.BTF.ext REL 0000000000000000 00000648 0000000000000030 0000000000000010 16 13 8 ...... -bash-4.2$
The linux kernel 4.18 can already support .BTF with type information
except BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO which is added in this patch.
The following patch set submitted to linux netdev:
https://www.spinics.net/lists/netdev/msg528817.html
adds supports in kernel for .BTF.ext func_info subsection.
The patchset refers to a previous commit, which is reverted due to
lacking proper review. But it can still be tried together with this
patchset as there is no internal implementation change between this one and
https://reviews.llvm.org/rL344366.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>