Page MenuHomePhabricator

[POC] Put annotation strings into debuginfo.
Needs ReviewPublic

Authored by yonghong-song on Jun 2 2021, 12:06 PM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

This is a Proof-Of-Concept patch and intends to seek suggestions
from llvm community on how to put an attribute with arbitrary
string into the final debuginfo in the object file.

The Use Cases

In BPF ecosystem, BTF is the debuginfo used for validation
and additional information.

https://www.kernel.org/doc/html/latest/bpf/btf.html

Currently, BTF in vmlinux (x86_64, aarch64, etc.) are
generated by using pahole to convert dwarf to BTF and
vmlinux BTF is used to validate bpf program compliance,
e.g., bpf program signature must match kernel function
signature for certain tracing programs. Beyond signature
checking, the following are use cases which will further
help verifier.

  1. annotation of "user"/"rcu" etc for function arguments, structure fields and global/static variables. Kernel currently uses address_space attributes for sparse tool. But we could like to carry this information to debuginfo. Previous attempt https://reviews.llvm.org/D69393 tries to use address_space which is halted as it needs to touch a lot of other llvm places.
  2. annotation of functions. Currently, kernel tries to group them with separate logic, e.g., foo() attribute("property1", "property2") since the above attribute is not supported, kernel has to do some magic like global btf_property1: btf type id for foo, ... global btf_property2: btf type id for foo, ... this is really error prone as the function definition may be under some configs and the global btf_property1 ... may not even in the same source file as the function. Such a disconnect between function definition and function attributes already caused numerous issues.

    We also want to annotate functions with certain pre-conditions (e.g., a socket lock has been hold), as bpf programs has started to call kernel functions. Such annotations should be really directly applied to the function definition to avoid any potential later mismatch issues.
  3. annotation of structures, e.g., if somehow this structure fields may have been randomized, verifier should know it as it cannot trust debuginfo structure layout any more.

Sorry for tense explanation of use cases. The main takeaway
is we want to annotate structure/field/func/argument/variable
with *arbitrary* strings and want such strings to be preserved
in final dwarf (or BTF) output.

An Example

In this patch, I hacked clang Frontend to put annotations
in debuginfo and hacked llvm/CodeGen to "output" these
annotations into BTF. The target architecture is x86.
Note that I didn't really output these attributes to BTF yet.
I would like to seek llvm community advise first.

Below is an example to show what the source code looks like.
I am using "annotate" attribute as it accepts arbitrary strings.

$ cat t1.c
/* a pointer pointing to user memory */
#define __user __attribute__((annotate("user")))
/* a pointer protected by rcu */
#define __rcu __attribute__((annotate("rcu")))
/* the struct has some special property */
#define __special_struct __attribute__((annotate("special_struct")))
/* sock_lock is held for the function */
#define __sock_lock_held __attribute((annotate("sock_lock_held")))
/* the hash table element type is socket */
#define __special_info __attribute__((annotate("elem_type:socket")))

struct hlist_node;
struct hlist_head {
  struct hlist_node *prev;
  struct hlist_node *next;
} __special_struct;
struct hlist {
   struct hlist_head head __special_info;
};

extern void bar(struct hlist *);
int foo(struct hlist *h,  int *a __user, int *b __rcu) __sock_lock_held {
  bar(h);
  return *a + *b;
}
$ clang --target x86_64 -O2 -c -g t1.c
TODO (BTF2Debug.cpp): Add func arg 'a' annotation 'user' to .BTF section
TODO (BTF2Debug.cpp): Add func arg 'b' annotation 'rcu' to .BTF section
TODO (BTF2Debug.cpp): Add subroutine 'foo' annotation 'sock_lock_held' to .BTF section
TODO (BTF2Debug.cpp): Add field 'head' annotation 'elem_type:socket' to .BTF section
TODO (BTF2Debug.cpp): Add struct 'hlist_head' annotation 'special_struct' to .BTF section
$

What Is Next

First, using "annotate" attribute is not the best choice as I generated
extra globals and IRs. Maybe a different clang specific attribute?

Second, in the above example, I tried to put these attributes in BTF
as I researched and didn't find a way to put these attributes in dwarf.
Do we have a way to put it into dwarf? That works for us too.
Otherwise, we can let x86/arm64 etc. generates BTF (with a flag of course)
which will have these attribute information.

Diff Detail

Unit TestsFailed

TimeTest
300 msx64 windows > Clang.CodeGen::thinlto-clang-diagnostic-handler-in-be.c
Script: -- : 'RUN: at line 5'; llvm-profdata merge -o C:\ws\w4\llvm-project\premerge-checks\build\tools\clang\test\CodeGen\Output\thinlto-clang-diagnostic-handler-in-be.c.tmp1.profdata C:\ws\w4\llvm-project\premerge-checks\clang\test\CodeGen/Inputs/thinlto_expect1.proftext
150 msx64 windows > Clang.CodeGenCXX::2010-03-09-AnonAggregate.cpp
Script: -- : 'RUN: at line 1'; c:\ws\w4\llvm-project\premerge-checks\build\bin\clang.exe -cc1 -internal-isystem c:\ws\w4\llvm-project\premerge-checks\build\lib\clang\13.0.0\include -nostdsysteminc -debug-info-kind=limited -S -o C:\ws\w4\llvm-project\premerge-checks\build\tools\clang\test\CodeGenCXX\Output\2010-03-09-AnonAggregate.cpp.tmp C:\ws\w4\llvm-project\premerge-checks\clang\test\CodeGenCXX\2010-03-09-AnonAggregate.cpp
300 msx64 windows > Clang.CodeGenCXX::crash.cpp
Script: -- : 'RUN: at line 1'; c:\ws\w4\llvm-project\premerge-checks\build\bin\clang.exe -cc1 -internal-isystem c:\ws\w4\llvm-project\premerge-checks\build\lib\clang\13.0.0\include -nostdsysteminc C:\ws\w4\llvm-project\premerge-checks\clang\test\CodeGenCXX\crash.cpp -std=c++11 -emit-llvm-only
380 msx64 windows > Clang.CodeGenCXX::debug-info-byval.cpp
Script: -- : 'RUN: at line 2'; c:\ws\w4\llvm-project\premerge-checks\build\bin\clang.exe --target=x86_64-pc-windows-gnu -g -S C:\ws\w4\llvm-project\premerge-checks\clang\test\CodeGenCXX\debug-info-byval.cpp -o - | c:\ws\w4\llvm-project\premerge-checks\build\bin\filecheck.exe C:\ws\w4\llvm-project\premerge-checks\clang\test\CodeGenCXX\debug-info-byval.cpp
360 msx64 windows > Clang.CodeGenCXX::debug-info-ctor2.cpp
Script: -- : 'RUN: at line 2'; c:\ws\w4\llvm-project\premerge-checks\build\bin\clang.exe --target=x86_64-pc-windows-gnu -fverbose-asm -g -S C:\ws\w4\llvm-project\premerge-checks\clang\test\CodeGenCXX\debug-info-ctor2.cpp -o - | grep AT_explicit
View Full Test Results (573 Failed)

Event Timeline

yonghong-song created this revision.Jun 2 2021, 12:06 PM
yonghong-song requested review of this revision.Jun 2 2021, 12:06 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 2 2021, 12:06 PM

Generally arbitrary strings are best avoided where possible owing to lack of structure, type safety, semantics, etc. But they might be suitable here since they're opaque to everything from the frontend to the backend.

As for supporting it in DWARF, probably with a custom attribute (DW_AT_BTF_annotation? (or "LLVM" instead of "BTF" perhaps, I'm not sure)) with a standard form (DW_FORM_strp/strxN/etc - the usual way we emit strings).

As for supporting it in DWARF, probably with a custom attribute (DW_AT_BTF_annotation? (or "LLVM" instead of "BTF" perhaps, I'm not sure)) with a standard form (DW_FORM_strp/strxN/etc - the usual way we emit strings).

This is a good idea. Could you give some pointers where I could add one custom attribute? Does llvm currently have any custom attribute?

As for supporting it in DWARF, probably with a custom attribute (DW_AT_BTF_annotation? (or "LLVM" instead of "BTF" perhaps, I'm not sure)) with a standard form (DW_FORM_strp/strxN/etc - the usual way we emit strings).

This is a good idea. Could you give some pointers where I could add one custom attribute? Does llvm currently have any custom attribute?

Yep LLVM does have some custom attributes and such, eg: https://github.com/llvm/llvm-project/blob/effb87dfa810a28e763f914fe3e6e984782cc846/llvm/include/llvm/BinaryFormat/Dwarf.def#L592

Used here: https://github.com/llvm/llvm-project/blob/effb87dfa810a28e763f914fe3e6e984782cc846/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp#L1083

ast added a comment.Jun 2 2021, 4:38 PM

Yep LLVM does have some custom attributes and such, eg: https://github.com/llvm/llvm-project/blob/effb87dfa810a28e763f914fe3e6e984782cc846/llvm/include/llvm/BinaryFormat/Dwarf.def#L592

This is great insight! That should work. We need to make sure that DW_AT_LLVM_annotation in structure/field/func/argument/variable won't confuse gdb and such.

@dblaikie Thanks for the pointer! I will work on the new patch, starting with DW_AT_BTF_annotation (to limit the scope where annotations will be processed) and post a new patch soon.