This is an archive of the discontinued LLVM Phabricator instance.

[BPF] preserve debuginfo types for builtin __builtin__btf_type_id()
ClosedPublic

Authored by yonghong-song on Feb 13 2020, 11:05 AM.

Details

Summary

The builtin function

u32 btf_type_id = __builtin_btf_type_id(param)

can help preserve type info for the following use case:

extern void foo(..., void *data, int size);
int test(...) {
  struct t { int a; int b; int c; } d;
  d.a = ...; d.b = ...; d.c = ...;
  foo(..., &d, sizeof(d));
}

The function "foo" in the above only see raw data and does not
know what type of the data is. In certain cases, e.g., logging,
the additional type information will help pretty print.

This patch handles the builtin in BPF backend. It includes
an IR pass to translate the IR intrinsic to a load of
a global variable which carries the metadata, and an MI
pass to remove the intermediate load of the global variable.
Finally, in AsmPrinter pass, proper instruction are generated.

In the above example, the second argument for __builtin_btf_type_id()
is 0, which means a relocation for local adjustment,
i.e., w.r.t. bpf program BTF change, will be generated.
The value 1 for the second argument means
a relocation for remote adjustment, e.g., against vmlinux.

Diff Detail

Event Timeline

yonghong-song created this revision.Feb 13 2020, 11:05 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptFeb 13 2020, 11:05 AM
yonghong-song edited the summary of this revision. (Show Details)

add the type_id assignment to FieldInfo relocation .btf.ext ELF section.

yonghong-song edited the summary of this revision. (Show Details)

Addressing some Alexei's comments:

Change comments in clang CGBuiltin.cpp to make it easier to understand.
Remove BPFMIPreserveDIType.cpp and fold into existing BPFMISimplifyPatchable.cpp.
Other Simplifications.
yonghong-song retitled this revision from [WIP][BPF] implement intrinsic function __builtin__btf_type_id() to [BPF] preserve debuginfo types for builtin __builtin__btf_type_id().
yonghong-song edited the summary of this revision. (Show Details)

make this diff llvm BPF backend only. The clang side change will be posted separately.

The corresponding clang side of change is https://reviews.llvm.org/D74668

ast added inline comments.Feb 19 2020, 6:34 PM
llvm/lib/Target/BPF/BPFPreserveDIType.cpp
1

BPFPreserveDIType.cpp

95

may be "llvm.btf_type_id." instead?

yonghong-song marked an inline comment as done.Feb 19 2020, 8:35 PM
yonghong-song added inline comments.
llvm/lib/Target/BPF/BPFPreserveDIType.cpp
95

Oh, yes. Make sense. Forget to change this one. Will update.

change llvm.bpf_pdit to llvm.btf_type_id variable name to make it easy to understand.

ast accepted this revision.Feb 19 2020, 9:44 PM
ast added inline comments.
llvm/lib/Target/BPF/BPFPreserveDIType.cpp
1

this typo wasn't fixed

This revision is now accepted and ready to land.Feb 19 2020, 9:44 PM

fix a typo (BPFPreserveType.cpp => BPFPreserveDIType.cpp) in the comments.

ast accepted this revision.Feb 20 2020, 8:27 AM

rebase on top of master. Fix build error due to a recent func name change.

yonghong-song edited the summary of this revision. (Show Details)

add second argument to __builtin_btf_type_id(var, flag) to indicate whether a .BTF.ext relocation should be generated or not.
flag == 0, no reloc, flag == 1, reloc, others, error.

yonghong-song edited the summary of this revision. (Show Details)

change buitin flag to the following spec:

flag == 0: a local relocation against bpf_prog BTF change
flag == 1: a remote relocation against kernel
This revision was automatically updated to reflect the committed changes.

This seems like a fairly invasive way to preserve certain DWARF types.

Are you in a position to make source changes to help here? Would it be reasonable to annotate these types with attribute((used)), for instance? I think it'd probably be useful on the compiler end to support using that attribute to communicate the need for that type to be emitted into the debug info - and that could be powered by adding the attributed type to the DICompileUnit's "retainedTypes" list.

(& even if attribute used can't be used in your case, because you need to deal with the code as it's currently written - perhaps this whole feature could be replaced with a front-end tweak to add these particular types to the retainedTypes list under the same conditions you're doing all this other stuff? Without the need for the synthetic function calls, pass to remove them, etc)

(also, when making changes for/to debug info - probably worth including either/at least the debug-info tag and/or some of the usual debug info contributors (myself, @echristo, @aprantl, @JDevlieghere, @probinson, etc))

I think it'd probably be good to revert this/other patches, and maybe have a design discussion on cfe-dev. Could you do that?

@dblaikie I can revert but let me first understand the alternative way to do the work, at least in high level.

[
I do believe my commit message, esp. the first couple of lines is badly inaccurate. I suspect this is the one causing your confusion as well.
specially, In the beginning of commit message, I said:

u32 btf_type_id = __builtin_btf_type_id(param)

But later on, I also said:

In the above example, the second argument for __builtin_btf_type_id() is 0, which means a relocation for local adjustment, i.e., w.r.t. bpf program BTF change, will be generated. The value 1 for the second argument means a relocation for remote adjustment, e.g., against vmlinux.

The above statement implies two arguments.
The actually builtin signature should be

u32 btf_type_id = __builtin_btf_type_id(expr, reloc_type)

The first parameter is not necessary a parameter and it can be any expression.
This is my fault as this patch go through several revision and I forgot to change that. I am happy to revert and rework the commit message to make it more clear.
]

Maybe, the description not clear or too BPF specific. Let me try again to describe the purpose of this builtin.
The interface of this builtin likes below:

__builtin_btf_type_id(expr, reloc_type)

The goal is to:

  • the expression here can be a variable (like "a" int "struct { ...} a", or any valid C expression, e.g., &a, a->field_b).
  • return the type_id, and generates a reloc_type based on user input (the relocation is generated in .BTF.ext section).
  • here, the relocation is against the return value, e.g., unsigned btf_id = __builtin_btf_type_id(expr, reloc_type) a relocation against the BPF instruction bpf_id = ... needs to be generated.

What is the use case for this builtin? Currently, various uses are all for pretty-print. Pass some data together with
btf_id so kernel or use space can pretty-print it. For example

struct {void *p, uint32_t bpf_id} v = {a->b->c, __builtin_btf_type_id(a->b->c, 1)};
printk("....%pT...", ..., &v, ...);

here a->b->c might be a type "struct abc", and the kernel will be able to print out the object pointed by a->b->c in a pretty format based on "struct abc". Here, the second parameter "1" means the ressulted btf_id will be relocated against kernel BTF. The linux kernel target is x64, arm64, etc.

In the future, we may have other use cases.

The reason we need relocation is later BTF debug info may be merged (linked) with other bpf program and you need to adjust btf_id, or it may try to
match against types against vmlinux.

The type associated with the expression may be local which user has control or maybe in a header user does not have control.

The following is what I understand the possible alternative:

  • Keep or the retained type for the expression, so we do not need the IR builtin.

I guess, in order to do this, we need:

  • We still need to annotate the expression somehow so we later on we can generate proper relocation later on.
  • If we do have a variable here, we might be able to annotate the variable (special attributes?) so we do not carry debuginfo types. But for expressions, we would still need to carry debuginfo. Otherwise, it is very hard to get debug info in backend.

Hopefully this is more clear with the interface/intention of this builtin. Please let me know in high level whether we could do better. Thanks!

@dblaikie I can revert but let me first understand the alternative way to do the work, at least in high level.

[
I do believe my commit message, esp. the first couple of lines is badly inaccurate. I suspect this is the one causing your confusion as well.
specially, In the beginning of commit message, I said:

u32 btf_type_id = __builtin_btf_type_id(param)

But later on, I also said:

In the above example, the second argument for __builtin_btf_type_id() is 0, which means a relocation for local adjustment, i.e., w.r.t. bpf program BTF change, will be generated. The value 1 for the second argument means a relocation for remote adjustment, e.g., against vmlinux.

The above statement implies two arguments.
The actually builtin signature should be

u32 btf_type_id = __builtin_btf_type_id(expr, reloc_type)

The first parameter is not necessary a parameter and it can be any expression.
This is my fault as this patch go through several revision and I forgot to change that. I am happy to revert and rework the commit message to make it more clear.
]

Appreciate the clarification - thanks!

Maybe, the description not clear or too BPF specific. Let me try again to describe the purpose of this builtin.
The interface of this builtin likes below:

__builtin_btf_type_id(expr, reloc_type)

The goal is to:

  • the expression here can be a variable (like "a" int "struct { ...} a", or any valid C expression, e.g., &a, a->field_b).
  • return the type_id, and generates a reloc_type based on user input (the relocation is generated in .BTF.ext section).
  • here, the relocation is against the return value, e.g., unsigned btf_id = __builtin_btf_type_id(expr, reloc_type) a relocation against the BPF instruction bpf_id = ... needs to be generated.

What is the use case for this builtin? Currently, various uses are all for pretty-print. Pass some data together with
btf_id so kernel or use space can pretty-print it. For example

struct {void *p, uint32_t bpf_id} v = {a->b->c, __builtin_btf_type_id(a->b->c, 1)};
printk("....%pT...", ..., &v, ...);

here a->b->c might be a type "struct abc", and the kernel will be able to print out the object pointed by a->b->c in a pretty format based on "struct abc". Here, the second parameter "1" means the ressulted btf_id will be relocated against kernel BTF. The linux kernel target is x64, arm64, etc.

In the future, we may have other use cases.

The reason we need relocation is later BTF debug info may be merged (linked) with other bpf program and you need to adjust btf_id, or it may try to
match against types against vmlinux.

The type associated with the expression may be local which user has control or maybe in a header user does not have control.

The following is what I understand the possible alternative:

  • Keep or the retained type for the expression, so we do not need the IR builtin.

I guess, in order to do this, we need:

  • We still need to annotate the expression somehow so we later on we can generate proper relocation later on.
  • If we do have a variable here, we might be able to annotate the variable (special attributes?) so we do not carry debuginfo types. But for expressions, we would still need to carry debuginfo. Otherwise, it is very hard to get debug info in backend.

Hopefully this is more clear with the interface/intention of this builtin. Please let me know in high level whether we could do better. Thanks!

Thanks for the explanation (& sorry for my very delayed response).

BPF uses its own debug info format, right? I guess it might be far enough out of my wheelhouse that I don't have a lot of understanding/experience to impart into your situation there. I think the explanation makes some sense to me as to why you're using an intrinsic like this, and I was a bit quick to call for a revert/suggest this wasn't the right direction - sorry about that.

It might be worth having an extra test that's just IR->IR to test the BPFPreserveDIType pass in a more isolated fashion.

Or would it be reasonable to have Clang generate the IR in the post-BPFPreserveDIType form to begin with, and avoid the need for that pass to run later?