Page MenuHomePhabricator

[DebugInfo][BPF] Add 'btf:type_tag' annotation in DWARF
Needs ReviewPublic

Authored by eddyz87 on Feb 13 2023, 4:53 PM.

Details

Summary

Currently btf_type_tag attributes are represented in DWARF as child DIEs with DW_TAG_LLVM_annotation tag. Such attributes are supported for derived types of type DW_TAG_pointer_type and designate that the pointee type should have the attribute.

For example, the following C code:

// (variable :name "g"
//           :type (pointer (int :btf_type_tag "tag1")))
int __attribute__((btf_type_tag("tag1"))) *g;

Corresponds to the following DWARF:

0x1e:   DW_TAG_variable
          DW_AT_name      ("g")
          DW_AT_type      (0x29 "int *")

0x29:   DW_TAG_pointer_type
          DW_AT_type      (0x32 "int")

0x2e:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf_type_tag")
            DW_AT_const_value     ("tag1")

0x32:   DW_TAG_base_type
          DW_AT_name      ("int")

Recent discussion in Kernel BPF mailing list came to conclusion, that such annotations should apply to the annotated type itself (multiple approaches are discussed in the linked thread, "Solution 2" is the one accepted). For example, new DWARF encoding for the code above should look as follows:

0xe:   DW_TAG_variable
         DW_AT_name      ("g")
         DW_AT_type      (0x29 "int *")

0x9:   DW_TAG_pointer_type
         DW_AT_type      (0x32 "int")

0x2:   DW_TAG_base_type
         DW_AT_name      ("int")

0x6:     DW_TAG_LLVM_annotation
           DW_AT_name    ("btf:type_tag")
           DW_AT_const_value     ("tag1")

The goal of this commit is to enable new btf_type_tag encoding scheme for debug information emitted by clang. A new tag name, btf:type_tag, is used so that DWARF consumers could distinguish between old and new attachment semantics.

In order to preserve backwards compatibility both btf_type_tag and btf:type_tag are generated. btf_type_tag might be deprecated in the future. The complete DWARF for the example above looks as follows:

0x1e:   DW_TAG_variable
          DW_AT_name      ("g")
          DW_AT_type      (0x29 "int *")

0x29:   DW_TAG_pointer_type
          DW_AT_type      (0x32 "int")

0x2e:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf_type_tag")
            DW_AT_const_value     ("tag1")

0x32:   DW_TAG_base_type
          DW_AT_name      ("int")

0x36:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf:type_tag")
            DW_AT_const_value     ("tag1")

The commit includes the following changes:

  • New overload for CGDebugInfo::CreateType method is added to handle BTFTagAttributedType type. It creates a temporary "placeholder" DIDerivedType instance with tag set to DW_TAG_LLVM_annotation, base type corresponding to the base type of the BTFTagAttributedType and annotations corresponding to BTFTagAttributedType. All such placeholder instances are registered in the CGDebugInfo::AnnotationPlaceholders vector.
  • This overload is called from CGDebugInfo::CreateTypeNode and it overrides default qualifiers handling. This is necessary to put type tag annotations on the outermost CVR qualifier. Such representation makes downstream DWARF consumers simpler (eventually DWARF is used to construct BTF for kernel and kernel wants all type tag modifiers to precede const modifier).
  • CGDebugInfo::finalize() is modified to replace annotation placeholders with real types. When base type of the annotation placeholder is an instance of DIBasicType, DICompositeType, DIDerivedType, DISubroutineType that type is cloned and replaceAnnotations() method is used to copy annotations from placeholder to clone. Such logic allows to handle types with cyclic references.
  • ASTContext::getBTFTagAttributedType() is updated to ensure that BTFTagAttributedType always wraps QualType w/o local constant/volatile/restricted qualifiers.

Some additional example of DWARF generated after this commit follow (unrelated fields like file name and NULL entries are removed for brevity).

Type tag for primitive type

C:

int __attribute__((btf_type_tag("__a"))) a;

DWARF:

0x1e:   DW_TAG_variable
          DW_AT_name      ("a")
          DW_AT_type      (0x29 "int")

0x29:   DW_TAG_base_type
          DW_AT_name      ("int")

0x2d:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf:type_tag")
            DW_AT_const_value     ("__a")
Type tag for CVR modifier type

C:

volatile int __attribute__((btf_type_tag("__b"))) b;

DWARF:

0x1e:   DW_TAG_variable
          DW_AT_name	("b")
          DW_AT_type	(0x29 "volatile int")

0x29:   DW_TAG_volatile_type
          DW_AT_type	(0x2e "int")

0x2e:   DW_TAG_base_type
          DW_AT_name	("int")
          DW_AT_encoding	(DW_ATE_signed)
          DW_AT_byte_size	(0x04)

0x32:     DW_TAG_LLVM_annotation
            DW_AT_name	("btf:type_tag")
            DW_AT_const_value	("__b")
Type tag for typedef

C:

typedef int foo;
foo __attribute__((btf_type_tag("__c"))) c;

DWARF:

0x49:   DW_TAG_variable
          DW_AT_name      ("c")
          DW_AT_type      (0x54 "foo")

0x54:   DW_TAG_typedef
          DW_AT_type      (0x45 "int")
          DW_AT_name      ("foo")

0x5c:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf:type_tag")
            DW_AT_const_value     ("__c")
Type tag for composite type

C:

struct bar {};
struct bar __attribute__((btf_type_tag("__d"))) d1;
struct bar d2;

DWARF (Note two copies of struct bar one with annotations, one without):

0x60:   DW_TAG_variable
          DW_AT_name      ("d1")
          DW_AT_type      (0x6b "bar")

0x6b:   DW_TAG_structure_type
          DW_AT_name      ("bar")

0x70:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf:type_tag")
            DW_AT_const_value     ("__d")

0x74:   DW_TAG_variable
          DW_AT_name      ("d2")
          DW_AT_type      (0x7f "bar")

0x7f:   DW_TAG_structure_type
          DW_AT_name      ("bar")
Type tag for composite type with circular reference

C:

struct buz {
  struct buz __attribute__((btf_type_tag("__buz_self"))) *self;
} e;

DWARF (Note two copies of struct buz one with annotations, one without):

0x84:   DW_TAG_variable
          DW_AT_name      ("e")
          DW_AT_type      (0x8f "buz")

0x8f:   DW_TAG_structure_type
          DW_AT_name      ("buz")

0x94:     DW_TAG_member
            DW_AT_name    ("self")
            DW_AT_type    (0x9e "buz *")

0x9e:   DW_TAG_pointer_type
          DW_AT_type      (0xa7 "buz")

0xa3:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf_type_tag")
            DW_AT_const_value     ("__buz_self")

0xa7:   DW_TAG_structure_type
          DW_AT_name      ("buz")

0xac:     DW_TAG_member
            DW_AT_name    ("self")
            DW_AT_type    (0x9e "buz *")

0xb5:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf:type_tag")
            DW_AT_const_value     ("__buz_self")
Type tag for subroutine type

C:

void (__attribute__((btf_type_tag("__f"))) *f)(void);

DWARF:

0x000000b9:   DW_TAG_variable
                DW_AT_name      ("f")
                DW_AT_type      (0xc4 "void (*)(")

0x000000c4:   DW_TAG_pointer_type
                DW_AT_type      (0xcd "void (")

0x000000c9:     DW_TAG_LLVM_annotation
                  DW_AT_name    ("btf_type_tag")
                  DW_AT_const_value     ("f")

0x000000cd:   DW_TAG_subroutine_type
                DW_AT_prototyped        (true)

0x000000ce:     DW_TAG_LLVM_annotation
                  DW_AT_name    ("btf:type_tag")
                  DW_AT_const_value     ("__f")

Diff Detail

Event Timeline

eddyz87 created this revision.Feb 13 2023, 4:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2023, 4:53 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
eddyz87 updated this revision to Diff 497835.Feb 15 2023, 3:59 PM
  • Removed DWARF tag 0x6001, use 'btf_type_tag:v2' name instead.
  • Removed CopyBTFTypeTags pass, generate 'btf_type_tag' and 'btf_type_tag:v2' in CGDebugInfo instead.
  • Added tests for BTF deduplication logic for debug info types that differ only in annotations.
eddyz87 retitled this revision from [DebugInfo][BPF] 'annotations' on a target type to represent btf_type_tag to [DebugInfo][BPF] Add 'btf_type_tag:v2' annotation in DWARF.Feb 15 2023, 4:07 PM
eddyz87 edited the summary of this revision. (Show Details)

When I tried to build llvm with the patch, I hit the following errors:

[2198/3751] Building CXX object lib/Target/BPF/CMakeFiles/LLVMBPFCodeGen.dir/BTFDebug.cpp.o
FAILED: lib/Target/BPF/CMakeFiles/LLVMBPFCodeGen.dir/BTFDebug.cpp.o 
/bin/c++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D_LIBCPP_ENABLE_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Target/BPF -I../lib/Target/BPF -Iinclude -I../include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-class-memaccess -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -fvisibility=hidden  -fno-exceptions -fno-rtti -UNDEBUG -std=c++17 -MD -MT lib/Target/BPF/CMakeFiles/LLVMBPFCodeGen.dir/BTFDebug.cpp.o -MF lib/Target/BPF/CMakeFiles/LLVMBPFCodeGen.dir/BTFDebug.cpp.o.d -o lib/Target/BPF/CMakeFiles/LLVMBPFCodeGen.dir/BTFDebug.cpp.o -c ../lib/Target/BPF/BTFDebug.cpp
../lib/Target/BPF/BTFDebug.cpp: In function ‘llvm::DINodeArray lookupAnnotations(const llvm::DIType*)’:
../lib/Target/BPF/BTFDebug.cpp:516:21: error: ‘const class llvm::DIBasicType’ has no member named ‘getAnnotations’; did you mean ‘getEncoding’?
     Annots = SubTy->getAnnotations();
                     ^~~~~~~~~~~~~~
                     getEncoding
../lib/Target/BPF/BTFDebug.cpp:522:21: error: ‘const class llvm::DISubroutineType’ has no member named ‘getAnnotations’
     Annots = SubTy->getAnnotations();
                     ^~~~~~~~~~~~~~
[2272/3751] Building CXX object lib/Passes/CMakeFiles/LLVMPasses.dir/PassBuilderPipelines.cpp.o
In file included from ../lib/Passes/PassBuilderPipelines.cpp:113:
../include/llvm/Transforms/Scalar/SROA.h:95:7: warning: ‘llvm::SROAPass’ declared with greater visibility than the type of its field ‘llvm::SROAPass::SelectsToRewrite’ [-Wattributes]
 class SROAPass : public PassInfoMixin<SROAPass> {
       ^~~~~~~~
[2279/3751] Building CXX object lib/Passes/CMakeFiles/LLVMPasses.dir/PassBuilder.cpp.o
In file included from ../lib/Passes/PassBuilder.cpp:214:
../include/llvm/Transforms/Scalar/SROA.h:95:7: warning: ‘llvm::SROAPass’ declared with greater visibility than the type of its field ‘llvm::SROAPass::SelectsToRewrite’ [-Wattributes]
 class SROAPass : public PassInfoMixin<SROAPass> {
       ^~~~~~~~
ninja: build stopped: subcommand failed.
[yhs@devbig309.ftw3 ~/work/llvm-project/llvm/build (main)]$

The top commit is

959216f9b1f1  [AMDGPU] MIR-Tests for Multiplication using KBA

Could you take a look?

My cmake command line:

cmake .. -DCMAKE_BUILD_TYPE=Release -G Ninja \
    -DLLVM_ENABLE_PROJECTS="clang;lld" \
    -DLLVM_TARGETS_TO_BUILD="BPF;X86" \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DLLVM_ENABLE_ZLIB=ON \
    -DCMAKE_INSTALL_PREFIX=$PWD/install

../lib/Target/BPF/BTFDebug.cpp: In function ‘llvm::DINodeArray lookupAnnotations(const llvm::DIType*)’:
../lib/Target/BPF/BTFDebug.cpp:516:21: error: ‘const class llvm::DIBasicType’ has no member named ‘getAnnotations’; did you mean ‘getEncoding’?

Annots = SubTy->getAnnotations();
                ^~~~~~~~~~~~~~
                getEncoding

../lib/Target/BPF/BTFDebug.cpp:522:21: error: ‘const class llvm::DISubroutineType’ has no member named ‘getAnnotations’

Annots = SubTy->getAnnotations();
                ^~~~~~~~~~~~~~

Hi Yonghong,

Looking at the error it seems that you applied only the changes from D143967, however it depends on D143966. I organized these two as a stack to simplify the review:

  • D143966 adds annotations attribute for DIBasicType and DISubroutineType;
  • D143967 adds logic for btf_type_tag:v2 generation.

Anyways, I've rebased the revisions against current main (dad3741e3bdc [PowerPC] clang-format isPPC64ELFv2ABI() function. NFC) and see one failing test for ninja check-all target:

Failed Tests (1):
  Clang :: CodeGen/dwarf-version.c

However, it fails without my changes as well.
I'll push the rebased revisions shortly.

eddyz87 updated this revision to Diff 499281.Feb 21 2023, 1:34 PM
eddyz87 edited the summary of this revision. (Show Details)

Rebase

eddyz87 updated this revision to Diff 502033.Mar 2 2023, 5:59 PM

Rebase, change for tag name from btf_type_tag:v2 to btf:type_tag.

eddyz87 edited the summary of this revision. (Show Details)Mar 2 2023, 6:02 PM
eddyz87 retitled this revision from [DebugInfo][BPF] Add 'btf_type_tag:v2' annotation in DWARF to [DebugInfo][BPF] Add 'btf:type_tag' annotation in DWARF.
eddyz87 updated this revision to Diff 502321.Mar 3 2023, 5:17 PM

Another rebase, hope to fix windows build issue.

eddyz87 updated this revision to Diff 503198.Mar 7 2023, 5:52 PM

Fix support for cyclic types

Update to handle situations like below:

struct bravo {
  struct bravo __tag(a) * a;
} g;

During DI construction annotations are now represneted by
"annotation placeholders", special kind of derived type.
These placeholders are replaced by actual types with
proper "annotations" field in CGDebugInfo::finalize().

eddyz87 updated this revision to Diff 504493.Mar 12 2023, 6:45 PM
  • BTFDebug changes moved to a separate revision (to simplify the review);
  • Added test case for circular type reference handling.
eddyz87 published this revision for review.Mar 13 2023, 3:53 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 13 2023, 3:53 PM
eddyz87 edited the summary of this revision. (Show Details)Mar 15 2023, 8:42 AM

Hi @jemarch,

Could you please take a look to verify that this implementation is on the same page with what is planned for GCC?

eddyz87 added a subscriber: jemarch.
eddyz87 added a comment.

Hi @jemarch,

Could you please take a look to verify that this implementation is on
the same page with what is planned for GCC?

Sure. Can you please add David Faust as a subscriber as well? I don't
know if he has an account in reviews.llvm.org.

Repository:

rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION

https://reviews.llvm.org/D143967/new/

https://reviews.llvm.org/D143967

dfaust added a subscriber: dfaust.Mar 15 2023, 9:14 AM

Hi @dfaust ,

Two head-ups, the following items turned out to be important when I tested using kernel BPF testsuite:

  • in the final BTF, type tags have to precede CVR modifiers, e.g. TYPE_TAG 'foo' -> CONST -> INT. Right now pahole does not do any reordering, so I ended up putting the type tag annotations on the DIE with outermost modifier. Will see if DI maintainers would be ok with this.
  • CO-RE relocation entries have to point to the actual type, not type tag. For both field relocations and variable relocations (this is related to the next patch in a stack).

Type tag for CVR modifier type

C:

volatile int __attribute__((btf_type_tag("__b"))) b;

DWARF:

0x31:   DW_TAG_variable
          DW_AT_name      ("b")
          DW_AT_type      (0x3c "volatile int")

0x3c:   DW_TAG_volatile_type
          DW_AT_type      (0x45 "int")

0x41:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf:type_tag")
            DW_AT_const_value     ("__b")

I am not sure about this case. Do we want the tag on the CVR qualifier or on the type being qualified?

Current patches for GCC place the annotation DIE as child of the 'int':

0x0000001e:   DW_TAG_variable
                DW_AT_name	("b")
                DW_AT_type	(0x00000047 "volatile int")

0x00000032:   DW_TAG_base_type
                DW_AT_byte_size	(0x04)
                DW_AT_name	("int")

0x0000003d:     DW_TAG_LLVM_annotation
                  DW_AT_name	("btf:type_tag")
                  DW_AT_const_value	("__b")

0x00000047:   DW_TAG_volatile_type
                DW_AT_type	(0x00000032 "int")

The way I see it both 'volatile' and the type tag are modifying 'int' type here, so the annotation DIE more properly fits as a child of 'int' rather than the 'volatile'.

I don't think we discussed this particular case, and I'm not sure whether there is any precedent here.

WDYT @eddyz87, @jemarch ?

in the final BTF, type tags have to precede CVR modifiers, e.g. TYPE_TAG 'foo' -> CONST -> INT. Right now pahole does not do any reordering, so I ended up putting the type tag annotations on the DIE with outermost modifier. Will see if DI maintainers would be ok with this.

Ok I was not aware of that requirement.
Internally GCC converts DWARF representation to BTF for BTF emission so we get (for the volatile example):

[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[2] TYPE_TAG '__b' type_id=1
[3] VOLATILE '(anon)' type_id=2
[4] VAR 'b' type_id=3, linkage=global

i.e. VOLATILE -> TYPE_TAG -> INT which doesn't meet the above requirement that type tags precede CVR modifiers.

The way I see it both 'volatile' and the type tag are modifying 'int' type here, so the annotation DIE more properly fits as a child of 'int' rather than the 'volatile'.

I don't think we discussed this particular case, and I'm not sure whether there is any precedent here.

WDYT @eddyz87, @jemarch ?

This is related to my previous comment. Kernel BTF parser wants all type tags to precede modifiers (see this code). So such reordering will have to happen either during DWARF generation or by pahole. Currently pahole is "dumb" in a sense that it reflects DWARF structure in BTF effectively one-to-one. Adding such rewrite at a pahole level is possible but one-to-one mapping to DWARF would be lost, so the tool would require significant changes.

So, at least for LLVM its a tradeoff, either do it in the compiler (less code), or in the pahole (more code). I opted for less code :) If you think that this is conceptually unacceptable, I'll do the change on the pahole side (DI maintainers on LLVM side might agree, the patch has not been reviewed yet).

Another option would be to modify the kernel, but this might impede backwards compatibility.

eddyz87 added a comment.

In D143967#4197041 https://reviews.llvm.org/D143967#4197041, @dfaust wrote:

The way I see it both 'volatile' and the type tag are modifying
'int' type here, so the annotation DIE more properly fits as a child
of 'int' rather than the 'volatile'.

I don't think we discussed this particular case, and I'm not sure whether there is any precedent here.

WDYT @eddyz87, @jemarch ?

This is related to my previous comment. Kernel BTF parser wants all
type tags to precede modifiers (see this
https://elixir.bootlin.com/linux/latest/source/kernel/bpf/btf.c#L5348
code). So such reordering will have to happen either during DWARF
generation or by pahole. Currently pahole is "dumb" in a sense
that it reflects DWARF structure in BTF effectively one-to-one. Adding
such rewrite at a pahole level is possible but one-to-one mapping to
DWARF would be lost, so the tool would require significant changes.

So, at least for LLVM its a tradeoff, either do it in the compiler
(less code), or in the pahole (more code). I opted for less code :) If
you think that this is conceptually unacceptable, I'll do the change
on the pahole side (DI maintainers on LLVM side might agree, the patch
has not been reviewed yet).

Another option would be to modify the kernel, but this might impede backwards compatibility.

In GCC we also translate from the internal DWARF structures into BTF.
So, for us, it would also imply to reoder (before generating the BTF
from the interal DWARF) in case we wanted to use a different approach in
DWARF than in BTF.

I think the question is: what are the consequences of having the
annotation DIE as a child of the qualifiers instead of the qualified
type?

Repository:

rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION

https://reviews.llvm.org/D143967/new/

https://reviews.llvm.org/D143967

! In D143967#4197142, @jemarch wrote:
...
In GCC we also translate from the internal DWARF structures into BTF.
So, for us, it would also imply to reoder (before generating the BTF
from the interal DWARF) in case we wanted to use a different approach in
DWARF than in BTF.

It's not only the BTF generated for BPF programs. There are two ways
to obtain BTF that we should worry about:

  1. BTF generated for BPF programs by compiler;
  2. BTF generated during kernel build for the whole kernel, this one is generated by pahole from DWARF.

I think the question is: what are the consequences of having the
annotation DIE as a child of the qualifiers instead of the qualified
type?

Are there any? I just tried the following code with GDB:

volatile int __attribute__((btf_type_tag("__a"))) a = 1;

void foo(void) {
  a = 2;
}

int main(int argc, char **argv) {
  foo();
  return 0;
}

No warnings, a can be inspected.

I think there are two sides to this:

  • conceptual: is it ok to allow annotations for CVR modifiers?
  • technical: what is the point where such reordering is easiest to implement? For LLVM / pahole stack the path of least resistance is to modify DWARF generation (but again @dblaikie might disagree). How hard is it to adjust DI generation in GCC in this way?

...
I think there are two sides to this:

  • conceptual: is it ok to allow annotations for CVR modifiers?

Maybe someone more an expert in DWARF could chime in whether this is problematic?
As far as I know it would be "ok" in the sense that it should not break DWARF or cause issues for DWARF consumers which do not know about the tags.

My initial reaction was that placing the annotations on CVR modifiers makes less sense conceptually than placing them on the types proper.
But, I suppose it is a question of what precisely the annotated type is. e.g. in volatile int __attribute__((btf_type_tag("__b"))) b; is it the case that:

  • type tag "__b" applies to the type volatile int (i.e. cvr quals "bind more closely" than the tag), or
  • type tag "__b" and volatile both apply to int (i.e. cvr quals and type tags are "equal")

For all other cases the new "btf:type_tag" format places the annotation as a child of exactly the type that is annotated, so I think the answer determines where the annotation logically belongs in DWARF:

  • If the tag applies to volatile int then it should be a child of the volatile DIE.
  • If the tag applies to int it should be a child of the integer type DIE.

At the moment I can't say that one is more correct than the other, so I guess I have no real objection placing the annotation on the CVR modifier.

  • technical: what is the point where such reordering is easiest to implement? For LLVM / pahole stack the path of least resistance is to modify DWARF generation (but again @dblaikie might disagree). How hard is it to adjust DI generation in GCC in this way?

It is not a difficult change in GCC. Some added complexity but not too much. I have a prototype that seems to work from initial testing.
It is probably simpler than instead modifying the internal transformation to BTF to reorder tags/CVR qualifiers as kernel currently requires.

But I don't think ease of implementation should be a factor unless we are sure the format makes sense.
We are changing the format because the old one was flawed, let's try to make sure we aren't just replacing it with something flawed in a different way because it is easy.

...
I think there are two sides to this:

  • conceptual: is it ok to allow annotations for CVR modifiers?

Maybe someone more an expert in DWARF could chime in whether this is problematic?
As far as I know it would be "ok" in the sense that it should not break DWARF or cause issues for DWARF consumers which do not know about the tags.

My initial reaction was that placing the annotations on CVR modifiers makes less sense conceptually than placing them on the types proper.

I agree conceptually.

But, I suppose it is a question of what precisely the annotated type is. e.g. in volatile int __attribute__((btf_type_tag("__b"))) b; is it the case that:

  • type tag "__b" applies to the type volatile int (i.e. cvr quals "bind more closely" than the tag), or
  • type tag "__b" and volatile both apply to int (i.e. cvr quals and type tags are "equal")

For all other cases the new "btf:type_tag" format places the annotation as a child of exactly the type that is annotated, so I think the answer determines where the annotation logically belongs in DWARF:

  • If the tag applies to volatile int then it should be a child of the volatile DIE.
  • If the tag applies to int it should be a child of the integer type DIE.

I'm aware of three use-cases for "btf_type_tag": "percpu", "user", "rcu" marks in kernel code. First is used for per-cpu variables, second is used to mark pointers to userspace memory, third to mark the data that should be read with RCU lock. For these use-cases there is no difference between e.g. const __user *ptr and __user const *ptr. I don't think the difference was ever thought through.

At the moment I can't say that one is more correct than the other, so I guess I have no real objection placing the annotation on the CVR modifier.

  • technical: what is the point where such reordering is easiest to implement? For LLVM / pahole stack the path of least resistance is to modify DWARF generation (but again @dblaikie might disagree). How hard is it to adjust DI generation in GCC in this way?

It is not a difficult change in GCC. Some added complexity but not too much. I have a prototype that seems to work from initial testing.

Same here, it is a small change in clang code, but I don't like that it has to be handled as a special case.

It is probably simpler than instead modifying the internal transformation to BTF to reorder tags/CVR qualifiers as kernel currently requires.

But I don't think ease of implementation should be a factor unless we are sure the format makes sense.
We are changing the format because the old one was flawed, let's try to make sure we aren't just replacing it with something flawed in a different way because it is easy.

The main issue is that kernel currently expects type tags before modifiers. Even if the kernel code would be modified there is still an issue of backwards compatibility. During kernel build BTF is generated from DWARF and is embedded in a kernel image, it is done by tool named pahole. If we forgo modifiers reordering the embedded BTF would not match format supported by kernel.

So, I assume that reordering _has_ to happen somewhere. Either at DWARF generation stage, or in two places:

  • compiler BPF backend, when BPF programs are compiled;
  • pahole, when BTF is generated from DWARF.

Let me prototype pahole change to see how much of hassle that is, it will take a day or two.

eddyz87 added a comment.

In D143967#4200331 https://reviews.llvm.org/D143967#4200331, @dfaust wrote:

In D143967#4197288 https://reviews.llvm.org/D143967#4197288, @eddyz87 wrote:

...
I think there are two sides to this:

  • conceptual: is it ok to allow annotations for CVR modifiers?

Maybe someone more an expert in DWARF could chime in whether this is problematic?
As far as I know it would be "ok" in the sense that it should not
break DWARF or cause issues for DWARF consumers which do not know
about the tags.

My initial reaction was that placing the annotations on CVR
modifiers makes less sense conceptually than placing them on the
types proper.

I agree conceptually.

That was my initial reaction too. But see below.

But, I suppose it is a question of what precisely the annotated type
is. e.g. in volatile int __attribute__((btf_type_tag("__b"))) b;
is it the case that:

  • type tag "__b" applies to the type volatile int (i.e. cvr quals "bind more closely" than the tag), or
  • type tag "__b" and volatile both apply to int (i.e. cvr quals and type tags are "equal")

For all other cases the new "btf:type_tag" format places the
annotation as a child of exactly the type that is annotated, so I
think the answer determines where the annotation logically belongs
in DWARF:

  • If the tag applies to volatile int then it should be a child of the volatile DIE.
  • If the tag applies to int it should be a child of the integer type DIE.

I think that is the main point. DWARF composes types by chaining DIEs
via DW_AT_type attributes, like in:

const -> volatile -> pointer-to -> int

Conceptually, there are four different types in that chain, not just
one:

  1. const volatile pointer-to int
  2. volatile pointer-to int
  3. pointer-to int
  4. int

Ideally speaking, we would have liked to implement the annotations just
like the qualifiers, so we could have something like:

const -> volatile -> annotated -> pointer-to -> int

That is what would suite us best conceptually speaking. But since that
would break DWARF consumers, we decided to go with a parent-child
relationship instead, having:

const -> volatile -> pointer-to -> int
                         |
                     annotated

Which would be a different situation than, say:

const -> volatile -> pointer-to -> int
  |

annotated

So, I think it only makes sense to have the child annotation node in the
node that starts the type to which the annotation refers. Both
qualifiers and type DIEs are types on that sense.

And now that I think about this. In this conceptual situation:

const -> volatile -> annotated (foo) -> annotated (bar) -> pointer-to -> int

I guess we are encoding it with several DW_TAG_LLVM_annotation children:

const -> volatile -> pointer-to -> int
                         |
               +---------+---------+
               |                   |
          anotated (foo)    annotated (bar)

Basically accumulating/reducing what are conceptually two different
types into one. Since typedefs are explicitly encoded in the chain as
their own node, this situation will only happen when there are several
tags specified consecutively in the source code (this is another reason
why putting the annotation DIEs as children of the qualifiers is
necessary, because otherwise we would be accumulating/reducing
annotation across these nodes and we could end with situations where
annotations get lost.)

When accumulating/reducign tags like above:

Does the ordering matter here? Does it matter? Should we require
keeping the order among the annotation siblings as part of the
DW_TAG_LLVM_annotation specification? Even if we don't, we probably
want to behave the same in both compilers.

In case two identical annotations are applied to the same type, are we
deduplicating them? Should we actually require that? Even if we don't,
we probably want to behave the same in both compilers.

(Note that since DW_TAG_typedef is also a type in the type chain, we
shouldn't worry about

I'm aware of three use-cases for "btf_type_tag": "percpu", "user",
"rcu" marks in kernel code. First is used for per-cpu variables,
second is used to mark pointers to userspace memory, third to mark the
data that should be read with RCU lock. For these use-cases there is
no difference between e.g. const __user *ptr and `__user const
*ptr`. I don't think the difference was ever thought through.

At the moment I can't say that one is more correct than the other,
so I guess I have no real objection placing the annotation on the
CVR modifier.

  • technical: what is the point where such reordering is easiest to

implement? For LLVM / pahole stack the path of least resistance is
to modify DWARF generation (but again @dblaikie might
disagree). How hard is it to adjust DI generation in GCC in this
way?

It is not a difficult change in GCC. Some added complexity but not
too much. I have a prototype that seems to work from initial
testing.

Same here, it is a small change in clang code, but I don't like that it has to be handled as a special case.

It is probably simpler than instead modifying the internal
transformation to BTF to reorder tags/CVR qualifiers as kernel
currently requires.

But I don't think ease of implementation should be a factor unless we are sure the format makes sense.
We are changing the format because the old one was flawed, let's try
to make sure we aren't just replacing it with something flawed in a
different way because it is easy.

The main issue is that kernel currently expects type tags before
modifiers. Even if the kernel code would be modified there is still an
issue of backwards compatibility. During kernel build BTF is generated
from DWARF and is embedded in a kernel image, it is done by tool named
pahole https://github.com/acmel/dwarves. If we forgo modifiers
reordering the embedded BTF would not match format supported by
kernel.

So, I assume that reordering _has_ to happen somewhere. Either at DWARF generation stage, or in two places:

  • compiler BPF backend, when BPF programs are compiled;
  • pahole, when BTF is generated from DWARF.

Let me prototype pahole change to see how much of hassle that is, it will take a day or two.

Repository:

rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION

https://reviews.llvm.org/D143967/new/

https://reviews.llvm.org/D143967

eddyz87 added a comment.

In D143967#4200331 https://reviews.llvm.org/D143967#4200331, @dfaust wrote:

In D143967#4197288 https://reviews.llvm.org/D143967#4197288, @eddyz87 wrote:

...
I think there are two sides to this:

  • conceptual: is it ok to allow annotations for CVR modifiers?

Maybe someone more an expert in DWARF could chime in whether this is problematic?
As far as I know it would be "ok" in the sense that it should not
break DWARF or cause issues for DWARF consumers which do not know
about the tags.

My initial reaction was that placing the annotations on CVR
modifiers makes less sense conceptually than placing them on the
types proper.

I agree conceptually.

That was my initial reaction too. But see below.

But, I suppose it is a question of what precisely the annotated type
is. e.g. in volatile int __attribute__((btf_type_tag("__b"))) b;
is it the case that:

  • type tag "__b" applies to the type volatile int (i.e. cvr quals "bind more closely" than the tag), or
  • type tag "__b" and volatile both apply to int (i.e. cvr quals and type tags are "equal")

For all other cases the new "btf:type_tag" format places the
annotation as a child of exactly the type that is annotated, so I
think the answer determines where the annotation logically belongs
in DWARF:

  • If the tag applies to volatile int then it should be a child of the volatile DIE.
  • If the tag applies to int it should be a child of the integer type DIE.

I think that is the main point. DWARF composes types by chaining DIEs
via DW_AT_type attributes, like in:

const -> volatile -> pointer-to -> int

Conceptually, there are four different types in that chain, not just
one:

  1. const volatile pointer-to int
  2. volatile pointer-to int
  3. pointer-to int
  4. int

Ideally speaking, we would have liked to implement the annotations just
like the qualifiers, so we could have something like:

const -> volatile -> annotated -> pointer-to -> int

That is what would suite us best conceptually speaking. But since that
would break DWARF consumers, we decided to go with a parent-child
relationship instead, having:

const -> volatile -> pointer-to -> int
                         |
                     annotated

Which would be a different situation than, say:

const -> volatile -> pointer-to -> int
  |
annotated

So, I think it only makes sense to have the child annotation node in the
node that starts the type to which the annotation refers. Both
qualifiers and type DIEs are types on that sense.

Ok, I understand your point.
For example if we have:

volatile int __tag1 b;

The tag applies to the type (volatile int) and we have 3 types total:

  1. __tag1 volatile int
  2. volatile int
  3. int

And in DWARF the tag must be child of the qualifier:

var(b) -> volatile -> int
             |
            __tag1

Doing it the other way would be incorrect - if we instead did this:

var(b) -> volatile -> int
                        |
                     __tag1

We encode a different chain of types:

var(b) -> volatile -> __tag1 -> int

Which reflects a different set of types than the ones we actually have:

  1. volatile __tag1 int
  2. __tag1 int
  3. int

So I guess it is clear, conceptually the tags do belong on the qualifiers.

And now that I think about this. In this conceptual situation:

const -> volatile -> annotated (foo) -> annotated (bar) -> pointer-to -> int

I guess we are encoding it with several DW_TAG_LLVM_annotation children:

const -> volatile -> pointer-to -> int
                         |
               +---------+---------+
               |                   |
          anotated (foo)    annotated (bar)

Basically accumulating/reducing what are conceptually two different
types into one. Since typedefs are explicitly encoded in the chain as
their own node, this situation will only happen when there are several
tags specified consecutively in the source code (this is another reason
why putting the annotation DIEs as children of the qualifiers is
necessary, because otherwise we would be accumulating/reducing
annotation across these nodes and we could end with situations where
annotations get lost.)

When accumulating/reducign tags like above:

Does the ordering matter here? Does it matter? Should we require
keeping the order among the annotation siblings as part of the
DW_TAG_LLVM_annotation specification? Even if we don't, we probably
want to behave the same in both compilers.

A quick check suggests order does not matter for DWARF for normal qualifiers:

const volatile int x;
volatile const int y;

Both compilers emit something like this ('x' and 'y' share type):

0x0000001e:   DW_TAG_variable
                DW_AT_name	("x")
                DW_AT_type	(0x00000029 "const volatile int")

0x00000029:   DW_TAG_const_type
                DW_AT_type	(0x0000002e "volatile int")

0x0000002e:   DW_TAG_volatile_type
                DW_AT_type	(0x00000033 "int")

0x00000033:   DW_TAG_base_type
                DW_AT_name	("int")

0x00000037:   DW_TAG_variable
                DW_AT_name	("y")
                DW_AT_type	(0x00000029 "const volatile int")

Interestingly, GCC and LLVM do not behave exactly the same in that in
GCC the DWARF types for both variables are

x, y   -> volatile -> const -> int

while in LLVM (shown above) it is

x, y -> const -> volatile -> int

So I'm not sure we necessarily need both compilers to be exactly the same.

In case two identical annotations are applied to the same type, are we
deduplicating them? Should we actually require that? Even if we don't,
we probably want to behave the same in both compilers.

FWIW, if you write something like:

int __attribute__((btf_type_tag("__c"))) __attribute__((btf_type_tag("__c"))) c;

GCC de-duplicates attributes with the same value, so the tag "__c" will only appear
once. I don't know if this would be easy to change if we wanted to.

(Note that since DW_TAG_typedef is also a type in the type chain, we
shouldn't worry about

Looks like this got cutoff.

Ok, I understand your point.
For example if we have:

volatile int __tag1 b;

The tag applies to the type (volatile int) and we have 3 types total:

  1. __tag1 volatile int
  2. volatile int
  3. int

And in DWARF the tag must be child of the qualifier:

var(b) -> volatile -> int
             |
            __tag1

Doing it the other way would be incorrect - if we instead did this:

var(b) -> volatile -> int
                        |
                     __tag1

We encode a different chain of types:

var(b) -> volatile -> __tag1 -> int

Which reflects a different set of types than the ones we actually have:

  1. volatile __tag1 int
  2. __tag1 int
  3. int

So I guess it is clear, conceptually the tags do belong on the qualifiers.

But that's just an encoding quirk. For the C language "const volatile int *" and "volatile const int *" have the same meaning. In clang AST such qualifiers are represented as bitfields.

If we consider type tag a qualifier, conceptually it would be simpler if ordering would not matter for it as well, so that the following have identical meaning:

  • __tag volatile int
  • volatile __tag int

And now that I think about this. In this conceptual situation:

const -> volatile -> annotated (foo) -> annotated (bar) -> pointer-to -> int

I guess we are encoding it with several DW_TAG_LLVM_annotation children:

const -> volatile -> pointer-to -> int
                         |
               +---------+---------+
               |                   |
          anotated (foo)    annotated (bar)

Basically accumulating/reducing what are conceptually two different
types into one. Since typedefs are explicitly encoded in the chain as
their own node, this situation will only happen when there are several
tags specified consecutively in the source code (this is another reason
why putting the annotation DIEs as children of the qualifiers is
necessary, because otherwise we would be accumulating/reducing
annotation across these nodes and we could end with situations where
annotations get lost.)

When accumulating/reducign tags like above:

Does the ordering matter here? Does it matter? Should we require
keeping the order among the annotation siblings as part of the
DW_TAG_LLVM_annotation specification? Even if we don't, we probably
want to behave the same in both compilers.

A quick check suggests order does not matter for DWARF for normal qualifiers:

const volatile int x;
volatile const int y;

Both compilers emit something like this ('x' and 'y' share type):

0x0000001e:   DW_TAG_variable
                DW_AT_name	("x")
                DW_AT_type	(0x00000029 "const volatile int")

0x00000029:   DW_TAG_const_type
                DW_AT_type	(0x0000002e "volatile int")

0x0000002e:   DW_TAG_volatile_type
                DW_AT_type	(0x00000033 "int")

0x00000033:   DW_TAG_base_type
                DW_AT_name	("int")

0x00000037:   DW_TAG_variable
                DW_AT_name	("y")
                DW_AT_type	(0x00000029 "const volatile int")

Interestingly, GCC and LLVM do not behave exactly the same in that in
GCC the DWARF types for both variables are

x, y   -> volatile -> const -> int

while in LLVM (shown above) it is

x, y -> const -> volatile -> int

So I'm not sure we necessarily need both compilers to be exactly the same.

Unfortunately, for ordering we are also limited by a consumer. In the kernel case the ordering of const/volatile/restrict does not matter, but it does want type tags to come first.

In case two identical annotations are applied to the same type, are we
deduplicating them? Should we actually require that? Even if we don't,
we probably want to behave the same in both compilers.

FWIW, if you write something like:

int __attribute__((btf_type_tag("__c"))) __attribute__((btf_type_tag("__c"))) c;

GCC de-duplicates attributes with the same value, so the tag "__c" will only appear
once. I don't know if this would be easy to change if we wanted to.

LLVM de-duplicates as well.

AFAIK current type tag use cases do not require to distinguish "const tag" from "tag const" and do not need to handle "tag tag" separately from "__tag". So, I think we should stick with simpler semantics.

eddyz87 added a comment.

In D143967#4231517 https://reviews.llvm.org/D143967#4231517, @dfaust wrote:

In D143967#4220233 https://reviews.llvm.org/D143967#4220233, @jemarch wrote:

Ok, I understand your point.
For example if we have:

volatile int __tag1 b;

The tag applies to the type (volatile int) and we have 3 types total:

  1. __tag1 volatile int
  2. volatile int
  3. int

And in DWARF the tag must be child of the qualifier:

var(b) -> volatile -> int
             |
            __tag1

Doing it the other way would be incorrect - if we instead did this:

var(b) -> volatile -> int
                        |
                     __tag1

We encode a different chain of types:

var(b) -> volatile -> __tag1 -> int

Which reflects a different set of types than the ones we actually have:

  1. volatile __tag1 int
  2. __tag1 int
  3. int

So I guess it is clear, conceptually the tags do belong on the qualifiers.

But that's just an encoding quirk. For the C language "const volatile
int *" and "volatile const int *" have the same meaning. In clang AST
such qualifiers are represented as bitfields.

If we consider type tag a qualifier, conceptually it would be simpler
if ordering would not matter for it as well, so that the following
have identical meaning:

  • __tag volatile int
  • volatile __tag int

Makes sense.

But then I think we should allow the tags to be anywhere in the chain in
the DWARF, and not necessarily in the qualified type. For example
given:

typedef const int bar;
volatile bar __tag1 foo;

The resulting DWARF type chain is:

volatile -> typedef (bar) -> const -> int

IMO we should allow __tag1 to be a child of any of the elements of the
chain, like:

volatile -> typedef (bar) -> const -> int
                 |
               __tag1

or

volatile -> typedef (bar) -> const -> int
                                       |
                                    __tag1

or

volatile -> typedef (bar) -> const -> int
                               |
                            __tag1

or

volatile -> typedef (bar) -> const -> int
   |
 __tag1

would be all accepted and equivalent. Also Faust noted that GCC
sometimes optimizes out the typedef nodes, so we need to be able to
place the tags (in the internal representatinos) in the typedef'ed type
instead.

And now that I think about this. In this conceptual situation:

const -> volatile -> annotated (foo) -> annotated (bar) -> pointer-to -> int

I guess we are encoding it with several DW_TAG_LLVM_annotation children:

const -> volatile -> pointer-to -> int
                         |
               +---------+---------+
               |                   |
          anotated (foo)    annotated (bar)

Basically accumulating/reducing what are conceptually two different
types into one. Since typedefs are explicitly encoded in the chain as
their own node, this situation will only happen when there are several
tags specified consecutively in the source code (this is another reason
why putting the annotation DIEs as children of the qualifiers is
necessary, because otherwise we would be accumulating/reducing
annotation across these nodes and we could end with situations where
annotations get lost.)

When accumulating/reducign tags like above:

Does the ordering matter here? Does it matter? Should we require
keeping the order among the annotation siblings as part of the
DW_TAG_LLVM_annotation specification? Even if we don't, we probably
want to behave the same in both compilers.

A quick check suggests order does not matter for DWARF for normal qualifiers:

const volatile int x;
volatile const int y;

Both compilers emit something like this ('x' and 'y' share type):

0x0000001e:   DW_TAG_variable
                DW_AT_name	("x")
                DW_AT_type	(0x00000029 "const volatile int")

0x00000029:   DW_TAG_const_type
                DW_AT_type	(0x0000002e "volatile int")

0x0000002e:   DW_TAG_volatile_type
                DW_AT_type	(0x00000033 "int")

0x00000033:   DW_TAG_base_type
                DW_AT_name	("int")

0x00000037:   DW_TAG_variable
                DW_AT_name	("y")
                DW_AT_type	(0x00000029 "const volatile int")

Interestingly, GCC and LLVM do not behave exactly the same in that in
GCC the DWARF types for both variables are

x, y   -> volatile -> const -> int

while in LLVM (shown above) it is

x, y -> const -> volatile -> int

So I'm not sure we necessarily need both compilers to be exactly the same.

Unfortunately, for ordering we are also limited by a consumer. In the
kernel case the ordering of const/volatile/restrict does not matter,
but it does want type tags to come first.

In case two identical annotations are applied to the same type, are we
deduplicating them? Should we actually require that? Even if we don't,
we probably want to behave the same in both compilers.

FWIW, if you write something like:

int __attribute__((btf_type_tag("__c"))) __attribute__((btf_type_tag("__c"))) c;

GCC de-duplicates attributes with the same value, so the tag "__c" will only appear
once. I don't know if this would be easy to change if we wanted to.

LLVM de-duplicates as well.

AFAIK current type tag use cases do not require to distinguish "const
tag" from "tag const" and do not need to handle "tag tag"
separately from "__tag". So, I think we should stick with simpler
semantics.

Repository:

rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION

https://reviews.llvm.org/D143967/new/

https://reviews.llvm.org/D143967

eddyz87 added a comment.
...
If we consider type tag a qualifier, conceptually it would be simpler
if ordering would not matter for it as well, so that the following
have identical meaning:

  • __tag volatile int
  • volatile __tag int

Makes sense.

But then I think we should allow the tags to be anywhere in the chain in
the DWARF, and not necessarily in the qualified type. For example
given:

typedef const int bar;
volatile bar __tag1 foo;

The resulting DWARF type chain is:

volatile -> typedef (bar) -> const -> int

IMO we should allow __tag1 to be a child of any of the elements of the
chain, like:

volatile -> typedef (bar) -> const -> int
                 |
               __tag1

or

volatile -> typedef (bar) -> const -> int
                                       |
                                    __tag1

or

volatile -> typedef (bar) -> const -> int
                               |
                            __tag1

or

volatile -> typedef (bar) -> const -> int
   |
 __tag1

would be all accepted and equivalent. Also Faust noted that GCC
sometimes optimizes out the typedef nodes, so we need to be able to
place the tags (in the internal representatinos) in the typedef'ed type
instead.

Well, it's a logical consequence, I guess.
In practice I'm going to emit it like below:

volatile -> const -> int
                      |
                   __tag1

But please remember that in BTF it would have to be encoded like this:

__tag1 -> volatile -> const -> int

(that's how kernel expects it, we can change the kernel but will loose backwards compatibility).
For DWARF -> BTF transformation in pahole I will make the necessary modifications, but BTF might also be emitted C->BPF backend.

@yonghong-song, @anakryiko, what do you think?

eddyz87 added a subscriber: anakryiko.
eddyz87 added a comment.

In D143967#4231746 https://reviews.llvm.org/D143967#4231746, @jemarch wrote:

eddyz87 added a comment.
...
If we consider type tag a qualifier, conceptually it would be simpler
if ordering would not matter for it as well, so that the following
have identical meaning:

  • __tag volatile int
  • volatile __tag int

Makes sense.

But then I think we should allow the tags to be anywhere in the chain in
the DWARF, and not necessarily in the qualified type. For example
given:

typedef const int bar;
volatile bar __tag1 foo;

The resulting DWARF type chain is:

volatile -> typedef (bar) -> const -> int

IMO we should allow __tag1 to be a child of any of the elements of the
chain, like:

volatile -> typedef (bar) -> const -> int
                 |
               __tag1

or

volatile -> typedef (bar) -> const -> int
                                       |
                                    __tag1

or

volatile -> typedef (bar) -> const -> int
                               |
                            __tag1

or

volatile -> typedef (bar) -> const -> int
   |
 __tag1

would be all accepted and equivalent. Also Faust noted that GCC
sometimes optimizes out the typedef nodes, so we need to be able to
place the tags (in the internal representatinos) in the typedef'ed type
instead.

Well, it's a logical consequence, I guess.
In practice I'm going to emit it like below:

volatile -> const -> int
                      |
                   __tag1

We can probably do the same in GCC, to be consistent.

But please remember that in BTF it would have to be encoded like this:

__tag1 -> volatile -> const -> int

(that's how kernel expects it, we can change the kernel but will loose
backwards compatibility). For DWARF -> BTF transformation in pahole
I will make the necessary modifications, but BTF might also be emitted
C->BPF backend.

We can do the same thing when we generate BTF directly from GCC.
Faust, do you agree with the above?

@yonghong-song, @anakryiko, what do you think?

Repository:

rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION

https://reviews.llvm.org/D143967/new/

https://reviews.llvm.org/D143967

eddyz87 added a subscriber: anakryiko.
eddyz87 added a comment.

In D143967#4231746 https://reviews.llvm.org/D143967#4231746, @jemarch wrote:

eddyz87 added a comment.
...
If we consider type tag a qualifier, conceptually it would be simpler
if ordering would not matter for it as well, so that the following
have identical meaning:

  • __tag volatile int
  • volatile __tag int

Makes sense.

But then I think we should allow the tags to be anywhere in the chain in
the DWARF, and not necessarily in the qualified type. For example
given:

typedef const int bar;
volatile bar __tag1 foo;

The resulting DWARF type chain is:

volatile -> typedef (bar) -> const -> int

IMO we should allow __tag1 to be a child of any of the elements of the
chain, like:

volatile -> typedef (bar) -> const -> int
                 |
               __tag1

or

volatile -> typedef (bar) -> const -> int
                                       |
                                    __tag1

or

volatile -> typedef (bar) -> const -> int
                               |
                            __tag1

or

volatile -> typedef (bar) -> const -> int
   |
 __tag1

would be all accepted and equivalent. Also Faust noted that GCC
sometimes optimizes out the typedef nodes, so we need to be able to
place the tags (in the internal representatinos) in the typedef'ed type
instead.

Well, it's a logical consequence, I guess.
In practice I'm going to emit it like below:

volatile -> const -> int
                      |
                   __tag1

We can probably do the same in GCC, to be consistent.

But please remember that in BTF it would have to be encoded like this:

__tag1 -> volatile -> const -> int

(that's how kernel expects it, we can change the kernel but will loose
backwards compatibility). For DWARF -> BTF transformation in pahole
I will make the necessary modifications, but BTF might also be emitted
C->BPF backend.

We can do the same thing when we generate BTF directly from GCC.
Faust, do you agree with the above?

OK, that's fine with me.

eddyz87 added a subscriber: anakryiko.
eddyz87 added a comment.

In D143967#4231746 https://reviews.llvm.org/D143967#4231746, @jemarch wrote:

eddyz87 added a comment.
...
If we consider type tag a qualifier, conceptually it would be simpler
if ordering would not matter for it as well, so that the following
have identical meaning:

  • __tag volatile int
  • volatile __tag int

Makes sense.

But then I think we should allow the tags to be anywhere in the chain in
the DWARF, and not necessarily in the qualified type. For example
given:

typedef const int bar;
volatile bar __tag1 foo;

The resulting DWARF type chain is:

volatile -> typedef (bar) -> const -> int

IMO we should allow __tag1 to be a child of any of the elements of the
chain, like:

volatile -> typedef (bar) -> const -> int
                 |
               __tag1

or

volatile -> typedef (bar) -> const -> int
                                       |
                                    __tag1

or

volatile -> typedef (bar) -> const -> int
                               |
                            __tag1

or

volatile -> typedef (bar) -> const -> int
   |
 __tag1

would be all accepted and equivalent. Also Faust noted that GCC
sometimes optimizes out the typedef nodes, so we need to be able to
place the tags (in the internal representatinos) in the typedef'ed type
instead.

Well, it's a logical consequence, I guess.
In practice I'm going to emit it like below:

volatile -> const -> int
                      |
                   __tag1

But please remember that in BTF it would have to be encoded like this:

__tag1 -> volatile -> const -> int

(that's how kernel expects it, we can change the kernel but will loose
backwards compatibility).

Thinking about typedef, some C cases may be problematic if we adopt the
flexible rule we are discussing:

typedef int bar;
const bar __tag1 var1;
bar var2;

DWARF:

var1 -> const -> typedef (bar) -> int
         |           ^
        __tag1       |
                     |
var2 ----------------+

If we would allow __tag1 to be "moved" to either the typedef DWARF node
or the node for int like this:

DWARF:

var1 -> const -> typedef (bar) -> int
                     ^             |
                     |          __tag1
var2 ----------------+

Then the tag1 would also apply to var2's type. But I would say in the
C snippet above
tag1 should apply to the type of var1, but not
to the type of var2.

This makes me think we shouldn't allow tags to "jump" over typedef
nodes, in neither DWARF nor BTF.

PS: I am a bit concerned with the fact that the kernel's interpretation

of BTF is so rigit in this case as to assume C's type system
semantics when it comes to type qualifiers.  Other languages may be
coming to the BPF world (Rust, for example) and in these languages
the ordering of type qualifiers may actually matter.  There is a
reason why DWARF encodes qualifiers as explicit nodes in the type
chain rather than as attributes of the type nodes.

Thinking about typedef, some C cases may be problematic if we adopt the
flexible rule we are discussing:

typedef int bar;
const bar __tag1 var1;
bar var2;

DWARF:

var1 -> const -> typedef (bar) -> int
         |           ^
        __tag1       |
                     |
var2 ----------------+

If we would allow __tag1 to be "moved" to either the typedef DWARF node
or the node for int like this:

DWARF:

var1 -> const -> typedef (bar) -> int
                     ^             |
                     |          __tag1
var2 ----------------+

Then the tag1 would also apply to var2's type. But I would say in the
C snippet above
tag1 should apply to the type of var1, but not
to the type of var2.

I'm not sure I understand, the DWARF #2 is not a valid representation of the program for all the reasons you write.
Current LLVM implementation should generate DWARF #1 in this case.
If some tooling applies such tags movement it should also apply appropriate copying of tags, e.g. it should transform DWARF like this:

var1 -> const -> typedef (bar) -> int
                                   |
                                __tag1

var2 ----------> typedef (bar) -> int

(and it is what needs to be implemented in pahole to get BTF qualifiers ordering expected by kernel, but the move is in the opposite direction).

PS: I am a bit concerned with the fact that the kernel's interpretation

of BTF is so rigit in this case as to assume C's type system
semantics when it comes to type qualifiers.  Other languages may be
coming to the BPF world (Rust, for example) and in these languages
the ordering of type qualifiers may actually matter.  There is a
reason why DWARF encodes qualifiers as explicit nodes in the type
chain rather than as attributes of the type nodes.

Need to read a bit about Rust, can't comment right now.

eddyz87 added a comment.

In D143967#4232089 https://reviews.llvm.org/D143967#4232089, @jemarch wrote:

Thinking about typedef, some C cases may be problematic if we adopt the
flexible rule we are discussing:

typedef int bar;
const bar __tag1 var1;
bar var2;

DWARF:

var1 -> const -> typedef (bar) -> int
         |           ^
        __tag1       |
                     |
var2 ----------------+

If we would allow __tag1 to be "moved" to either the typedef DWARF node
or the node for int like this:

DWARF:

var1 -> const -> typedef (bar) -> int
                     ^             |
                     |          __tag1
var2 ----------------+

Then the tag1 would also apply to var2's type. But I would say in the
C snippet above
tag1 should apply to the type of var1, but not
to the type of var2.

I'm not sure I understand, the DWARF #2 is not a valid representation
of the program for all the reasons you write. Current LLVM
implementation should generate DWARF #1 in this case.

Yes that was my point: tags can cross qualifiers and still keep C
language semantics, but not (all) typedefs.

If some tooling applies such tags movement it should also apply
appropriate copying of tags, e.g. it should transform DWARF like this:

var1 -> const -> typedef (bar) -> int
                                   |
                                __tag1

var2 ----------> typedef (bar) -> int

(and it is what needs to be implemented in pahole to get BTF
qualifiers ordering expected by kernel, but the move is in the
opposite direction).

So the kernel expects tags to precede typedefs as well as qualifiers?
i.e. given this DWARF:

var1 -> const -> typedef (bar) -> int
                     ^             |
                     |           __tag1
                     |
var2 ----------------+

We have to transform to these two BTF type chains:

var1 -> __tag1 -> const -> typedef (bar) -> int
                                ^
                                |
var2 -> __tag1 -----------------+

Correct?

PS: I am a bit concerned with the fact that the kernel's interpretation

of BTF is so rigit in this case as to assume C's type system
semantics when it comes to type qualifiers.  Other languages may be
coming to the BPF world (Rust, for example) and in these languages
the ordering of type qualifiers may actually matter.  There is a
reason why DWARF encodes qualifiers as explicit nodes in the type
chain rather than as attributes of the type nodes.

Need to read a bit about Rust, can't comment right now.

Repository:

rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION

https://reviews.llvm.org/D143967/new/

https://reviews.llvm.org/D143967

tbaeder removed a subscriber: tbaeder.Mar 30 2023, 6:39 AM

If some tooling applies such tags movement it should also apply
appropriate copying of tags, e.g. it should transform DWARF like this:

var1 -> const -> typedef (bar) -> int
                                   |
                                __tag1

var2 ----------> typedef (bar) -> int

(and it is what needs to be implemented in pahole to get BTF
qualifiers ordering expected by kernel, but the move is in the
opposite direction).

So the kernel expects tags to precede typedefs as well as qualifiers?
i.e. given this DWARF:

var1 -> const -> typedef (bar) -> int
                     ^             |
                     |           __tag1
                     |
var2 ----------------+

We have to transform to these two BTF type chains:

var1 -> __tag1 -> const -> typedef (bar) -> int
                                ^
                                |
var2 -> __tag1 -----------------+

Correct?

This is controlled by the following code (btf.c:btf_check_type_tags()):

https://elixir.bootlin.com/linux/latest/source/kernel/bpf/btf.c#L5349

It uses btf_type_is_modifier() utility function, which treats typedef as a modifier.

So, in theory the transformation moving tags past typedef is necessary. On the other hand, such transformation is not applied now, and this does not cause issues. Which suggests that there are no cases in practice where type tag follows typedef (and thus, this is not required for backwards compatibility).

Moving type tags cross typedef feels wrong.

Moving type tags past typedefs would also make C code reconstruction from BTF incomplete. Such reconstruction is used now by e.g. bpftool to create a vmlinux.h header with all kernel type definitions. So, I think type tags should not be moved past typedefs.

Moving type tags past typedefs would also make C code reconstruction from BTF incomplete. Such reconstruction is used now by e.g. bpftool to create a vmlinux.h header with all kernel type definitions. So, I think type tags should not be moved past typedefs.

Yes I agree. Tags should not be moved past typedefs.
In the GCC implementation we are not moving tags past typedefs.

It uses btf_type_is_modifier() utility function, which treats typedef as a modifier.

So, in theory the transformation moving tags past typedef is necessary. On the other hand, such transformation is not applied now, and this does not cause issues. Which suggests that there are no cases in practice where type tag follows typedef (and thus, this is not required for backwards compatibility).

It seems to me then that the appropriate fix is to remove the expectation of this transformation from the verifier (for the reasons above).

If it is not important for backwards compatibility, then I guess it should not be problematic to change in the kernel? (I say, as not a kernel hacker...)

I find it odd that BTF_KIND_TYPEDEF is included in btf_type_is_modifier(), seems like the expectation of this transformation is almost an unintended side effect of that.

Related to btf_type_is_modifier() issue, Not that depending on call site, sometimes typedef is skipped and sometimes are not. So we could keep btf_type_is_modifier() as is and modify the call size to not skip typedef in specific places. Or we could remove typedef from the function and add a 'or' in call side if typedef needs to be skipped. The following are a few examples in kernel/bpf/btf.c:

/* Similar to btf_type_skip_modifiers() but does not skip typedefs. */
static const struct btf_type *btf_type_skip_qualifiers(const struct btf *btf,

u32 id)

{

const struct btf_type *t = btf_type_by_id(btf, id);

while (btf_type_is_modifier(t) &&
       BTF_INFO_KIND(t->info) != BTF_KIND_TYPEDEF) {
        t = btf_type_by_id(btf, t->type);
}      
       
return t;

}

So we can add necessary changes (in specific context) to make the btf_type_tag approach with this patch work without breaking backward compatibility.

I see the following in the Summary:

Type tag for CVR modifier type

C:

volatile int __attribute__((btf_type_tag("__b"))) b;

DWARF:

0x31:   DW_TAG_variable
          DW_AT_name      ("b")
          DW_AT_type      (0x3c "volatile int")

0x3c:   DW_TAG_volatile_type
          DW_AT_type      (0x45 "int")

0x41:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf:type_tag")
            DW_AT_const_value     ("__b")

Basically, the btf_type_tag is put under 'volatile' type in dwarf. Is this what gcc also agrees with?

eddyz87 planned changes to this revision.Tue, May 2, 9:31 AM

I see the following in the Summary:

Type tag for CVR modifier type

C:

volatile int __attribute__((btf_type_tag("__b"))) b;

DWARF:

0x31:   DW_TAG_variable
          DW_AT_name      ("b")
          DW_AT_type      (0x3c "volatile int")

0x3c:   DW_TAG_volatile_type
          DW_AT_type      (0x45 "int")

0x41:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf:type_tag")
            DW_AT_const_value     ("__b")

Basically, the btf_type_tag is put under 'volatile' type in dwarf. Is this what gcc also agrees with?

We decided against it but changes to force all type tags after CVR modifiers are not implemented, should probably add a wip tag.

eddyz87 updated this revision to Diff 522338.Mon, May 15, 2:11 PM
eddyz87 edited the summary of this revision. (Show Details)

Changes to avoid attaching type tags to DWARF derived types for const/volatile/restrict qualifiers.

@dblaikie could you help take a look at this patch? Similar to https://reviews.llvm.org/D143966, this patch is the clang frontend change to support new btf_type_tag format, as we discussed and agreed with gcc community (https://lore.kernel.org/bpf/87r0w9jjoq.fsf@oracle.com/). Please let us know if you have any questions. Thanks!

Changes to avoid attaching type tags to DWARF derived types for const/volatile/restrict qualifiers.

I just tested this locally and can confirm it LGTM in terms of implementing the DWARF format we've been discussing and agreed upon.
Also compared against my WIP patches implementing the same in GCC, and looks like it is all consistent (apart from one known bug in my patches)

Changes to avoid attaching type tags to DWARF derived types for const/volatile/restrict qualifiers.

I just tested this locally and can confirm it LGTM in terms of implementing the DWARF format we've been discussing and agreed upon.
Also compared against my WIP patches implementing the same in GCC, and looks like it is all consistent (apart from one known bug in my patches)

Heads-up, it currently fails on the following example:

#define __rcu __attribute__((btf_type_tag("rcu")))

struct cred;
const struct cred *cred;
void commit_creds()
{
  cred = (typeof(*cred) __rcu *)(0);
}

I'm working on the fix.
Also I missed the fact that D145891 (actual BTF generation) needs adjustment as well.

eddyz87 updated this revision to Diff 523233.Wed, May 17, 6:12 PM

Fix for case when CVR qualifier is behind type ignored by UnwrapTypeForDebugInfo().
E.g. for the following example:

const int *foo;
typeof(*foo) __attribute__((btf_type_tag("tag"))) buz;
eddyz87 edited the summary of this revision. (Show Details)Tue, May 23, 2:53 PM

Hi @dblaikie ,

Could you please take a look at this revision (and the previous one in the stack, D143966) or suggest some other reviewer familiar with this part of the code base?