This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo][BPF] Add 'btf:type_tag' annotation in DWARF
AcceptedPublic

Authored by eddyz87 on Feb 13 2023, 4:53 PM.

Details

Summary

Currently btf_type_tag attributes are represented in DWARF as child DIEs with DW_TAG_LLVM_annotation tag. Such attributes are supported for derived types of type DW_TAG_pointer_type and designate that the pointee type should have the attribute.

For example, the following C code:

// (variable :name "g"
//           :type (pointer (int :btf_type_tag "tag1")))
int __attribute__((btf_type_tag("tag1"))) *g;

Corresponds to the following DWARF:

0x1e:   DW_TAG_variable
          DW_AT_name      ("g")
          DW_AT_type      (0x29 "int *")

0x29:   DW_TAG_pointer_type
          DW_AT_type      (0x32 "int")

0x2e:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf_type_tag")
            DW_AT_const_value     ("tag1")

0x32:   DW_TAG_base_type
          DW_AT_name      ("int")

Recent discussion in Kernel BPF mailing list came to conclusion, that such annotations should apply to the annotated type itself (multiple approaches are discussed in the linked thread, "Solution 2" is the one accepted). For example, new DWARF encoding for the code above should look as follows:

0xe:   DW_TAG_variable
         DW_AT_name      ("g")
         DW_AT_type      (0x29 "int *")

0x9:   DW_TAG_pointer_type
         DW_AT_type      (0x32 "int")

0x2:   DW_TAG_base_type
         DW_AT_name      ("int")

0x6:     DW_TAG_LLVM_annotation
           DW_AT_name    ("btf:type_tag")
           DW_AT_const_value     ("tag1")

The goal of this commit is to enable new btf_type_tag encoding scheme for debug information emitted by clang. A new tag name, btf:type_tag, is used so that DWARF consumers could distinguish between old and new attachment semantics.

In order to preserve backwards compatibility both btf_type_tag and btf:type_tag are generated. btf_type_tag might be deprecated in the future. The complete DWARF for the example above looks as follows:

0x1e:   DW_TAG_variable
          DW_AT_name      ("g")
          DW_AT_type      (0x29 "int *")

0x29:   DW_TAG_pointer_type
          DW_AT_type      (0x32 "int")

0x2e:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf_type_tag")
            DW_AT_const_value     ("tag1")

0x32:   DW_TAG_base_type
          DW_AT_name      ("int")

0x36:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf:type_tag")
            DW_AT_const_value     ("tag1")

The commit includes the following changes:

  • New overload for CGDebugInfo::CreateType method is added to handle BTFTagAttributedType type. It creates a temporary "placeholder" DIDerivedType instance with tag set to DW_TAG_LLVM_annotation, base type corresponding to the base type of the BTFTagAttributedType and annotations corresponding to BTFTagAttributedType. All such placeholder instances are registered in the CGDebugInfo::AnnotationPlaceholders vector.
  • This overload is called from CGDebugInfo::CreateTypeNode and it overrides default qualifiers handling. This is necessary to put type tag annotations on the outermost CVR qualifier. Such representation makes downstream DWARF consumers simpler (eventually DWARF is used to construct BTF for kernel and kernel wants all type tag modifiers to precede const modifier).
  • CGDebugInfo::finalize() is modified to replace annotation placeholders with real types. When base type of the annotation placeholder is an instance of DIBasicType, DICompositeType, DIDerivedType, DISubroutineType that type is cloned and replaceAnnotations() method is used to copy annotations from placeholder to clone. Such logic allows to handle types with cyclic references.
  • ASTContext::getBTFTagAttributedType() is updated to ensure that BTFTagAttributedType always wraps QualType w/o local constant/volatile/restricted qualifiers.

Some additional example of DWARF generated after this commit follow (unrelated fields like file name and NULL entries are removed for brevity).

Type tag for primitive type

C:

int __attribute__((btf_type_tag("__a"))) a;

DWARF:

0x1e:   DW_TAG_variable
          DW_AT_name      ("a")
          DW_AT_type      (0x29 "int")

0x29:   DW_TAG_base_type
          DW_AT_name      ("int")

0x2d:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf:type_tag")
            DW_AT_const_value     ("__a")
Type tag for CVR modifier type

C:

volatile int __attribute__((btf_type_tag("__b"))) b;

DWARF:

0x1e:   DW_TAG_variable
          DW_AT_name	("b")
          DW_AT_type	(0x29 "volatile int")

0x29:   DW_TAG_volatile_type
          DW_AT_type	(0x2e "int")

0x2e:   DW_TAG_base_type
          DW_AT_name	("int")
          DW_AT_encoding	(DW_ATE_signed)
          DW_AT_byte_size	(0x04)

0x32:     DW_TAG_LLVM_annotation
            DW_AT_name	("btf:type_tag")
            DW_AT_const_value	("__b")
Type tag for typedef

C:

typedef int foo;
foo __attribute__((btf_type_tag("__c"))) c;

DWARF:

0x49:   DW_TAG_variable
          DW_AT_name      ("c")
          DW_AT_type      (0x54 "foo")

0x54:   DW_TAG_typedef
          DW_AT_type      (0x45 "int")
          DW_AT_name      ("foo")

0x5c:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf:type_tag")
            DW_AT_const_value     ("__c")
Type tag for composite type

C:

struct bar {};
struct bar __attribute__((btf_type_tag("__d"))) d1;
struct bar d2;

DWARF (Note two copies of struct bar one with annotations, one without):

0x60:   DW_TAG_variable
          DW_AT_name      ("d1")
          DW_AT_type      (0x6b "bar")

0x6b:   DW_TAG_structure_type
          DW_AT_name      ("bar")

0x70:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf:type_tag")
            DW_AT_const_value     ("__d")

0x74:   DW_TAG_variable
          DW_AT_name      ("d2")
          DW_AT_type      (0x7f "bar")

0x7f:   DW_TAG_structure_type
          DW_AT_name      ("bar")
Type tag for composite type with circular reference

C:

struct buz {
  struct buz __attribute__((btf_type_tag("__buz_self"))) *self;
} e;

DWARF (Note two copies of struct buz one with annotations, one without):

0x84:   DW_TAG_variable
          DW_AT_name      ("e")
          DW_AT_type      (0x8f "buz")

0x8f:   DW_TAG_structure_type
          DW_AT_name      ("buz")

0x94:     DW_TAG_member
            DW_AT_name    ("self")
            DW_AT_type    (0x9e "buz *")

0x9e:   DW_TAG_pointer_type
          DW_AT_type      (0xa7 "buz")

0xa3:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf_type_tag")
            DW_AT_const_value     ("__buz_self")

0xa7:   DW_TAG_structure_type
          DW_AT_name      ("buz")

0xac:     DW_TAG_member
            DW_AT_name    ("self")
            DW_AT_type    (0x9e "buz *")

0xb5:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf:type_tag")
            DW_AT_const_value     ("__buz_self")
Type tag for subroutine type

C:

void (__attribute__((btf_type_tag("__f"))) *f)(void);

DWARF:

0x000000b9:   DW_TAG_variable
                DW_AT_name      ("f")
                DW_AT_type      (0xc4 "void (*)(")

0x000000c4:   DW_TAG_pointer_type
                DW_AT_type      (0xcd "void (")

0x000000c9:     DW_TAG_LLVM_annotation
                  DW_AT_name    ("btf_type_tag")
                  DW_AT_const_value     ("f")

0x000000cd:   DW_TAG_subroutine_type
                DW_AT_prototyped        (true)

0x000000ce:     DW_TAG_LLVM_annotation
                  DW_AT_name    ("btf:type_tag")
                  DW_AT_const_value     ("__f")

Diff Detail

Event Timeline

eddyz87 created this revision.Feb 13 2023, 4:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2023, 4:53 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
eddyz87 updated this revision to Diff 497835.Feb 15 2023, 3:59 PM
  • Removed DWARF tag 0x6001, use 'btf_type_tag:v2' name instead.
  • Removed CopyBTFTypeTags pass, generate 'btf_type_tag' and 'btf_type_tag:v2' in CGDebugInfo instead.
  • Added tests for BTF deduplication logic for debug info types that differ only in annotations.
eddyz87 retitled this revision from [DebugInfo][BPF] 'annotations' on a target type to represent btf_type_tag to [DebugInfo][BPF] Add 'btf_type_tag:v2' annotation in DWARF.Feb 15 2023, 4:07 PM
eddyz87 edited the summary of this revision. (Show Details)

When I tried to build llvm with the patch, I hit the following errors:

[2198/3751] Building CXX object lib/Target/BPF/CMakeFiles/LLVMBPFCodeGen.dir/BTFDebug.cpp.o
FAILED: lib/Target/BPF/CMakeFiles/LLVMBPFCodeGen.dir/BTFDebug.cpp.o 
/bin/c++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D_LIBCPP_ENABLE_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Target/BPF -I../lib/Target/BPF -Iinclude -I../include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-class-memaccess -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -fvisibility=hidden  -fno-exceptions -fno-rtti -UNDEBUG -std=c++17 -MD -MT lib/Target/BPF/CMakeFiles/LLVMBPFCodeGen.dir/BTFDebug.cpp.o -MF lib/Target/BPF/CMakeFiles/LLVMBPFCodeGen.dir/BTFDebug.cpp.o.d -o lib/Target/BPF/CMakeFiles/LLVMBPFCodeGen.dir/BTFDebug.cpp.o -c ../lib/Target/BPF/BTFDebug.cpp
../lib/Target/BPF/BTFDebug.cpp: In function ‘llvm::DINodeArray lookupAnnotations(const llvm::DIType*)’:
../lib/Target/BPF/BTFDebug.cpp:516:21: error: ‘const class llvm::DIBasicType’ has no member named ‘getAnnotations’; did you mean ‘getEncoding’?
     Annots = SubTy->getAnnotations();
                     ^~~~~~~~~~~~~~
                     getEncoding
../lib/Target/BPF/BTFDebug.cpp:522:21: error: ‘const class llvm::DISubroutineType’ has no member named ‘getAnnotations’
     Annots = SubTy->getAnnotations();
                     ^~~~~~~~~~~~~~
[2272/3751] Building CXX object lib/Passes/CMakeFiles/LLVMPasses.dir/PassBuilderPipelines.cpp.o
In file included from ../lib/Passes/PassBuilderPipelines.cpp:113:
../include/llvm/Transforms/Scalar/SROA.h:95:7: warning: ‘llvm::SROAPass’ declared with greater visibility than the type of its field ‘llvm::SROAPass::SelectsToRewrite’ [-Wattributes]
 class SROAPass : public PassInfoMixin<SROAPass> {
       ^~~~~~~~
[2279/3751] Building CXX object lib/Passes/CMakeFiles/LLVMPasses.dir/PassBuilder.cpp.o
In file included from ../lib/Passes/PassBuilder.cpp:214:
../include/llvm/Transforms/Scalar/SROA.h:95:7: warning: ‘llvm::SROAPass’ declared with greater visibility than the type of its field ‘llvm::SROAPass::SelectsToRewrite’ [-Wattributes]
 class SROAPass : public PassInfoMixin<SROAPass> {
       ^~~~~~~~
ninja: build stopped: subcommand failed.
[yhs@devbig309.ftw3 ~/work/llvm-project/llvm/build (main)]$

The top commit is

959216f9b1f1  [AMDGPU] MIR-Tests for Multiplication using KBA

Could you take a look?

My cmake command line:

cmake .. -DCMAKE_BUILD_TYPE=Release -G Ninja \
    -DLLVM_ENABLE_PROJECTS="clang;lld" \
    -DLLVM_TARGETS_TO_BUILD="BPF;X86" \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DLLVM_ENABLE_ZLIB=ON \
    -DCMAKE_INSTALL_PREFIX=$PWD/install

../lib/Target/BPF/BTFDebug.cpp: In function ‘llvm::DINodeArray lookupAnnotations(const llvm::DIType*)’:
../lib/Target/BPF/BTFDebug.cpp:516:21: error: ‘const class llvm::DIBasicType’ has no member named ‘getAnnotations’; did you mean ‘getEncoding’?

Annots = SubTy->getAnnotations();
                ^~~~~~~~~~~~~~
                getEncoding

../lib/Target/BPF/BTFDebug.cpp:522:21: error: ‘const class llvm::DISubroutineType’ has no member named ‘getAnnotations’

Annots = SubTy->getAnnotations();
                ^~~~~~~~~~~~~~

Hi Yonghong,

Looking at the error it seems that you applied only the changes from D143967, however it depends on D143966. I organized these two as a stack to simplify the review:

  • D143966 adds annotations attribute for DIBasicType and DISubroutineType;
  • D143967 adds logic for btf_type_tag:v2 generation.

Anyways, I've rebased the revisions against current main (dad3741e3bdc [PowerPC] clang-format isPPC64ELFv2ABI() function. NFC) and see one failing test for ninja check-all target:

Failed Tests (1):
  Clang :: CodeGen/dwarf-version.c

However, it fails without my changes as well.
I'll push the rebased revisions shortly.

eddyz87 updated this revision to Diff 499281.Feb 21 2023, 1:34 PM
eddyz87 edited the summary of this revision. (Show Details)

Rebase

eddyz87 updated this revision to Diff 502033.Mar 2 2023, 5:59 PM

Rebase, change for tag name from btf_type_tag:v2 to btf:type_tag.

eddyz87 edited the summary of this revision. (Show Details)Mar 2 2023, 6:02 PM
eddyz87 retitled this revision from [DebugInfo][BPF] Add 'btf_type_tag:v2' annotation in DWARF to [DebugInfo][BPF] Add 'btf:type_tag' annotation in DWARF.
eddyz87 updated this revision to Diff 502321.Mar 3 2023, 5:17 PM

Another rebase, hope to fix windows build issue.

eddyz87 updated this revision to Diff 503198.Mar 7 2023, 5:52 PM

Fix support for cyclic types

Update to handle situations like below:

struct bravo {
  struct bravo __tag(a) * a;
} g;

During DI construction annotations are now represneted by
"annotation placeholders", special kind of derived type.
These placeholders are replaced by actual types with
proper "annotations" field in CGDebugInfo::finalize().

eddyz87 updated this revision to Diff 504493.Mar 12 2023, 6:45 PM
  • BTFDebug changes moved to a separate revision (to simplify the review);
  • Added test case for circular type reference handling.
eddyz87 published this revision for review.Mar 13 2023, 3:53 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 13 2023, 3:53 PM
eddyz87 edited the summary of this revision. (Show Details)Mar 15 2023, 8:42 AM

Hi @jemarch,

Could you please take a look to verify that this implementation is on the same page with what is planned for GCC?

eddyz87 added a subscriber: jemarch.
eddyz87 added a comment.

Hi @jemarch,

Could you please take a look to verify that this implementation is on
the same page with what is planned for GCC?

Sure. Can you please add David Faust as a subscriber as well? I don't
know if he has an account in reviews.llvm.org.

Repository:

rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION

https://reviews.llvm.org/D143967/new/

https://reviews.llvm.org/D143967

dfaust added a subscriber: dfaust.Mar 15 2023, 9:14 AM

Hi @dfaust ,

Two head-ups, the following items turned out to be important when I tested using kernel BPF testsuite:

  • in the final BTF, type tags have to precede CVR modifiers, e.g. TYPE_TAG 'foo' -> CONST -> INT. Right now pahole does not do any reordering, so I ended up putting the type tag annotations on the DIE with outermost modifier. Will see if DI maintainers would be ok with this.
  • CO-RE relocation entries have to point to the actual type, not type tag. For both field relocations and variable relocations (this is related to the next patch in a stack).

Type tag for CVR modifier type

C:

volatile int __attribute__((btf_type_tag("__b"))) b;

DWARF:

0x31:   DW_TAG_variable
          DW_AT_name      ("b")
          DW_AT_type      (0x3c "volatile int")

0x3c:   DW_TAG_volatile_type
          DW_AT_type      (0x45 "int")

0x41:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf:type_tag")
            DW_AT_const_value     ("__b")

I am not sure about this case. Do we want the tag on the CVR qualifier or on the type being qualified?

Current patches for GCC place the annotation DIE as child of the 'int':

0x0000001e:   DW_TAG_variable
                DW_AT_name	("b")
                DW_AT_type	(0x00000047 "volatile int")

0x00000032:   DW_TAG_base_type
                DW_AT_byte_size	(0x04)
                DW_AT_name	("int")

0x0000003d:     DW_TAG_LLVM_annotation
                  DW_AT_name	("btf:type_tag")
                  DW_AT_const_value	("__b")

0x00000047:   DW_TAG_volatile_type
                DW_AT_type	(0x00000032 "int")

The way I see it both 'volatile' and the type tag are modifying 'int' type here, so the annotation DIE more properly fits as a child of 'int' rather than the 'volatile'.

I don't think we discussed this particular case, and I'm not sure whether there is any precedent here.

WDYT @eddyz87, @jemarch ?

in the final BTF, type tags have to precede CVR modifiers, e.g. TYPE_TAG 'foo' -> CONST -> INT. Right now pahole does not do any reordering, so I ended up putting the type tag annotations on the DIE with outermost modifier. Will see if DI maintainers would be ok with this.

Ok I was not aware of that requirement.
Internally GCC converts DWARF representation to BTF for BTF emission so we get (for the volatile example):

[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[2] TYPE_TAG '__b' type_id=1
[3] VOLATILE '(anon)' type_id=2
[4] VAR 'b' type_id=3, linkage=global

i.e. VOLATILE -> TYPE_TAG -> INT which doesn't meet the above requirement that type tags precede CVR modifiers.

The way I see it both 'volatile' and the type tag are modifying 'int' type here, so the annotation DIE more properly fits as a child of 'int' rather than the 'volatile'.

I don't think we discussed this particular case, and I'm not sure whether there is any precedent here.

WDYT @eddyz87, @jemarch ?

This is related to my previous comment. Kernel BTF parser wants all type tags to precede modifiers (see this code). So such reordering will have to happen either during DWARF generation or by pahole. Currently pahole is "dumb" in a sense that it reflects DWARF structure in BTF effectively one-to-one. Adding such rewrite at a pahole level is possible but one-to-one mapping to DWARF would be lost, so the tool would require significant changes.

So, at least for LLVM its a tradeoff, either do it in the compiler (less code), or in the pahole (more code). I opted for less code :) If you think that this is conceptually unacceptable, I'll do the change on the pahole side (DI maintainers on LLVM side might agree, the patch has not been reviewed yet).

Another option would be to modify the kernel, but this might impede backwards compatibility.

eddyz87 added a comment.

In D143967#4197041 https://reviews.llvm.org/D143967#4197041, @dfaust wrote:

The way I see it both 'volatile' and the type tag are modifying
'int' type here, so the annotation DIE more properly fits as a child
of 'int' rather than the 'volatile'.

I don't think we discussed this particular case, and I'm not sure whether there is any precedent here.

WDYT @eddyz87, @jemarch ?

This is related to my previous comment. Kernel BTF parser wants all
type tags to precede modifiers (see this
https://elixir.bootlin.com/linux/latest/source/kernel/bpf/btf.c#L5348
code). So such reordering will have to happen either during DWARF
generation or by pahole. Currently pahole is "dumb" in a sense
that it reflects DWARF structure in BTF effectively one-to-one. Adding
such rewrite at a pahole level is possible but one-to-one mapping to
DWARF would be lost, so the tool would require significant changes.

So, at least for LLVM its a tradeoff, either do it in the compiler
(less code), or in the pahole (more code). I opted for less code :) If
you think that this is conceptually unacceptable, I'll do the change
on the pahole side (DI maintainers on LLVM side might agree, the patch
has not been reviewed yet).

Another option would be to modify the kernel, but this might impede backwards compatibility.

In GCC we also translate from the internal DWARF structures into BTF.
So, for us, it would also imply to reoder (before generating the BTF
from the interal DWARF) in case we wanted to use a different approach in
DWARF than in BTF.

I think the question is: what are the consequences of having the
annotation DIE as a child of the qualifiers instead of the qualified
type?

Repository:

rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION

https://reviews.llvm.org/D143967/new/

https://reviews.llvm.org/D143967

! In D143967#4197142, @jemarch wrote:
...
In GCC we also translate from the internal DWARF structures into BTF.
So, for us, it would also imply to reoder (before generating the BTF
from the interal DWARF) in case we wanted to use a different approach in
DWARF than in BTF.

It's not only the BTF generated for BPF programs. There are two ways
to obtain BTF that we should worry about:

  1. BTF generated for BPF programs by compiler;
  2. BTF generated during kernel build for the whole kernel, this one is generated by pahole from DWARF.

I think the question is: what are the consequences of having the
annotation DIE as a child of the qualifiers instead of the qualified
type?

Are there any? I just tried the following code with GDB:

volatile int __attribute__((btf_type_tag("__a"))) a = 1;

void foo(void) {
  a = 2;
}

int main(int argc, char **argv) {
  foo();
  return 0;
}

No warnings, a can be inspected.

I think there are two sides to this:

  • conceptual: is it ok to allow annotations for CVR modifiers?
  • technical: what is the point where such reordering is easiest to implement? For LLVM / pahole stack the path of least resistance is to modify DWARF generation (but again @dblaikie might disagree). How hard is it to adjust DI generation in GCC in this way?

...
I think there are two sides to this:

  • conceptual: is it ok to allow annotations for CVR modifiers?

Maybe someone more an expert in DWARF could chime in whether this is problematic?
As far as I know it would be "ok" in the sense that it should not break DWARF or cause issues for DWARF consumers which do not know about the tags.

My initial reaction was that placing the annotations on CVR modifiers makes less sense conceptually than placing them on the types proper.
But, I suppose it is a question of what precisely the annotated type is. e.g. in volatile int __attribute__((btf_type_tag("__b"))) b; is it the case that:

  • type tag "__b" applies to the type volatile int (i.e. cvr quals "bind more closely" than the tag), or
  • type tag "__b" and volatile both apply to int (i.e. cvr quals and type tags are "equal")

For all other cases the new "btf:type_tag" format places the annotation as a child of exactly the type that is annotated, so I think the answer determines where the annotation logically belongs in DWARF:

  • If the tag applies to volatile int then it should be a child of the volatile DIE.
  • If the tag applies to int it should be a child of the integer type DIE.

At the moment I can't say that one is more correct than the other, so I guess I have no real objection placing the annotation on the CVR modifier.

  • technical: what is the point where such reordering is easiest to implement? For LLVM / pahole stack the path of least resistance is to modify DWARF generation (but again @dblaikie might disagree). How hard is it to adjust DI generation in GCC in this way?

It is not a difficult change in GCC. Some added complexity but not too much. I have a prototype that seems to work from initial testing.
It is probably simpler than instead modifying the internal transformation to BTF to reorder tags/CVR qualifiers as kernel currently requires.

But I don't think ease of implementation should be a factor unless we are sure the format makes sense.
We are changing the format because the old one was flawed, let's try to make sure we aren't just replacing it with something flawed in a different way because it is easy.

...
I think there are two sides to this:

  • conceptual: is it ok to allow annotations for CVR modifiers?

Maybe someone more an expert in DWARF could chime in whether this is problematic?
As far as I know it would be "ok" in the sense that it should not break DWARF or cause issues for DWARF consumers which do not know about the tags.

My initial reaction was that placing the annotations on CVR modifiers makes less sense conceptually than placing them on the types proper.

I agree conceptually.

But, I suppose it is a question of what precisely the annotated type is. e.g. in volatile int __attribute__((btf_type_tag("__b"))) b; is it the case that:

  • type tag "__b" applies to the type volatile int (i.e. cvr quals "bind more closely" than the tag), or
  • type tag "__b" and volatile both apply to int (i.e. cvr quals and type tags are "equal")

For all other cases the new "btf:type_tag" format places the annotation as a child of exactly the type that is annotated, so I think the answer determines where the annotation logically belongs in DWARF:

  • If the tag applies to volatile int then it should be a child of the volatile DIE.
  • If the tag applies to int it should be a child of the integer type DIE.

I'm aware of three use-cases for "btf_type_tag": "percpu", "user", "rcu" marks in kernel code. First is used for per-cpu variables, second is used to mark pointers to userspace memory, third to mark the data that should be read with RCU lock. For these use-cases there is no difference between e.g. const __user *ptr and __user const *ptr. I don't think the difference was ever thought through.

At the moment I can't say that one is more correct than the other, so I guess I have no real objection placing the annotation on the CVR modifier.

  • technical: what is the point where such reordering is easiest to implement? For LLVM / pahole stack the path of least resistance is to modify DWARF generation (but again @dblaikie might disagree). How hard is it to adjust DI generation in GCC in this way?

It is not a difficult change in GCC. Some added complexity but not too much. I have a prototype that seems to work from initial testing.

Same here, it is a small change in clang code, but I don't like that it has to be handled as a special case.

It is probably simpler than instead modifying the internal transformation to BTF to reorder tags/CVR qualifiers as kernel currently requires.

But I don't think ease of implementation should be a factor unless we are sure the format makes sense.
We are changing the format because the old one was flawed, let's try to make sure we aren't just replacing it with something flawed in a different way because it is easy.

The main issue is that kernel currently expects type tags before modifiers. Even if the kernel code would be modified there is still an issue of backwards compatibility. During kernel build BTF is generated from DWARF and is embedded in a kernel image, it is done by tool named pahole. If we forgo modifiers reordering the embedded BTF would not match format supported by kernel.

So, I assume that reordering _has_ to happen somewhere. Either at DWARF generation stage, or in two places:

  • compiler BPF backend, when BPF programs are compiled;
  • pahole, when BTF is generated from DWARF.

Let me prototype pahole change to see how much of hassle that is, it will take a day or two.

eddyz87 added a comment.

In D143967#4200331 https://reviews.llvm.org/D143967#4200331, @dfaust wrote:

In D143967#4197288 https://reviews.llvm.org/D143967#4197288, @eddyz87 wrote:

...
I think there are two sides to this:

  • conceptual: is it ok to allow annotations for CVR modifiers?

Maybe someone more an expert in DWARF could chime in whether this is problematic?
As far as I know it would be "ok" in the sense that it should not
break DWARF or cause issues for DWARF consumers which do not know
about the tags.

My initial reaction was that placing the annotations on CVR
modifiers makes less sense conceptually than placing them on the
types proper.

I agree conceptually.

That was my initial reaction too. But see below.

But, I suppose it is a question of what precisely the annotated type
is. e.g. in volatile int __attribute__((btf_type_tag("__b"))) b;
is it the case that:

  • type tag "__b" applies to the type volatile int (i.e. cvr quals "bind more closely" than the tag), or
  • type tag "__b" and volatile both apply to int (i.e. cvr quals and type tags are "equal")

For all other cases the new "btf:type_tag" format places the
annotation as a child of exactly the type that is annotated, so I
think the answer determines where the annotation logically belongs
in DWARF:

  • If the tag applies to volatile int then it should be a child of the volatile DIE.
  • If the tag applies to int it should be a child of the integer type DIE.

I think that is the main point. DWARF composes types by chaining DIEs
via DW_AT_type attributes, like in:

const -> volatile -> pointer-to -> int

Conceptually, there are four different types in that chain, not just
one:

  1. const volatile pointer-to int
  2. volatile pointer-to int
  3. pointer-to int
  4. int

Ideally speaking, we would have liked to implement the annotations just
like the qualifiers, so we could have something like:

const -> volatile -> annotated -> pointer-to -> int

That is what would suite us best conceptually speaking. But since that
would break DWARF consumers, we decided to go with a parent-child
relationship instead, having:

const -> volatile -> pointer-to -> int
                         |
                     annotated

Which would be a different situation than, say:

const -> volatile -> pointer-to -> int
  |

annotated

So, I think it only makes sense to have the child annotation node in the
node that starts the type to which the annotation refers. Both
qualifiers and type DIEs are types on that sense.

And now that I think about this. In this conceptual situation:

const -> volatile -> annotated (foo) -> annotated (bar) -> pointer-to -> int

I guess we are encoding it with several DW_TAG_LLVM_annotation children:

const -> volatile -> pointer-to -> int
                         |
               +---------+---------+
               |                   |
          anotated (foo)    annotated (bar)

Basically accumulating/reducing what are conceptually two different
types into one. Since typedefs are explicitly encoded in the chain as
their own node, this situation will only happen when there are several
tags specified consecutively in the source code (this is another reason
why putting the annotation DIEs as children of the qualifiers is
necessary, because otherwise we would be accumulating/reducing
annotation across these nodes and we could end with situations where
annotations get lost.)

When accumulating/reducign tags like above:

Does the ordering matter here? Does it matter? Should we require
keeping the order among the annotation siblings as part of the
DW_TAG_LLVM_annotation specification? Even if we don't, we probably
want to behave the same in both compilers.

In case two identical annotations are applied to the same type, are we
deduplicating them? Should we actually require that? Even if we don't,
we probably want to behave the same in both compilers.

(Note that since DW_TAG_typedef is also a type in the type chain, we
shouldn't worry about

I'm aware of three use-cases for "btf_type_tag": "percpu", "user",
"rcu" marks in kernel code. First is used for per-cpu variables,
second is used to mark pointers to userspace memory, third to mark the
data that should be read with RCU lock. For these use-cases there is
no difference between e.g. const __user *ptr and `__user const
*ptr`. I don't think the difference was ever thought through.

At the moment I can't say that one is more correct than the other,
so I guess I have no real objection placing the annotation on the
CVR modifier.

  • technical: what is the point where such reordering is easiest to

implement? For LLVM / pahole stack the path of least resistance is
to modify DWARF generation (but again @dblaikie might
disagree). How hard is it to adjust DI generation in GCC in this
way?

It is not a difficult change in GCC. Some added complexity but not
too much. I have a prototype that seems to work from initial
testing.

Same here, it is a small change in clang code, but I don't like that it has to be handled as a special case.

It is probably simpler than instead modifying the internal
transformation to BTF to reorder tags/CVR qualifiers as kernel
currently requires.

But I don't think ease of implementation should be a factor unless we are sure the format makes sense.
We are changing the format because the old one was flawed, let's try
to make sure we aren't just replacing it with something flawed in a
different way because it is easy.

The main issue is that kernel currently expects type tags before
modifiers. Even if the kernel code would be modified there is still an
issue of backwards compatibility. During kernel build BTF is generated
from DWARF and is embedded in a kernel image, it is done by tool named
pahole https://github.com/acmel/dwarves. If we forgo modifiers
reordering the embedded BTF would not match format supported by
kernel.

So, I assume that reordering _has_ to happen somewhere. Either at DWARF generation stage, or in two places:

  • compiler BPF backend, when BPF programs are compiled;
  • pahole, when BTF is generated from DWARF.

Let me prototype pahole change to see how much of hassle that is, it will take a day or two.

Repository:

rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION

https://reviews.llvm.org/D143967/new/

https://reviews.llvm.org/D143967

eddyz87 added a comment.

In D143967#4200331 https://reviews.llvm.org/D143967#4200331, @dfaust wrote:

In D143967#4197288 https://reviews.llvm.org/D143967#4197288, @eddyz87 wrote:

...
I think there are two sides to this:

  • conceptual: is it ok to allow annotations for CVR modifiers?

Maybe someone more an expert in DWARF could chime in whether this is problematic?
As far as I know it would be "ok" in the sense that it should not
break DWARF or cause issues for DWARF consumers which do not know
about the tags.

My initial reaction was that placing the annotations on CVR
modifiers makes less sense conceptually than placing them on the
types proper.

I agree conceptually.

That was my initial reaction too. But see below.

But, I suppose it is a question of what precisely the annotated type
is. e.g. in volatile int __attribute__((btf_type_tag("__b"))) b;
is it the case that:

  • type tag "__b" applies to the type volatile int (i.e. cvr quals "bind more closely" than the tag), or
  • type tag "__b" and volatile both apply to int (i.e. cvr quals and type tags are "equal")

For all other cases the new "btf:type_tag" format places the
annotation as a child of exactly the type that is annotated, so I
think the answer determines where the annotation logically belongs
in DWARF:

  • If the tag applies to volatile int then it should be a child of the volatile DIE.
  • If the tag applies to int it should be a child of the integer type DIE.

I think that is the main point. DWARF composes types by chaining DIEs
via DW_AT_type attributes, like in:

const -> volatile -> pointer-to -> int

Conceptually, there are four different types in that chain, not just
one:

  1. const volatile pointer-to int
  2. volatile pointer-to int
  3. pointer-to int
  4. int

Ideally speaking, we would have liked to implement the annotations just
like the qualifiers, so we could have something like:

const -> volatile -> annotated -> pointer-to -> int

That is what would suite us best conceptually speaking. But since that
would break DWARF consumers, we decided to go with a parent-child
relationship instead, having:

const -> volatile -> pointer-to -> int
                         |
                     annotated

Which would be a different situation than, say:

const -> volatile -> pointer-to -> int
  |
annotated

So, I think it only makes sense to have the child annotation node in the
node that starts the type to which the annotation refers. Both
qualifiers and type DIEs are types on that sense.

Ok, I understand your point.
For example if we have:

volatile int __tag1 b;

The tag applies to the type (volatile int) and we have 3 types total:

  1. __tag1 volatile int
  2. volatile int
  3. int

And in DWARF the tag must be child of the qualifier:

var(b) -> volatile -> int
             |
            __tag1

Doing it the other way would be incorrect - if we instead did this:

var(b) -> volatile -> int
                        |
                     __tag1

We encode a different chain of types:

var(b) -> volatile -> __tag1 -> int

Which reflects a different set of types than the ones we actually have:

  1. volatile __tag1 int
  2. __tag1 int
  3. int

So I guess it is clear, conceptually the tags do belong on the qualifiers.

And now that I think about this. In this conceptual situation:

const -> volatile -> annotated (foo) -> annotated (bar) -> pointer-to -> int

I guess we are encoding it with several DW_TAG_LLVM_annotation children:

const -> volatile -> pointer-to -> int
                         |
               +---------+---------+
               |                   |
          anotated (foo)    annotated (bar)

Basically accumulating/reducing what are conceptually two different
types into one. Since typedefs are explicitly encoded in the chain as
their own node, this situation will only happen when there are several
tags specified consecutively in the source code (this is another reason
why putting the annotation DIEs as children of the qualifiers is
necessary, because otherwise we would be accumulating/reducing
annotation across these nodes and we could end with situations where
annotations get lost.)

When accumulating/reducign tags like above:

Does the ordering matter here? Does it matter? Should we require
keeping the order among the annotation siblings as part of the
DW_TAG_LLVM_annotation specification? Even if we don't, we probably
want to behave the same in both compilers.

A quick check suggests order does not matter for DWARF for normal qualifiers:

const volatile int x;
volatile const int y;

Both compilers emit something like this ('x' and 'y' share type):

0x0000001e:   DW_TAG_variable
                DW_AT_name	("x")
                DW_AT_type	(0x00000029 "const volatile int")

0x00000029:   DW_TAG_const_type
                DW_AT_type	(0x0000002e "volatile int")

0x0000002e:   DW_TAG_volatile_type
                DW_AT_type	(0x00000033 "int")

0x00000033:   DW_TAG_base_type
                DW_AT_name	("int")

0x00000037:   DW_TAG_variable
                DW_AT_name	("y")
                DW_AT_type	(0x00000029 "const volatile int")

Interestingly, GCC and LLVM do not behave exactly the same in that in
GCC the DWARF types for both variables are

x, y   -> volatile -> const -> int

while in LLVM (shown above) it is

x, y -> const -> volatile -> int

So I'm not sure we necessarily need both compilers to be exactly the same.

In case two identical annotations are applied to the same type, are we
deduplicating them? Should we actually require that? Even if we don't,
we probably want to behave the same in both compilers.

FWIW, if you write something like:

int __attribute__((btf_type_tag("__c"))) __attribute__((btf_type_tag("__c"))) c;

GCC de-duplicates attributes with the same value, so the tag "__c" will only appear
once. I don't know if this would be easy to change if we wanted to.

(Note that since DW_TAG_typedef is also a type in the type chain, we
shouldn't worry about

Looks like this got cutoff.

Ok, I understand your point.
For example if we have:

volatile int __tag1 b;

The tag applies to the type (volatile int) and we have 3 types total:

  1. __tag1 volatile int
  2. volatile int
  3. int

And in DWARF the tag must be child of the qualifier:

var(b) -> volatile -> int
             |
            __tag1

Doing it the other way would be incorrect - if we instead did this:

var(b) -> volatile -> int
                        |
                     __tag1

We encode a different chain of types:

var(b) -> volatile -> __tag1 -> int

Which reflects a different set of types than the ones we actually have:

  1. volatile __tag1 int
  2. __tag1 int
  3. int

So I guess it is clear, conceptually the tags do belong on the qualifiers.

But that's just an encoding quirk. For the C language "const volatile int *" and "volatile const int *" have the same meaning. In clang AST such qualifiers are represented as bitfields.

If we consider type tag a qualifier, conceptually it would be simpler if ordering would not matter for it as well, so that the following have identical meaning:

  • __tag volatile int
  • volatile __tag int

And now that I think about this. In this conceptual situation:

const -> volatile -> annotated (foo) -> annotated (bar) -> pointer-to -> int

I guess we are encoding it with several DW_TAG_LLVM_annotation children:

const -> volatile -> pointer-to -> int
                         |
               +---------+---------+
               |                   |
          anotated (foo)    annotated (bar)

Basically accumulating/reducing what are conceptually two different
types into one. Since typedefs are explicitly encoded in the chain as
their own node, this situation will only happen when there are several
tags specified consecutively in the source code (this is another reason
why putting the annotation DIEs as children of the qualifiers is
necessary, because otherwise we would be accumulating/reducing
annotation across these nodes and we could end with situations where
annotations get lost.)

When accumulating/reducign tags like above:

Does the ordering matter here? Does it matter? Should we require
keeping the order among the annotation siblings as part of the
DW_TAG_LLVM_annotation specification? Even if we don't, we probably
want to behave the same in both compilers.

A quick check suggests order does not matter for DWARF for normal qualifiers:

const volatile int x;
volatile const int y;

Both compilers emit something like this ('x' and 'y' share type):

0x0000001e:   DW_TAG_variable
                DW_AT_name	("x")
                DW_AT_type	(0x00000029 "const volatile int")

0x00000029:   DW_TAG_const_type
                DW_AT_type	(0x0000002e "volatile int")

0x0000002e:   DW_TAG_volatile_type
                DW_AT_type	(0x00000033 "int")

0x00000033:   DW_TAG_base_type
                DW_AT_name	("int")

0x00000037:   DW_TAG_variable
                DW_AT_name	("y")
                DW_AT_type	(0x00000029 "const volatile int")

Interestingly, GCC and LLVM do not behave exactly the same in that in
GCC the DWARF types for both variables are

x, y   -> volatile -> const -> int

while in LLVM (shown above) it is

x, y -> const -> volatile -> int

So I'm not sure we necessarily need both compilers to be exactly the same.

Unfortunately, for ordering we are also limited by a consumer. In the kernel case the ordering of const/volatile/restrict does not matter, but it does want type tags to come first.

In case two identical annotations are applied to the same type, are we
deduplicating them? Should we actually require that? Even if we don't,
we probably want to behave the same in both compilers.

FWIW, if you write something like:

int __attribute__((btf_type_tag("__c"))) __attribute__((btf_type_tag("__c"))) c;

GCC de-duplicates attributes with the same value, so the tag "__c" will only appear
once. I don't know if this would be easy to change if we wanted to.

LLVM de-duplicates as well.

AFAIK current type tag use cases do not require to distinguish "const tag" from "tag const" and do not need to handle "tag tag" separately from "__tag". So, I think we should stick with simpler semantics.

eddyz87 added a comment.

In D143967#4231517 https://reviews.llvm.org/D143967#4231517, @dfaust wrote:

In D143967#4220233 https://reviews.llvm.org/D143967#4220233, @jemarch wrote:

Ok, I understand your point.
For example if we have:

volatile int __tag1 b;

The tag applies to the type (volatile int) and we have 3 types total:

  1. __tag1 volatile int
  2. volatile int
  3. int

And in DWARF the tag must be child of the qualifier:

var(b) -> volatile -> int
             |
            __tag1

Doing it the other way would be incorrect - if we instead did this:

var(b) -> volatile -> int
                        |
                     __tag1

We encode a different chain of types:

var(b) -> volatile -> __tag1 -> int

Which reflects a different set of types than the ones we actually have:

  1. volatile __tag1 int
  2. __tag1 int
  3. int

So I guess it is clear, conceptually the tags do belong on the qualifiers.

But that's just an encoding quirk. For the C language "const volatile
int *" and "volatile const int *" have the same meaning. In clang AST
such qualifiers are represented as bitfields.

If we consider type tag a qualifier, conceptually it would be simpler
if ordering would not matter for it as well, so that the following
have identical meaning:

  • __tag volatile int
  • volatile __tag int

Makes sense.

But then I think we should allow the tags to be anywhere in the chain in
the DWARF, and not necessarily in the qualified type. For example
given:

typedef const int bar;
volatile bar __tag1 foo;

The resulting DWARF type chain is:

volatile -> typedef (bar) -> const -> int

IMO we should allow __tag1 to be a child of any of the elements of the
chain, like:

volatile -> typedef (bar) -> const -> int
                 |
               __tag1

or

volatile -> typedef (bar) -> const -> int
                                       |
                                    __tag1

or

volatile -> typedef (bar) -> const -> int
                               |
                            __tag1

or

volatile -> typedef (bar) -> const -> int
   |
 __tag1

would be all accepted and equivalent. Also Faust noted that GCC
sometimes optimizes out the typedef nodes, so we need to be able to
place the tags (in the internal representatinos) in the typedef'ed type
instead.

And now that I think about this. In this conceptual situation:

const -> volatile -> annotated (foo) -> annotated (bar) -> pointer-to -> int

I guess we are encoding it with several DW_TAG_LLVM_annotation children:

const -> volatile -> pointer-to -> int
                         |
               +---------+---------+
               |                   |
          anotated (foo)    annotated (bar)

Basically accumulating/reducing what are conceptually two different
types into one. Since typedefs are explicitly encoded in the chain as
their own node, this situation will only happen when there are several
tags specified consecutively in the source code (this is another reason
why putting the annotation DIEs as children of the qualifiers is
necessary, because otherwise we would be accumulating/reducing
annotation across these nodes and we could end with situations where
annotations get lost.)

When accumulating/reducign tags like above:

Does the ordering matter here? Does it matter? Should we require
keeping the order among the annotation siblings as part of the
DW_TAG_LLVM_annotation specification? Even if we don't, we probably
want to behave the same in both compilers.

A quick check suggests order does not matter for DWARF for normal qualifiers:

const volatile int x;
volatile const int y;

Both compilers emit something like this ('x' and 'y' share type):

0x0000001e:   DW_TAG_variable
                DW_AT_name	("x")
                DW_AT_type	(0x00000029 "const volatile int")

0x00000029:   DW_TAG_const_type
                DW_AT_type	(0x0000002e "volatile int")

0x0000002e:   DW_TAG_volatile_type
                DW_AT_type	(0x00000033 "int")

0x00000033:   DW_TAG_base_type
                DW_AT_name	("int")

0x00000037:   DW_TAG_variable
                DW_AT_name	("y")
                DW_AT_type	(0x00000029 "const volatile int")

Interestingly, GCC and LLVM do not behave exactly the same in that in
GCC the DWARF types for both variables are

x, y   -> volatile -> const -> int

while in LLVM (shown above) it is

x, y -> const -> volatile -> int

So I'm not sure we necessarily need both compilers to be exactly the same.

Unfortunately, for ordering we are also limited by a consumer. In the
kernel case the ordering of const/volatile/restrict does not matter,
but it does want type tags to come first.

In case two identical annotations are applied to the same type, are we
deduplicating them? Should we actually require that? Even if we don't,
we probably want to behave the same in both compilers.

FWIW, if you write something like:

int __attribute__((btf_type_tag("__c"))) __attribute__((btf_type_tag("__c"))) c;

GCC de-duplicates attributes with the same value, so the tag "__c" will only appear
once. I don't know if this would be easy to change if we wanted to.

LLVM de-duplicates as well.

AFAIK current type tag use cases do not require to distinguish "const
tag" from "tag const" and do not need to handle "tag tag"
separately from "__tag". So, I think we should stick with simpler
semantics.

Repository:

rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION

https://reviews.llvm.org/D143967/new/

https://reviews.llvm.org/D143967

eddyz87 added a comment.
...
If we consider type tag a qualifier, conceptually it would be simpler
if ordering would not matter for it as well, so that the following
have identical meaning:

  • __tag volatile int
  • volatile __tag int

Makes sense.

But then I think we should allow the tags to be anywhere in the chain in
the DWARF, and not necessarily in the qualified type. For example
given:

typedef const int bar;
volatile bar __tag1 foo;

The resulting DWARF type chain is:

volatile -> typedef (bar) -> const -> int

IMO we should allow __tag1 to be a child of any of the elements of the
chain, like:

volatile -> typedef (bar) -> const -> int
                 |
               __tag1

or

volatile -> typedef (bar) -> const -> int
                                       |
                                    __tag1

or

volatile -> typedef (bar) -> const -> int
                               |
                            __tag1

or

volatile -> typedef (bar) -> const -> int
   |
 __tag1

would be all accepted and equivalent. Also Faust noted that GCC
sometimes optimizes out the typedef nodes, so we need to be able to
place the tags (in the internal representatinos) in the typedef'ed type
instead.

Well, it's a logical consequence, I guess.
In practice I'm going to emit it like below:

volatile -> const -> int
                      |
                   __tag1

But please remember that in BTF it would have to be encoded like this:

__tag1 -> volatile -> const -> int

(that's how kernel expects it, we can change the kernel but will loose backwards compatibility).
For DWARF -> BTF transformation in pahole I will make the necessary modifications, but BTF might also be emitted C->BPF backend.

@yonghong-song, @anakryiko, what do you think?

eddyz87 added a subscriber: anakryiko.
eddyz87 added a comment.

In D143967#4231746 https://reviews.llvm.org/D143967#4231746, @jemarch wrote:

eddyz87 added a comment.
...
If we consider type tag a qualifier, conceptually it would be simpler
if ordering would not matter for it as well, so that the following
have identical meaning:

  • __tag volatile int
  • volatile __tag int

Makes sense.

But then I think we should allow the tags to be anywhere in the chain in
the DWARF, and not necessarily in the qualified type. For example
given:

typedef const int bar;
volatile bar __tag1 foo;

The resulting DWARF type chain is:

volatile -> typedef (bar) -> const -> int

IMO we should allow __tag1 to be a child of any of the elements of the
chain, like:

volatile -> typedef (bar) -> const -> int
                 |
               __tag1

or

volatile -> typedef (bar) -> const -> int
                                       |
                                    __tag1

or

volatile -> typedef (bar) -> const -> int
                               |
                            __tag1

or

volatile -> typedef (bar) -> const -> int
   |
 __tag1

would be all accepted and equivalent. Also Faust noted that GCC
sometimes optimizes out the typedef nodes, so we need to be able to
place the tags (in the internal representatinos) in the typedef'ed type
instead.

Well, it's a logical consequence, I guess.
In practice I'm going to emit it like below:

volatile -> const -> int
                      |
                   __tag1

We can probably do the same in GCC, to be consistent.

But please remember that in BTF it would have to be encoded like this:

__tag1 -> volatile -> const -> int

(that's how kernel expects it, we can change the kernel but will loose
backwards compatibility). For DWARF -> BTF transformation in pahole
I will make the necessary modifications, but BTF might also be emitted
C->BPF backend.

We can do the same thing when we generate BTF directly from GCC.
Faust, do you agree with the above?

@yonghong-song, @anakryiko, what do you think?

Repository:

rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION

https://reviews.llvm.org/D143967/new/

https://reviews.llvm.org/D143967

eddyz87 added a subscriber: anakryiko.
eddyz87 added a comment.

In D143967#4231746 https://reviews.llvm.org/D143967#4231746, @jemarch wrote:

eddyz87 added a comment.
...
If we consider type tag a qualifier, conceptually it would be simpler
if ordering would not matter for it as well, so that the following
have identical meaning:

  • __tag volatile int
  • volatile __tag int

Makes sense.

But then I think we should allow the tags to be anywhere in the chain in
the DWARF, and not necessarily in the qualified type. For example
given:

typedef const int bar;
volatile bar __tag1 foo;

The resulting DWARF type chain is:

volatile -> typedef (bar) -> const -> int

IMO we should allow __tag1 to be a child of any of the elements of the
chain, like:

volatile -> typedef (bar) -> const -> int
                 |
               __tag1

or

volatile -> typedef (bar) -> const -> int
                                       |
                                    __tag1

or

volatile -> typedef (bar) -> const -> int
                               |
                            __tag1

or

volatile -> typedef (bar) -> const -> int
   |
 __tag1

would be all accepted and equivalent. Also Faust noted that GCC
sometimes optimizes out the typedef nodes, so we need to be able to
place the tags (in the internal representatinos) in the typedef'ed type
instead.

Well, it's a logical consequence, I guess.
In practice I'm going to emit it like below:

volatile -> const -> int
                      |
                   __tag1

We can probably do the same in GCC, to be consistent.

But please remember that in BTF it would have to be encoded like this:

__tag1 -> volatile -> const -> int

(that's how kernel expects it, we can change the kernel but will loose
backwards compatibility). For DWARF -> BTF transformation in pahole
I will make the necessary modifications, but BTF might also be emitted
C->BPF backend.

We can do the same thing when we generate BTF directly from GCC.
Faust, do you agree with the above?

OK, that's fine with me.

eddyz87 added a subscriber: anakryiko.
eddyz87 added a comment.

In D143967#4231746 https://reviews.llvm.org/D143967#4231746, @jemarch wrote:

eddyz87 added a comment.
...
If we consider type tag a qualifier, conceptually it would be simpler
if ordering would not matter for it as well, so that the following
have identical meaning:

  • __tag volatile int
  • volatile __tag int

Makes sense.

But then I think we should allow the tags to be anywhere in the chain in
the DWARF, and not necessarily in the qualified type. For example
given:

typedef const int bar;
volatile bar __tag1 foo;

The resulting DWARF type chain is:

volatile -> typedef (bar) -> const -> int

IMO we should allow __tag1 to be a child of any of the elements of the
chain, like:

volatile -> typedef (bar) -> const -> int
                 |
               __tag1

or

volatile -> typedef (bar) -> const -> int
                                       |
                                    __tag1

or

volatile -> typedef (bar) -> const -> int
                               |
                            __tag1

or

volatile -> typedef (bar) -> const -> int
   |
 __tag1

would be all accepted and equivalent. Also Faust noted that GCC
sometimes optimizes out the typedef nodes, so we need to be able to
place the tags (in the internal representatinos) in the typedef'ed type
instead.

Well, it's a logical consequence, I guess.
In practice I'm going to emit it like below:

volatile -> const -> int
                      |
                   __tag1

But please remember that in BTF it would have to be encoded like this:

__tag1 -> volatile -> const -> int

(that's how kernel expects it, we can change the kernel but will loose
backwards compatibility).

Thinking about typedef, some C cases may be problematic if we adopt the
flexible rule we are discussing:

typedef int bar;
const bar __tag1 var1;
bar var2;

DWARF:

var1 -> const -> typedef (bar) -> int
         |           ^
        __tag1       |
                     |
var2 ----------------+

If we would allow __tag1 to be "moved" to either the typedef DWARF node
or the node for int like this:

DWARF:

var1 -> const -> typedef (bar) -> int
                     ^             |
                     |          __tag1
var2 ----------------+

Then the tag1 would also apply to var2's type. But I would say in the
C snippet above
tag1 should apply to the type of var1, but not
to the type of var2.

This makes me think we shouldn't allow tags to "jump" over typedef
nodes, in neither DWARF nor BTF.

PS: I am a bit concerned with the fact that the kernel's interpretation

of BTF is so rigit in this case as to assume C's type system
semantics when it comes to type qualifiers.  Other languages may be
coming to the BPF world (Rust, for example) and in these languages
the ordering of type qualifiers may actually matter.  There is a
reason why DWARF encodes qualifiers as explicit nodes in the type
chain rather than as attributes of the type nodes.

Thinking about typedef, some C cases may be problematic if we adopt the
flexible rule we are discussing:

typedef int bar;
const bar __tag1 var1;
bar var2;

DWARF:

var1 -> const -> typedef (bar) -> int
         |           ^
        __tag1       |
                     |
var2 ----------------+

If we would allow __tag1 to be "moved" to either the typedef DWARF node
or the node for int like this:

DWARF:

var1 -> const -> typedef (bar) -> int
                     ^             |
                     |          __tag1
var2 ----------------+

Then the tag1 would also apply to var2's type. But I would say in the
C snippet above
tag1 should apply to the type of var1, but not
to the type of var2.

I'm not sure I understand, the DWARF #2 is not a valid representation of the program for all the reasons you write.
Current LLVM implementation should generate DWARF #1 in this case.
If some tooling applies such tags movement it should also apply appropriate copying of tags, e.g. it should transform DWARF like this:

var1 -> const -> typedef (bar) -> int
                                   |
                                __tag1

var2 ----------> typedef (bar) -> int

(and it is what needs to be implemented in pahole to get BTF qualifiers ordering expected by kernel, but the move is in the opposite direction).

PS: I am a bit concerned with the fact that the kernel's interpretation

of BTF is so rigit in this case as to assume C's type system
semantics when it comes to type qualifiers.  Other languages may be
coming to the BPF world (Rust, for example) and in these languages
the ordering of type qualifiers may actually matter.  There is a
reason why DWARF encodes qualifiers as explicit nodes in the type
chain rather than as attributes of the type nodes.

Need to read a bit about Rust, can't comment right now.

eddyz87 added a comment.

In D143967#4232089 https://reviews.llvm.org/D143967#4232089, @jemarch wrote:

Thinking about typedef, some C cases may be problematic if we adopt the
flexible rule we are discussing:

typedef int bar;
const bar __tag1 var1;
bar var2;

DWARF:

var1 -> const -> typedef (bar) -> int
         |           ^
        __tag1       |
                     |
var2 ----------------+

If we would allow __tag1 to be "moved" to either the typedef DWARF node
or the node for int like this:

DWARF:

var1 -> const -> typedef (bar) -> int
                     ^             |
                     |          __tag1
var2 ----------------+

Then the tag1 would also apply to var2's type. But I would say in the
C snippet above
tag1 should apply to the type of var1, but not
to the type of var2.

I'm not sure I understand, the DWARF #2 is not a valid representation
of the program for all the reasons you write. Current LLVM
implementation should generate DWARF #1 in this case.

Yes that was my point: tags can cross qualifiers and still keep C
language semantics, but not (all) typedefs.

If some tooling applies such tags movement it should also apply
appropriate copying of tags, e.g. it should transform DWARF like this:

var1 -> const -> typedef (bar) -> int
                                   |
                                __tag1

var2 ----------> typedef (bar) -> int

(and it is what needs to be implemented in pahole to get BTF
qualifiers ordering expected by kernel, but the move is in the
opposite direction).

So the kernel expects tags to precede typedefs as well as qualifiers?
i.e. given this DWARF:

var1 -> const -> typedef (bar) -> int
                     ^             |
                     |           __tag1
                     |
var2 ----------------+

We have to transform to these two BTF type chains:

var1 -> __tag1 -> const -> typedef (bar) -> int
                                ^
                                |
var2 -> __tag1 -----------------+

Correct?

PS: I am a bit concerned with the fact that the kernel's interpretation

of BTF is so rigit in this case as to assume C's type system
semantics when it comes to type qualifiers.  Other languages may be
coming to the BPF world (Rust, for example) and in these languages
the ordering of type qualifiers may actually matter.  There is a
reason why DWARF encodes qualifiers as explicit nodes in the type
chain rather than as attributes of the type nodes.

Need to read a bit about Rust, can't comment right now.

Repository:

rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION

https://reviews.llvm.org/D143967/new/

https://reviews.llvm.org/D143967

tbaeder removed a subscriber: tbaeder.Mar 30 2023, 6:39 AM

If some tooling applies such tags movement it should also apply
appropriate copying of tags, e.g. it should transform DWARF like this:

var1 -> const -> typedef (bar) -> int
                                   |
                                __tag1

var2 ----------> typedef (bar) -> int

(and it is what needs to be implemented in pahole to get BTF
qualifiers ordering expected by kernel, but the move is in the
opposite direction).

So the kernel expects tags to precede typedefs as well as qualifiers?
i.e. given this DWARF:

var1 -> const -> typedef (bar) -> int
                     ^             |
                     |           __tag1
                     |
var2 ----------------+

We have to transform to these two BTF type chains:

var1 -> __tag1 -> const -> typedef (bar) -> int
                                ^
                                |
var2 -> __tag1 -----------------+

Correct?

This is controlled by the following code (btf.c:btf_check_type_tags()):

https://elixir.bootlin.com/linux/latest/source/kernel/bpf/btf.c#L5349

It uses btf_type_is_modifier() utility function, which treats typedef as a modifier.

So, in theory the transformation moving tags past typedef is necessary. On the other hand, such transformation is not applied now, and this does not cause issues. Which suggests that there are no cases in practice where type tag follows typedef (and thus, this is not required for backwards compatibility).

Moving type tags cross typedef feels wrong.

Moving type tags past typedefs would also make C code reconstruction from BTF incomplete. Such reconstruction is used now by e.g. bpftool to create a vmlinux.h header with all kernel type definitions. So, I think type tags should not be moved past typedefs.

Moving type tags past typedefs would also make C code reconstruction from BTF incomplete. Such reconstruction is used now by e.g. bpftool to create a vmlinux.h header with all kernel type definitions. So, I think type tags should not be moved past typedefs.

Yes I agree. Tags should not be moved past typedefs.
In the GCC implementation we are not moving tags past typedefs.

It uses btf_type_is_modifier() utility function, which treats typedef as a modifier.

So, in theory the transformation moving tags past typedef is necessary. On the other hand, such transformation is not applied now, and this does not cause issues. Which suggests that there are no cases in practice where type tag follows typedef (and thus, this is not required for backwards compatibility).

It seems to me then that the appropriate fix is to remove the expectation of this transformation from the verifier (for the reasons above).

If it is not important for backwards compatibility, then I guess it should not be problematic to change in the kernel? (I say, as not a kernel hacker...)

I find it odd that BTF_KIND_TYPEDEF is included in btf_type_is_modifier(), seems like the expectation of this transformation is almost an unintended side effect of that.

Related to btf_type_is_modifier() issue, Not that depending on call site, sometimes typedef is skipped and sometimes are not. So we could keep btf_type_is_modifier() as is and modify the call size to not skip typedef in specific places. Or we could remove typedef from the function and add a 'or' in call side if typedef needs to be skipped. The following are a few examples in kernel/bpf/btf.c:

/* Similar to btf_type_skip_modifiers() but does not skip typedefs. */
static const struct btf_type *btf_type_skip_qualifiers(const struct btf *btf,

u32 id)

{

const struct btf_type *t = btf_type_by_id(btf, id);

while (btf_type_is_modifier(t) &&
       BTF_INFO_KIND(t->info) != BTF_KIND_TYPEDEF) {
        t = btf_type_by_id(btf, t->type);
}      
       
return t;

}

So we can add necessary changes (in specific context) to make the btf_type_tag approach with this patch work without breaking backward compatibility.

I see the following in the Summary:

Type tag for CVR modifier type

C:

volatile int __attribute__((btf_type_tag("__b"))) b;

DWARF:

0x31:   DW_TAG_variable
          DW_AT_name      ("b")
          DW_AT_type      (0x3c "volatile int")

0x3c:   DW_TAG_volatile_type
          DW_AT_type      (0x45 "int")

0x41:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf:type_tag")
            DW_AT_const_value     ("__b")

Basically, the btf_type_tag is put under 'volatile' type in dwarf. Is this what gcc also agrees with?

eddyz87 planned changes to this revision.May 2 2023, 9:31 AM

I see the following in the Summary:

Type tag for CVR modifier type

C:

volatile int __attribute__((btf_type_tag("__b"))) b;

DWARF:

0x31:   DW_TAG_variable
          DW_AT_name      ("b")
          DW_AT_type      (0x3c "volatile int")

0x3c:   DW_TAG_volatile_type
          DW_AT_type      (0x45 "int")

0x41:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf:type_tag")
            DW_AT_const_value     ("__b")

Basically, the btf_type_tag is put under 'volatile' type in dwarf. Is this what gcc also agrees with?

We decided against it but changes to force all type tags after CVR modifiers are not implemented, should probably add a wip tag.

eddyz87 updated this revision to Diff 522338.May 15 2023, 2:11 PM
eddyz87 edited the summary of this revision. (Show Details)

Changes to avoid attaching type tags to DWARF derived types for const/volatile/restrict qualifiers.

@dblaikie could you help take a look at this patch? Similar to https://reviews.llvm.org/D143966, this patch is the clang frontend change to support new btf_type_tag format, as we discussed and agreed with gcc community (https://lore.kernel.org/bpf/87r0w9jjoq.fsf@oracle.com/). Please let us know if you have any questions. Thanks!

Changes to avoid attaching type tags to DWARF derived types for const/volatile/restrict qualifiers.

I just tested this locally and can confirm it LGTM in terms of implementing the DWARF format we've been discussing and agreed upon.
Also compared against my WIP patches implementing the same in GCC, and looks like it is all consistent (apart from one known bug in my patches)

Changes to avoid attaching type tags to DWARF derived types for const/volatile/restrict qualifiers.

I just tested this locally and can confirm it LGTM in terms of implementing the DWARF format we've been discussing and agreed upon.
Also compared against my WIP patches implementing the same in GCC, and looks like it is all consistent (apart from one known bug in my patches)

Heads-up, it currently fails on the following example:

#define __rcu __attribute__((btf_type_tag("rcu")))

struct cred;
const struct cred *cred;
void commit_creds()
{
  cred = (typeof(*cred) __rcu *)(0);
}

I'm working on the fix.
Also I missed the fact that D145891 (actual BTF generation) needs adjustment as well.

eddyz87 updated this revision to Diff 523233.May 17 2023, 6:12 PM

Fix for case when CVR qualifier is behind type ignored by UnwrapTypeForDebugInfo().
E.g. for the following example:

const int *foo;
typeof(*foo) __attribute__((btf_type_tag("tag"))) buz;
eddyz87 edited the summary of this revision. (Show Details)May 23 2023, 2:53 PM

Hi @dblaikie ,

Could you please take a look at this revision (and the previous one in the stack, D143966) or suggest some other reviewer familiar with this part of the code base?

I haven't looked closely at this, but from a vague/quick reading, it sounds like this is creating annotations on certain type DINodes, but then moving them around to different types? It'd be nice if we could avoid that/create them in the right place in the first place.

I haven't looked closely at this, but from a vague/quick reading, it sounds like this is creating annotations on certain type DINodes, but then moving them around to different types? It'd be nice if we could avoid that/create them in the right place in the first place.

Hi @dblaikie, thank you for taking a look!

I assume you refer to the "placeholder" nodes created for annotated types.
I added these to handle the following example:

#define __tag1 __attribute__((btf_type_tag("tag1")))

struct st {
  struct st __tag1 *self;
} g;

Here I need to create two DICompositeType instances for struct st:

  • one with btf_type_tag annotations;
  • one without btf_type_tag annotations.

The AST for struct st __tag1 * looks as follows:

PointerType 0x55c364ead270 'struct st  __attribute__((btf_type_tag("tag1")))*'
`-BTFTagAttributedType 0x55c364ead210
      'struct st __attribute__((btf_type_tag("tag1")))' sugar
  `-ElaboratedType 0x55c364ead1a0 'struct st' sugar
    `-RecordType 0x55c364ead120 'struct st'
      `-Record 0x55c364ead0a0 'st'

(Note the BTFTagAttributedType node).

W/o type tags this examples is processed as follows:

  • getOrCreateType() is called for struct st
    • CreateType(const RecordType *) is called for struct st
      • CreateTypeDefinition(const RecordType *) is called for struct st
        • getOrCreateLimitedType(const RecordType *) is called for struct st
          • CreateLimitedType(const RecordType *) is called for struct st, which returns a TempDICompositeType (let's call it T)
          • TypeCache for struct st is set to T
        • members of struct st are processed:
          • getOrCreateType() is called for struct st *
            • CreateType(const PointerType *) is called for struct st *
              • getOrCreateType() is called for struct st, which returns T, because of TypeCache state.
        • all uses of T are replaced by complete type.

With type tags processing is similar, but pointee type for struct st * is actually (BTFTagAttributedType (RecordType 'struct st')).
The CreateType(const BTFTagAttributedType *) creates an instance of TempDIDerivedType with correct annotations (the placeholder) and base type corresponding to struct st.
Then, at CGDebugInfo::finalize() the placeholder is removed:

  • the copy of the base type with correct annotations is created;
  • all used of placeholder are replaced by this copy.

So, because at the point when CreateType(BTFTagAttributedType *) is called, the base type is not guaranteed to be complete yet there is a need to create some kind of temporary node.

An alternative to my current implementation would be creation of TempDICompositeType nodes, but this would require passing annotations down from CreateType(BTFTagAttributedType *) to CreateLimitedType(const RecordType *) via getOrCreateType() / CreateTypeNode(). But this seems very intrusive for the sake of a narrow feature used by a single backend.

Thus, I think that current implementation is better, because all btf_decl_tag annotations processing is grouped in two points:

  • CreateType(const BTFTagAttributedType *);
  • copyAnnotations() called from finalize().
eddyz87 updated this revision to Diff 533746.Jun 22 2023, 1:05 PM
eddyz87 edited the summary of this revision. (Show Details)

Rebase: replace usage of removeLocalCVRQualifiers by removeLocalFastQualifiers.

Hi @dblaikie ,

Could you please take a look at my reply here?
It would be really great if this could be sorted out before LLVM 17 release...

Hi @dblaikie,

Could you please take a look at my reply?

I haven't looked closely at this, but from a vague/quick reading, it sounds like this is creating annotations on certain type DINodes, but then moving them around to different types? It'd be nice if we could avoid that/create them in the right place in the first place.

Hi @dblaikie, thank you for taking a look!

I assume you refer to the "placeholder" nodes created for annotated types.
...

Yeah, seems unfortunate to break the abstractions and have to add members and finalizing for such a narrow feature too. Though I appreciate the complications with adding this during type building too.

So you get some bunch of annotations from the BTFTagAttributedType, then you build the underlying type (which might already be built/needed/fine because it's used without attributes in other places/needs to exist independently) - and then at the end you copy any of these types that are needed and put the annotations on them? Does the BTFTagAttributedType always have to apply to the immediate type that's going to be modified/mutated to have the attributes applied to it directly? Is the set of types that may have these attributes/annotations added small/closed? (struct types, void, anything else? could you add tags to int __tag *, for instance and get a DW_TAG_base_type for "int" with annotations on it?

If it's really generic/can apply to any type - but always the /immediate/ type (I guess with the special handling you've got to skip applying it to the DW_TAG_const_type, etc)...

What if you skipped getOrCreateType - and called into the CreateType dispatch below that? Since you never want a cached instance of a type, right? You want to create a new copy of the type you could then apply annotations to.

Except, I guess, in the instance you showed, where the type is being completed - can't go and create another one, because we're part-way through building this one... hmm, maybe you can, though? What happens if we start making a totally distinct copy of that same type? I guess there's some map that contains such partially completed types, and that map would get confused/break if we tried to build two types for the same type without adding in some extra key goo that contained the annotations... - maybe that wouldn't be too invasive, though?

Hmm - I guess what to put in the type cache would be an interesting question. Though that does raise other questions.

given this code:

#define __tag1 __attribute__((btf_type_tag("tag1")))

struct st {
  struct st *self;
};
struct st __tag1 x;

What's the expected DWARF here? x's type is a the attributed/annotated st type, but then does the self pointer inside there point to the attributed type or the unattributed type?

Maybe what you've got is the best of some bad options, but I'm not sure/trying to think it through. (@aprantl wouldn't mind your perspective - if it's too much to consume easily, let me know and I can make a best-effort at summarizing, maybe even better over a video chat if you've got a few minutes)

Hi @dblaikie

Thank you for the detailed response!

So you get some bunch of annotations from the BTFTagAttributedType, then you build the underlying type (which might already be built/needed/fine because it's used without attributes in other places/needs to exist independently) - and then at the end you copy any of these types that are needed and put the annotations on them?

Yes.

Does the BTFTagAttributedType always have to apply to the immediate type that's going to be modified/mutated to have the attributes applied to it directly? Is the set of types that may have these attributes/annotations added small/closed? (struct types, void, anything else? could you add tags to int __tag *, for instance and get a DW_TAG_base_type for "int" with annotations on it?

We decided against applying it to const/volatile/restrict, but it can be applied to struct types, void, "base" types ("int"), pointers and typedefs.

If it's really generic/can apply to any type - but always the /immediate/ type (I guess with the special handling you've got to skip applying it to the DW_TAG_const_type, etc)...

What if you skipped getOrCreateType - and called into the CreateType dispatch below that? Since you never want a cached instance of a type, right? You want to create a new copy of the type you could then apply annotations to.

Except, I guess, in the instance you showed, where the type is being completed - can't go and create another one, because we're part-way through building this one... hmm, maybe you can, though? What happens if we start making a totally distinct copy of that same type? I guess there's some map that contains such partially completed types, and that map would get confused/break if we tried to build two types for the same type without adding in some extra key goo that contained the annotations... - maybe that wouldn't be too invasive, though?

I made a prototype of such change, available here. The interesting function is CGDebugInfo::CreateType(const BTFTagAttributedType *Ty, ...).
Pseudo code for old version (this revision):

def CGDebugInfo::CreateType(const BTFTagAttributedType *Ty, ...):
  WrappedTy, Annotations = collect annotations are base type from Ty
  WrappedDI = getOrCreateType(WrappedTy, ...)
  Placeholder = create temporary placeholder node with
                references to WrappedDI and Annotations
  AnnotationPlaceholders.push_back(Placeholder)
  return Placeholder
  
def CGDebugInfo::finalize():
  ...
  for Placeholder in AnnotationPlaceholders:
    T = Placeholder.WrappedDI.Clone()
    T.replaceAnnotations(Placeholder.Annotations)
    replace Placeholder with T

Pseudo code for new version (link to my github from above):

def CGDebugInfo::CreateType(const BTFTagAttributedType *Ty, ...):
  WrappedTy, Annotations = collect annotations are base type from Ty
  Placeholder = empty temporary node
  TypeCache[Ty] = Placeholder
  WrappedDI = CreateType(UnwrapTypeForDebugInfo(WrappedTy), ...)
  T = WrappedDI.Clone()
  T.replaceAnnotations(Annotations)
  replace Placeholder with T
  return T

Comparing the "old" and "new" versions "old" appears to be less hacky to me: it does not break getOrCreateType() abstraction and it does not process members two times for circular types.

Note that this diff is important for the new version, without it CreateType won't actually create a new object in the circular type case. I checked call-graph for CreateType(RecordType *) and it looks like it is only invoked from getOrCreateType(), so getTypeOrNull() within it is not actually needed. It was added in commit 3b1cc9b85846 back in 2013. Test cases for that commit are passing.

given this code:

#define __tag1 __attribute__((btf_type_tag("tag1")))

struct st {
  struct st *self;
};
struct st __tag1 x;

What's the expected DWARF here? x's type is a the attributed/annotated st type, but then does the self pointer inside there point to the attributed type or the unattributed type?

The self should have type w/o tag, the x should have type with tag.

Maybe what you've got is the best of some bad options, but I'm not sure/trying to think it through. (@aprantl wouldn't mind your perspective - if it's too much to consume easily, let me know and I can make a best-effort at summarizing, maybe even better over a video chat if you've got a few minutes)

I appreciate the effort, if you and @aprantl decide that the call is necessary I can attend if you need me.

Hi @dblaikie,

Could you please take a look at my response?

Hi @dblaikie,

If you have some time, could you please take a look at my response?

Hi @dblaikie,

If you have some time, could you please take a look at my response?
It would be great if decision on this revision could be reached before next LLVM release.
The overall design of this thing was discussed with GCC BPF team and we really would like to have identical implementation for LLVM and GCC.

dblaikie accepted this revision.Nov 8 2023, 10:54 AM

Fair enough - all seems a bit unfortunate to be pushing these attributes through to places they don't technically apply to (both complicates the implementation, and might be confusing to users).

Thanks for trying the prototype - not clear what the right design is, so let's go with what you've got here.

clang/lib/CodeGen/CGDebugInfo.cpp
1230

use empty() rather than size() == 0

1233

probably write this as if (!WrappedDI)

This revision is now accepted and ready to land.Nov 8 2023, 10:54 AM
eddyz87 updated this revision to Diff 558086.Nov 13 2023, 2:57 PM
eddyz87 marked 2 inline comments as done.

Rebase, fixes for review feedback.

Fair enough - all seems a bit unfortunate to be pushing these attributes through to places they don't technically apply to (both complicates the implementation, and might be confusing to users).

Thank you for taking a look!
I'll wait till the last part of the commit stack (BPF-specific) is ironed out and apply all three patches.

eddyz87 added a subscriber: ast.Mon, Dec 4, 7:35 AM

Hi @dblaikie,

While doing some additional testing for this feature I found that one of the tools used in Linux kernel build, namely pahole, does not handle newly generated btf:type_tag annotations well. (pahole reads DWARF and emits BTF, a special debug format for data structures used in kernel).

Specifically, there are two issues:

  1. This revision uses DW_TAG_unspecified_type for void with annotations.
  2. This revision attaches annotations to DW_TAG_base_type.

These issues either preclude kernel compilation for pahole versions 1.23 and 1.24, or emit lots of warnings for pahole version 1.25 (current). Examples of error messages are at the bottom of this message.

After discussing this issue with BPF sub-system maintainers (@ast, @yonghong-song) and given that gcc had not landed support for this feature yet, we decided that it would be preferable to keep this feature in LLVM but use hidden flag to enable it. This would allow to:

  • keep things working as-is at the moment;
  • test changes to pahole using clang main and eventually release pahole 1.26 with necessary changes;
  • bump minimal pahole requirement for kernel builds to 1.26;
  • eventually remove hidden flag from LLVM and switch to always generating btf:type_tag annotation instead of btf_type_tag.

The drawback is that after last step it won't be possible to compile kernel with BTF support using old pahole and new clang.

The changes on top of currently approved revision are not big, the diff is at the bottom of this message. Temporarily, I pushed updated branch to my github.

Given that there are very few hidden options in clang/lib/CodeGen/, my understanding is that this change of course might be controversial.
Would you agree to such change to this revision?


Proposed diff on top of currently approved changes.

diff
diff --git a/clang/lib/CodeGen/CGDebugInfo.cpp b/clang/lib/CodeGen/CGDebugInfo.cpp
index 91a019027a35..5b4755821b0d 100644
--- a/clang/lib/CodeGen/CGDebugInfo.cpp
+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -56,6 +56,16 @@
 using namespace clang;
 using namespace clang::CodeGen;
 
+// Temporarily hide new format for btf_type_tags / DW_TAG_LLVM_annotation
+// behind an option to allow transitory period for tooling dependent on
+// this annotation. The goal is to remove this flag after transitory period.
+static llvm::cl::opt<bool> BTFTypeTagV2(
+    "btf-type-tag-v2", llvm::cl::Hidden,
+    llvm::cl::desc("For __attribute__((btf_type_tag(...))) generate "
+                   "DW_TAG_LLVM_annotation tags with DW_AT_name 'btf:type_tag' "
+                   "attached to annotated type itself"),
+    llvm::cl::init(false));
+
 static uint32_t getTypeAlignIfRequired(const Type *Ty, const ASTContext &Ctx) {
   auto TI = Ctx.getTypeInfo(Ty);
   return TI.isAlignRequired() ? TI.Align : 0;
@@ -1265,6 +1275,9 @@ llvm::DIType *CGDebugInfo::CreateType(const BTFTagAttributedType *Ty,
   auto WrappedTy = collectBTFTypeTagAnnotations(
       CGM.getLLVMContext(), DBuilder, Annotations, Ty, "btf:type_tag");
 
+  if (!BTFTypeTagV2 || Annotations.empty())
+    return getOrCreateType(WrappedTy, Unit);
+
   // After discussion with GCC BPF team in [1] it was decided to avoid
   // attaching BTF type tags to const/volatile/restrict DWARF DIEs.
   // So, strip qualifiers from WrappedTy and apply those to a final
@@ -1276,9 +1289,6 @@ llvm::DIType *CGDebugInfo::CreateType(const BTFTagAttributedType *Ty,
   WrappedTy.removeLocalFastQualifiers(Qualifiers::CVRMask);
 
   llvm::DIType *WrappedDI = getOrCreateType(WrappedTy, Unit);
-  if (Annotations.empty())
-    return WrappedDI;
-
   if (!WrappedDI)
     WrappedDI = DBuilder.createUnspecifiedType("void");
 
@@ -1341,8 +1351,8 @@ llvm::DIType *CGDebugInfo::CreatePointerLikeType(llvm::dwarf::Tag Tag,
           CGM.getTypes().getTargetAddressSpace(PointeeTy));
 
   llvm::DINodeArray Annotations = nullptr;
-  if (auto *BTFAttrTy =
-          dyn_cast<BTFTagAttributedType>(PointeeTy.getTypePtr())) {
+  auto *BTFAttrTy = dyn_cast<BTFTagAttributedType>(PointeeTy.getTypePtr());
+  if (!BTFTypeTagV2 && BTFAttrTy) {
     SmallVector<llvm::Metadata *, 4> AnnotationsVec;
     collectBTFTypeTagAnnotations(CGM.getLLVMContext(), DBuilder, AnnotationsVec,
                                  BTFAttrTy, "btf_type_tag");

Plus changes in tests.


pahole error message details:

Because of (1) versions 1.23 and 1.24 of pahole fail to generate BTF and exit with error (thus preventing Kernel compilation):

Unsupported DW_TAG_unspecified_type(0x3b)
Encountered error while encoding BTF.

Because of (2) version 1.25 (current version) of pahole emits lots of warnings like:

die__create_new_base_type: DW_TAG_base_type WITH children!
die__create_new_tag: unspecified_type WITH children!
die__create_new_array: DW_TAG_LLVM_annotation (0x6000) @ <0xa8e7a> not handled!