This is an archive of the discontinued LLVM Phabricator instance.

[clang][DebugInfo] Emit DISubprogram for extern functions with reserved names
ClosedPublic

Authored by eddyz87 on Oct 16 2022, 9:04 AM.

Details

Summary

Callsite DISubprogram entries are not generated for:

  • builtin functions;
  • external functions with reserved names (e.g. names starting from "__").

This limitation was added by the commit [1] as a workaround for the
situation described in [2] that triggered the IR verifier error.
The goal of the present commit is to lift this limitation by adjusting
the IR verifier logic.

The logic behind [1] is to avoid the following situation:

  • a DISubprogram is added for some builtin function;
  • there is some location where this builtin is also emitted by a transformation (w/o debug location);
  • the Verifier::visitCallBase sees a call to a function with DISubprogram but w/o debug location and emits an error.

Here is an updated example of such situation taken from [2]:

extern "C" int memcmp(void *, void *, long);

struct a { int b; int c; int d; };

struct e { int f[1000]; };

bool foo(e g, e &h) {
  // DISubprogram for memcmp is created here when [1] is commented out
  return memcmp(&g, &h, sizeof(e));
}

bool bar(a &g, a &h) {
  // memcmp might be generated here by MergeICmps
  return g.b == h.b && g.c == h.c && g.d == h.d;
}

This triggers the verifier error when:

  • compiled for AArch64: clang++ -c -g -Oz -target aarch64-unknown-linux-android21 test.cpp;
  • [1] check is commented out.

Instead of forbidding generation of DISubprogram entries as in [1]
one can instead adjust the verifier to additionally check if callee
has a body. Functions w/o bodies cannot be inlined and thus verifier
warning is not necessary.

E.g. llvm::InlineFunction requires functions for which
GlobalValue::isDeclaration() == false.

[1] 568db780bb7267651a902da8e85bc59fc89aea70
[2] https://bugs.chromium.org/p/chromium/issues/detail?id=1022296

Diff Detail

Event Timeline

eddyz87 created this revision.Oct 16 2022, 9:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 16 2022, 9:04 AM
eddyz87 published this revision for review.Oct 16 2022, 4:34 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 16 2022, 4:34 PM

Hello Adrian, I've noticed that you was tagged as a reviewer in a somewhat related revision: https://reviews.llvm.org/D133060, so decided to tag you here.
Could you please take a look?

@aprantl The following example shows the difference between gcc and clang w.r.t. reserved names.

[$ ~/tmp3] cat t.c
extern void *__bpf_kptr_new(int, int, void *)
__attribute__((section(".ksyms")));
#define bpf_kptr_new(x) __bpf_kptr_new(x, 0, 0)

struct foo {
        int data;
};

int main(void)
{
        struct foo *f;

        f = bpf_kptr_new(0);
        return f->data;
}
[$ ~/tmp3] clang -target bpf -O2 -g -c t.c
[$ ~/tmp3] llvm-dwarfdump t.o | grep bpf_kptr_new
[$ ~/tmp3] gcc -O2 -g -c t.c
[$ ~/tmp3] llvm-dwarfdump t.o | grep bpf_kptr_new
                  DW_AT_abstract_origin (0x000000a3 "__bpf_kptr_new")
                DW_AT_linkage_name      ("__bpf_kptr_new")
                DW_AT_name      ("__bpf_kptr_new")
[$ ~/tmp3]

If not using reserved name, the function will appear in debuginfo/dwarf:

[$ ~/tmp3] cat t1.c
extern void *bpf_kptr_new_(int, int, void *)
__attribute__((section(".ksyms")));
#define bpf_kptr_new(x) bpf_kptr_new_(x, 0, 0)

struct foo {
        int data;
};

int main(void)
{
        struct foo *f;

        f = bpf_kptr_new(0);
        return f->data;
}
[$ ~/tmp3] clang -target bpf -O2 -g -c t1.c
[$ ~/tmp3] llvm-dwarfdump t1.o | grep bpf_kptr_new
                DW_AT_name      ("bpf_kptr_new_")
[$~/tmp3] gcc -O2 -g -c t1.c
[$ ~/tmp3] llvm-dwarfdump t1.o | grep bpf_kptr_new
                  DW_AT_abstract_origin (0x000000a3 "bpf_kptr_new_")
                DW_AT_linkage_name      ("bpf_kptr_new_")
                DW_AT_name      ("bpf_kptr_new_")
[$ ~/tmp3]

In BPF, we need the above extern function in debuginfo. This happens when a bpf program is calling a function defined in kernel. For example,
in kernel we have function

int tcp_drop(struct sk_buff *skb, ...) { ...
}
/* allow tcp_drop() function accessible to bpf programs */

In bpf program, we could have

extern int tcp_drop(struct sk_buff *sbk, ...);
BPF_PROG(...) {
    tcp_drop(...)

BPF verifier will need to check whether the signature of tcp_drop() in bpf program matched the one in kernel or not. In kernel, we DO have some global functions with '__' prefix in the function name. Although no reserved function name works, we want all legal names working so user won't be surprised why some kernel functions are not available to them.

}
aprantl accepted this revision as: aprantl.Oct 17 2022, 10:53 AM

I think this is reasonable. Ideally the LLVM and Clang patches should be two independent commits.
Please wait a few days for others to chime in though. Do you have any idea how much this will grow the debug info for something like, e.g., clang?

This revision is now accepted and ready to land.Oct 17 2022, 10:53 AM

I think this is reasonable. Ideally the LLVM and Clang patches should be two independent commits.
Please wait a few days for others to chime in though. Do you have any idea how much this will grow the debug info for something like, e.g., clang?

Thanks for the review,
I will split the patch and provide the measurements, most likely tomorrow.

Hmm - this does mean linking IR can produce invalid code, though, right (you link in a definition of the function, so what was valid is now invalid - because it now has a definition, can be inlined, etc)? Is that new? concerning?

Hmm - this does mean linking IR can produce invalid code, though, right (you link in a definition of the function, so what was valid is now invalid - because it now has a definition, can be inlined, etc)? Is that new? concerning?

@dblaikie do you have a concrete example to illustrate the problem?

Hmm - this does mean linking IR can produce invalid code, though, right (you link in a definition of the function, so what was valid is now invalid - because it now has a definition, can be inlined, etc)? Is that new? concerning?

As far as I understand this discussion the check in question is "best effort".
My change further narrows conditions when verifier would report an error, thus it should not add any new failures to the existing code.
But yes, hypothetically there might be a situation when old version of the check would have caught a miss-behaving transformation at an earlier stage (before linking IR rather then after).

eddyz87 updated this revision to Diff 468514.Oct 18 2022, 6:13 AM

Split the original commit in two to separate clang and llvm parts.

dblaikie accepted this revision.Oct 18 2022, 8:42 AM

Hmm - this does mean linking IR can produce invalid code, though, right (you link in a definition of the function, so what was valid is now invalid - because it now has a definition, can be inlined, etc)? Is that new? concerning?

@dblaikie do you have a concrete example to illustrate the problem?

One translation unit with a call to a function declaration, that call has no !dbg attachment.
Another translation unit with the definition of that declared function.

As-is, this would not cause any verifier error or backend debug info assertion.
Link the two together and now it's a verifier error - and, depending on the contents of the function definition, maybe a backend debug info assertion.

Hmm - this does mean linking IR can produce invalid code, though, right (you link in a definition of the function, so what was valid is now invalid - because it now has a definition, can be inlined, etc)? Is that new? concerning?

As far as I understand this discussion the check in question is "best effort".

Fair point

My change further narrows conditions when verifier would report an error, thus it should not add any new failures to the existing code.

No, the opposite.

But yes, hypothetically there might be a situation when old version of the check would have caught a miss-behaving transformation at an earlier stage (before linking IR rather then after).

Right - now things that were invalid categorically, are now valid, unless you link them. Which is a bit tricky.

Ah well, as you say - it's best effort.

... Do you have any idea how much this will grow the debug info for something like, e.g., clang?

I made the measurements but I'm not sure what to make of these numbers (e.g. is it good or bad).
There were no changes in executable size for Clang (in Debug mode, I'll try in Release mode a bit later), so I tried the Linux kernel with debug symbols enabled instead.
Here are executable sizes in three modes:

  • (main) w/o my changes;
  • (a) with these two commits;
  • (b) with these two commits but with CalleeDecl->getBuiltinID() != 0 in CGDebugInfo::EmitFuncDeclForCallSite put back.
abmaina - mainb - main
vmlinux size (bytes)667,827,456667,365,792663,996,200+3,831,256+3,369,592
Debug section size (bytes)565,548,585565,086,931561,717,339+3,831,246+3,369,592
#call site DIEs311,217303,909242,933+68,284+60,976
#call site parameter DIEs369,405361,080301,738+67,667+59,342

The BTF use-case does not care about the builtin functions, so I can change this revision to variant (b) to save some space.
What do you think?

probinson added inline comments.
llvm/lib/IR/Verifier.cpp
3472

(Interposable functions are not inlinable, neither are functions without
definitions.)

eddyz87 updated this revision to Diff 468611.Oct 18 2022, 10:38 AM

Comment fixed.

Nice. This also fixes DW_AT_call_all_calls for functions which call builtins/reserved name functions.

llvm/test/Verifier/callsite-dbgloc.ll
52

to not match other functions named @i*

MaskRay added inline comments.Oct 18 2022, 6:39 PM
llvm/lib/IR/Verifier.cpp
3475

Removing this check does not break any check-llvm check-clang test.

eddyz87 updated this revision to Diff 469081.Oct 19 2022, 4:46 PM
eddyz87 marked an inline comment as done.

Fixed test case to fail if !Call.getCalledFunction()->isDeclaration() condition is removed in Verifier::visitCallBase.

eddyz87 marked 2 inline comments as done.Oct 19 2022, 4:50 PM
eddyz87 added inline comments.
llvm/lib/IR/Verifier.cpp
3475

Well, that's embarrassing, thank you for catching it!
I've updated llvm/test/Verifier/callsite-dbgloc.ll by adding !dbg metadata for @i. Know the test fails if the check above is commented out.