This is an archive of the discontinued LLVM Phabricator instance.

[RFC][DebugInfo] emit user specified address_space in dwarf
Needs ReviewPublic

Authored by yonghong-song on Oct 24 2019, 9:49 AM.

Details

Summary

The RFC intends to kick off the discussion on how to support
user defined address_space attributes in dwarf for linux kernel.

The use case:

In linux kernel, under certain make flags, pointers may be
annotated with additional address_space information. The source
code is below:

https://github.com/torvalds/linux/blob/master/include/linux/compiler_types.h

For example, we have

  1. define user attribute__((noderef, address_space(1)))
  2. define kernel attribute__((address_space(0)))
  3. define iomem attribute__((noderef, address_space(2)))
  4. define percpu attribute__((noderef, address_space(3)))
  5. define rcu attribute__((noderef, address_space(4)))

Currently, the address_space annotation is not used when compiling
a normal (production) kernel. It is typically used during development
and used by 'sparse' tool to check proper pointer usage.

Now there is a growing need to put address_space info into debug info,
e.g., dwarf, in linux binary to help automatically differentiate
pointers accessing kernel and user memories in order to avoid
explicit user annotations like below:

http://lkml.iu.edu/hypermail/linux/kernel/1905.1/05750.html

Other tracing tools like bpftrace, bcc would have similar issues.

The current patch

The proposal here is for user specified address_space, just add it
to DebugInfo and then later it will be automatically inserted into
dwarf. For example,

-bash-4.4$ cat t.c
#define __user __attribute__((noderef, address_space(1)))
void __user *g;
extern int __user *foo(int *a, int __user *b);
int __user *foo(int *a, int __user *b) { return b; }
-bash-4.4$ clang -O2 -g -c t.c
-bash-4.4$ llvm-dwarfdump t.o
...
0x0000002a:   DW_TAG_variable
              DW_AT_name      ("g")
              DW_AT_type      (0x00000042 "*")
0x00000042:   DW_TAG_pointer_type
              DW_AT_address_class     (0x00000001)
0x00000060:     DW_TAG_formal_parameter
                DW_AT_name    ("a")
                DW_AT_type    (0x00000091 "int*")
0x00000071:     DW_TAG_formal_parameter
                DW_AT_name    ("b")
                DW_AT_type    (0x00000081 "int*")
0x00000081:   DW_TAG_pointer_type
              DW_AT_type      (0x0000008a "int")
              DW_AT_address_class     (0x00000001)
0x00000091:   DW_TAG_pointer_type
              DW_AT_type      (0x0000008a "int")
              DW_AT_address_class     (0x00000000)

The DW_AT_address_class attribute tells debugger how to
dereference the pointer on linux kernel platform.
Note that on linux, kernel and user address
dereference may require different pre-setting of
certain registers.

Question:

The here introduced new DW_AT_address_class 0x1 for BPF
and X86 is really at OS level, but I do not know how to
express it. Currently, it is specified at each target level
which also allows more flexibility of DW_AT_address_class values.

Diff Detail

Event Timeline

yonghong-song created this revision.Oct 24 2019, 9:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2019, 9:49 AM

The DW_AT_address_class attribute is intended to be used for target architectures where a simple address value is ambiguous, and the debugger needs additional information to resolve the ambiguity. The canonical example is something like i386 with its segmented addresses, and an address value has different interpretations depending on whether it is (for example) a "near" or "far" pointer.
This would be in contrast to something like the VAX, where the upper two bits of the address identify a subdivision of the flat address space (VAX calls these P0, P1, S0, where P0/P1 are Process [user] spaces and S0 is system/kernel space); however, there is no ambiguity because addresses are always 32-bit and the same address value can't be interpreted different ways.

The address spaces envisioned by the Linux kernel appear to be more informational and not descriptive of hardware characteristics. Therefore, I would be inclined to say it's not appropriate to propagate them into DW_AT_address_class.

ast added a comment.EditedOct 24 2019, 8:17 PM

The address spaces envisioned by the Linux kernel appear to be more informational and not descriptive of hardware characteristics.

From the kernel pov the __user and normal are two different address spaces that must be accessed via different kernel primitives.
User access needs stac/clac on x86 and other precautions.
iirc other architectures have co-processor address space that needs its own load/store equivalents.
__percpu is also different address space. It's roughly equivalent to __thread in user space.

@probinson Thanks for the input! That is my concern too mixing the user defined and language defined may not be a good idea and may actually cause confusion. This is exactly this RFC for. Let us try a different dwarf encoding and then we can continue to discuss.

In D69393#1720816, @ast wrote:

The address spaces envisioned by the Linux kernel appear to be more informational and not descriptive of hardware characteristics.

From the kernel pov the __user and normal are two different address spaces that must be accessed via different kernel primitives.
User access needs stac/clac on x86 and other precautions.
iirc other architectures have co-processor address space that needs its own load/store equivalents.
__percpu is also different address space. It's roughly equivalent to __thread in user space.

Ah, it has been so long since I worked on priv code that I had forgotten about the use of user-context access instructions. That would be a reasonable use of the DW_AT_address_class attribute for kernel code, so that a kernel-mode debugger would do the right thing with user-space addresses.

@probinson Thanks for the input! That is my concern too mixing the user defined and language defined may not be a good idea and may actually cause confusion. This is exactly this RFC for. Let us try a different dwarf encoding and then we can continue to discuss.

For proper separation of concerns, it would be best to define values to use for the DWARF attribute independently of whatever conventions the Linux kernel might have. What the debugger needs to know is how to dereference the pointers; this may be different than how the kernel chooses to classify addresses.

yonghong-song edited the summary of this revision. (Show Details)

minor update to only support address_space 1 for user pointer

During experimenting with linux kernel codes, I found that clang does not allow address_space attribute for function pointers, specifically, in clang/lib/Sema/SemaType.cpp,

// ISO/IEC TR 18037 S5.3 (amending C99 6.7.3): "A function type shall not be
// qualified by an address-space qualifier."
if (Type->isFunctionType()) {
  S.Diag(Attr.getLoc(), diag::err_attribute_address_function_type);
  Attr.setInvalid();
  return;
}

But linux kernel tries to annotate signal handling function pointer with address space to indicate it is accessing user space.

typedef __signalfn_t __user *__sighandler_t;
typedef __restorefn_t __user *__sigrestore_t;

Such attribute makes sense for linux since indeed the signal handler code resides in user space and the kernel pointer
merely points to user memory here.

But such attributes are not allowed for function pointers.

Maybe somebody can give some context about this particular ISO/IEC TR 18037 specification? cc @probinson

During experimenting with linux kernel codes, I found that clang does not allow address_space attribute for function pointers, specifically, in clang/lib/Sema/SemaType.cpp,

// ISO/IEC TR 18037 S5.3 (amending C99 6.7.3): "A function type shall not be
// qualified by an address-space qualifier."
if (Type->isFunctionType()) {
  S.Diag(Attr.getLoc(), diag::err_attribute_address_function_type);
  Attr.setInvalid();
  return;
}

But linux kernel tries to annotate signal handling function pointer with address space to indicate it is accessing user space.

typedef __signalfn_t __user *__sighandler_t;
typedef __restorefn_t __user *__sigrestore_t;

Such attribute makes sense for linux since indeed the signal handler code resides in user space and the kernel pointer
merely points to user memory here.

But such attributes are not allowed for function pointers.

Maybe somebody can give some context about this particular ISO/IEC TR 18037 specification? cc @probinson

I have no insight into the standards committees. @rsmith is your guru here.

jemarch added a subscriber: jemarch.Nov 4 2019, 2:29 AM

For what it's worth, in our downstream fork of Clang we have added the ability for function types to possess an address space.

Though technically, even in our fork it is not possible to actually declare functions/function pointers with an address space; the target-specified AS is implicitly applied to all functions.