This is an archive of the discontinued LLVM Phabricator instance.

[BTF] Add BTF DebugInfo
AbandonedPublic

Authored by yonghong-song on Oct 25 2018, 3:45 PM.

Details

Summary

This patch adds BPF Debug Format (BTF) as a standalone
LLVM debuginfo. The BTF related sections are directly
generated from IR. The BTF debuginfo is generated
only when the compilation target is BPF.

What is BTF?

First, the BPF is a linux kernel virtual machine
and widely used for tracing, networking and security.

https://www.kernel.org/doc/Documentation/networking/filter.txt
https://cilium.readthedocs.io/en/v1.2/bpf/

BTF is the debug info format for BPF, introduced in the below
linux patch

https://github.com/torvalds/linux/commit/69b693f0aefa0ed521e8bd02260523b5ae446ad7#diff-06fb1c8825f653d7e539058b72c83332

in the patch set mentioned in the below lwn article.

https://lwn.net/Articles/752047/

The BTF format is specified in the above github commit.
In summary, its layout looks like

struct btf_header
type subsection (a list of types)
string subsection (a list of strings)

With such information, the kernel and the user space is able to
pretty print a particular bpf map key/value. One possible example below:

Withtout BTF:
  key: [ 0x01, 0x01, 0x00, 0x00 ]
With BTF:
  key: struct t { a : 1; b : 1; c : 0}
where struct is defined as
  struct t { char a; char b; short c; };

How BTF is generated?

Currently, the BTF is generated through pahole.

https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=68645f7facc2eb69d0aeb2dd7d2f0cac0feb4d69

and available in pahole v1.12

https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=4a21c5c8db0fcd2a279d067ecfb731596de822d4

Basically, the bpf program needs to be compiled with -g with
dwarf sections generated. The pahole is enhanced such that
a .BTF section can be generated based on dwarf. This format
of the .BTF section matches the format expected by
the kernel, so a bpf loader can just take the .BTF section
and load it into the kernel.

https://github.com/torvalds/linux/commit/8a138aed4a807ceb143882fb23a423d524dcdb35

The .BTF section layout is also specified in this patch:
with file include/llvm/BinaryFormat/BTF.h.

What use cases this patch tries to address?

Currently, only the bpf instruction stream is required to
pass to the kernel. The kernel verifies it, jits it if configured
to do so, attaches it to a particular kernel attachment point,
and later executes when a particular event happens.

This patch tries to expand BTF to support two more use cases below:

(1). BPF supports subroutine calls.
     During performance analysis, it would be good to
     differentiate which call is hot instead of just
     providing a virtual address. This would require to
     pass a unique identifier for each subroutine to
     the kernel, the subroutine name is a natual choice.
(2). If a particular jitted instruction is hot, we want
     user to know which source line this jitted instruction
     belongs to. This would require the source information
     is available to various profiling tools.

Note that in a single ELF file,

. there may be multiple loadable bpf programs,
. for a particular to-be-loaded bpf instruction stream,
  its instructions may come from multiple PROGBITS sections,
  the bpf loader needs to merge them together to a single
  consecutive insn stream before loading to the kernel.

For example:

section .text: subroutines funcFoo
section _progA: calling funcFoo
section _progB: calling funcFoo

The bpf loader could construct two loadable bpf instruction
streams and load them into the kernel:

. _progA funcFoo
. _progB funcFoo

So per ELF section function offset and instruction offset
will need to be adjusted before passing to the kernel, and
the kernel essentially expect only one code section regardless
of how many in the ELF file.

What do we propose and Why?

To support the above two use cases, we propose to
add an additional section, .BTF.ext, to the ELF file
which is the input of the bpf loader. A different section
is preferred since loader may need to manipulate it before
loading part of its data to the kernel.

The .BTF.ext section has a similar header to the .BTF section
and it contains two subsections for func_info and line_info.

. the func_info maps the func insn byte offset to a func
  type in the .BTF type subsection.
. the line_info maps the insn byte offset to a line info.
. both func_info and line_info subsections are organized
  by ELF PROGBITS AX sections.

pahole is not a good place to implement .BTF.ext as
pahole is mostly for structure hole information and more
importantly, we want to pass the actual code to the kernel.

. bpf program typically is small so storage overhead
  should be small.
. in bpf land, it is totally possible that
  an application loads the bpf program into the
  kernel and then that application quits, so
  holding debug info by the user space application
  is not practical as you may not even know who
  loads this bpf program.
. having source codes directly kept by kernel
  would ease deployment since the original source
  code does not need ship on every hosts and
  kernel-devel package does not need to be
  deployed even if kernel headers are used.

LLVM is a good place to implement.

. The only reliable time to get the source code is
  during compilation time. This will result in both more
  accurate information and easier deployment as
  stated in the above.
. Another consideration is for JIT. The project like bcc
  (https://github.com/iovisor/bcc)
  use MCJIT to compile a C program into bpf insns and
  load them to the kernel. The llvm generated BTF sections
  will be readily available for such cases as well.

Design and implementation of emiting .BTF/.BTF.ext sections

The BTF debuginfo format is defined. Both .BTF and .BTF.ext
sections are generated directly from IR when both
"-target bpf" and "-g" are specified. Note that
dwarf sections are still generated as dwarf is used
by user space tools like llvm-objdump etc. for BPF target.

This patch also contains tests to verify generated
.BTF and .BTF.ext sections for all supported types, func_info
and line_info subsections. The patch is also tested
against linux kernel bpf sample tests and selftests.

Signed-off-by: Yonghong Song <yhs@fb.com>

Diff Detail

Repository
rL LLVM

Event Timeline

yonghong-song created this revision.Oct 25 2018, 3:45 PM

change a few struct/func test cases so names are shown as ascii instead of bytes.

aprantl added inline comments.Oct 26 2018, 9:25 AM
include/llvm/BinaryFormat/BTF.h
46 ↗(On Diff #171245)

These names don't follow the LLVM coding guidelines.
Perhaps these enums should have underlying types?
enum : uint32_t or something?

include/llvm/MC/MCObjectFileInfo.h
210 ↗(On Diff #171245)

/// BTF-specific sections. (BTF is ...)

380 ↗(On Diff #171245)

same as above

lib/CodeGen/AsmPrinter/BTFDebug.cpp
207 ↗(On Diff #171245)

There are suspiciously few comments in most of these functions. Is what they are doing really obvious?

419 ↗(On Diff #171245)

This function looks like it would fit better in DebugInfoFinder?http://llvm.org/doxygen/classllvm_1_1DebugInfoFinder.html

I would also check how Verifier.cpp does this and whether it could be factored out.

lib/CodeGen/AsmPrinter/BTFDebug.h
119 ↗(On Diff #171245)

please add doxygen comments to all member functions that aren't obvious getter/setters and such.

150 ↗(On Diff #171245)

This is a little long for an inline definition, perhaps move to .cpp file?

150 ↗(On Diff #171245)

doxygen comment?

Fixed various stylish issues pointed out by Adrian.

I did not use DebugInfoFinder or refactor Verifier.cpp to type visiting. There are a couple of reasons for this.

. DebugInfoFinder tries to collect all information for a module and process them together.
. The BTF type processing is very similar to Dwarf and Codeview. 
  In Dwarf, getDIE() is used to check whether a type->DIE has been done or not. If it has,
  that type will not be processed further. 
  Similarly, for CodeView, TypeIndices variable is used to check whether a type -> Index has
  been done or not. If it has, that type will be skipped.
. The current BTF type processing may skip some types during traversal. With DebugInfoFinder approach,
  it will be hard to find which type should be skipped without recursive traversing again.... It may be
  just some space waste, but not ideal.
. With DebugInfoFinder, the type information will need to be collected before the first subprogram
  is processed. Not sure whether all type information is available or not.

I feel refactoring Verifier.cpp to provide a common interface for visiting the type hierarchy is better.
But it looks like complicated and it probably should be done in a different patch. This common
interface could be used by BTF, CodeView and Dwarf as all of them have such a code pattern.
@aprantl what do you think? Maybe I missed something here?

@aprantl @dblaikie Just pinging. Any comments? Thanks!

@dblaikie / @echristo: Since this is specific to the BTF target, should the code be moved to that target's directory, so clients that don't build the BTF target don't need to pay the code size penalty? Is that something we do / can do at the AsmPrinter level?

Otherwise I think this is mostly looking good now.

yonghong-song updated this revision to Diff 171634.EditedOct 29 2018, 9:35 PM

. rebase on latest trunk as there is a conflict and people need to resolve manually otherwise
. fix a small bug related to StringRef usage where StringRef is constructed with a local std::string and after function return local std::string is deallocated, StringRef may not hold correct values.

kernel BTF verifier requires if a FUNC kind is a subprogram, all its arguments must be valid identifier.
Using SP RetainedNodes instead DbgValues so we can get all argument local variables even if they
are not used in the function body. Added a test case to test such a scenario.

the previous revision used SP retained nodes instead of dbgValues. So remove the use of header file DbgEntityHistoryCalculator.h.

If we want to move BTFDebug.h and BTFDebug.cpp to lib/Target/BPF, the header files BTFDebug.h and DebugHandlerBase.h needs to be exposed. In the future, other header files like DbgEntityHistoryCalculator.h may need to be exposed as well as BTF itself may evolve.

@dblaikie @echristo What is your opinion on this patch? Thanks!

emit record size for funcInfo and lineinfo to make funcinfo/lineinfo subsections extensible.

fix a bug in calculating the func_info/debug_info with newly added record size in subsections.

ast accepted this revision.Oct 31 2018, 4:28 PM

Eric, David, any objections to land it now?
It's orthogonal to everything else in the llvm land and we'll keep working on it for foreseeable future.
All comments were addressed and any further refactoring requests can be done later.
I'd like to avoid sitting on it for too long, since a bunch of folks in bpf community are waiting on it.
We have two large sets of kernel patches that depend on it.
We'd like to land llvm bits first, so our kernel test bots can do testing with latest llvm trunk and latest kernel tree asap.

This revision is now accepted and ready to land.Oct 31 2018, 4:28 PM

previous we may generate two line_info record for the first insn in a function: the line info based subprogram line and the line info based on the debugloc of the first insn. This is confusing. Now let us just generate one. for the first insn of the function, if the debugloc is available, use that to generate the line info. Otherwise, use the subprogram line to generate a line info record for BTF.

Hi, @echristo, just a ping. Not sure whether you got some time or not to look at this patch. Thanks!

@echristo, just ping. Did you get some time to take a look at this patch? Thanks!

Meanwhile, have you looked into what it would take to make the BTF part of AsmPrinter link only if the BTF target is being built?

@aprantl I just tried. The following is the change which works by using a stub BPFDebug when BPF target is not specified.

diff --git a/lib/CodeGen/AsmPrinter/BTFDebugStub.cpp b/lib/CodeGen/AsmPrinter/BTFDebugStub.cpp
new file mode 100644
index 00000000000..f27e9c5c75f
--- /dev/null
+++ b/lib/CodeGen/AsmPrinter/BTFDebugStub.cpp
@@ -0,0 +1,46 @@
+//===- llvm/lib/CodeGen/AsmPrinter/BTFDebugStub.cpp -------------*- C++ -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file is a stub for BTF generation support. The real implementation
+/// is at BTFDebug.cpp which will be included if BPF target is built.
+///
+//===----------------------------------------------------------------------===//
+
+#include "DebugHandlerBase.h"
+
+namespace llvm {
+
+/// Collect and emit BTF information.
+class BTFDebug : public DebugHandlerBase {
+
+protected:
+  /// Gather pre-function debug information.
+  void beginFunctionImpl(const MachineFunction *MF) override {};
+
+  /// Post process after all instructions in this function are processed.
+  void endFunctionImpl(const MachineFunction *MF) override {};
+
+public:
+  BTFDebug(AsmPrinter *AP);
+
+  void setSymbolSize(const MCSymbol *Symbol, uint64_t Size) override {}
+
+  /// Process beginning of an instruction.
+  void beginInstruction(const MachineInstr *MI) override {
+    DebugHandlerBase::beginInstruction(MI);
+  };
+
+  /// Complete all the types and emit the BTF sections.
+  void endModule() override {};
+};
+
+BTFDebug::BTFDebug(AsmPrinter *AP) : DebugHandlerBase(AP) {}
+
+} // end namespace llvm
diff --git a/lib/CodeGen/AsmPrinter/CMakeLists.txt b/lib/CodeGen/AsmPrinter/CMakeLists.txt
index 24c5d33a120..e12fbf92c77 100644
--- a/lib/CodeGen/AsmPrinter/CMakeLists.txt
+++ b/lib/CodeGen/AsmPrinter/CMakeLists.txt
@@ -1,3 +1,14 @@
+# depends on whether BPF target is built or not
+# eitheer BTFDebug.cpp or BTFDebugStub.cpp is included,
+# but not both.
+set(LLVM_OPTIONAL_SOURCES "BTFDebug.cpp" "BTFDebugStub.cpp")
+list(FIND LLVM_TARGETS_TO_BUILD "BPF" idx)
+if( NOT idx LESS 0 )
+set(BTFDebug_File "BTFDebug.cpp")
+else()
+set(BTFDebug_File "BTFDebugStub.cpp")
+endif()
+
 add_llvm_library(LLVMAsmPrinter
   AccelTable.cpp
   AddressPool.cpp
@@ -24,7 +35,7 @@ add_llvm_library(LLVMAsmPrinter
   WinException.cpp
   CodeViewDebug.cpp
   WasmException.cpp
-  BTFDebug.cpp
+  ${BTFDebug_File}
 
   DEPENDS
   intrinsics_gen

This adds complexity to make file and it adds one more file BTFDebugStub.cpp. I personally think this is not a good approach.
But please let me know what you think and I can implement this if you think it is good.

@aprantl I just tried. The following is the change which works by using a stub BPFDebug when BPF target is not specified.

diff --git a/lib/CodeGen/AsmPrinter/BTFDebugStub.cpp b/lib/CodeGen/AsmPrinter/BTFDebugStub.cpp
new file mode 100644
index 00000000000..f27e9c5c75f
--- /dev/null
+++ b/lib/CodeGen/AsmPrinter/BTFDebugStub.cpp
@@ -0,0 +1,46 @@
+//===- llvm/lib/CodeGen/AsmPrinter/BTFDebugStub.cpp -------------*- C++ -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file is a stub for BTF generation support. The real implementation
+/// is at BTFDebug.cpp which will be included if BPF target is built.
+///
+//===----------------------------------------------------------------------===//
+
+#include "DebugHandlerBase.h"
+
+namespace llvm {
+
+/// Collect and emit BTF information.
+class BTFDebug : public DebugHandlerBase {
+
+protected:
+  /// Gather pre-function debug information.
+  void beginFunctionImpl(const MachineFunction *MF) override {};
+
+  /// Post process after all instructions in this function are processed.
+  void endFunctionImpl(const MachineFunction *MF) override {};
+
+public:
+  BTFDebug(AsmPrinter *AP);
+
+  void setSymbolSize(const MCSymbol *Symbol, uint64_t Size) override {}
+
+  /// Process beginning of an instruction.
+  void beginInstruction(const MachineInstr *MI) override {
+    DebugHandlerBase::beginInstruction(MI);
+  };
+
+  /// Complete all the types and emit the BTF sections.
+  void endModule() override {};
+};
+
+BTFDebug::BTFDebug(AsmPrinter *AP) : DebugHandlerBase(AP) {}
+
+} // end namespace llvm
diff --git a/lib/CodeGen/AsmPrinter/CMakeLists.txt b/lib/CodeGen/AsmPrinter/CMakeLists.txt
index 24c5d33a120..e12fbf92c77 100644
--- a/lib/CodeGen/AsmPrinter/CMakeLists.txt
+++ b/lib/CodeGen/AsmPrinter/CMakeLists.txt
@@ -1,3 +1,14 @@
+# depends on whether BPF target is built or not
+# eitheer BTFDebug.cpp or BTFDebugStub.cpp is included,
+# but not both.
+set(LLVM_OPTIONAL_SOURCES "BTFDebug.cpp" "BTFDebugStub.cpp")
+list(FIND LLVM_TARGETS_TO_BUILD "BPF" idx)
+if( NOT idx LESS 0 )
+set(BTFDebug_File "BTFDebug.cpp")
+else()
+set(BTFDebug_File "BTFDebugStub.cpp")
+endif()
+
 add_llvm_library(LLVMAsmPrinter
   AccelTable.cpp
   AddressPool.cpp
@@ -24,7 +35,7 @@ add_llvm_library(LLVMAsmPrinter
   WinException.cpp
   CodeViewDebug.cpp
   WasmException.cpp
-  BTFDebug.cpp
+  ${BTFDebug_File}
 
   DEPENDS
   intrinsics_gen

This adds complexity to make file and it adds one more file BTFDebugStub.cpp. I personally think this is not a good approach.
But please let me know what you think and I can implement this if you think it is good.

LLVM code size is an actual problem, and since this code only makes sense when the BTF target is present, I think that we should make sure that it's only there when it's needed. Since you don't seem too happy with detecting the BTF target in the AsmPrinter CMake file, perhaps defining the interface in AsmPrinter, moving the implementation to the target directory and having a unique_ptr to the implementation in AsmPrinter would be more elegant?

@aprantl I just prototyped to move BTF generation to BPF backend. This indeed works. The BPF backend
already extends AsmPrinter (in BPFAsmPrinter), which makes changes easier.
The main changes are below:

  1. Add an override interface EmitBTF in AsmPrinter and change HandlerInfo to protected as EmitBTF will access this member.

diff --git a/include/llvm/CodeGen/AsmPrinter.h b/include/llvm/CodeGen/AsmPrinter.h
index 14bc0816782..19ff4c2270d 100644

  • a/include/llvm/CodeGen/AsmPrinter.h

+++ b/include/llvm/CodeGen/AsmPrinter.h
@@ -137,6 +137,7 @@ private:

static char ID;

+protected:

struct HandlerInfo {
  AsmPrinterHandler *Handler;
  const char *TimerName;

@@ -425,6 +426,11 @@ public:

/// instructions in verbose mode.
virtual void emitImplicitDef(const MachineInstr *MI) const;

+ / Targets can override this to output BTF.
+
/ Currently, only BPF target does this.
+ virtual void EmitBTF(const char *DbgTimerName, const char *DbgTimerDescription,
+ const char *GroupName, const char *GroupDescription) {};
+

  1. Move BTFDebug.{cpp,h} to lib/Target/BPF directory. And AsmPrinter.cpp change

looks like below:
diff --git a/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
index 1a5bbd06033..8b1332ba9a5 100644

  • a/lib/CodeGen/AsmPrinter/AsmPrinter.cpp

+++ b/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
@@ -12,8 +12,6 @@
===----------------------------------------------------------------------===

#include "llvm/CodeGen/AsmPrinter.h"
-#include "AsmPrinterHandler.h"
-#include "BTFDebug.h"
#include "CodeViewDebug.h"
#include "DwarfDebug.h"
#include "DwarfException.h"
@@ -37,6 +35,7 @@
#include "llvm/BinaryFormat/COFF.h"
#include "llvm/BinaryFormat/Dwarf.h"
#include "llvm/BinaryFormat/ELF.h"
+#include "llvm/CodeGen/AsmPrinterHandler.h"
#include "llvm/CodeGen/GCMetadata.h"
#include "llvm/CodeGen/GCMetadataPrinter.h"
#include "llvm/CodeGen/GCStrategy.h"
@@ -313,12 +312,8 @@ bool AsmPrinter::doInitialization(Module &M) {

  Handlers.push_back(HandlerInfo(DD, DbgTimerName, DbgTimerDescription,
                                 DWARFGroupName, DWARFGroupDescription));
}
  • const Triple &TT = TM.getTargetTriple();
  • if (TT.getArch() == Triple::bpfel || TT.getArch() == Triple::bpfeb) {
  • Handlers.push_back(HandlerInfo(new BTFDebug(this),
  • DbgTimerName, DbgTimerDescription,
  • BTFGroupName, BTFGroupDescription));
  • }

+
+ EmitBTF(DbgTimerName, DbgTimerDescription, BTFGroupName, BTFGroupDescription);

}
 
switch (MAI->getExceptionHandlingType()) {
  1. Since BPFDebug extends DebugHandlerBase, so I have to move lib/CodeGen/AsmPrinter/DebugHandlerBase.h

to include/llvm/CodeGen directory. The same happens lib/CodeGen/AsmPrinter/DbgEntityHistoryCalculator.h
and lib/CodeGen/AsmPrinter/AsmPrinterHandler.h to include/llvm/CodeGen directory.
As a result, a lot of files under lib/CodeGen/AsmPrinter needs to be changed to reflect the new include
location.

  1. The BPFAsmPrinter simply implements the override function.

diff --git a/lib/Target/BPF/BPFAsmPrinter.cpp b/lib/Target/BPF/BPFAsmPrinter.cpp
index 705211b486b..6dc25f7d708 100644

  • a/lib/Target/BPF/BPFAsmPrinter.cpp

+++ b/lib/Target/BPF/BPFAsmPrinter.cpp
@@ -16,6 +16,7 @@
#include "BPFInstrInfo.h"
#include "BPFMCInstLower.h"
#include "BPFTargetMachine.h"
+#include "BTFDebug.h"
#include "InstPrinter/BPFInstPrinter.h"
#include "llvm/CodeGen/AsmPrinter.h"
#include "llvm/CodeGen/MachineConstantPool.h"
@@ -49,6 +50,8 @@ public:

                           raw_ostream &O) override;
 
void EmitInstruction(const MachineInstr *MI) override;

+ void EmitBTF(const char *DbgTimerName, const char *DbgTimerDescription,
+ const char *GroupName, const char *GroupDescription) override;
};
} // namespace

@@ -131,6 +134,13 @@ void BPFAsmPrinter::EmitInstruction(const MachineInstr *MI) {

EmitToStreamer(*OutStreamer, TmpInst);

}

+void BPFAsmPrinter::EmitBTF(const char *DbgTimerName, const char *DbgTimerDescription,
+ const char *GroupName, const char *GroupDescription) {
+ Handlers.push_back(HandlerInfo(new BTFDebug(this),
+ DbgTimerName, DbgTimerDescription,
+ GroupName, GroupDescription));
+}
+
// Force static initialization.

I think this approach is better. But since it will move a few private header files to public include,
maybe you can comment whether this is okay or not. If it is considered at this point moving these private headers to public headers is premature,
I can implement cmake detection mechanism and we can refactor later if necessary.
Please let me know.

aprantl added a comment.EditedNov 16 2018, 10:39 AM

Since BPFDebug extends DebugHandlerBase, so I have to move lib/CodeGen/AsmPrinter/DebugHandlerBase.h to include/llvm/CodeGen directory. The same happens lib/CodeGen/AsmPrinter/DbgEntityHistoryCalculator.h and lib/CodeGen/AsmPrinter/AsmPrinterHandler.h to include/llvm/CodeGen directory. As a result, a lot of files under lib/CodeGen/AsmPrinter needs to be changed to reflect the new include location.

@dblaikie: any ideas for how to best get the layering right here?

Added BTFDebugStub.cpp and changed CMakefile such that if BPF target is not built, the stub will be used so we can reduce the binary size.
Also added BTF_KIND_FUNC_PROTO per discussion with Edward Cree.

@aprantl @dblaikie I just updated a revision with cmakefile change such that a simple stub in BTFDebugStub.cpp will be included if BPF target is not in build target list. This can be used if you decided that it is not good idea to expose private debuginfo headers at this moment. Thanks!

MaskRay requested changes to this revision.Nov 17 2018, 12:04 PM
MaskRay added inline comments.
lib/CodeGen/AsmPrinter/BTFDebug.cpp
142 ↗(On Diff #174499)

const DINode *

144 ↗(On Diff #174499)

cast if you don't think it can be null. LLVM_ENABLE_ASSERTIONS builds have assert() in cast()

195 ↗(On Diff #174499)

cast

490 ↗(On Diff #174499)

dyn_cast + assert can just be replaced by cast

lib/CodeGen/AsmPrinter/BTFDebug.h
171 ↗(On Diff #174499)

StringRef

199 ↗(On Diff #174499)

Must these maps be ordered?

202 ↗(On Diff #174499)

Consider StringMap

This revision now requires changes to proceed.Nov 17 2018, 12:04 PM

Address Fangrui's comments:

  • using "cast" instead of "dyn_cast" and removed a few assertions.
  • change argument of addString from std::string to StringRef.
  • change std::map to std::unordered_map in several instances as we do not really care its ordering.
  • Do not really understand the pros/cons of StringMap vs. std::map, but anyway changed FileContent to use StringMap.

@MaskRay does this patch addressed your comments?
@dblaikie @echristo I currently included BTFDebug.cpp in build only if BPF is in the target list. This is intended to reduce the final binary size. Another approach is to put BTFDebug.cpp in lib/Target/BPF directory. But that involves exposing several private headers in lib/CodeGen/AsmPrinter directory. if the refactoring is anticipated to involve nontrivial work, maybe we can first use the CMakefile tweaking approach and we can have another patch to deal with refactoring? Please let me know your thought. Thanks!

The kernel change has been merged to the kernel bpf-next tree.

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=740baecd811f8e0be7475e952a0f3174b82d813b

It would be great if we can resolve this patch soon so people can just clone llvm trunk, without applying additional out-of-tree patches, for trying and testing purpose.
Thanks!

Pinging...
@MaskRay could you check whether your comment is properly addressed or not?
@dblaikie could you check whether the current way to handle code size is proper or not and what is the way forward?
Thanks both of you for attention in this patch!

MaskRay added a comment.EditedNov 26 2018, 12:33 AM

Sorry for my slow response. Since this revision proposes to extend the currently supported debug info formats from 2 to 3 (hope I didn't get it wrong), this comes with code size increase and greater maintenance cost (people have to ensure the 3 debug formats continue working when they make changes to the common interface), so please be a bit patient for the slower review process:)
I find that there hasn't been actual discussion on the mailing list https://lists.llvm.org/pipermail/llvm-dev/2018-November/127953.html What about making a follow-up to remind people of that RFC? http://llvmweekly.org/ is also a great place to attract more attention.

Since you are proposing a format similar to CTF (Compact C Type Format), which is used by dtrace. Users on the platforms supporting dtrace may shed some light: Solaris, Mac OS X (since 10.5), FreeBSD (2008), NetBSD (2010), OpenBSD (ongoing since 2016). ctfconvert ctfmerge ctfdump are tools used widely by dtrace but it doesn't seem they need direct support from the compiler.

I'm still learning these stuff to understand better about the whole picture.

Note that dwarf sections are still generated as dwarf is used by user space tools like llvm-objdump etc. for BPF target.

If it is the case, it seems there is no size benefit of object files. I also wonder if https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html is related to this revision. That article compares the sizes of BTF and DWARF.
BTF type descriptors does not seem to improve much compared with DWARF, but the deduplicated result is impressive (137x reduction).
I am not an expert on DWARF but I have a feeling that we can probably get a similar reduction rate by leveraging a similar technique to deduplicate DWARF. (For the pass 2 mentioned in that article I think it is a form of DFA minimalization)

ast added a comment.Nov 26 2018, 8:50 AM

I find that there hasn't been actual discussion on the mailing list

at this point we're no longer looking for feedback on the format itself.
Since BTF already ships as part of linux kernel we cannot change the core format anymore.
Only future BTF extensions are up for discussion.
If you're interested in participating I suggest to subscribe to netdev mailing list.
That's where any BTF/BPF discussions were and will be happening.

BTF type descriptors does not seem to improve much compared with DWARF, but the deduplicated result is impressive (137x reduction).

BTF is tightly coupled with BPF ISA. It's not a traditional debug format like dwarf and not like Solari's CTF.
Linux kernel loads the BTF description and verifies it, since it's about to start using BTF for _safety_ decisions.
It's not debug-info only format.

The latest version of pahole already can convert dwarf to BTF.
But we need tight integration of BTF with llvm's BPF backend to be able to provide func_info and line_info directly into .BTF and .BTF.ext sections
(in this patch set) and for symbolic struct field references (future work).

We don't expect BTF to be used with anything but BPF ISA.

Thanks for the review! A few comments for some of your concerns.

I find that there hasn't been actual discussion on the mailing list https://lists.llvm.org/pipermail/llvm-dev/2018-November/127953.html What about making a follow-up to remind people of that RFC? http://llvmweekly.org/ is also a great place to attract more attention.

This is regarding to the format (using auto vs. using explicit type). I do not have a strong preference on this and I just follow some general guideline e.g., using it in iterators, using it if it would just repeat the same already-appeared type name, using a particular "auto" usage pattern just because somebody else already used it that way during my code study.... Let me know what I need to change in this regard.

I'm still learning these stuff to understand better about the whole picture.

The whole BPF and BTF discussion happens in netdev mailing list https://www.spinics.net/lists/netdev/. You are more than welcome to participate in the discussion. The commit message of this patch provided some limited information about what is BTF and why we want to implement in LLVM.

Note that dwarf sections are still generated as dwarf is used by user space tools like llvm-objdump etc. for BPF target.

The size benefit is not our goal. As stated in commit message, dwarf is used for user space application and BTF is used for kernel.
The size benefit you mentioned in https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html is part of work
for compile-once run-everywhere bpf program, presented at linux plumbers conference 2018.
(http://vger.kernel.org/lpc-bpf.html, see the slide under "compile-once run-everywhere BPF program).

@aprantl @echristo @dblaikie Since Adrian has reviewed the patch extensively and the only remaining issue is for possible refactoring and there is no other change request, I would like to merge the patch within a few days. Please do voice out if you have any further change requests. After the merge, I will file a RFE in bugzilla for possible refactoring work so we do not lose track of it. Thanks!

This revision was not accepted when it landed; it landed in state Needs Review.Nov 30 2018, 8:25 AM
Closed by commit rL347999: [BTF] Add BTF DebugInfo (authored by yhs). · Explain Why
This revision was automatically updated to reflect the committed changes.
yonghong-song reopened this revision.Nov 30 2018, 9:02 AM

The patch has been reverted.

The previous brief commit (reverted minutes later) shows some compilation failures with the patch.
http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/40411
The reason is I implemented in a simpler BTFDebug class in BTFDebugStub.cpp which causing mangled name slightly different between BTFDebugStub.cpp and AsmPrinter.cpp.
This new updated patch fixed the issue and now it builds properly on x86 only platform with the failed test case configuration.

In one of my earlier comments, I mentioned it is possible to push the implementation into BPF target directory with the need
moving three private headers to include/llvm/CodeGen directory.
I did not get opinion whether that implementation is preferred or not, but I feel that maybe it is a good idea just to go ahead
with a diff which may facilitate discussion and move review/discussion faster.
I created another diff, https://reviews.llvm.org/D55752, which did this implementation.

@echristo @aprantl @dblaikie This patch has stuck in this state for more than one month and did not make progress.
I have suggested a possible way to move some header files to include directory and then we will be
able to move BTFDebug.{cpp,h} under BPF directory. I have implemented this approach with the
following three patches for easy to see pro and cons of two approaches:

(1) https://reviews.llvm.org/D55755, moving header files
(2) https://reviews.llvm.org/D55756, changes some fields from private to protected
(3) https://reviews.llvm.org/D55752, changes only under target BPF

The new patch series tried to minimize the changes outside the BPF target.
I will really appreciate if you can timely review this patch and related other "competing" patches.
Thanks!

yonghong-song abandoned this revision.Dec 19 2018, 8:46 AM

The alternative method (put most of implementation under lib/Target/BPF with slight refactoring at the core) is used. Abandon this one.