This is an archive of the discontinued LLVM Phabricator instance.

Generate DIFile with main program if source is not available
AbandonedPublic

Authored by yonghong-song on Oct 16 2018, 9:56 AM.

Details

Summary

A llvm segfault happens when I tried to add "-g -gdwarf-5 -gembed-source"
in bcc in order to get soruce code into IR and dwarf.

A little bit background, bcc (https://github.com/iovisor/bcc)
utilizes llvm MCJIT to generate BPF insns and load them
to the kernel for execution. Several files passed to clang
are memory mapped including the main program /virtual/main.c.

I did some analysis on why segfault happens and whether we did
get all file sources. And I found that we did not get source
for one instance:

!32 = !DIFile(filename: "/virtual/main.c", directory: "/usr/src/kernels/4.11.3-70_fbk18_4116_g1cf3f1a0ca4f")

After some debugging, I found this patch can fix the problem.
Basically, when the source is not available to generate DIFile
use the main program.

You can get more details of the bug, IR, cc1 flags, segfault
stack and how to reproduce at this commit:

https://github.com/yonghong-song/bcc/commit/7ac342e05468e60138d61e0e41691ed2f98bd929

  Signed-off-by: Yonghong Song <yhs@fb.com>

Diff Detail

Repository
rC Clang

Event Timeline

yonghong-song created this revision.Oct 16 2018, 9:56 AM

Hi, @vsk @HsiangKai @ABataev I found a bug in clang when trying to use "-g -gdwarf-5 -gembed-source" for
bcc MCJIT based clang/llvm compilation. This patch fixed the issue but I am not sure whether this is the
correct fix or not. Please help take a look and advise if the fix is not correct and there is a better one. Thanks!

Couldn't you just pass -main-file-name to cc1 instead?

We already have -main-file-name in cc1.
clang -cc1 -triple x86_64-unknown-linux-gnu ... -main-file-name main.c ... -o main.bc -x c /virtual/main.c -faddrsig
Is this expected?

BTW, we just pass the normal C flags to the driver like

vector<const char *> flags_cstr({"-O0", "-O2", "-emit-llvm", "-I", dstack.cwd(),
                                 "-D", "__BPF_TRACING__",
                                 "-Wno-deprecated-declarations",
                                 "-Wno-gnu-variable-sized-type-not-at-end",
                                 "-Wno-pragma-once-outside-header",
                                 "-Wno-address-of-packed-member",
                                 "-Wno-unknown-warning-option",
                                 "-fno-color-diagnostics",
                                 "-fno-unwind-tables",
                                 "-fno-asynchronous-unwind-tables",
                                 "-x", "c", "-c", abs_file.c_str()});
flags_cstr.push_back("-g");
flags_cstr.push_back("-gdwarf-5");
flags_cstr.push_back("-gembed-source");
// A few other flags

We call drv.BuildCompilation to get the cc1 flags.

In the above /virtual/main.c is a memory mapped file. The invocation0.getPreprocessorOpts().addRemappedFile(...) adds the mapping
for the compilation.

@aprantl - yeah, not sure I have any big feels about this (nor do I fully understand it)

@yonghong-song - could you explain this maybe in a bit more detail. What behavior does this fix provide? (compared to behavior of existing working cases that don't hit this bug, for example)

Sure. Let me provide a little bit more context and what I want to achieve:

. I have a tool, called bcc (https://github.com/iovisor/bcc) which uses clang CompilerInvocation interface and
  MCJIT to generates BPF code and load into kernel
. Several files (the main.c and a few others headers) are clang memory mapped.
. The particular fix here is related to https://reviews.llvm.org/D53261, getting source code into BTF, but
  before that, based on that particular implementation, the source code needs to be in IR.
  What I found is that for the memory mapped /virtual/main.c file, there is one DIFile entry in
  generated IR, the associated 'source' is empty, which actually caused a seg fault later.

  Not that this bug itself does not need https://reviews.llvm.org/D53261.

So without this fix, bcc tool will seg fault.
With this fix, bcc tool works properly and all DIFile entry has proper source codes embedded, which
if coupled with IR->BTF or Dwarf->BTF should generate correct BTF debug info.

Sure. Let me provide a little bit more context and what I want to achieve:

. I have a tool, called bcc (https://github.com/iovisor/bcc) which uses clang CompilerInvocation interface and
  MCJIT to generates BPF code and load into kernel
. Several files (the main.c and a few others headers) are clang memory mapped.
. The particular fix here is related to https://reviews.llvm.org/D53261, getting source code into BTF, but
  before that, based on that particular implementation, the source code needs to be in IR.
  What I found is that for the memory mapped /virtual/main.c file, there is one DIFile entry in
  generated IR, the associated 'source' is empty, which actually caused a seg fault later.

  Not that this bug itself does not need https://reviews.llvm.org/D53261.

So without this fix, bcc tool will seg fault.
With this fix, bcc tool works properly and all DIFile entry has proper source codes embedded, which
if coupled with IR->BTF or Dwarf->BTF should generate correct BTF debug info.

Thanks for the explanation/context.

Any chance of a test case? I'm not sure how the current virtual filesystem support is tested & whether it could be extended/used to cover this issue.

@dblaikie This is not blocking me yet since the main BTF support is not in llvm yet. I will be in a conference next week. I will dig into this issue the week after, to have a smaller test case and try to root cause a little further. Will update here as soon as I got anything. Thanks!

@dblaikie I got a standalone reproducible test now.

-bash-4.4$ pwd
/home/yhs/work/bcc/clang-crash
-bash-4.4$ ls
clang-test.cpp  compile.sh  README  ttest.c  ttest.h
-bash-4.4$

README has details how to reproduce the issue. Basically, step 1, build a llvm with x86/bpf support; step 2, run "compile.sh"; step 3, run the binary "clang-test" produced in step 2 and you will get IR with missing file content and segfault.

I did not investigate further, but my above patch does not really fix the issue. So the right fix should be a different one. There could be multiple issues with clang remapped files.

It will be great if you or others can help resolve these issues!

  2. $ ./compile.sh
  3. $ ./clang-test

The test will print IR before JITing for BPF backend.
The IR has something like:
   ...
!8 = !DIFile(filename: "/virtual/ttest.h", directory: "/tmp", checksumkind: CSK_MD5, checksum: "6a615c0cc48c46cdbe7d9917c635d292", source: "\0A#define BPF_PERF_OUTPUT2(_name) \5C\0Astruct _name##_table_t { \5C\0A  int key; \5C\0A  unsigned leaf; \5C\0A  int (*perf_submit) (void *, void *, unsigned); \5C\0A  int (*perf_submit_skb) (void *, unsigned, void *, unsigned); \5C\0A  unsigned max_entries; \5C\0A}; \5C\0A__attribute__((section(\22maps/perf_output\22))) \5C\0Astruct _name##_table_t _name = { .max_entries = 0 }\0A\0Aint g;\0A")
!9 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!10 = !DIFile(filename: "/virtual/ttest.c", directory: "/tmp")
!11 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "probe_table_t", file: !10, line: 3, size: 256, elements: !12)
   ...
Segmentation fault (core dumped)

Note that the "!10" DIFile does not have source code.

yonghong-song added a comment.EditedNov 14 2018, 8:28 AM

The above segment fault looks like

#0  0x00000000000000a1 in ?? ()
#1  0x0000000001490d63 in clang::SourceManager::~SourceManager() ()
#2  0x00000000005ad0ee in clang::CompilerInstance::~CompilerInstance() ()
#3  0x000000000056ad1b in main ()

In my example, I did not implement the customized SourceManager while the real bcc implements it
https://github.com/iovisor/bcc/blob/master/src/cc/bpf_module.cc#L73

I will try to implement this later. With the attached tar file, you should be still observe the missing
file contents. And my patch does fix this particular issue.

If I am using regular files instead of remapped files, I did not see the segfault. So is it possible that this is a clang issue?

Thanks for this - though it looks like the test program hits an assertion failure (for me at least - before it gets to the interesting point.

clang-test: /usr/local/google/home/blaikie/dev/llvm/src/lib/ExecutionEngine/MCJIT/MCJIT.cpp:204: virtual void llvm::MCJIT::generateCodeForModule(llvm::Module *): Assertion `M->getDataLayout() == getDataLayout() && "DataLayout Mismatch"' failed.

Also, while this is useful for me to understand what's going on (thanks!) & it looks like might lead us to a better understanding of where the bug(s) are - this change would also need a standalone test case in clang without the need to build a new/separate binary. But I imagine this could be tested in Clang by running clang on this source file & checking the IR that clang generates? Or does clang not generate the interesting IR itself, only when invoked through some other codepath you've demonstrated in the clang-test.cpp does this come up?

You compiled with the assertion on, which triggered the issue. I should have thought about this. Sorry.
For error,

clang-test: /usr/local/google/home/blaikie/dev/llvm/src/lib/ExecutionEngine/MCJIT/MCJIT.cpp:204: virtual void llvm::MCJIT::generateCodeForModule(llvm::Module *): Assertion `M->getDataLayout() == getDataLayout() && "DataLayout M ismatch"' failed.

It is because the clang compiler flag is compiled with native arch. which is x86 and later on it is changed to generatee BPF code. Could you try to temporarily build without assertion on? I will try to modify the test (or you could do that too) to generate native code to see whether the bug is still there or not.

Just attached a new test case (clang-crash-x86.tar) which works even if the compiler is compiled with assertion on. JIT now generates x86-64 code instead of BPF code.

The error message looks like:

Failure value returned from cantFail wrapped call
UNREACHABLE executed at /home/yhs/work/llvm/include/llvm/Support/Error.h:732!
Aborted (core dumped)

The core stack looks like

Core was generated by `./clang-test'. 
Program terminated with signal SIGABRT, Aborted. 
#0  0x00007f47542e0277 in raise () from /lib64/libc.so.6 
(gdb) where
#0  0x00007f47542e0277 in raise () from /lib64/libc.so.6
#1  0x00007f47542e1968 in abort () from /lib64/libc.so.6
#2  0x00000000045f66b6 in llvm::llvm_unreachable_internal (msg=0x5df9ba0 "Failure value returned from cantFail wrapped call",  
    file=0x5df9bd8 "/home/yhs/work/llvm/include/llvm/Support/Error.h", line=732)
    at /home/yhs/work/llvm/lib/Support/ErrorHandling.cpp:222
#3  0x00000000032482f1 in llvm::cantFail<unsigned int> (ValOrErr=..., 
    Msg=0x5df9ba0 "Failure value returned from cantFail wrapped call") at /home/yhs/work/llvm/include/llvm/Support/Error.h:732
#4  0x0000000003247183 in llvm::MCStreamer::EmitDwarfFileDirective (this=0x8625e40, FileNo=0, Directory=..., Filename=..., 
    Checksum=0x8640868, Source=..., CUID=0) at /home/yhs/work/llvm/include/llvm/MC/MCStreamer.h:786 
#5  0x000000000324184b in llvm::DwarfCompileUnit::getOrCreateSourceID (this=0x8630ed0, File=0x861ae80)
    at /home/yhs/work/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:110
#6  0x00000000032a1903 in llvm::DwarfUnit::addSourceLine (this=0x8630ed0, Die=..., Line=13, File=0x861ae80)
    at /home/yhs/work/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp:395
#7  0x00000000032a1a95 in llvm::DwarfUnit::addSourceLine (this=0x8630ed0, Die=..., G=0x861af10)
    at /home/yhs/work/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp:410 
#8  0x0000000003241b5c in llvm::DwarfCompileUnit::getOrCreateGlobalVariableDIE (this=0x8630ed0, GV=0x861af10, GlobalExprs=...)
    at /home/yhs/work/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:153
#9  0x000000000325b5f9 in llvm::DwarfDebug::beginModule (this=0x862ec00)
    at /home/yhs/work/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:782
#10 0x000000000320636d in llvm::AsmPrinter::doInitialization (this=0x8626820, M=...) 
    at /home/yhs/work/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:309
#11 0x00000000024eeea4 in llvm::X86AsmPrinter::doInitialization (this=0x8626820, M=...)
    at /home/yhs/work/llvm/lib/Target/X86/X86AsmPrinter.h:137
#12 0x00000000044aa4ee in llvm::FPPassManager::doInitialization (this=0x8633e10, M=...)
    at /home/yhs/work/llvm/lib/IR/LegacyPassManager.cpp:1688
#13 0x00000000044aa6a2 in (anonymous namespace)::MPPassManager::runOnModule (this=0x868a380, M=...) 
    at /home/yhs/work/llvm/lib/IR/LegacyPassManager.cpp:1720
#14 0x00000000044ab027 in llvm::legacy::PassManagerImpl::run (this=0x860e710, M=...)
    at /home/yhs/work/llvm/lib/IR/LegacyPassManager.cpp:1857
#15 0x00000000044ab2a7 in llvm::legacy::PassManager::run (this=0x7fff67f37fe0, M=...)
    at /home/yhs/work/llvm/lib/IR/LegacyPassManager.cpp:1888
#16 0x0000000002d4a39b in llvm::MCJIT::emitObject (this=0x862ab00, M=0x85a7bd0)
    at /home/yhs/work/llvm/lib/ExecutionEngine/MCJIT/MCJIT.cpp:169
#17 0x0000000002d4a669 in llvm::MCJIT::generateCodeForModule (this=0x862ab00, M=0x85a7bd0)
    at /home/yhs/work/llvm/lib/ExecutionEngine/MCJIT/MCJIT.cpp:208
#18 0x0000000002d4ab68 in llvm::MCJIT::finalizeObject (this=0x862ab00)
    at /home/yhs/work/llvm/lib/ExecutionEngine/MCJIT/MCJIT.cpp:263
#19 0x000000000040ac10 in main ()

This coredump stack matches one of stacks I hit earlier with using bcc infrastructure to reproduce the issue.
Thanks!

Thanks!

So I can see where/how this fails now - the LLVM backend seems to require that if any file has embedded source, they all do. Would you be able to/would you mind adding a debug info verifier check for this? That'd help make this fail sooner. (@scott.linder - perhaps you'd be able to/interested in doing this - looks like you wrote the initial implementation?)

Beyond that, then the question is whether this fix is the right way to satisfy that requirement (I'm assuming the requirement is reasonable - though I've not looked at the embedded source support & have no idea if there's a way/it's reasonable to support some embedded source and some not). My big question & I've tried to do some debugging to better understand this - but don't have the time to dive much further into it (& haven't really figured it out as yet): Why is this source not accessible through this API at this point? is the fallback to the main file always accurate?

(as far as I got was figuring out that SourceManager::getBufferData had an SLoc with isFile == true for the header, but isFile was false for the SLocs related to the .cpp file. Can't say I know why that is/what that means & would like to understand that (or someone else more familiar with this stuff who already knows can review this code) before approving this patch - if you could help explain this, that'd be great)

Thanks!

So I can see where/how this fails now - the LLVM backend seems to require that if any file has embedded source, they all do. Would you be able to/would you mind adding a debug info verifier check for this? That'd help make this fail sooner. (@scott.linder - perhaps you'd be able to/interested in doing this - looks like you wrote the initial implementation?)

Beyond that, then the question is whether this fix is the right way to satisfy that requirement (I'm assuming the requirement is reasonable - though I've not looked at the embedded source support & have no idea if there's a way/it's reasonable to support some embedded source and some not). My big question & I've tried to do some debugging to better understand this - but don't have the time to dive much further into it (& haven't really figured it out as yet): Why is this source not accessible through this API at this point? is the fallback to the main file always accurate?

(as far as I got was figuring out that SourceManager::getBufferData had an SLoc with isFile == true for the header, but isFile was false for the SLocs related to the .cpp file. Can't say I know why that is/what that means & would like to understand that (or someone else more familiar with this stuff who already knows can review this code) before approving this patch - if you could help explain this, that'd be great)

I can certainly update the verifier; my initial implementation should have included tests for this case so I would have noticed how unpleasantly it fails. I might be a bit slow to get a patch up, as I'm on vacation all next week.

The decision to require "all or nothing" for embedded source came after quite a bit of discussion, and the original proposal does include a boolean flag (see http://dwarfstd.org/ShowIssue.php?issue=180201.1). LLVM is still the only implementation I know of, and it is not in any actual DWARF standard yet, so if there are strong use-cases for allowing some sources be absent this could still be changed.

Thanks @scott.linder, actually "all or nothing" is what I wanted here. We really want to get the source for these remapped files.

The Verifier patch has landed as https://reviews.llvm.org/rL348022

I do not have any more knowledge than the reviewers of the right way to handle providing source for remapped files, but at least compilation will warn and strip debug-info rather than segfault when source is provided inconsistently.

@scott.linder Actually clang can get source for remapped files. I just uploaded another test tarball which may help you debug the issue.
The test script is changed as well since recent llvm/clang requiring linking with -ltinfo.

Just give you a summary here.

-bash-4.2$ cat ttest.h

#define BPF_PERF_OUTPUT2(_name) \
struct _name##_table_t { \
  int key; \
  unsigned leaf; \
  int (*perf_submit) (void *, void *, unsigned); \
  int (*perf_submit_skb) (void *, unsigned, void *, unsigned); \
  unsigned max_entries; \
}; \
__attribute__((section("maps/perf_output"))) \
struct _name##_table_t _name = { .max_entries = 0 }

int g;
-bash-4.2$ cat ttest1.c
#include "ttest.h"

int main() { return g; }
-bash-4.2$ cat ttest2.c
#include "ttest.h"

BPF_PERF_OUTPUT2(probe);

int main() { return g; }
-bash-4.2$

The header file ttest.h is remapped.
For remapped file ttest1.c, the clang can get the source properly.
For remapped file ttest2.c, the clang cannot get the source.
The difference between two files are macro usage.

-bash-4.2$ diff ttest1.c ttest2.c
2a3,4
> BPF_PERF_OUTPUT2(probe);
> 
-bash-4.2$

So macro existence may somehow prevent availability of remapped file source.

yonghong-song added a comment.EditedDec 27 2018, 11:42 PM

I found an easy way to reproduce the issue and also did some preliminary analysis. I have filed a bug https://bugs.llvm.org/show_bug.cgi?id=40170. The reproducible case and my analysis is below:

-bash-4.4$ cat ttest2.c
#include "ttest.h"

BPF_PERF_OUTPUT2(probe);

int main() { return g; }
-bash-4.4$ cat ttest.h

#define BPF_PERF_OUTPUT2(_name) \
struct _name##_table_t { \
  int key; \
  unsigned leaf; \
  int (*perf_submit) (void *, void *, unsigned); \
  int (*perf_submit_skb) (void *, unsigned, void *, unsigned); \
  unsigned max_entries; \
}; \
__attribute__((section("maps/perf_output"))) \
struct _name##_table_t _name = { .max_entries = 0 }

int g;
-bash-4.4$ clang -g -O2 -gdwarf-5 -gembed-source -c ttest2.c
inconsistent use of embedded source
fatal error: error in backend: Broken module found, compilation aborted!
clang-8: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 8.0.0 (trunk 350092) (llvm/trunk 350084)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/yhs/work/llvm/build/install/bin
clang-8: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
clang-8: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-8: note: diagnostic msg: /tmp/ttest2-6c497a.c
clang-8: note: diagnostic msg: /tmp/ttest2-6c497a.sh
clang-8: note: diagnostic msg: 

********************
-bash-4.4$

Basically, a simple C file and header file. The C file contains a macro expansion. Compiling with -g -gdwarf-5 -gembed-source will cause the
compiler complains:

fatal error: error in backend: Broken module found, compilation aborted!

When I tried to root cause the bug described in https://reviews.llvm.org/D53329, I found a simple test case like the above will break clang.

I did some analysis. The following are rough reason.

  1. Looks like for macro in the source code, the clang actually will create a "FileID" for it. The FileID is not associated with a file. It is actually associated with a source location range.
  2. In my particular example, the macro defines a global variable and the global variable needs a file.
  3. When clang tries to find the FileID based on source location of the global variable, it found a source expansion FileID which does not have source associated with it.
  4. So clang simply does not have a "source" attribute for the DIFile corresponding to the global as in (3) it does not have such information.
  5. Later on, IR verification complains embed-source is enabled but all not DIFile has sources, so aborted.

My above workaround is obviously incorrect. But I am not sure what is the correct way to fix this issue. Maybe one of you can help. Thanks!

@dblaikie @scott.linder Have you got time to look at this issue? Looks like a trivial test case will be able to reproduce the problem. I have the simple example in the previous comments. Practically, -gdwarf-5 -gembed-source is broken now.

@dblaikie @scott.linder Have you got time to look at this issue? Looks like a trivial test case will be able to reproduce the problem. I have the simple example in the previous comments. Practically, -gdwarf-5 -gembed-source is broken now.

Sorry, I haven't had a chance to look into the issue yet. I don't know if I will have a chance this week, and I'm on vacation next week, but I hope to be able to soon.

@scott.linder Thanks! It would be good if you could give at least an initial, even rough, analysis.

Thanks for filing the bug/reducing a test case - this change should include an automated test case that captures this issue (check for other tests that pass -gembed-source to see how this might be tested, possibly expanding the existing test case case or introducing a new one (test/CodeGen/debug-info-embed-source.c looks like the place to start)

I reduced your test case down a bit further:

test.h:

int g;

test.c:

#include "test.h"
#define MACRO int x;
MACRO

Though the #include is needed to reproduce the /crash/, but even without the #include you can still reproduce the underlying bug - a DIFile witohut source and a DIFile with source in the same IR file (one compiled with -gembed-source).

Actually - is that a required invariant, that all DIFiles have source if any of them do? That could be a problem for LTO situations that might merge modules (one of which may have embedded source and another that doesn't)

Is this fix correct even when the problem occurs in a header file, for instance?

Yeah, seems this problem doesn't occur if you move everything into the header - I think it'd be worthwhile to understand/explain why that works out (despite the file ID for the macro would be associated with the source range in both macro-used-in-header and macro-used-in-primary-source-file cases, so why do they need different handling later? Maybe there's a better way to handle this more consistently?)

@dblaikie As I mentioned in one of my earlier comments, after some analysis, I think this patch definitely not the right way to fix the problem. We just cannot default to the main file if the DIFile source is not available.

I have a bug filed for the same issue (https://bugs.llvm.org/show_bug.cgi?id=40170) Maybe I should close this patch and put the information on the bug report. Do you agree?

@dblaikie As I mentioned in one of my earlier comments, after some analysis, I think this patch definitely not the right way to fix the problem. We just cannot default to the main file if the DIFile source is not available.

I have a bug filed for the same issue (https://bugs.llvm.org/show_bug.cgi?id=40170) Maybe I should close this patch and put the information on the bug report. Do you agree?

Ah, I understand now - thanks for explaining. Yeah, if you feel like this patch isn't the right direction, please mark it "abandoned" & if you're not sure how to proceed, probably worth having some discussion on the bug or in an llvm-dev thread. (or if you're not sure you have the bandwidth to continue - yeah, documenting in the bug anything we've learned so you/someone else can pick it up another time would be great)

Thanks!

yonghong-song abandoned this revision.Jan 30 2019, 9:54 AM

Abandon this patch as the proposed fix is not correct. I have put relative information in the bug https://bugs.llvm.org/show_bug.cgi?id=40170.