This is an archive of the discontinued LLVM Phabricator instance.

[JITLoaderGDB] Set eTypeJIT for objects read from JIT descriptors
ClosedPublic

Authored by sgraenitz on May 6 2019, 2:05 PM.

Details

Summary

First part of a fix for JITed code debugging. This has been a regression from 5.0 to 6.0 and it's is still reproducible on current master: https://bugs.llvm.org/show_bug.cgi?id=36209

The address of the breakpoint site is corrupt: the 0x4 value we end up with, looks like an offset on a zero base address. When we parse the ELF section headers from the JIT descriptor, the load address for the text section we find in header.sh_addr is correct.

The bug manifests in VMAddressProvider::GetVMRange(const ELFSectionHeader &) (follow it from ObjectFileELF::CreateSections()). Here we think the object type was eTypeObjectFile and unleash some extra logic [1] which essentially overwrites the address with a zero value.

The object type is deduced from the ELF header's e_type in ObjectFileELF::CalculateType(). It never returns eTypeJIT, because the ELF header has no representation for it [2]. Instead the in-memory ELF object states ET_REL, which leads to eTypeObjectFile. This is what we get from lli at least since 3.x. (Might it be better to write ET_EXEC on the JIT side instead? In fact, relocations were already applied at this point, so "Relocatable" is not quite exact.)

So, this patch proposes to set eTypeJIT explicitly whenever we read from a JIT descriptor. In ObjectFileELF::CreateSections() we can then call GetType(), which returns the explicit value or otherwise falls back to CalculateType().

LLDB then sets the breakpoint successfully. Next step: debug info.

Process 1056 stopped
* thread #1, name = 'lli', stop reason = breakpoint 1.2
    frame #0: 0x00007ffff7ff7000 JIT(0x3ba2030)`jitbp()
JIT(0x3ba2030)`jitbp:
->  0x7ffff7ff7000 <+0>:  pushq  %rbp
    0x7ffff7ff7001 <+1>:  movq   %rsp, %rbp
    0x7ffff7ff7004 <+4>:  movabsq $0x7ffff7ff6000, %rdi     ; imm = 0x7FFFF7FF6000 
    0x7ffff7ff700e <+14>: movabsq $0x7ffff6697e80, %rcx     ; imm = 0x7FFFF6697E80

[1] It was first introduced with https://reviews.llvm.org/D38142#change-lF6csxV8HdlL, which has also been the original breaking change. The code has changed a lot since then.

[2] ELF object types: https://github.com/llvm/llvm-project/blob/2d2277f5/llvm/include/llvm/BinaryFormat/ELF.h#L110

Event Timeline

sgraenitz created this revision.May 6 2019, 2:05 PM
labath accepted this revision.May 7 2019, 7:33 AM

Sounds like a reasonable thing to do.

If you're going to be looking into jitted code more, I'd strongly encourage you to try to come up with a testing strategy here. Setting up a smoke test which jits a simple function using llvm jit, and tries to debug it shouldn't be fundamentally difficult (though it will likely be tedious to set up the relevant infrastructure), and it will increase our jit test coverage by a factor of infinity (from 0% to ??%).

Might it be better to write ET_EXEC on the JIT side instead?

I don't think that would be right, but ET_DYN might be, since the jitted object behaves in a lot of ways like a shared library that was dlopened at runtime. OTOH, you could probably find a lot of differences too (I bet lli does not bother with generating the program headers for the jitted object)

This revision is now accepted and ready to land.May 7 2019, 7:33 AM
sgraenitz updated this revision to Diff 198692.May 8 2019, 10:11 AM

Add lit test

sgraenitz marked 3 inline comments as done.May 8 2019, 10:34 AM

Thanks for your reply and thoughts about that.

I'd strongly encourage you to try to come up with a testing strategy here.

Yes, I just added a simple lit test. What do you think?
I ran it in isolation on macOS (UNSUPPORTED) and on Ubuntu 18.04 (PASS) using:

$ cd path/to/llvm-build
$ ninja FileCheck llvm-config lli clang lldb
$ python bin/llvm-lit -v /path/to/llvm-project/lldb/lit/Breakpoint/jitbp_elf.test
lldb/lit/Breakpoint/Inputs/jitbp.cpp
2

lli already has a main() and so we have jitbp() here to set the breakpoint on.

lldb/lit/Breakpoint/jitbp_elf.test
1

The test only works with ELF on Linux. Is the REQUIRES sufficient?

3

With these args, clang shouldn't optimize away jitbp()

labath added a comment.May 9 2019, 5:19 AM

Thanks for adding the test. It looks like it was much easier than I feared. I do wonder if the test couldn't be made to run on more platforms. It sounds like it is generic enough for that, and given that this is our first and only jit test, it would be great if that were the case.

lldb/lit/Breakpoint/jitbp_elf.test
1

Yes, but what is the reason for that? It looks like the test is generic enough that it should run on any system where lli is able to jit code. In particular I'd expect this to also work on macOS if you set plugin-jit-loader.gdb.enable to on.

3

Is that a question? If it is I'm pretty sure the answer is "it shouldn't", because a lot of our tests depend on -O0 not doing anything funny.

sgraenitz updated this revision to Diff 198823.May 9 2019, 7:32 AM
sgraenitz marked 2 inline comments as done.

Generalize lit test

sgraenitz marked 2 inline comments as done.May 9 2019, 7:42 AM

@stella.stamenova Can you have a look at the lit test please? It works on macOS and Linux, but I didn't test Windows. Should I add something like # REQUIRES: nowindows or is it fine like this?

lldb/lit/Breakpoint/jitbp_elf.test
1

Indeed, since https://reviews.llvm.org/D57689 it does work on macOS with ELF.

3

Yes, I didn't know whether -g is enough or I better pass something like -O0 explicitly?

sgraenitz marked an inline comment as done.May 9 2019, 7:42 AM
clayborg accepted this revision.May 9 2019, 7:44 AM

lgtm.

labath accepted this revision.May 9 2019, 8:02 AM

@stella.stamenova Can you have a look at the lit test please? It works on macOS and Linux, but I didn't test Windows. Should I add something like # REQUIRES: nowindows or is it fine like this?

Maybe I'm pessimistic, but I'd guess that this will fail on windows because some of the features this needs is not implemented there (but I'm not sure which one). You can just add XFAIL: system-windows and then you'll get an email if the guess is wrong.

lldb/lit/Breakpoint/jitbp_elf.test
1

Cool. I think you can also drop the target-x86_64 part, as I don't see a reason why this shouldn't work on arm for instance (though we don't have any arm bots around to verify that). I am not sure why you're using the --target argument to clang -- I think you should be able to just drop that and make clang generate IR for the host. (the native feature ensures that "host" is the default target for clang).

3

I'm pretty sure -O0 will always be the default opt level for clang (and -g as a matter of principle has no impact on codegen decisions like inlining).

@stella.stamenova Can you have a look at the lit test please? It works on macOS and Linux, but I didn't test Windows. Should I add something like # REQUIRES: nowindows or is it fine like this?

Sorry for the drive-by... what is this REQUIRES: nowindows? I don't see where lit generates this property. I grepped all of llvm-project.git and I see it used in two tests, but not where it's produced. Which suggests that those tests don't actually run *anywhere*.
I think maybe you want UNSUPPORTED: system-windows instead?

sgraenitz marked 3 inline comments as done.May 9 2019, 8:16 AM
sgraenitz added inline comments.
lldb/lit/Breakpoint/jitbp_elf.test
1

Without --target we emit MachO, which is not implemented on the JIT side and LLDB doesn't even get a JIT descriptor:

$ bin/lldb
(lldb) target create bin/lli
Current executable set to 'bin/lli' (x86_64).
(lldb) b getObjectForDebug
Breakpoint 1: 3 locations.
(lldb) run -jit-kind=mcjit tools/lldb/lit/Breakpoint/Output/jitbp_elf.test.tmp.ll
Process 40151 launched: /path/to/llvm-build/bin/lli' (x86_64)
Process 40151 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.3
    frame #0: 0x0000000100f3ddb3 lli`(anonymous namespace)::LoadedMachOObjectInfo::getObjectForDebug(this=0x0000000105c03540, Obj=0x0000000105c03880) const at RuntimeDyldMachO.cpp:38:12
   35  	
   36  	  OwningBinary<ObjectFile>
   37  	  getObjectForDebug(const ObjectFile &Obj) const override {
-> 38  	    return OwningBinary<ObjectFile>();
   39  	  }
   40  	};
   41
sgraenitz updated this revision to Diff 198834.May 9 2019, 8:17 AM
sgraenitz marked an inline comment as done.
  1. XFAIL: system-windows

Sorry for the drive-by... what is this REQUIRES: nowindows? I don't see where lit generates this property. I grepped all of llvm-project.git and I see it used in two tests, but not where it's produced. Which suggests that those tests don't actually run *anywhere*.

Thanks for the heads-up. I also just grepped. The history says that Stella will know more about it. I went with XFAIL for now.

labath added inline comments.May 9 2019, 8:27 AM
lldb/lit/Breakpoint/jitbp_elf.test
1

Ah, interesting. That's unfortunate, cause that makes running the test on other architectures trickier (but I guess it doesn't matter that much as we'll always have x86 bots around to check that.

sgraenitz updated this revision to Diff 198843.May 9 2019, 8:45 AM

Add back test requirement target-x86_64

stella.stamenova added a comment.EditedMay 9 2019, 9:07 AM

Sorry for the drive-by... what is this REQUIRES: nowindows? I don't see where lit generates this property. I grepped all of llvm-project.git and I see it used in two tests, but not where it's produced. Which suggests that those tests don't actually run *anywhere*.

Thanks for the heads-up. I also just grepped. The history says that Stella will know more about it. I went with XFAIL for now.

The syntax is supposed to be different: XFAIL means that the test is expected to fail no that platform, but it will still run. This makes sense for tests that should eventually pass on that platform, but don't yet. REQUIRES: noX means that the test won't run unless the requirement is satisfied. In the case of REQUIRES: nowindows this means that the test does not even support windows and it should never pass there. There are currently a couple of tests in LLVM and LLDB marked with REQUIRES: nowindows because they are not expected to run there at all.

The "feature" nowindows is supposed to be added to the available features in lit/llvm/config.py, but it looks like it's not at the moment, so I am wondering if the two tests that are marked as nowindows run on any platform right now. I can update config.py to create the right features, but we can also use UNSUPPORTED: system-windows, so I'll look into the best way to do this.

For your tests, I think they are expected to *eventually* pass on Windows even if they don't today, correct? If that's the case, then you should use XFAIL. Otherwise, you should use REQUIRES: nosystem-windows.

I can apply your patch locally tomorrowish and run the tests to see how they behave on Windows.

I can update config.py to create the right features, but we can also use UNSUPPORTED: system-windows, so I'll look into the best way to do this.

Perfect, thanks

For your tests, I think they are expected to *eventually* pass on Windows even if they don't today, correct? If that's the case, then you should use XFAIL. Otherwise, you should use REQUIRES: nosystem-windows.

Ok, should be fine to keep XFAIL then.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2019, 9:38 AM

@stella.stamenova I'm not familiar with any lit feature that gives a special meaning to the prefix "no". The opposite of "REQUIRES: windows" is not "REQUIRES: nowindows" but "UNSUPPORTED: windows" AFAIK.
This part of the discussion should probably be taken to llvm-dev, though.

FTR I don't see that lldb's lit.cfg.py sets any features based on host OS, so even "UNSUPPORTED: windows" probably does not work currently.

labath added a comment.May 9 2019, 9:51 AM

@stella.stamenova I'm not familiar with any lit feature that gives a special meaning to the prefix "no". The opposite of "REQUIRES: windows" is not "REQUIRES: nowindows" but "UNSUPPORTED: windows" AFAIK.
This part of the discussion should probably be taken to llvm-dev, though.

There's a binary_feature function in utils/lit/lit/llvm/config.py, but it's only used in a handful of cases. And yeah, I would generally prefer UNSUPPORTED: XYZ over REQUIRES: noXYZ.

@stella.stamenova I'm not familiar with any lit feature that gives a special meaning to the prefix "no". The opposite of "REQUIRES: windows" is not "REQUIRES: nowindows" but "UNSUPPORTED: windows" AFAIK.
This part of the discussion should probably be taken to llvm-dev, though.

FTR I don't see that lldb's lit.cfg.py sets any features based on host OS, so even "UNSUPPORTED: windows" probably does not work currently.

@probinson LLDB's lit configuration derives from the lit configuration in LLVM. The most important file you are looking for is:

llvm\utils\lit\lit\llvm\config.py

You can see there that we add various platform to the list of available features including system-windows, so XFAIL: system-windows, UNSUPPORTED: system-windows, etc. all work.

This is also where binary_features are set, such as 'zlib/nozlib', 'asan/not_asan' etc. The prefix varies for historical reasons, but the paradigm has existed for a long time. If we wanted to support nosystem-windows, we would add it here.

@stella.stamenova I'm not familiar with any lit feature that gives a special meaning to the prefix "no". The opposite of "REQUIRES: windows" is not "REQUIRES: nowindows" but "UNSUPPORTED: windows" AFAIK.
This part of the discussion should probably be taken to llvm-dev, though.

FTR I don't see that lldb's lit.cfg.py sets any features based on host OS, so even "UNSUPPORTED: windows" probably does not work currently.

@probinson LLDB's lit configuration derives from the lit configuration in LLVM. The most important file you are looking for is:

llvm\utils\lit\lit\llvm\config.py

You can see there that we add various platform to the list of available features including system-windows, so XFAIL: system-windows, UNSUPPORTED: system-windows, etc. all work.

This is also where binary_features are set, such as 'zlib/nozlib', 'asan/not_asan' etc. The prefix varies for historical reasons, but the paradigm has existed for a long time. If we wanted to support nosystem-windows, we would add it here.

Thanks @stella.stamenova I did find that. But I agree with Pavel, we are better off not having "nofoo" or "not_foo" features, because we can get the same effect using UNSUPPORTED: and it can mislead people into thinking the no/not_ prefix applies generally, and things like "REQUIRES: nowindows" will unexpectedly disable a test everywhere.

I am going to propose eliminating the negative keywords on llvm-dev.

It looks like this test is flaky:
http://lab.llvm.org:8014/builders/lldb-x86_64-debian/builds/1339/steps/test/logs/stdio
http://lab.llvm.org:8014/builders/lldb-x86_64-debian/builds/1294/steps/test/logs/stdio

The failure messages I've seen are all the same and seem to indicate that the "breakpoint locations added" and "process stopped" messages come out of order. This seems plausible since the two messages are reported through different channels, and I was able to reproduce this failure locally by adding a sleep(1) to Debugger::HandleBreakpointEvent.

Given that the behavior of asynchronous breakpoint location messages is not the puprose of this test, what would you say to just deleting the "# CHECK: 1 location added to breakpoint 1" line from the test?

It looks like this test is flaky [...] what would you say to just deleting the "# CHECK: 1 location added to breakpoint 1" line from the test?

Thanks for reporting and explaining. Sounds like a reasonable change. Pushed with rL360571.