This is an archive of the discontinued LLVM Phabricator instance.

Add a new tool named "lldb-vscode" that implements the Visual Studio Code Debug Adaptor Protocol
ClosedPublic

Authored by clayborg on Aug 6 2018, 3:24 PM.

Details

Summary

This patch adds a new lldb-vscode tool that speaks the Microsoft Visual Studio Code debug adaptor protocol. It has full unit tests that test all packets.

This tool can be easily packaged up into a native extension and used with Visual Studio Code, and it can also be used by Nuclide.

Not sure who to add as a reviewer, so I started with a few people. Feel free to add more as needed.

Diff Detail

Event Timeline

clayborg created this revision.Aug 6 2018, 3:24 PM
xiaobai added a subscriber: xiaobai.Aug 6 2018, 3:43 PM

Can't speak much to the contents yet (Haven't done a thorough pass yet) but it looks like there's a lot of dead code you might want to remove. I commented on a few of them but there is probably more.

packages/Python/lldbsuite/test/tools/lldb-vscode/variables/TestVSCode_variables.py
16

Should remove this

89

Commented out line

packages/Python/lldbsuite/test/tools/lldb-vscode/vscode.py
83–87

Dead code

882

Dead code

902–924

Dead code

clayborg updated this revision to Diff 159526.Aug 7 2018, 9:24 AM

Removed dead code from python files.

Really cool! Are you planning to add some documentation for it? (set up
instructions, etc...)

Really cool! Are you planning to add some documentation for it? (set up instructions, etc...)

Yes. I will add a README.txt for this patch and also a python script that will create an VSCode extension.

labath edited reviewers, added: zturner, davide; removed: labath.Aug 8 2018, 2:42 AM

I am not sure I'll have the resources to see this review through, so I'd prefer to leave this to someone else.

The thoughts I have had so far are:

  • the patch is very big and probably runs afoul of the "you shall develop incrementally" section in the LLVM developer policy. At least the JSON parts should be split off into a separate patch and tested independently.
  • however, the choice of the JSON library is also an open question. We currently have at least three options to choose from:
    • llvm/Support/JSON.h
    • lldb/Utility/JSON.h
    • debugserver/source/JSON.h

      Of these, the third one is the one I'd least expect to be used here.
  • Since this is essentially starting a new-subproject, I think it's in place to discuss various conventions. E.g., right now, this seems to use a mixture of UpperCamel and snake_case, and so isn't very consistent with neither llvm nor lldb naming conventions.
packages/Python/lldbsuite/test/tools/lldb-vscode/attach/TestVSCode_attach.py
48

You might want to set NO_DEBUG_INFO_TESTCASE = True in the base VSCodeTestCaseBase to avoid these. I guess none of these tests should be debug-info dependent, right?

tools/lldb-vscode/lldb-vscode.cpp
1689

gcc-specific attribute

labath added a comment.Aug 8 2018, 8:15 AM

To end things on a positive note: I think the test coverage is pretty good (I'm sure somebody will suggest switching to lit), and I think a VS code plugin would be a great addition to lldb.

How does this differ from VS Code's builtin LLDB MI support?

To elaborate, if you install the C/C++ plugin, and you go to Debug ->
Configurations, and you go down to the MICmdMode property, it is by default
set to "gdb". But you can change this to "lldb" and it works out of the
box.

To elaborate, if you install the C/C++ plugin, and you go to Debug ->
Configurations, and you go down to the MICmdMode property, it is by default
set to "gdb". But you can change this to "lldb" and it works out of the
box.

It is a different protocol. The LLDB MI plugin didn't work very well as was quite flaky when I tested it a while back. Then I grabbed the CodeLLDB plugin by Vadim Chugunov and it worked very well. When I looked more closely at this plugin, it was using a javascript/typescript plug-in to launch a LLDB instance and then ran a python script that received all of the javascript packets from the javascript/typescript based plug-in and ran the debug session using the python interpreter. It worked very well and was rock solid stable. So I then created this plug-in for Nuclide for use at Facebook as they switched all of the debugging plug-in over to use the VSCode debug adaptor protocols. It works event better than the CodeLLDB plugin with Visual Studio Code and it also doesn't stop you from using the python interpreter. The issue I had with the CodeLLDB is that is uses the python interpreter to run the debug session thus it isn't available to you as a LLDB user.

So long story short: our IDE at Facebook uses the VSCode protocol, MI is clunky and doesn't work that well and was flaky, so this tool was created. This also provides a really nice way to do remote debugging where the lldb-vscode is run remotely on other systems. This removes the need for copying files from a remote host up onto the system that is doing the debugging. So we use this at Facebook and it also provides the best way to use LLDB to debug using Visual Studio Code or Nuclide.

This looks really cool :-)

I started doing a superficial pass but I have a hard time following what everything is doing. I think it could really help if you added more structure/abstraction. Can you also run clang-format over the new files?

tools/lldb-vscode/lldb-vscode.cpp
66

Can we replace these with static consexprs?

94

I think you could use llvm's to_integer from StringExtras.h for this.

118

Should this maybe become a member of (SB)Thread?

137

I very much prefer enum classes to prevent collisions. It's probably not that important here but it also doesn't harm to have the values qualified with the enum name. Same for the other enums in this file.

254

I'd move this into a separate header file with all the other breakpoint classes. I think this file is pretty big already and it would help readability/maintainability to split things up.

338

Some newlines to group related members and methods would really improve readability in this struct.

351

Can we remove this?

473

Building on my previous suggestion, I'd prefer separate files over these pragmas as my editor doesn't understand them.

513

I'd move this up and turn it into a doxygen comment.

925

Could be a single statement.

1371

Should this live in the json lib?

jkorous added a subscriber: jkorous.Aug 8 2018, 9:25 AM

Hi Greg, this looks interesting! I got curious about your patch since I am dealing with another protocol from the same "family" - LSP.

tools/lldb-vscode/lldb-vscode.cpp
1425

It looks to me that since you are just logging protocol level error here they will manifest as JSON parsing errors down the road (L 4113). Is that ok?

BTW there's another implementation of HTTP-like header parsing in clangd:
https://github.com/llvm-mirror/clang-tools-extra/blob/master/clangd/JSONRPCDispatcher.cpp#L238

4108

This looks like some possible forgotten debugging artefact.

I am not sure I'll have the resources to see this review through, so I'd prefer to leave this to someone else.

The thoughts I have had so far are:

  • the patch is very big and probably runs afoul of the "you shall develop incrementally" section in the LLVM developer policy. At least the JSON parts should be split off into a separate patch and tested independently.
  • however, the choice of the JSON library is also an open question. We currently have at least three options to choose from:
    • llvm/Support/JSON.h
    • lldb/Utility/JSON.h
    • debugserver/source/JSON.h

      Of these, the third one is the one I'd least expect to be used here.

Ok, I removed the debugserver reliance for the JSON parser and I have recoded it to use the LLVM JSON parser. Will post patch soon.

  • Since this is essentially starting a new-subproject, I think it's in place to discuss various conventions. E.g., right now, this seems to use a mixture of UpperCamel and snake_case, and so isn't very consistent with neither llvm nor lldb naming conventions.

I will fix these things before checkin and run clang-format

lemo added a subscriber: zturner.Aug 8 2018, 5:52 PM

The LLDB MI plugin didn't work very well as was quite flaky when I tested
it a while back.

Just curious, what was the flaky part, the debug adapter or the LLDB MI
interface?

Haven't had time to read through the main part of the patch, but I think the changes to debugserver's JSON parser would be good additions even if you've switched away from using it. The debugserver JSON parser is a copy of lldb's with all the StringRef etc llvm classes removed quickly, so it's always been a bit of a mess, I should have forked it from before any of the llvm stuff started being integrated.

clayborg updated this revision to Diff 160466.Aug 13 2018, 3:20 PM
  • Use the LLVM JSON parser
  • Split lldb-vscode.cpp into smaller files
  • Fix function names
  • ran clang format on everything
clayborg updated this revision to Diff 160494.Aug 13 2018, 5:02 PM

Sort the &#%&$# Xcode project changes...

clayborg updated this revision to Diff 160736.Aug 14 2018, 5:46 PM

Replace README.txt with README.md and add full content to it.

This should be in great shape now. Can anyone find time for this?

zturner added inline comments.Aug 15 2018, 9:34 AM
tools/lldb-vscode/BreakpointBase.cpp
14–22

Can we delete this function and use StringRef::getAsInteger() instead?

tools/lldb-vscode/BreakpointBase.h
19

This should have a virtual destructor.

25

Shouldn't this be an integer?

tools/lldb-vscode/ExceptionBreakpoint.cpp
15

Early return.

tools/lldb-vscode/ExceptionBreakpoint.h
18

Should this inherit from BreakpointBase?

24–25

Can you make this constructor take the first two arguments as std::string, then initialize them with filter(std::move(f)), and label(std::move(l))? The current code is guaranteed to make a copy, the suggested approach might not.

zturner added inline comments.Aug 15 2018, 9:34 AM
tools/lldb-vscode/ExceptionBreakpoint.cpp
23

Is it useful to have a notion of a breakpoint object that isn't actually set? Why not just delete this class instance entirely and let the caller allocate a new one when they're ready. If we want to keep this, then I think an llvm::Optional<SBBreakpoint> would be a better approach, because it makes explicit for someone reading the code that not existing is a valid state.

24

Early return.

tools/lldb-vscode/FunctionBreakpoint.h
18

Can we call this variable functionName and delete the comment?

tools/lldb-vscode/JSONUtils.h
28

I think this could return StringRef

45–46

I think this could return a StringRef. Also, it seems weird to have an overload which takes a pointer. Can we delete that overload? I don't think it would make sense for anyone to call this function with a null obj.

142–143

This could return a vector<StringRef>

226

StringRef event_name

262–263

StringRef name

tools/lldb-vscode/SourceBreakpoint.h
25

StringRef source_path

tools/lldb-vscode/SourceReference.h
20

llvm::DenseMap

tools/lldb-vscode/VSCode.cpp
23–25

I'd write this as:

inline bool is_empty_line(StringRef S) {
  return S.ltrim().empty();
}
130

You should check out llvm/Support/LineIterator.h

tools/lldb-vscode/VSCode.h
46

llvm::DenseMap

47

llvm::StringMap

61–62

Any reason why we have to use FILE* instead of fds?

73–74

Delete commented out code

75

llvm::Optional<llvm::raw_fd_ostream>?

76–77

llvm::DenseMap<>

78

llvm::StringMap

80–84

llvm::SmallVector

90

llvm::DenseSet<>

98

StringRef filter

105

StringRef json_str

115–116

StringRef output

131–132

StringRef prefix, ArrayRef<std::string> commands

tools/lldb-vscode/lldb-vscode.cpp
48–59

This block of code can be deleted. We have similar abstractions in either lldb or llvm.

62

llvm::StringMap

70

I don't believe these pragma statements will work on all compilers

72

Can you make all global functions static?

109

Can this take an ArrayRef<std::string> instead?

433

This should be an llvm::raw_string_ostream which builds up the command, then at the end call stream.str() to render the final string.

clayborg added inline comments.Aug 15 2018, 2:49 PM
tools/lldb-vscode/BreakpointBase.cpp
22

Sure thing

tools/lldb-vscode/ExceptionBreakpoint.cpp
23

Exceptions breakpoints get set once at the beginning of the session after the "initialized" event is sent back to the debugger. The ClearBreakpoint is just for cleanup after all is said and done.

tools/lldb-vscode/ExceptionBreakpoint.h
18

No. It doesn't actually share anything from BreakpointBase. You would think it would, but it doesn't.

25

sure

tools/lldb-vscode/JSONUtils.h
46

it keeps the code much cleaner. Some of the LLVM JSON calls return an "llvm::json::Object *" when fetching a value for a key which can be NULL. The primary reason for these functions it to keep the code clean and simple. I prefer to keep these overloads.

143

Need to keep as is because as noted in the description, numbers and booleans will be converted to strings. This is in case you specify arguments for your program as:

"args": [ 123, "hello", true ]
tools/lldb-vscode/SourceBreakpoint.h
25

I am gonna leave this one as is because we must pass a "const char *" the API that is called inside the body of this method:

lldb::SBBreakpoint lldb::SBTarget::BreakpointCreateByLocation(const char *file, uint32_t line);
tools/lldb-vscode/VSCode.h
46

Causes build errors when I tried switching.

47

Doesn't work with std::string and we need the std::string to back the string content.

84

These are usually empty. No need to use SmallVector and force storage when we don't need it.

98

This is iterating through an array of classes and check if the member variable, which is a std::string is equal. Leaving as std::string

105

Leaving as std::string as we need to guarantee null termination

tools/lldb-vscode/lldb-vscode.cpp
72

I will add an anonymous namespace around all local functions.

109

yes

zturner added inline comments.Aug 15 2018, 3:12 PM
tools/lldb-vscode/JSONUtils.h
143

That's unfortunate. All json values are backed by text in the underlying file, so in theory you could have StringRefs referrings to the part of the file that contains the string true. But it seems the JSON library doesn't preserve the full text of each value, so maybe this isn't possible.

tools/lldb-vscode/SourceBreakpoint.h
25

In that case you can use const llvm::Twine &. You can pass many types of objects to it, such as const char*, std::string, llvm::StringRef. Then, inside the function you can write:

llvm::SmallString<32> S;
StringRef NTS = source_path.toNullTerminatedStringRef(S);

If the original parameter was constructed with a const char* or std::string, then toNullTerminatedStringRef is smart enough to know that it doesn't need to make a copy, and it just returns the pointer. If it was constructed with a StringRef, then it null terminates it using S as the backing storage and returns that pointer. So it's the best of both worlds

tools/lldb-vscode/VSCode.h
46

What kind of build errors? DenseMap<uint32_t, SourceBreakpoint> shouldn't cause any. If SourceBreakpoint was the key it might, but not if it's the value.

47

llvm::StringMap copies the keys so it will always be backed.

61–62

I think maybe out would be better as an llvm::raw_fd_ostream, and in would be better as an llvm::MemoryBuffer which you could obtain via llvm::MemoryBuffer::getSTDIN(). These are always stdin and stdout and never seem to change, and being able to stream data to out and read from in as very convenient and cleans up a lot of code.

98

That shouldn't be a problem, should it? With the current code, if you write GetExceptionBreakpoint("foo"); there will be an unnecessary allocation. With StringRef, there won't. I'm not aware of *any* good reason to prefer const string& over StringRef unless you need to guarantee null termination, but that is questionable as well since the implementation details of a function end up affecting its API, so the implementation details aren't truly hidden.

clayborg updated this revision to Diff 160943.Aug 15 2018, 4:41 PM

Fixed almost all issues mentioned by zturner.

clayborg marked 23 inline comments as done.Aug 15 2018, 4:48 PM
clayborg added inline comments.
tools/lldb-vscode/SourceBreakpoint.h
25

That is just way too much work for no gain.

tools/lldb-vscode/VSCode.h
46

I meant to remove this. I have everything working with DenseMap, DenseSet and StringMap

47

Ditto, meant to remove this. Got everything working.

62

This can work with a port number. Rather not switch away from FILE * for now. No real benefit.

Looking pretty good, I went over it again and found a few more things. There's a bit more auto than I'm comfortable with, but I'm not gonna make a big deal about it. it does make the code a bit hard to read when it's used for trivial return values though.

tools/lldb-vscode/BreakpointBase.cpp
23

I remembered that we have a function that wraps this and returns a more intuitive error value (true means it succeeded). So this can actually be

if (llvm::to_integer(hitCondition, hitCount))
  bp.SetIgnoreCount(hitCount-1);
tools/lldb-vscode/BreakpointBase.h
11–12

We should use all uppercase for the include guards.

tools/lldb-vscode/ExceptionBreakpoint.h
22

What is value? Can you maybe make a more descriptive name? Does it indicate whether the breakpoint is enabled?

25–26

This doesn't look clang-formatted, I would expect some of the constructor initializers to be on the same line.

tools/lldb-vscode/JSONUtils.cpp
94

Can we do an early return here? Also I haven't checked, but if the json_array object contains a size() field, which I expect it does, then we might want to do a reserve() on the vector before adding items to it.

98–99

I don't think we need the if statement here. Unless value.kind() has a bug in it, this should be guaranteed to be correct without the check.

143

I'd like to avoid using unsafe functions from the printf family if possible. I think this function could be written:

StringRef Value = v.GetValue();
StringRef Summary = v.GetSummary();
StringRef TypeName = v.GetType().GetDisplayTypeName();

std::string Result;
llvm::raw_string_ostream OS(Result);
if (!Value.empty()) {
  OS << Value;
  if (!Summary.empty())
    OS << ' ' << Summary;
} else if (!Summary.empty()) {
  OS << ' ' << Summary;
} else if (!TypeName.empty()) {
  OS << TypeName;
  lldb::addr_t address = v.GetLoadAddress();
  if (address != LLDB_INVALID_ADDRESS)
    OS << " @ " << llvm::format_hex(address, 0);
}
object.try_emplace(OS.str());

I think this is a lot more concise.

288

Can we do an early return here?

306

Early return here.

308

And here.

472–473

Can we use the std::string overload here so we aren't limited to PATH_MAX?

542–544

Can we use a raw_string_ostream here and non-printf formatting functions here and elsewhere in this function? One way to re-write this would be:

Stream << formatv("{0:X+}: <{1}> {2} {3} {4}", inst_addr, inst_offset, fmt_repeat(' ', spaces), m, o);

Which I think is nice and easy to read. Similarly for the rest of the function.

549–550

With the above approach you don't need this copy because you're rendering it directly to the string you ultimately plan to use anyway, so this could go away. Also not a huge deal, but if you pull this std::string out of the for loop, you can avoid a re-allocation on every iteration.

560

Maybe just `source.content.push_back('\n');'

tools/lldb-vscode/JSONUtils.h
18

Can you put all the functions in this file in a namespace? I realize this is a standalone utility but it's still good practice to not pollute the global namespace from a header file.

26

Comment should be updated.

42

Commented should be updated here too.

tools/lldb-vscode/LLDBUtils.cpp
16

Can this be an llvm::raw_ostream& instead of an SBStream&?

clayborg updated this revision to Diff 161040.Aug 16 2018, 9:17 AM

Fix all of zturner's inline comments.

clayborg marked 13 inline comments as done.Aug 16 2018, 9:20 AM
clayborg added inline comments.
tools/lldb-vscode/JSONUtils.cpp
473

This is a SBFileSpec. We don't allow any STL to be used in API.

I had a couple of other comments, but since I responded from email since I was on the go and I guess they didn't show up inline. Sorry about that. If you prefer I can resubmit them all as inline comments, or I guess you can just respond to the email thread.

tools/lldb-vscode/JSONUtils.cpp
473

Ahh good point, my bad.

Inline comments really help if you don't mind.

zturner added inline comments.Aug 16 2018, 10:06 AM
tools/lldb-vscode/SourceBreakpoint.h
25

The gain is that const char* is unsafe and error prone and should be avoided wherever possible unless it’s very cumbersome not to.

The reason is that it's a bit like const propagation. It's inflexible and as soon as one function parameter is const, it is restricted to only being able to call other const functions, and it trickles downward throughout the entire program. Now in this case it's good, because const is intended to help you, but in the cast of raw string pointers like const char * it's bad, because raw pointers in general are unsafe and error prone, and in this case additionally inefficient (if multiple functions need to do something with the string, you end up with multiple calls to strlen).

It's true that the implementation of the function needs to call an SB API function, but interface design and interface implementation should be orthogonal. You pass a StringRef because it's the right thing to do for the interface, that the implementation needs a null terminated string is an implementation detail (some day, for example, we might have an SB API v2 that can be made to work with C++ types via clever SWIG mappings or some other technology).

It might be a minor nitpick, but we have a basically clean slate here so I would prefer to start on the right foot with modern c++ constructs and paradigms

tools/lldb-vscode/lldb-vscode.cpp
2641–2645

Now I see what you mean about FILE* being able to be used with a port. But I wasn't familiar with this behavior of FILE*. My undersatnding of FILE* is that its behavior is implementation defined. So can we guarantee this mutual exclusion on all platforms? I don't even know Windows' behavior with respect to this, I will have to check on this first to see if Windows behaves the same way.

If I understand your comment correctly though, it sounds like you're saying that you can't simultaneously read from and write to a full duplex socket, so the main advantage to using FILE* is that it gives us a mutex? How does it know to use the same mutex for both in and out? It seems like it would only guarantee mutual exclusion per-file object. So you couldn't read from the same socket on multiple threads, and you couldn't write to the same socket on multiple threads. But the current code still allows reading from and writing to the same socket simultaneously. Is that ok?

I need to confirm this behavior on other platforms. That said, we could also just put a mutex in the global VSCode class.

clayborg updated this revision to Diff 161058.Aug 16 2018, 10:26 AM

Change void SourceBreakpoint::SetBreakpoint() to use llvm::StringRef

clayborg marked 2 inline comments as done.Aug 16 2018, 10:29 AM
clayborg added inline comments.
tools/lldb-vscode/lldb-vscode.cpp
2647

The mutex isn't the problem, it is being able to read and write from the in and out streams. So adding a mutex to VSCode won't help. We have very simple interactions with in and out where we read lines and read a specific amount of data, and we just write data out.

zturner added inline comments.Aug 16 2018, 10:36 AM
tools/lldb-vscode/lldb-vscode.cpp
2647

So if all you've got is a a single integer fd, you can't do both a read and write on the same fd (even from the same thread)? Isn't a FILE* just a wrapper around the fd anyway?

Regardless, I guess my suggestion for the llvm streams won't work after all since LLVM doesn't have a good abstraction for a true input stream (like a socket). Maybe one of these days I should add one.

zturner accepted this revision.Aug 16 2018, 10:42 AM

Thanks for being patient. Looking forward to actually using this on my Linux box.

This revision is now accepted and ready to land.Aug 16 2018, 10:42 AM
This revision was automatically updated to reflect the committed changes.
lemo added a comment.Sep 20 2018, 3:05 PM

Hi Greg, looking at request_evaluate() I noticed that it will evaluate the
string as a lldb command if prefixed by ` .

This is a great feature (it allows building REPL consoles on top of DAP),
but I'm curious how you picked up this convention? For example I believe
that the gdb DAP uses -exec 'command' instead.

Hi Greg, looking at request_evaluate() I noticed that it will evaluate the string as a lldb command if prefixed by ` .

This is a great feature (it allows building REPL consoles on top of DAP), but I'm curious how you picked up this convention? For example I believe that the gdb DAP uses -exec 'command' instead.

` is an illegal expression character so it won't stop you from evaluating any possible expression. The gdb prefix "-exec" stops you from being able to negate a local variable named "exec". Not a huge deal. So I just picked a good prefix character that wouldn't stop anyone from evaluating any valid expression (at least in C/C++/ObjC/Swift).

The solution I would love to see is to have the initialize packet return something via the DAP that says "I have a command line interpreter, please send a packet with a file handle (slave path to slave side of pseudo terminal maybe)". VS Code and Nuclide both emulated tty already, so we could have a true command line that exposes the "(gdb)" prompt for GDB and "(lldb)" for lldb and all the power that comes with it.

I needed something that could run LLDB commands when things go wrong to do trouble shooting (enable logging, run commands to dump vital information) so I hacked it in with `