Page MenuHomePhabricator

Add a new tool named "lldb-vscode" that implements the Visual Studio Code Debug Adaptor Protocol

Authored by clayborg on Aug 6 2018, 3:24 PM.



This patch adds a new lldb-vscode tool that speaks the Microsoft Visual Studio Code debug adaptor protocol. It has full unit tests that test all packets.

This tool can be easily packaged up into a native extension and used with Visual Studio Code, and it can also be used by Nuclide.

Not sure who to add as a reviewer, so I started with a few people. Feel free to add more as needed.

Diff Detail


Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
zturner added inline comments.Aug 15 2018, 9:34 AM
22 ↗(On Diff #160736)

Is it useful to have a notion of a breakpoint object that isn't actually set? Why not just delete this class instance entirely and let the caller allocate a new one when they're ready. If we want to keep this, then I think an llvm::Optional<SBBreakpoint> would be a better approach, because it makes explicit for someone reading the code that not existing is a valid state.

23 ↗(On Diff #160736)

Early return.

17 ↗(On Diff #160736)

Can we call this variable functionName and delete the comment?

27 ↗(On Diff #160736)

I think this could return StringRef

44–45 ↗(On Diff #160736)

I think this could return a StringRef. Also, it seems weird to have an overload which takes a pointer. Can we delete that overload? I don't think it would make sense for anyone to call this function with a null obj.

141–142 ↗(On Diff #160736)

This could return a vector<StringRef>

225 ↗(On Diff #160736)

StringRef event_name

261–262 ↗(On Diff #160736)

StringRef name

24 ↗(On Diff #160736)

StringRef source_path

19 ↗(On Diff #160736)


22–24 ↗(On Diff #160736)

I'd write this as:

inline bool is_empty_line(StringRef S) {
  return S.ltrim().empty();
129 ↗(On Diff #160736)

You should check out llvm/Support/LineIterator.h

45 ↗(On Diff #160736)


46 ↗(On Diff #160736)


60–61 ↗(On Diff #160736)

Any reason why we have to use FILE* instead of fds?

72–73 ↗(On Diff #160736)

Delete commented out code

74 ↗(On Diff #160736)


75–76 ↗(On Diff #160736)


77 ↗(On Diff #160736)


79–83 ↗(On Diff #160736)


89 ↗(On Diff #160736)


97 ↗(On Diff #160736)

StringRef filter

104 ↗(On Diff #160736)

StringRef json_str

114–115 ↗(On Diff #160736)

StringRef output

130–131 ↗(On Diff #160736)

StringRef prefix, ArrayRef<std::string> commands

47–58 ↗(On Diff #160736)

This block of code can be deleted. We have similar abstractions in either lldb or llvm.

61 ↗(On Diff #160736)


69 ↗(On Diff #160736)

I don't believe these pragma statements will work on all compilers

71 ↗(On Diff #160736)

Can you make all global functions static?

108 ↗(On Diff #160736)

Can this take an ArrayRef<std::string> instead?

432 ↗(On Diff #160736)

This should be an llvm::raw_string_ostream which builds up the command, then at the end call stream.str() to render the final string.

clayborg added inline comments.Aug 15 2018, 2:49 PM
21 ↗(On Diff #160736)

Sure thing

22 ↗(On Diff #160736)

Exceptions breakpoints get set once at the beginning of the session after the "initialized" event is sent back to the debugger. The ClearBreakpoint is just for cleanup after all is said and done.

17 ↗(On Diff #160736)

No. It doesn't actually share anything from BreakpointBase. You would think it would, but it doesn't.

24 ↗(On Diff #160736)


45 ↗(On Diff #160736)

it keeps the code much cleaner. Some of the LLVM JSON calls return an "llvm::json::Object *" when fetching a value for a key which can be NULL. The primary reason for these functions it to keep the code clean and simple. I prefer to keep these overloads.

142 ↗(On Diff #160736)

Need to keep as is because as noted in the description, numbers and booleans will be converted to strings. This is in case you specify arguments for your program as:

"args": [ 123, "hello", true ]
24 ↗(On Diff #160736)

I am gonna leave this one as is because we must pass a "const char *" the API that is called inside the body of this method:

lldb::SBBreakpoint lldb::SBTarget::BreakpointCreateByLocation(const char *file, uint32_t line);
45 ↗(On Diff #160736)

Causes build errors when I tried switching.

46 ↗(On Diff #160736)

Doesn't work with std::string and we need the std::string to back the string content.

83 ↗(On Diff #160736)

These are usually empty. No need to use SmallVector and force storage when we don't need it.

97 ↗(On Diff #160736)

This is iterating through an array of classes and check if the member variable, which is a std::string is equal. Leaving as std::string

104 ↗(On Diff #160736)

Leaving as std::string as we need to guarantee null termination

71 ↗(On Diff #160736)

I will add an anonymous namespace around all local functions.

108 ↗(On Diff #160736)


zturner added inline comments.Aug 15 2018, 3:12 PM
142 ↗(On Diff #160736)

That's unfortunate. All json values are backed by text in the underlying file, so in theory you could have StringRefs referrings to the part of the file that contains the string true. But it seems the JSON library doesn't preserve the full text of each value, so maybe this isn't possible.

24 ↗(On Diff #160736)

In that case you can use const llvm::Twine &. You can pass many types of objects to it, such as const char*, std::string, llvm::StringRef. Then, inside the function you can write:

llvm::SmallString<32> S;
StringRef NTS = source_path.toNullTerminatedStringRef(S);

If the original parameter was constructed with a const char* or std::string, then toNullTerminatedStringRef is smart enough to know that it doesn't need to make a copy, and it just returns the pointer. If it was constructed with a StringRef, then it null terminates it using S as the backing storage and returns that pointer. So it's the best of both worlds

45 ↗(On Diff #160736)

What kind of build errors? DenseMap<uint32_t, SourceBreakpoint> shouldn't cause any. If SourceBreakpoint was the key it might, but not if it's the value.

46 ↗(On Diff #160736)

llvm::StringMap copies the keys so it will always be backed.

60–61 ↗(On Diff #160736)

I think maybe out would be better as an llvm::raw_fd_ostream, and in would be better as an llvm::MemoryBuffer which you could obtain via llvm::MemoryBuffer::getSTDIN(). These are always stdin and stdout and never seem to change, and being able to stream data to out and read from in as very convenient and cleans up a lot of code.

97 ↗(On Diff #160736)

That shouldn't be a problem, should it? With the current code, if you write GetExceptionBreakpoint("foo"); there will be an unnecessary allocation. With StringRef, there won't. I'm not aware of *any* good reason to prefer const string& over StringRef unless you need to guarantee null termination, but that is questionable as well since the implementation details of a function end up affecting its API, so the implementation details aren't truly hidden.

clayborg updated this revision to Diff 160943.Aug 15 2018, 4:41 PM

Fixed almost all issues mentioned by zturner.

clayborg marked 23 inline comments as done.Aug 15 2018, 4:48 PM
clayborg added inline comments.
24 ↗(On Diff #160736)

That is just way too much work for no gain.

45 ↗(On Diff #160736)

I meant to remove this. I have everything working with DenseMap, DenseSet and StringMap

46 ↗(On Diff #160736)

Ditto, meant to remove this. Got everything working.

61 ↗(On Diff #160736)

This can work with a port number. Rather not switch away from FILE * for now. No real benefit.

Looking pretty good, I went over it again and found a few more things. There's a bit more auto than I'm comfortable with, but I'm not gonna make a big deal about it. it does make the code a bit hard to read when it's used for trivial return values though.

22 ↗(On Diff #160943)

I remembered that we have a function that wraps this and returns a more intuitive error value (true means it succeeded). So this can actually be

if (llvm::to_integer(hitCondition, hitCount))
10–11 ↗(On Diff #160943)

We should use all uppercase for the include guards.

21 ↗(On Diff #160943)

What is value? Can you maybe make a more descriptive name? Does it indicate whether the breakpoint is enabled?

24–25 ↗(On Diff #160943)

This doesn't look clang-formatted, I would expect some of the constructor initializers to be on the same line.

93 ↗(On Diff #160943)

Can we do an early return here? Also I haven't checked, but if the json_array object contains a size() field, which I expect it does, then we might want to do a reserve() on the vector before adding items to it.

97–98 ↗(On Diff #160943)

I don't think we need the if statement here. Unless value.kind() has a bug in it, this should be guaranteed to be correct without the check.

142 ↗(On Diff #160943)

I'd like to avoid using unsafe functions from the printf family if possible. I think this function could be written:

StringRef Value = v.GetValue();
StringRef Summary = v.GetSummary();
StringRef TypeName = v.GetType().GetDisplayTypeName();

std::string Result;
llvm::raw_string_ostream OS(Result);
if (!Value.empty()) {
  OS << Value;
  if (!Summary.empty())
    OS << ' ' << Summary;
} else if (!Summary.empty()) {
  OS << ' ' << Summary;
} else if (!TypeName.empty()) {
  OS << TypeName;
  lldb::addr_t address = v.GetLoadAddress();
  if (address != LLDB_INVALID_ADDRESS)
    OS << " @ " << llvm::format_hex(address, 0);

I think this is a lot more concise.

287 ↗(On Diff #160943)

Can we do an early return here?

305 ↗(On Diff #160943)

Early return here.

307 ↗(On Diff #160943)

And here.

471–472 ↗(On Diff #160943)

Can we use the std::string overload here so we aren't limited to PATH_MAX?

541–543 ↗(On Diff #160943)

Can we use a raw_string_ostream here and non-printf formatting functions here and elsewhere in this function? One way to re-write this would be:

Stream << formatv("{0:X+}: <{1}> {2} {3} {4}", inst_addr, inst_offset, fmt_repeat(' ', spaces), m, o);

Which I think is nice and easy to read. Similarly for the rest of the function.

548–549 ↗(On Diff #160943)

With the above approach you don't need this copy because you're rendering it directly to the string you ultimately plan to use anyway, so this could go away. Also not a huge deal, but if you pull this std::string out of the for loop, you can avoid a re-allocation on every iteration.

559 ↗(On Diff #160943)

Maybe just `source.content.push_back('\n');'

17 ↗(On Diff #160943)

Can you put all the functions in this file in a namespace? I realize this is a standalone utility but it's still good practice to not pollute the global namespace from a header file.

25 ↗(On Diff #160943)

Comment should be updated.

41 ↗(On Diff #160943)

Commented should be updated here too.

15 ↗(On Diff #160943)

Can this be an llvm::raw_ostream& instead of an SBStream&?

clayborg updated this revision to Diff 161040.Aug 16 2018, 9:17 AM

Fix all of zturner's inline comments.

clayborg marked 13 inline comments as done.Aug 16 2018, 9:20 AM
clayborg added inline comments.
472 ↗(On Diff #160943)

This is a SBFileSpec. We don't allow any STL to be used in API.

I had a couple of other comments, but since I responded from email since I was on the go and I guess they didn't show up inline. Sorry about that. If you prefer I can resubmit them all as inline comments, or I guess you can just respond to the email thread.

472 ↗(On Diff #160943)

Ahh good point, my bad.

Inline comments really help if you don't mind.

zturner added inline comments.Aug 16 2018, 10:06 AM
24 ↗(On Diff #160736)

The gain is that const char* is unsafe and error prone and should be avoided wherever possible unless it’s very cumbersome not to.

The reason is that it's a bit like const propagation. It's inflexible and as soon as one function parameter is const, it is restricted to only being able to call other const functions, and it trickles downward throughout the entire program. Now in this case it's good, because const is intended to help you, but in the cast of raw string pointers like const char * it's bad, because raw pointers in general are unsafe and error prone, and in this case additionally inefficient (if multiple functions need to do something with the string, you end up with multiple calls to strlen).

It's true that the implementation of the function needs to call an SB API function, but interface design and interface implementation should be orthogonal. You pass a StringRef because it's the right thing to do for the interface, that the implementation needs a null terminated string is an implementation detail (some day, for example, we might have an SB API v2 that can be made to work with C++ types via clever SWIG mappings or some other technology).

It might be a minor nitpick, but we have a basically clean slate here so I would prefer to start on the right foot with modern c++ constructs and paradigms

2641–2645 ↗(On Diff #161040)

Now I see what you mean about FILE* being able to be used with a port. But I wasn't familiar with this behavior of FILE*. My undersatnding of FILE* is that its behavior is implementation defined. So can we guarantee this mutual exclusion on all platforms? I don't even know Windows' behavior with respect to this, I will have to check on this first to see if Windows behaves the same way.

If I understand your comment correctly though, it sounds like you're saying that you can't simultaneously read from and write to a full duplex socket, so the main advantage to using FILE* is that it gives us a mutex? How does it know to use the same mutex for both in and out? It seems like it would only guarantee mutual exclusion per-file object. So you couldn't read from the same socket on multiple threads, and you couldn't write to the same socket on multiple threads. But the current code still allows reading from and writing to the same socket simultaneously. Is that ok?

I need to confirm this behavior on other platforms. That said, we could also just put a mutex in the global VSCode class.

clayborg updated this revision to Diff 161058.Aug 16 2018, 10:26 AM

Change void SourceBreakpoint::SetBreakpoint() to use llvm::StringRef

clayborg marked 2 inline comments as done.Aug 16 2018, 10:29 AM
clayborg added inline comments.
2646 ↗(On Diff #161058)

The mutex isn't the problem, it is being able to read and write from the in and out streams. So adding a mutex to VSCode won't help. We have very simple interactions with in and out where we read lines and read a specific amount of data, and we just write data out.

zturner added inline comments.Aug 16 2018, 10:36 AM
2646 ↗(On Diff #161058)

So if all you've got is a a single integer fd, you can't do both a read and write on the same fd (even from the same thread)? Isn't a FILE* just a wrapper around the fd anyway?

Regardless, I guess my suggestion for the llvm streams won't work after all since LLVM doesn't have a good abstraction for a true input stream (like a socket). Maybe one of these days I should add one.

zturner accepted this revision.Aug 16 2018, 10:42 AM

Thanks for being patient. Looking forward to actually using this on my Linux box.

This revision is now accepted and ready to land.Aug 16 2018, 10:42 AM
This revision was automatically updated to reflect the committed changes.
lemo added a comment.Sep 20 2018, 3:05 PM

Hi Greg, looking at request_evaluate() I noticed that it will evaluate the
string as a lldb command if prefixed by ` .

This is a great feature (it allows building REPL consoles on top of DAP),
but I'm curious how you picked up this convention? For example I believe
that the gdb DAP uses -exec 'command' instead.

Hi Greg, looking at request_evaluate() I noticed that it will evaluate the string as a lldb command if prefixed by ` .

This is a great feature (it allows building REPL consoles on top of DAP), but I'm curious how you picked up this convention? For example I believe that the gdb DAP uses -exec 'command' instead.

` is an illegal expression character so it won't stop you from evaluating any possible expression. The gdb prefix "-exec" stops you from being able to negate a local variable named "exec". Not a huge deal. So I just picked a good prefix character that wouldn't stop anyone from evaluating any valid expression (at least in C/C++/ObjC/Swift).

The solution I would love to see is to have the initialize packet return something via the DAP that says "I have a command line interpreter, please send a packet with a file handle (slave path to slave side of pseudo terminal maybe)". VS Code and Nuclide both emulated tty already, so we could have a true command line that exposes the "(gdb)" prompt for GDB and "(lldb)" for lldb and all the power that comes with it.

I needed something that could run LLDB commands when things go wrong to do trouble shooting (enable logging, run commands to dump vital information) so I hacked it in with `