This is an archive of the discontinued LLVM Phabricator instance.

Add StructuredData unit tests; move packet processing into delegate.
ClosedPublic

Authored by tfiala on Aug 25 2016, 10:41 AM.

Details

Summary

This change does the following:

  • Changes the signature for the continuation delegate method that handles async structured data from accepting an already-parsed structured data element to taking just the packet contents.
  • Moves the conversion of the JSON-async: packet contents from GDBRemoteClientBase to the continuation delegate method.
  • Adds a new unit test for verifying that the $JSON-asyc: packets get decoded and that the decoded packets get forwarded on to the delegate for further processing. Thanks to Pavel for making that whole section of code easily unit testable!
  • Tightens up the packet verification on reception of a $JSON-async: packet contents. The code prior to this change is susceptible to a segfault if a packet is carefully crafted that starts with $J but has a total length shorter than the length of "$JSON-async:".

Diff Detail

Repository
rL LLVM

Event Timeline

tfiala updated this revision to Diff 69269.Aug 25 2016, 10:41 AM
tfiala retitled this revision from to Add StructuredData unit tests; remove JSON parsing string copy.
tfiala updated this object.
tfiala added reviewers: clayborg, labath, jasonmolenda.
tfiala added a subscriber: lldb-commits.
labath edited edge metadata.Aug 26 2016, 2:24 AM

Thank you for writing the tests. I have two stylistic comments, but otherwise looks great.

source/Plugins/Process/gdb-remote/GDBRemoteClientBase.cpp
113 ↗(On Diff #69269)

I'd like to move this code to a separate function. The main job of SendContinuePacketAndWaitForResponse is dealing with the thread synchronization issues, which is tricky enough without having json parsing in the middle of it.

unittests/Process/gdb-remote/GDBRemoteClientBaseTest.cpp
371 ↗(On Diff #69269)

Please put the "expected" values first (ASSERT_NE(expected, actual)). Otherwise the error message will come out wrong when the comparison fails. The same issue is present in a number of other checks.

clayborg requested changes to this revision.Aug 26 2016, 8:43 AM
clayborg edited edge metadata.

We should probably move over to using llvm::StrringRef instead of "const char *". It the JSON parser isn't using llvm::StringRef already it probably should.

This revision now requires changes to proceed.Aug 26 2016, 8:43 AM
tfiala added inline comments.Aug 26 2016, 8:50 AM
source/Plugins/Process/gdb-remote/GDBRemoteClientBase.cpp
113 ↗(On Diff #69269)

Sure, sounds like a good change.

unittests/Process/gdb-remote/GDBRemoteClientBaseTest.cpp
371 ↗(On Diff #69269)

Okay. That's somewhat unfortunate that Python unittest's messages are geared for the reverse:
http://bugs.python.org/issue10573

I will reverse these, thanks for catching that!

clayborg accepted this revision.Aug 26 2016, 8:52 AM
clayborg edited edge metadata.

Actually after looking at the JSON parser I see that it isn't using llvm::StringRef throughout. So I vote to keep the "const char *" for now since we need to guarantee that the JSON is null terminated. If we take a llvm::StringRef, then we will need to call llvm::StringRef::str() to make a std::string so we can guarantee that the JSON is null terminated which we do not want to have to do.

Either we include changing the JSON parser over to using llvm::StringRef throughout and change the constructors, or we leave things as "const char *" for now. I am assuming Todd doesn't want to take on the conversion over to llvm::StringRef so I am going to say I am ok with this change.

This revision is now accepted and ready to land.Aug 26 2016, 8:52 AM
tfiala added a comment.EditedAug 26 2016, 8:54 AM

<nevermind>

Looking closer at the JSON parser llvm::StringRef isn't the right thing to use when parsing JSON. The parser will often remove desensitizing characters from say a string like:

"hello \"world\""

And when parsing a token in JSONParser::GetToken, we can't just hand out a llvm::StringRef do this data in memory. The function prototype is:

JSONParser::Token
JSONParser::GetToken (std::string &value)

This would mean we would need "value" to be able to have a new backing store in case the values don't match what is in the JSON text buffer which would defeat the purpose of using llvm::StringRef. We could make a struct like:

struct ParseString
{
    llvm::StringRef value;
    std::string backing_store;
};

But then the code becomes more complex for no real benefit.

I'm not sure I follow. StringRef is literally just a wrapper around a const char*, so if const char* works, why doesn't StringRef? JSONParser constructor is already taking a const char* in this patch, so it's not like it is modifying the contents of the string.

Checking the length and/or getting a substring or trimming values from the beginning and end is going to be a far more common operation than converting it into something that has to be null terminated (why does it have to be null terminated for that matter anyway?), so the benefit of having the length stored with the string outweighs the con of having to call str() when you want something null terminated, IMO

Actually it looks like JSONParser constructor takes a StringRef and then converts it to a std::string internally. Why can't it just store the StringRef internally? It doesn't modify the buffer, and incidentally it also keeps a uint64_t representing the "offset" into the string. That's what a StringRef is. an immutable buffer and an offset. Each time you extract something, you just say m_buffer = m_buffer.drop_front(n).

There is the SetFilePos() method, but it looks like it's only being used for two purposes: 1) To move forward, and 2) To reset the position to 0 after setting a new string in the extractor. The first one is still possible with a StringRef by writing extractor.skip(n), and the second one is still possible by writing extractor.SetString(NewStringRef).

I'm not sure I follow. StringRef is literally just a wrapper around a const char*, so if const char* works, why doesn't StringRef?

"const char *" assumed null termination. StringRef can be a reference to part of a C string:

const char *cstr = "\"valid json string\"and some other junk";
llvm::StringRef json_text(cstr, 19);

Not that the last character after the ending quote isn't a '\0' charcter it is a 'a'.

So if we want to actually use the "json_text" to parse the JSON, we need make a std::string so that it becomes null terminated. We could switch the parsing code to use a new class StringRefExtractor instead of StringExtractor in which case we can accept the data as a llvm::StringRef from the constructor. But all internal code would need to use std::string (GetToken() would still need to use std::string).

JSONParser constructor is already taking a const char* in this patch, so it's not like it is modifying the contents of the string.

You are correct. It does make a contract that used to not be there where the caller must guarantee the data it is parsing doesn't go away, but I don't think that will be a problem.

Checking the length and/or getting a substring or trimming values from the beginning and end is going to be a far more common operation than converting it into something that has to be null terminated (why does it have to be null terminated for that matter anyway?)

Currently it is so the parser knows when to stop, but it doesn't have to be if we switch to using a new StringRefExtractor class.

So the benefit of having the length stored with the string outweighs the con of having to call str() when you want something null terminated, IMO

Yes it it fine as long as you convert the parsing code.

Actually it looks like JSONParser constructor takes a StringRef and then converts it to a std::string internally. Why can't it just store the StringRef internally? It doesn't modify the buffer, and incidentally it also keeps a uint64_t representing the "offset" into the string. That's what a StringRef is. an immutable buffer and an offset. Each time you extract something, you just say m_buffer = m_buffer.drop_front(n).

Yep, we just need to create a StringRefExtractor class that uses a llvm::StringRef class internally instead of a std::string. If you do something about this, don't change StringExtractor since all of the GDB remote packets use these and actually use the std::string as the backing store for all of its incoming packets.

There is the SetFilePos() method, but it looks like it's only being used for two purposes: 1) To move forward, and 2) To reset the position to 0 after setting a new string in the extractor. The first one is still possible with a StringRef by writing extractor.skip(n), and the second one is still possible by writing extractor.SetString(NewStringRef).

You don't want to modify the string in any way. StringRefExtractor would contain a llvm::StringRef + the current offset position just like StringExtractor. No need to change anything functionality wise.

Again, if we switch the parser over to use an llvm::StringRef then this make sense, otherwise it doesn't.

You mention a StringRefExtractor. But nothing about StringExtractor that I can tell depends on the fact that it's null terminated. It only needs to know where to stop. A StringRef contains a length, so you stop when it gets to the end of the StringRef. It seems to me like we don't need a StringRefExtractor, but rather that StringExtractor itself can just be updated to use a StringRef instead of a std::string.

Anyway, I will try to whip up a patch that converts StringExtractor to store a StringRef internally. I'll put it up for review when I'm done just so you can see what I have in mind. If it works great, if it doesn't oh well :)

Again, if we switch the parser over to use an llvm::StringRef then this make sense, otherwise it doesn't.

I don't want to conflate this change with switching the parser over to a string ref. The only reason I changed the entry to a const char* was to avoid creating a couple unnecessary temporaries.

We can consider the parser conversion a separate logical change that can be done independently.

Again, if we switch the parser over to use an llvm::StringRef then this make sense, otherwise it doesn't.

I don't want to conflate this change with switching the parser over to a string ref. The only reason I changed the entry to a const char* was to avoid creating a couple unnecessary temporaries.

We can consider the parser conversion a separate logical change that can be done independently.

Greg and I talked about this a bit.

I'm going to reverse out the change to the JSON parsing signature, and just add a move-enabled one to avoid the temporary I was concerned about.

Getting back to this change.

I'm going to rebase for code reformatting, remove the signature change on the JSON parsing, adjust my call sites for it, refactor the structured data parsing in the gdb-remote reception to a separate function, and then put this back up for review.

tfiala updated this revision to Diff 70909.Sep 9 2016, 2:42 PM
tfiala retitled this revision from Add StructuredData unit tests; remove JSON parsing string copy to Add StructuredData unit tests; move packet processing into delegate..
tfiala edited edge metadata.
tfiala updated this object.Sep 9 2016, 3:45 PM
tfiala updated this object.
zturner added inline comments.Sep 9 2016, 3:47 PM
source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
4812 ↗(On Diff #70909)

How about just a global llvm::StringRef, or even a StringRef at local scope? Doesn't seem worth using a function local static for this.

4817 ↗(On Diff #70909)

Change the function parameter to an llvm::StringRef, and then you can do the following:

4821–4838 ↗(On Diff #70909)

This entire block becomes:

if (!packet.consume_front("JSON-async:")) {
   // print the log statement
}
auto json_sp = StructuredData::ParseJSON(packet);
4856 ↗(On Diff #70909)

This is doing a string copy since you're going from a StringRef to a std::string. Use StringRef all the way down.

unittests/Process/gdb-remote/GDBRemoteClientBaseTest.cpp
38 ↗(On Diff #70909)

This can be a vector of StringRefs as well, unless there's some reason you need to throw away the memory backing the StringRef, which it doesn't appear you do.

Also, if you have a rough idea of how many StringRefs there's going to be ahead of time, or at least an upper bound, then an llvm::SmallVector<StringRef> will be more efficient.

335 ↗(On Diff #70909)

Would be nice to see PutEscapedBytes updated to take a StringRef. Every occurrence of passing const char * str, int len should be replaced with StringRef as we find occurrences of it.

tfiala marked 3 inline comments as done.Sep 9 2016, 4:15 PM

I'll make a few more adjustments here based on Zachary's feedback.

source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
4812 ↗(On Diff #70909)

If llvm::StringRef at global scope does not incur a global constructor, that's fine. If it does, we are prevented from adding global constructors within our products except on a special-case basis. This would not pass that mark.

Easy enough for me to try, though. I'll check.

4821–4838 ↗(On Diff #70909)

Yep, that looks great. Thanks!

unittests/Process/gdb-remote/GDBRemoteClientBaseTest.cpp
38 ↗(On Diff #70909)

It's not clear to me what the lifetime is of the packet message content and how that interplays with a StringRef. I was under the impression that a StringRef will assume the backing store hangs around. That sounds like a recipe for potential invalid memory access? I'll check the code, though. It may be totally fine based on how the packet content is handled.

335 ↗(On Diff #70909)

Yep. I'd prefer to not make that part of this change, though.

tfiala added a comment.Sep 9 2016, 4:16 PM

I'll make a few more adjustments here based on Zachary's feedback.

Heh, ahem, that was meant to be "global *destructor*" above, not global constructor.

zturner added inline comments.Sep 9 2016, 4:40 PM
unittests/Process/gdb-remote/GDBRemoteClientBaseTest.cpp
335 ↗(On Diff #70909)

Yea you are right, since the packets are coming in asynchronously I don't think we can make any assumptions about how long it will live.

tfiala updated this revision to Diff 70930.Sep 9 2016, 4:49 PM
tfiala updated this revision to Diff 70931.

Re-uploaded last patch with full context.

tfiala added inline comments.Sep 9 2016, 4:52 PM
unittests/Process/gdb-remote/GDBRemoteClientBaseTest.cpp
38 ↗(On Diff #70931)

Yeah, I did try a std::vector<llvm::StringRef> but that actually blew up since the backing storage did disappear.

For the rest of the interface, I was able to pipe through the llvm::StringRef, though, since that's all serial up to the JSON parser, which itself then makes a copy that it requires. So I think that's all for the better. Thanks for the suggestion!

tfiala added a comment.Sep 9 2016, 5:09 PM

I'm going to go ahead with this since I think the biggest concerns have been addressed. @labath we can adjust more later if you still have concerns.

This revision was automatically updated to reflect the committed changes.