This is an archive of the discontinued LLVM Phabricator instance.

[lldb/Reproducers] Fix passive replay for functions returning string as (char*, size_t).
ClosedPublic

Authored by JDevlieghere on Apr 8 2020, 3:18 PM.

Details

Summary

Several SB API functions return strings a (char*, size_t) pair. During capture, we serialize an empty string for the char* because the memory can be uninitialized. During replay, we have custom replay redirects that ensure that we don't override the buffer from which we're reading, but rather a buffer on the heap with the given length. This is sufficient for the "regular" reproducer use case, where we only care about the side effects of the API calls, not the actual return values.

This is not sufficient for passive replay. For passive replay, we ignore all the incoming arguments, and re-execute the current function with the arguments serialized in the reproducer. This means that these function will update the deserialized copy of the arguments, rather than whatever was passed in by the SWIG wrapper. To solve this problem, I've extended the reproducer instrumentation with special case replayers and corresponding macros. They ignore the replayer in the registry and the incoming char pointer, and instead reinvoke the current method on the deserialized class, and populate the output argument. It's unfortunate that this needs to be special cased, but I don't see a better solution.

Diff Detail

Event Timeline

JDevlieghere created this revision.Apr 8 2020, 3:18 PM

Rebase + add unit tests

JDevlieghere retitled this revision from [lldb/Reproducers] Fix inline replay for functions returning string as (char*, size_t). to [lldb/Reproducers] Fix passive replay for functions returning string as (char*, size_t)..Apr 13 2020, 12:36 PM
JDevlieghere edited the summary of this revision. (Show Details)

The thing which bugs me here (besides the duplication, which I think we now know how to handle) is that the definition of the behavior for the char*+size functions is spread out over REGISTER and RECORD macros -- one define the behavior of active replay, one for passive.

I sort of get that this is because the two modes are fairly different, but I think it still would be nice to centralize that, and I think that with some thought, that might be possible. As I understand it, tight now, the "active" redirection works by passing a pointer which completely replaces the behavior of the called method. In reality the replacing function will eventually call the replaced function anyway, with some fixups applied. What if, instead of replacing the function completely, we passed a pair of callbacks which would run before+after the original function. The first callback could apply any necessary fixups to the deserialized arguments, and the second one could (if in passive mode) translate the modifications done by the function into the "real" arguments. That way, we wouldn't need to do anything special in the RECORD macro...

Other ideas may be possible too...

Remove duplication similar to D77602.

Honestly, at this point I'm not convinced that the difference between active and passive replay warrants special handling. Other than the slightly different macros, which are unfortunate but OTOH also a good indicator that something special is happening, there's not that much more to support active/passive replay than there is for the generic case.

Hmm... well, I like how this (active&passive) is unified at the template level. It still feels like there's too much code to implement this thing (coming from the duplication for const, non-const and static variants), but I'm not sure what can be done about that. I'll need to ponder this a bit more

lldb/include/lldb/Utility/ReproducerInstrumentation.h
1016

this forward decl is superfluous.

labath accepted this revision.Apr 20 2020, 7:02 AM

Hmm... well, I like how this (active&passive) is unified at the template level. It still feels like there's too much code to implement this thing (coming from the duplication for const, non-const and static variants), but I'm not sure what can be done about that. I'll need to ponder this a bit more

After sleeping on this a couple of times, I've come to believe that the problem here is that we've let the redirection functions inherit the c++ function pointer weirdness, which resulted in a need to write template goo for const, non-const and static methods (and maybe sometimes even constructors). The goo is (more or less) unavoidable if we want to have compiler-generated "identifiers" for all these functions, but there's a difference between writing that once, and once for each redirection we need to make. Particularly as it doesn't seem like there's a compelling reason for that. There no reason that the "redirected" function in the replayer machinery needs be globally unique in the same way as the "original" functions do. In fact I think it doesn't even have to be a template at all -- it could just take the function-to-wrap as a regular argument instead of a templated one. I believe char_ptr_redirect<goo>::method<&goo>::record could have been written as:

// this handles const and non-const methods
template<typename Result, typename This> // Maybe "Result" not even needed, if this is always the same type.
Result char_ptr_redirect(Result (*f)(This *, char *, size_t), This *this_, char *ptr, size_t len) {
  char *buffer = reinterpret_cast<char *>(calloc(len, sizeof(char)));
  return f(this_, buffer, len)
}
// and another overload without This for static functions

Similar thing could be applied to the "replay" functions introduced here. Reduction from three overloads to two may not sound like much, but given that it would also remove the need for a lot of template goo (and reduce code size), I still find it interesting. It might be possible to also merge the two remaining overloads if we try hard enough.

Nonetheless, as long as we have only one kind of redirects (char_ptr thingy), it's benefit is not very clear, so I'm not going to insist on that. However, if more of these crop up, I think we should definitely revisit this.

This revision is now accepted and ready to land.Apr 20 2020, 7:02 AM

Hmm... well, I like how this (active&passive) is unified at the template level. It still feels like there's too much code to implement this thing (coming from the duplication for const, non-const and static variants), but I'm not sure what can be done about that. I'll need to ponder this a bit more

After sleeping on this a couple of times, I've come to believe that the problem here is that we've let the redirection functions inherit the c++ function pointer weirdness, which resulted in a need to write template goo for const, non-const and static methods (and maybe sometimes even constructors). The goo is (more or less) unavoidable if we want to have compiler-generated "identifiers" for all these functions, but there's a difference between writing that once, and once for each redirection we need to make. Particularly as it doesn't seem like there's a compelling reason for that. There no reason that the "redirected" function in the replayer machinery needs be globally unique in the same way as the "original" functions do. In fact I think it doesn't even have to be a template at all -- it could just take the function-to-wrap as a regular argument instead of a templated one.

Yeah, I've come to the same conclusion while working on this. Hindsight is 20/20 and changing all that might be a significant amount of work.

I believe char_ptr_redirect<goo>::method<&goo>::record could have been written as:

// this handles const and non-const methods
template<typename Result, typename This> // Maybe "Result" not even needed, if this is always the same type.
Result char_ptr_redirect(Result (*f)(This *, char *, size_t), This *this_, char *ptr, size_t len) {
  char *buffer = reinterpret_cast<char *>(calloc(len, sizeof(char)));
  return f(this_, buffer, len)
}
// and another overload without This for static functions

Similar thing could be applied to the "replay" functions introduced here. Reduction from three overloads to two may not sound like much, but given that it would also remove the need for a lot of template goo (and reduce code size), I still find it interesting. It might be possible to also merge the two remaining overloads if we try hard enough.

Agreed and while it would be a small thing it would also help compile times: this file is by far the slowest to compile in lldb. That and the templates have a significant cognitive overhead.

Nonetheless, as long as we have only one kind of redirects (char_ptr thingy), it's benefit is not very clear, so I'm not going to insist on that. However, if more of these crop up, I think we should definitely revisit this.

Thanks, I totally share your concerns and really appreciate the pragmatism.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2020, 1:34 PM

Hmm... well, I like how this (active&passive) is unified at the template level. It still feels like there's too much code to implement this thing (coming from the duplication for const, non-const and static variants), but I'm not sure what can be done about that. I'll need to ponder this a bit more

After sleeping on this a couple of times, I've come to believe that the problem here is that we've let the redirection functions inherit the c++ function pointer weirdness, which resulted in a need to write template goo for const, non-const and static methods (and maybe sometimes even constructors). The goo is (more or less) unavoidable if we want to have compiler-generated "identifiers" for all these functions, but there's a difference between writing that once, and once for each redirection we need to make. Particularly as it doesn't seem like there's a compelling reason for that. There no reason that the "redirected" function in the replayer machinery needs be globally unique in the same way as the "original" functions do. In fact I think it doesn't even have to be a template at all -- it could just take the function-to-wrap as a regular argument instead of a templated one.

Yeah, I've come to the same conclusion while working on this. Hindsight is 20/20 and changing all that might be a significant amount of work.

That last part makes me thing that we have not come to the same conclusion. :P I'm not talking about changing the entire mechanism of how we assign "ids" to functions and stuff. While it is true that it's not working out as well as I originally hoped, I still think that's ok, and having a bit of template goo to enable that is not bad. I'm just talking about changing how the "redirection" mechanism works -- the redirected functions don't need to be uniquely identifiable in the same way, as they can just be linked to the original function id, and so we should be able to come up with a less goo-ey mechanism for writing those. While that would certainly be "work" I don't think it would be a "significant" amount of work, as we only have a handful of redirected functions.