Changes the signature of the std::function to return an Expected<std::string>
instead of std::string to allow for (non-fatal) failures.  Previously, we
expected that any failures would be expressed with assertions. However, that's
unfriendly to running the code in servers or other places that don't want their
library calls to crash the program.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Why would we consider this a legitimate failure, rather than a programming error?
Same argument could be made about any form of format-string-like functions, e.g. llvm::formatv or sprintf.
Yet, they return strings and not Expected<string> or their equivalent.
It's not that it's a legitimate failure so much as an invalid argument -- that is, the caller caused it and therefore should handle the failure. For a standalone tool making the call, I'd argue they should treat it as a programming error and assert to crash. However, a server that takes the input from, say, a user will want to propagate that back to the user.
As for formatv and sprintf -- I think the difference is that this has a more dynamic design. Both of those take explicit arguments at the callsite, vs. TextGenerator which takes a MatchResult. But, that's just a detail -- the key thing is that we're trying to support a usecase where the input may be flawed and we want to fail gracefully. We do so at the cost of the added complexity of Expected<>.
Alternatives:
- add a separate validation function. But, this will complicate the design since we'd need to pass two functions rather than one. That is, TextGenerator would need to be a pair of functions.
- return an empty string on failure. This has the typical tradeoffs of using a sentinel value.
I'd argue it's the server's job to validate the inputs in that case.
The code that landed so far clearly looks like a C++ DSL to describe transformations of the source. While it can be used a dependency in the server-side, I don't see why doing user-input checking should be done by the library, rather than the server itself.
User input would have to be transformed into the calls of the C++ API somehow, that looks like a proper layer to do the validation.
The problem is that validation can't* be done in the abstract. It has to be done with respect to a specific match result. Unfortunately, the server won't be layered directly on top of the call to the TextGenerator -- the rewrite rule is interposed between them. That is, the client of the TG is the rewriterule and that's where the validation has to happen, but the TG is opaque to the rewrite rule, so it can't hardcode that validation logic. So, we'd need to change TextGenerator to bundle a validation function and string generator. Is it still worth it in that case?
*"can't" is a bit strong. I could imagine a design which allows full analysis and validation of rewrite rules before they are executed, but it would be far more sophisticated than the current design.
Could you provide more details into how the server app is layered? Specifically, what are the user inputs and how do they get translated into the transformer library calls?
I imagine their should be a syntax for textual representation of the generator, the representation should allow referring to the named match results. While parsing this representation, it should be possible to collect all matches of named nodes and make sure there are corresponding binding in the rewrite rule.
If the setup is different, I may be missing where the complexity comes from and I would love to learn about it.
Add test for new behavior.
In the process, tweak the handling of errors from TextGenerators in Transformer:
instead of printing to llvm::errs, we set the error in the AtomicChange.
| clang/include/clang/Tooling/Refactoring/Transformer.h | ||
|---|---|---|
| 47 ↗ | (On Diff #197255) | NIT: maybe shorten a bit, still capturing the essence? Something like the following should be enough: Note that \p TextGenerator is allowed to fail, e.g. when trying to access a matched node that was not bound. Allowing this to fail simplifies error handling for interactive tools like clang-query. | 
| 58 ↗ | (On Diff #197255) | Maybe drop trivially successful? Does not seem to be super-important. | 
| 238 ↗ | (On Diff #197255) | s/it's/its | 
| clang/lib/Tooling/Refactoring/Transformer.cpp | ||
| 164 ↗ | (On Diff #197255) | Maybe follow a typical pattern for handling errors here (to avoid OrErr suffixes and an extra Err variable)? I.e. auto Replacement = Edit.Replacement(Result); if (!Replacement) return Replacement.takeError(); T.Replacement = std::move(*Replacement); | 
| 205 ↗ | (On Diff #197255) | This looks super-complicated. The following alternatives would encourage clients to handle errors properly: 
 WDYT about picking one of those two? | 
| clang/lib/Tooling/Refactoring/Transformer.cpp | ||
|---|---|---|
| 205 ↗ | (On Diff #197255) | Agreed! I was using setError on the assumption that it was the "standard" way to express errors. Given that it seems to be totally ignored otherwise, let's go with option 1. I'll update the revision. | 
Addresses comments, including error handling style and signature of ChangeConsumer.
Updates testing code to use new ChangeConsumer signature.
| clang/lib/Tooling/Refactoring/Transformer.cpp | ||
|---|---|---|
| 164 ↗ | (On Diff #197255) | Here and elsewhere. | 
This breaks a test: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-msan/builds/12112/steps/check-llvm%20check-clang%20stage3%2Fmsan/logs/stdio
[----------] 1 test from TransformerTest
[ RUN      ] TransformerTest.NodePartNameDeclRefFailure
/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm/tools/clang/unittests/Tooling/TransformerTest.cpp:66: Failure
Value of: MaybeActual
  Actual: false
Expected: true
Rewrite failed. Expecting: 
    struct Y {
      int operator*();
    };
    int neutral(int x) {
      Y y;
      int (Y::*ptr)() = &Y::operator*;
      return *y + x;
    }
  
[  FAILED  ] TransformerTest.NodePartNameDeclRefFailure (83 ms)
[----------] 1 test from TransformerTest (83 ms total)Can you take a look?