This is an archive of the discontinued LLVM Phabricator instance.

[Support] Allow formatv() to consume llvm::Error by-value without crashing.
AbandonedPublic

Authored by sammccall on Jul 6 2018, 2:27 AM.

Details

Reviewers
zturner
ioeric
Summary

Currently the following code crashes if an error occurs:

llvm::Expected<Foo> MaybeFoo = ...
if (!MaybeFoo)
  llvm::errs() << formatv("Error: {0}", MaybeFoo.takeError());

I feel conflicted about adding a special case like this, but

  • Error has an unusual API that interacts strangely with other libraries.
  • it's fairly important, and
  • it gets used conditionally on error-handling paths that are rarely tested.

The main alternative I can see is to remove operator<< from Error again.
This requires you to write formatv(..., llvm::toString(MaybeFoo.takeError())),
and provides no way of printing an error by reference.

Diff Detail

Event Timeline

sammccall created this revision.Jul 6 2018, 2:27 AM
dblaikie added a subscriber: lhames.Jul 9 2018, 2:50 PM
dblaikie added a subscriber: dblaikie.

There are two more possibilities I can think of. Re-writing your original code sample in 2 different ways:

// Method 1
llvm::Expected<Foo> MaybeFoo = ...
if (!MaybeFoo) {
  Error E = MaybeFoo.takeError();
  llvm::errs() << formatv("Error: {0}", E);
  consumeError(std::move(E));
}
// Method 2
llvm::Expected<Foo> MaybeFoo = ...
if (!MaybeFoo)
  llvm::errs() << formatv("Error: {0}", fmt_consume_error(MaybeFoo.takeError()));

The former is more verbose and may be the best solution unless you're finding that you need to format errors very very often. It has the advantage of requiring no code change at all

The latter is a bit more verbose (perhaps we can make the name more concise), but at least it makes it explicit that the error object will be destroyed as a result of this call.

The latter could be done by modeling a FormatAdapter after the existing ones like fmt_repeat, fmt_pad, etc.

Thoughts?

lhames added a comment.Jul 9 2018, 3:19 PM

Currently the toString operation on Error is consuming. Could this just be:

errs() << toString(MaybeFoo.takeError());

If the aim is to apply special formatting to errors, my preference would be for Zachery's Method (2) as it makes the path of ownership more explicit.

Thanks for looking at this, and sorry it's a complicated space.
Context is: I'm trying to move clangd's logging to wrap formatv instead of ad-hoc concatenation. One of the motivations is to make dealing with llvm::Error/Expected less painful. (In hindsight, we maybe should have rolled our own type here maybe, but Error is pervasive in our code now).

There are two more possibilities I can think of. Re-writing your original code sample in 2 different ways:

// Method 1
llvm::Expected<Foo> MaybeFoo = ...
if (!MaybeFoo) {
  Error E = MaybeFoo.takeError();
  llvm::errs() << formatv("Error: {0}", E);
  consumeError(std::move(E));
}

The former is more verbose and may be the best solution unless you're finding that you need to format errors very very often. It has the advantage of requiring no code change at all

This is a really common pattern for clangd - in many places we don't have better options in the protocol than logging the error and returning a null response to the client.

So with that context, I think there are at least two problems with method 1:

  • it's too verbose/noisy for such a common pattern
  • it's error prone: if you forget consumeError you get a crash, but only if the error condition actually occurs. If tests don't trigger an error path, it's easy to check in time bomb bugs.
// Method 2
llvm::Expected<Foo> MaybeFoo = ...
if (!MaybeFoo)
  llvm::errs() << formatv("Error: {0}", fmt_consume_error(MaybeFoo.takeError()));

The latter is a bit more verbose (perhaps we can make the name more concise), but at least it makes it explicit that the error object will be destroyed as a result of this call.

This seems better, but there are still a couple of problems:

  • it's error prone: if you forget the fmt_consume_error then it still compiles, but crashes (again, only if the error condition actually occurs).
  • it still seems too verbose to me (I don't *want* error consumption to be explicit, I'm already passing by value so it just adds noise). I can't think of a great name to mitigate this :-( However we could provide fmt_take_error(Expected<T>& X) equivalent to fmt_consume_error(X.takeError()) which reduces verbosity at the cost of orthogonality.

The latter could be done by modeling a FormatAdapter after the existing ones like fmt_repeat, fmt_pad, etc.

Thoughts?

We can probably find something along these lines (explicit) that will work. My question is *why* :-)

  • If you don't think that formatv(char*, Foo.takeError())is the best syntax, I'd like to try to convince you otherwise. (Roughly: it's clear and unambiguous, and consistent with formatv for other types. It's also consistent with toString(Error).)
  • if you don't want to have a special case in the implementation of formatv for Error, I can agree with that. Obviously reasonable people can disagree about the tradeoffs here. I'd prefer workarounds for Error to live in Error.h, but I don't see how to do that here.

One point that seems hard to resolve is that we really need to be able to print errors *without* consuming them, but we want to avoid the error-prone patterns above.
i.e. we'd like this to be legal: formatv("{0}", const Error&) but this should not be: formatv("{0}", Error&&) (unless we make it consume).
This doesn't seem easy to do directly in formatv, maybe we need to write the first formatv("{0}", fmt_err(const Error&)) (and rename the operator<< on Error to something else so it can't be selected).

Currently the toString operation on Error is consuming. Could this just be:
errs() << toString(MaybeFoo.takeError());

I don't think toString is the right function here (when passing to formatv).

  • it's semantically the wrong name: formatv's job is to stringify things, so stringifying them first is confusing and inconsistent with other types.
  • we may not actually want to produce the string (e.g. if the Logger implementation decides to filter out the formatv_object_base based on level)
  • llvm::toString is dangerously close to the generic llvm::to_string which does the same thing without consuming the error.

If this is the desired syntax form, a FormatAdapter could give similar syntax with a better name and semantics, I think.

An alternate approach in D49129: give Error an overloaded operator<< that has the same consume semantics at toString when passed rvalues.

This makes formatv, to_string, toString all do basically the same thing (which is the only sensible thing to do when passed an rvalue).

It does require minor changes to formatv and to_string to make them propagate the right kind of reference, but these seem reasonable.

Overall I think I like that approach better than this one (more consistent, Error handling lives in Error.h). WDYT?

In the samples you gave in the original code review, it was always just
formatv(“Error: {0}”, E);

Is it always this simple and consistent? Or do you frequently construct
complex messages with many other parameters in the same format string? If
it’s always (or even often) that simple, maybe you just need a function
like void logError(Error E)

In the samples you gave in the original code review, it was always just
formatv(“Error: {0}”, E);

Is it always this simple and consistent? Or do you frequently construct
complex messages with many other parameters in the same format string? If
it’s always (or even often) that simple, maybe you just need a function
like void logError(Error E)

No, there are custom messages and other parameters.
First example from the initial migration in D49008:
elog("Failed to update {0}: {1}", File, Contents.takeError());

(where elog forwards its args to formatv and then sends the results to the current logger)

That seems plausible (op<< for Error) - but does that handle the case
mentioned above "we may not actually want to produce the string (eg: if the
Logger implementation decides to filter out the formatv_object_base based
on level)" - is op<< still called even with this filtering? (seems like if
it was, then it wouldn't avoid the stringification costs? but if it doesn't
call the op<< then it'll still fail the consume semantic requirement of
llvm::Error)

That seems plausible (op<< for Error) - but does that handle the case
mentioned above "we may not actually want to produce the string (eg: if the
Logger implementation decides to filter out the formatv_object_base based
on level)" - is op<< still called even with this filtering? (seems like if
it was, then it wouldn't avoid the stringification costs? but if it doesn't
call the op<< then it'll still fail the consume semantic requirement of
llvm::Error)

Would it be reasonable to have the wrapper for an Error value call consumeError in its destructor? That way a filtered value would silently consume the error. A printed value would consume it twice, but that should be safe (if it isn't already then we can make it safe).

sammccall abandoned this revision.Jul 12 2018, 12:15 AM

Would it be reasonable to have the wrapper for an Error value call consumeError in its destructor? That way a filtered value would silently consume the error. A printed value would consume it twice, but that should be safe (if it isn't already then we can make it safe).

D49170 uses the constructor-wrapper approach, with an explicit adapter. I'll abandon this revision.
I don't think having the wrapper call <<(Error&&) is a good idea even with this, because it means formatv_object's operator<< isn't truly const (when printing twice the message is different, printing concurrently is racy, etc).
So we could still add <<(Error&&), but the wrapper should call <<(const Error&) as it does in D49170.