This is an archive of the discontinued LLVM Phabricator instance.

[Error] Add stack traces for llvm::Error invariant violations.
Needs ReviewPublic

Authored by lhames on Nov 22 2019, 8:03 AM.

Details

Summary

This patch adds stack traces for two llvm::Error invariant violations,
dropped errors and failure values passed to cantFail calls. In each
case the stack trace of the failure point will now be printed. E.g.

"""
Failure value returned from cantFail wrapped call
foo
cantFail called from:
0 error-backtrace-test 0x0000000100e220a3 llvm::sys::
PrintStackTrace(llvm::raw_ostream&) + 307
1 error-backtrace-test 0x0000000100d4c286 llvm::detail::
fatalCantFailUnhandledError(llvm::Error, char const*) + 614
2 error-backtrace-test 0x0000000100ea475a llvm::
cantFail(llvm::Error, char const*) + 458
3 error-backtrace-test 0x0000000100ea4517 foo() + 231
4 error-backtrace-test 0x0000000100ea49d1 main + 497
5 libdyld.dylib 0x00007fff5c321011 start + 1
"""

If the LLVM_ENABLE_ERROR_TRACING environment variable is set (and the Error
was a failure value) then a second trace will be printed before the first
showing where the error originated:

"""
Error thrown from:
0 error-backtrace-test 0x00000001016f20a3 llvm::sys::
PrintStackTrace(llvm::raw_ostream&) + 307
1 error-backtrace-test 0x000000010161b206 llvm::detail::
annotateErrorWithTrace(llvm::ErrorInfoBase&) + 278
2 error-backtrace-test 0x00000001017742fb llvm::Error
llvm::make_error<llvm::StringError, char const (&) [4], std::1::error_code>(
char const (&) [4], std::
1::error_code&&) + 683
3 error-backtrace-test 0x0000000101773fcc bar() + 412
4 error-backtrace-test 0x0000000101774508 foo() + 216
5 error-backtrace-test 0x00000001017749d1 main + 497
6 libdyld.dylib 0x00007fff5c321011 start + 1
7 libdyld.dylib 0x0000000000000001 start + 18446603338969378801
"""

There are no performance implications for the first stack trace (to the failure
point) since printing only happens on invariant violation where we were about
to abort anyway.

The second scheme does have performance implications, however I believe the
trade-offs are reasonable:

  • Success values are unaffected.
  • When the flag is off the impact for each failure value is two null-checks: One at failure value construction time, and one at failure value destruction time.
  • When the flag is on the impact for each failure value is one stack trace and two map operations: One to store the trace and a second to discard it.

This scheme is intended as a generalization of the proposals in
https://reviews.llvm.org/D70259 and https://reviews.llvm.org/D70263 . The aim
is to make debugging of Error handling failures easier, including when working
with remote JITs where logging may be substantially easier than attaching a
debugger.

Known outstanding issues:

  • LLVM's stack traces don't include file and line info on Darwin yet. I am investigating that now.
  • Test cases needed.

Event Timeline

lhames created this revision.Nov 22 2019, 8:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2019, 8:03 AM
lhames updated this revision to Diff 230672.Nov 22 2019, 9:27 AM
  • Replace report_fatal_error declaration that was accidentally removed.
lhames edited the summary of this revision. (Show Details)Nov 22 2019, 9:29 AM
lhames edited the summary of this revision. (Show Details)
dblaikie added inline comments.Nov 22 2019, 5:49 PM
llvm/include/llvm/Support/Error.h
762–763
llvm/lib/Support/Error.cpp
73–74

Two map lookups - better to use find & check for end to do only one map lookup.

Actually DenseMap's 'lookup' might suffice.

86–105

I was going to ask - why all this rather than just asserting and letting the existing crash reporting stuff work. But I guess that crash reporting stuff only works for clang but not other LLVM tools? Perhaps it could be made to work? (though perhaps that's too much work - guess it involves reserving memory for use safely during crashes, etc))

Hmm, I guess it's not entirely about the clang crash reporter - a simple assert failure in llvm-dwarfdump does at least print the stack trace:

llvm-dwarfdump-tot: /usr/local/google/home/blaikie/dev/llvm/src/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:632: int main(int, char **): Assertion `false' failed.
Stack dump:
0.      Program arguments: llvm-dwarfdump-tot -debug-info -v loc.o 
 #0 0x000000000145b977 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/Unix/Signals.inc:548:11
 #1 0x000000000145bb19 PrintStackTraceSignalHandler(void*) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/Unix/Signals.inc:609: 1

So, yeah - why not just an assert? (or wasn't that what the failure mode was before this change)

Oh, I see it was unreachable and still has an unreachable - that might make this error message a bit redundant - printing the stack trace & then unreachable, which will print the stack trace again?

(the ErrorTraces part is value-add/new functionality here, for sure & assert/unreachable wouldn't ever provide that functionality of course)

175

This abort'll also print the stack trace - so seems that the "PrintStackTrace(dbgs())" would be redundant with this as well?

llvm/lib/Support/InitLLVM.cpp
32–34

If the code for this is always going to be present - what's the benefit from using a side table rather than making Error itself a little bigger (a pointer) & skip the side table/global state/global ctors/dtors? It's not like storing lots of Errors is a supported scenario, so I doubt an extra pointer in an Error would significantly increase memory usage at any point?

lhames marked 5 inline comments as done.Nov 22 2019, 7:08 PM
lhames added inline comments.
llvm/include/llvm/Support/Error.h
762–763

Yep.

llvm/lib/Support/Error.cpp
73–74

Yep.

86–105

I was going to ask - why all this rather than just asserting and letting the existing crash reporting stuff work.

The benefit to printing ourselves is that we can disambiguate the stack traces ("this is for the error, this is for where you dropped it") and ensure (by calling PrintStackTrace manually) that these two things are rendered side-by-side. I have not satisfied myself that assertions will always generate a similar trace (and not be intercepted by some crash logging tool) yet, but if that's the case then we can just do that.

175

Yep -- error message polish is to-do here. We do want to avoid redundant traces.

llvm/lib/Support/InitLLVM.cpp
32–34

It's sort of much-of-a-muchness. We do support multiple in-flight errors, so I like the idea of not increasing the size of ErrorInfo for something that won't be used 99.9% of the time.

grimar added a subscriber: grimar.Nov 24 2019, 12:38 AM
dblaikie added inline comments.Dec 12 2019, 4:25 PM
llvm/lib/Support/Error.cpp
86–105

Yeah, the current error message behavior I'm getting is a bit more than desirable, I imagine (the "discarded from" and "?

Program aborted due to an unhandled Error. Error message:
"version 6 of .debug_addr section at offset 0x0 is not supported"
Error thrown from:
#0 0x00000000014751a7 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/Unix/Sign
als.inc:548:11
#1 0x00000000013dd931 llvm::detail::annotateErrorWithTrace(llvm::ErrorInfoBase&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Suppo
rt/Error.cpp:64:3
#2 0x0000000000d098d5 llvm::Error llvm::make_error<llvm::StringError, std::cxx11::basic_string<char, std::char_traits<char>, std::allocat
or<char> >&, std::error_code&>(std::
cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::error_code&) /usr/loca
l/google/home/blaikie/dev/llvm/src/llvm/include/llvm/Support/Error.h:356:16
#3 0x0000000000d2de7b llvm::Error llvm::createStringError<unsigned short, unsigned long>(std::error_code, char const*, unsigned short const
&, unsigned long const&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/include/llvm/Support/Error.h:1211:1
#4 0x0000000000d2d231 llvm::DWARFDebugAddrTable::extract(llvm::DWARFDataExtractor, unsigned long*, unsigned short, unsigned char, std::func
tion<void (llvm::Error)>) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/DebugInfo/DWARF/DWARFDebugAddr.cpp:90:5
#5 0x0000000000cf470e dumpAddrSection(llvm::raw_ostream&, llvm::DWARFDataExtractor&, llvm::DIDumpOptions, unsigned short, unsigned char) /u
sr/local/google/home/blaikie/dev/llvm/src/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:250:21
#6 0x0000000000cf1a9c llvm::DWARFContext::dump(llvm::raw_ostream&, llvm::DIDumpOptions, std::array<llvm::Optional<unsigned long>, 28ul>) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:537:18
#7 0x0000000000cbe626 dumpObjectFile(llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:447:3
#8 0x0000000000cd83ed std::_Function_handler<bool (llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&), bool (*)(llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&)>::_M_invoke(std::_Any_data const&, llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine&&, llvm::raw_ostream&) /usr/local/google/home/blaikie/install/bin/../lib/gcc/x86_64-pc-linux-gnu/8.1.0/../../../../include/c++/8.1.0/bits/std_function.h:282:2
#9 0x0000000000cc9ade std::function<bool (llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&)>::operator()(llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&) const /usr/local/google/home/blaikie/install/bin/../lib/gcc/x86_64-pc-linux-gnu/8.1.0/../../../../include/c++/8.1.0/bits/std_function.h:687:7
#10 0x0000000000cbe90d handleBuffer(llvm::StringRef, llvm::MemoryBufferRef, std::function<bool (llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&)>, llvm::raw_ostream&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:494:14
#11 0x0000000000cbe131 handleFile(llvm::StringRef, std::function<bool (llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&)>, llvm::raw_ostream&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:528:3
#12 0x0000000000cbda32 main /usr/local/google/home/blaikie/dev/llvm/src/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:643:7
#13 0x00007f187e56552b libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2352b)
#14 0x0000000000caf02a _start (/usr/local/google/home/blaikie/dev/llvm/build/default/bin/llvm-dwarfdump+0xcaf02a)
Error discarded from:
#0 0x00000000014751a7 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/Unix/Signals.inc:548:11
#1 0x00000000013de260 llvm::Error::fatalUncheckedError() const /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/Error.cpp:172:3
#2 0x0000000000cb517a llvm::Error::assertIsChecked() /usr/local/google/home/blaikie/dev/llvm/src/llvm/include/llvm/Support/Error.h:286:3
#3 0x0000000000cb3f3c llvm::Error::~Error() /usr/local/google/home/blaikie/dev/llvm/src/llvm/include/llvm/Support/Error.h:245:5
#4 0x0000000000cf47e2 dumpAddrSection(llvm::raw_ostream&, llvm::DWARFDataExtractor&, llvm::DIDumpOptions, unsigned short, unsigned char) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:250:15
#5 0x0000000000cf1a9c llvm::DWARFContext::dump(llvm::raw_ostream&, llvm::DIDumpOptions, std::array<llvm::Optional<unsigned long>, 28ul>) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:537:18
#6 0x0000000000cbe626 dumpObjectFile(llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:447:3
#7 0x0000000000cd83ed std::_Function_handler<bool (llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&), bool (*)(llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&)>::_M_invoke(std::_Any_data const&, llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine&&, llvm::raw_ostream&) /usr/local/google/home/blaikie/install/bin/../lib/gcc/x86_64-pc-linux-gnu/8.1.0/../../../../include/c++/8.1.0/bits/std_function.h:282:2
#8 0x0000000000cc9ade std::function<bool (llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&)>::operator()(llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&) const /usr/local/google/home/blaikie/install/bin/../lib/gcc/x86_64-pc-linux-gnu/8.1.0/../../../../include/c++/8.1.0/bits/std_function.h:687:7
#9 0x0000000000cbe90d handleBuffer(llvm::StringRef, llvm::MemoryBufferRef, std::function<bool (llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&)>, llvm::raw_ostream&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:494:14
#10 0x0000000000cbe131 handleFile(llvm::StringRef, std::function<bool (llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&)>, llvm::raw_ostream&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:528:3
#11 0x0000000000cbda32 main /usr/local/google/home/blaikie/dev/llvm/src/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:643:7
#12 0x00007f187e56552b
libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2352b)
#13 0x0000000000caf02a _start (/usr/local/google/home/blaikie/dev/llvm/build/default/bin/llvm-dwarfdump+0xcaf02a)
Stack dump:
0. Program arguments: /usr/local/google/home/blaikie/dev/llvm/build/default/bin/llvm-dwarfdump -debug-addr x.o
#0 0x00000000014751a7 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/Unix/Signals.inc:548:11
#1 0x0000000001475349 PrintStackTraceSignalHandler(void*) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/Unix/Signals.inc:609:1
#2 0x0000000001473c2b llvm::sys::RunSignalHandlers() /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/Signals.cpp:67:5
#3 0x0000000001475ac5 SignalHandler(int) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/Unix/Signals.inc:390:1
#4 0x00007f187f8553a0 restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x123a0)
#5 0x00007f187e578cfb raise (/lib/x86_64-linux-gnu/libc.so.6+0x36cfb)
#6 0x00007f187e5638ad abort (/lib/x86_64-linux-gnu/libc.so.6+0x218ad)
#7 0x00000000013de27e (/usr/local/google/home/blaikie/dev/llvm/build/default/bin/llvm-dwarfdump+0x13de27e)
#8 0x0000000000cb517a llvm::Error::assertIsChecked() /usr/local/google/home/blaikie/dev/llvm/src/llvm/include/llvm/Support/Error.h:286:3
#9 0x0000000000cb3f3c llvm::Error::~Error() /usr/local/google/home/blaikie/dev/llvm/src/llvm/include/llvm/Support/Error.h:245:5
#10 0x0000000000cf47e2 dumpAddrSection(llvm::raw_ostream&, llvm::DWARFDataExtractor&, llvm::DIDumpOptions, unsigned short, unsigned char) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:250:15
#11 0x0000000000cf1a9c llvm::DWARFContext::dump(llvm::raw_ostream&, llvm::DIDumpOptions, std::array<llvm::Optional<unsigned long>, 28ul>) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:537:18
#12 0x0000000000cbe626 dumpObjectFile(llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:447:3
#13 0x0000000000cd83ed std::_Function_handler<bool (llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&), bool (*)(llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&)>::_M_invoke(std::_Any_data const&, llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine&&, llvm::raw_ostream&) /usr/local/google/home/blaikie/install/bin/../lib/gcc/x86_64-pc-linux-gnu/8.1.0/../../../../include/c++/8.1.0/bits/std_function.h:282:2
#14 0x0000000000cc9ade std::function<bool (llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&)>::operator()(llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&) const /usr/local/google/home/blaikie/install/bin/../lib/gcc/x86_64-pc-linux-gnu/8.1.0/../../../../include/c++/8.1.0/bits/std_function.h:687:7
#15 0x0000000000cbe90d handleBuffer(llvm::StringRef, llvm::MemoryBufferRef, std::function<bool (llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&)>, llvm::raw_ostream&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:494:14
#16 0x0000000000cbe131 handleFile(llvm::StringRef, std::function<bool (llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine, llvm::raw_ostream&)>, llvm::raw_ostream&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:528:3
#17 0x0000000000cbda32 main /usr/local/google/home/blaikie/dev/llvm/src/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:643:7
#18 0x00007f187e56552b
libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2352b)
#19 0x0000000000caf02a _start (/usr/local/google/home/blaikie/dev/llvm/build/default/bin/llvm-dwarfdump+0xcaf02a)
Aborted

(wonder if there's anything we can do about making these things more readable in general - line wrapping, colours, etc? even just some blank lines/bigger/bolder headers at the start of each stack trace might be helpful - but that sort of thing probably wouldn't be in this patch, of course)

If you're set on not producing a separate/duplicate stack trace from the one general stack dump one, fair enough/guess I can't talk you out of it & follow up patches could help make it more legible, for sure. (trimming the top few stack frames to get back up into user code/through the llvm::Error implementation details (something the generic code can't reasonably do), maybe pruning the common stack frames between the thrown/dropped stack? etc)

llvm/lib/Support/InitLLVM.cpp
32–34

I'd probably lean towards adding the extra pointer until there's a real use case for Error that has scaling problems because of the size rather than adding the side table now. Global variables/global ctors/dtors are something I think we're generally trying to avoid in LLVM (so users of LLVM don't have to pay for things they don't need - in the form of global ctors that hurt those users startup time) (though the cl::opt is the biggest perpetrator & hasn't been addressed yet).

Not super relevant, but FWIW, assigning to and reading from a unique_ptr isn't multithread safe - so you will have to make sure you call enableErrorTracing before any threads start that might create errors.

lhames marked 2 inline comments as done.Jan 2 2020, 10:02 AM
lhames added inline comments.
llvm/lib/Support/Error.cpp
86–105

Yeah, the current error message behavior I'm getting is a bit more than desirable...

I think stack traces are often long, and since this involves two it's going to be twice as long. I wonder if adding more verbiage at the top might actually help in this case, especially for people who will encounter this infrequently:

"Program crashed due to an unhandled error. The following two stack traces describe where the unhandled error was generated, and where it was discarded while in the unchecked state:"

I love the idea of adding line wrapping/colors to make this more readable, but I think that should just be applied to stack traces in general. I don't think it's specific to this patch.

Trimming the top few frames sounds *excellent*. I was thinking of proposing that myself. What about a "DropNFrames[=0]" argument to sys::PrintStackTrace for people who know their call to PrintStackTrace will sit some number of frames inside some uninteresting error handling machinery?

I love the idea of pruning common frames, but I can't figure out how to render it in a way that won't end up taking more cognitive effort to parse than just presenting two traces. Very happy to take suggestions on it, but I think it's something we could address in a follow-up.

llvm/lib/Support/InitLLVM.cpp
32–34

I'd be happy to switch it to a ManagedStatic (which I believe doesn't have static ctors/dtors) or a custom global with manual management and a zero-init, but I do want to keep it as a side-table.

Thanks for pointing out the unique_ptr thread safety issue. That's another reason to move away from unique_ptr here.

FWIW working on a codebase that heavily uses both Error and threads, the side-table seems fairly a bit terrifying. Even if the thread safety issues are fixed, do we really want to pay for mutex lock/unlock to create errors, to save a pointer?

thread_local might be an option, though as I understand llvm/Support may have to be portable to environments where that doesn't work well.

lhames added a comment.Jan 2 2020, 5:01 PM

FWIW working on a codebase that heavily uses both Error and threads, the side-table seems fairly a bit terrifying. Even if the thread safety issues are fixed, do we really want to pay for mutex lock/unlock to create errors, to save a pointer?

thread_local might be an option, though as I understand llvm/Support may have to be portable to environments where that doesn't work well.

An atomic pointer would work too. This cost is only paid on failure values and these are already heap allocated, so I'm I wouldn't expect the overhead to be prohibitive either way. I like the idea that the side table can be entirely compiled out in release mode (if we decide to go that way) without changing the shape of any error types or leaving any dead fields.

lhames added a comment.Jan 2 2020, 5:09 PM

Oh, I remember part of my motivation for designing it this way: While failure values are expected to be expensive, we don't want to make them unnecessarily expensive, which is why the error-geneation-point trace is under a flag: You don't want to stack-trace every error that is ever generated. However, making this conditional means we need to check a global flag to decide whether we *should* do the stack trace. Once you're doing that, you may as well make your "flag" a pointer to the side table.

Oh, I remember part of my motivation for designing it this way: While failure values are expected to be expensive, we don't want to make them unnecessarily expensive, which is why the error-geneation-point trace is under a flag: You don't want to stack-trace every error that is ever generated. However, making this conditional means we need to check a global flag to decide whether we *should* do the stack trace. Once you're doing that, you may as well make your "flag" a pointer to the side table.

Oh, totally all for the conditional stack trace generation, and we've got lots of flags (cl::opts, env variable tests, some straight up global variables (the latter being fairly rare, but I think there's some precedent for them in LLVM?)) - but I think there's significantly (that's certainly a value judgment - impossible to quantify) complexity in having a mutable side table for those stack traces - so, for me at least, one doesn't follow from the other necessarily.

FWIW working on a codebase that heavily uses both Error and threads, the side-table seems fairly a bit terrifying. Even if the thread safety issues are fixed, do we really want to pay for mutex lock/unlock to create errors, to save a pointer?

thread_local might be an option, though as I understand llvm/Support may have to be portable to environments where that doesn't work well.

An atomic pointer would work too. This cost is only paid on failure values and these are already heap allocated, so I'm I wouldn't expect the overhead to be prohibitive either way.

I don't follow this - are you saying if heap allocation is OK, locking a mutex must be too, as it's cheaper?
Locking/unlocking a shared global mutex can certainly be more expensive than malloc if there's a lot of contention. malloc implementations typically go to some lengths to avoid a global lock for this reason.

Example workload: clangd typically runs ~30 threads for background indexing, each thread frequently calls functions returning Expected<T> and recovers from errors (Error is documented as being for recoverable errors). It'd be a big surprise if this sort of use of Error led to contention/serialization on these worker threads.

FWIW working on a codebase that heavily uses both Error and threads, the side-table seems fairly a bit terrifying. Even if the thread safety issues are fixed, do we really want to pay for mutex lock/unlock to create errors, to save a pointer?

thread_local might be an option, though as I understand llvm/Support may have to be portable to environments where that doesn't work well.

An atomic pointer would work too. This cost is only paid on failure values and these are already heap allocated, so I'm I wouldn't expect the overhead to be prohibitive either way.

I don't follow this - are you saying if heap allocation is OK, locking a mutex must be too, as it's cheaper?
Locking/unlocking a shared global mutex can certainly be more expensive than malloc if there's a lot of contention. malloc implementations typically go to some lengths to avoid a global lock for this reason.

Example workload: clangd typically runs ~30 threads for background indexing, each thread frequently calls functions returning Expected<T> and recovers from errors (Error is documented as being for recoverable errors). It'd be a big surprise if this sort of use of Error led to contention/serialization on these worker threads.

Note this feature's only intended to be opt-in for debugging purposes, so for production use cases it shouldn't be a huge deal (though could be cheaper/free if it was a build-time setting rather than something you could opt into/out of at runtime - perhaps that's the point you're making? Though we do check various globals (cl::opts, etc) in a fair number of codepaths, I think, such that adding one doesn't seem likely to be super problematic?). (as much as I agree with you that I'd rather not have a side table)

FWIW working on a codebase that heavily uses both Error and threads, the side-table seems fairly a bit terrifying. Even if the thread safety issues are fixed, do we really want to pay for mutex lock/unlock to create errors, to save a pointer?

thread_local might be an option, though as I understand llvm/Support may have to be portable to environments where that doesn't work well.

An atomic pointer would work too. This cost is only paid on failure values and these are already heap allocated, so I'm I wouldn't expect the overhead to be prohibitive either way.

I don't follow this - are you saying if heap allocation is OK, locking a mutex must be too, as it's cheaper?
Locking/unlocking a shared global mutex can certainly be more expensive than malloc if there's a lot of contention. malloc implementations typically go to some lengths to avoid a global lock for this reason.

Example workload: clangd typically runs ~30 threads for background indexing, each thread frequently calls functions returning Expected<T> and recovers from errors (Error is documented as being for recoverable errors). It'd be a big surprise if this sort of use of Error led to contention/serialization on these worker threads.

Note this feature's only intended to be opt-in for debugging purposes, so for production use cases it shouldn't be a huge deal (though could be cheaper/free if it was a build-time setting rather than something you could opt into/out of at runtime - perhaps that's the point you're making? Though we do check various globals (cl::opts, etc) in a fair number of codepaths, I think, such that adding one doesn't seem likely to be super problematic?). (as much as I agree with you that I'd rather not have a side table)

Part of the reason for this patch was a need to provide file/line number for errors that lead to aborts via llvm::cantFail. This can be very important when debugging failing bots for which you have not access and are unable to reproduce. Since this solution is too expensive to turn on all the time, it's doubtful it'll be enabled on any bots, which makes it useless for my primary use case.

That doesn't mean it wouldn't be useful in cases where you can reproduce locally, but that wasn't my initial motivation. I'm also wary of the side-table.

FWIW working on a codebase that heavily uses both Error and threads, the side-table seems fairly a bit terrifying. Even if the thread safety issues are fixed, do we really want to pay for mutex lock/unlock to create errors, to save a pointer?

thread_local might be an option, though as I understand llvm/Support may have to be portable to environments where that doesn't work well.

An atomic pointer would work too. This cost is only paid on failure values and these are already heap allocated, so I'm I wouldn't expect the overhead to be prohibitive either way.

I don't follow this - are you saying if heap allocation is OK, locking a mutex must be too, as it's cheaper?
Locking/unlocking a shared global mutex can certainly be more expensive than malloc if there's a lot of contention. malloc implementations typically go to some lengths to avoid a global lock for this reason.

Example workload: clangd typically runs ~30 threads for background indexing, each thread frequently calls functions returning Expected<T> and recovers from errors (Error is documented as being for recoverable errors). It'd be a big surprise if this sort of use of Error led to contention/serialization on these worker threads.

Note this feature's only intended to be opt-in for debugging purposes, so for production use cases it shouldn't be a huge deal (though could be cheaper/free if it was a build-time setting rather than something you could opt into/out of at runtime - perhaps that's the point you're making? Though we do check various globals (cl::opts, etc) in a fair number of codepaths, I think, such that adding one doesn't seem likely to be super problematic?). (as much as I agree with you that I'd rather not have a side table)

Part of the reason for this patch was a need to provide file/line number for errors that lead to aborts via llvm::cantFail. This can be very important when debugging failing bots for which you have not access and are unable to reproduce. Since this solution is too expensive to turn on all the time, it's doubtful it'll be enabled on any bots, which makes it useless for my primary use case.

That doesn't mean it wouldn't be useful in cases where you can reproduce locally, but that wasn't my initial motivation. I'm also wary of the side-table.

Sorry - I meant opt-in for production use cases. I think it'd make a lot of sense to enable it for buildbots (same way that we enable assertions on buildbots - but you'd never want assertions enabled in your clang production build, it'd be much too slow). But I'd like to see some functionality to specifically disable all crash symbolication for crash-expected test cases (GUnit EXPECT_DEATH) - those crashes aren't reported anyway (gunit swallows the output of the crashing program).