With Twine now ubiquitous after rG92a79dbe91413f685ab19295fc7a6297dbd6c824,
it needs support for string_view when using llvm as a library under C++ standards.
This is similar to how StringRef is handled.
Differential D103935
Add Twine support for std::string_view. saugustine on Jun 8 2021, 5:03 PM. Authored by
Details With Twine now ubiquitous after rG92a79dbe91413f685ab19295fc7a6297dbd6c824, This is similar to how StringRef is handled.
Diff Detail
Event Timeline
Comment Actions This addresses the upstream comments. It also adds a string_view overload for raw_ostream, as now that
Comment Actions This patch causes segmentation faults when you have an application compiled with -std=c++17 (The current gcc default) and libLLVM.so compiled with c++14 (The current LLVM project default). Is this a valid use case or do we require that all library users use the same c++ version as the library. Comment Actions The most immediate reason is the enum value StdStringViewKind has an (unnecessary) #ifdef around it. Making that unconditional seems entirely reasonable, and is easy enough. But, even with that, compiling with different standards versions will result in an ODR violation in getSingleStringRef(), which may or may not cause a problem, depending on whether it's inlined, and whether the C++14-compiled version of it wins the link choice. Also, if llvm itself is compiled with c++14 while the consumer is compiled with c++17, the various print methods defined in the cpp file will not work either. Comment Actions I'm not sure removing the ifdef guards around the enum value is better. That would also mean moving the #ifdef guards in all the switch statements so the enum value would be present but unhandled by doing nothing (or asserting or similar) which seems pretty wrong. Comment Actions Seems like valid/important use case to me! I wonder if the Twine part of the patch should wait for LLVM to drop support for C++14.
Comment Actions @dblaikie / @lhames , maybe one of you has ideas for how to add Twine support for std::string_view in C++17 and later without changing Twine's ABI? I think the following won't work, because the lifetime of the StringRef wouldn't be extended to match the lifetime of the calling expression: Twine(const std::string_view &view) : Twine(StringRef(view)) {} Comment Actions The ODR violation issues would need to be resolved to support this case.
This patch follows on another one that enables Twine support everywhere over std::string. Before that, because strings are trivially convertable to string_views, there was no ambiguity. With that change there is ambiguity in many places, and causes similar problems to the present one for downstream C++ 17 users. Not sure which case should win.
Comment Actions Well, there's also the option of reverting this patch.
I took a quick look at the patch mentioned in the description, rG92a79dbe91413f685ab19295fc7a6297dbd6c824, which seems to be just updating MLIR. I imagine there's a way to make those APIs easier to use without breaking LLVM. IMO, a compile-time failure (problem before this patch) is much easier to reason about and fix for a library user than a miscompile (problem after this patch). What is the ambiguity between? Twine also has a constructor directly from std::string so it's not obvious to me how that patch introduces ambiguity. If the problem is an LLVM client that's using std::string_view instead of StringRef, it seems reasonable for the client to explicitly convert to a StringRef() to use the APIs. Alternatively, specific APIs can add std::string_view wrapper overloads that defer to the Twine versions. Comment Actions Reverting this patch might help in the instant case, but the odds of something similar happening, where the data structure is different across different versions of the standard is very hard to guarantee. And also hard to test for. Comment Actions When things are hard to test for, we usually rely on vendors to find and report problems during qualification/deployment (like @tstellar did) and then revert/refine/fix the patch that broke them. Not sure why you see it as likely that a data structure would vary across different versions of the standard. Seems to me that'd only happen when someone has code (like this patch) that explicitly asks for it, assuming your toolchain is ABI compatible between standards versions, like clang and libc++ are. Comment Actions Yeah, the current state of this is not acceptable - compatibility across different standard versions is necessary. Yep. Would it be so bad if we made "Child" larger by storing a StringRef (or equivalent - anything involving a pointer+length) in it (so Child would be two words rather than one)? Then we could collapse several cases into that (StringRef, string_view, std::string, SmallString, (could even do char as well)). Comment Actions llvm/include has such __cplusplus dispatches in a few other places, e.g. llvm/include/llvm/ADT/StringRef.h L82 strLen. So those (-std=c++14 libLLVM.so + -std=c++17 application using llvm/include headers) users are already relying on benign ODR violation. LLVM has -fvisibility-inlines-hidden, and an additional -Bsymbolic-functions for safeguard, after linking into libLLVM.so, the ODR violation should be benign. Ensuring that struct/enum sizes are the same for different -std= modes is probably a sufficient fix to regular users (and a sufficiently good workaround for libLLVM.so with mixed -std= users). This thread does raise the question about what degree of support we want to provide for -DLLVM_LINK_LLVM_DYLIB=on users. Before previously shared object users were explicitly not supported. Now it may change as many Linux distributions have switched to -DLLVM_LINK_LLVM_DYLIB=on for their llvm/clang packages for size. With shared objects, we have to care about all sorts of compiler options which can cause non-benign ODR issues. Comment Actions I think this will be an issue for static or dynamic linking, due to inlining. Importantly - if LLVM was built withoun string_view support but a caller was built with it and that caller tries to pass a string_view into some LLVM API then even if the layout and enum elements were consistent, we'd still hit UB/assert/etc because the Twine passed in would have that unexpected enum value which wouldn't be covered by the Twine code inside LLVM. Having the implementation be the same (by using some form of char*+length representation which would always be supported), having an extra ctor that takes string_view but lowers to that same representation should be safe (yes, a benign ODR violation - the debug info will be a bit weird/missing that extra function/ctor - but codewise, I can't think of any way it wouldn't work). Comment Actions I will implement dblaikie's idea, which should unblock in the short term, but I think a larger conversation about the level of support that the project wants for this would be good to have. Comment Actions Just ran into this issue (we were a bit behind on integrating upstream) - glad to see it's already fixed. Regarding the following:
This particular case isn't a particularly egregious case of issues being hard to detect, given that it's an ABI change based on an ifdef - LLVM has a long history of avoiding doing this kind of thing (either through being compiler agnostic with Compiler.h or using LLVM_ENABLE_ABI_BREAKING_CHECKS) |
Is this constructor necessary? I would have expected this to get picked up as a conversion by the StringRef constructor... but maybe there are two candidates?