It implements the multiline support as described at https://wg21.link/LWG2503
Details
Diff Detail
Event Timeline
To keep the code easier to read, I would be tempted to remove almost all of the #ifdefs here, and just make __use_multiline and the constants depend on the language level.
[ Note that I haven't tried this; just looking to make the code simpler. ]
libcxx/include/regex | ||
---|---|---|
2003 | I think that this is going to have to go behind an "ABI break" flag, because it changes the size of __l_anchor. I'll look into it some more. |
Note: Since this was resolved by an LWG issue (rather than a paper), we should provide this capability back to C++11.
(So you can get rid of all the #ifdefs
Thanks for the review.
Once it's known whether or not the __l_anchor changes are an ABI break I'll update the patch.
Is there a way to test what is and what is not an ABI break?
Yes, I think it is. Imagine a program compiled with an old version of libc++ (which assumes that __l_anchor doesn't have a bool) which creates a regex with a __l_anchor node in it, and then passes that regex to a shared library that was compiled with the new headers (which assumes __l_anchor has a bool). The new library will try to read the field which doesn't exist. This can be a problem because the <regex> code is free to be inlined into users programs and libraries too.
If we added a new derived class instead, I think this could solve our problem. Indeed, __l_anchor would stay unchanged, however we could have something like a __multiline_l_anchor?
The other option is that we just take that ABI break. Do we have a way of quantifying how bad it would be? What regexes would be impacted? Is it regexes with ^ or $ used as anchors, passed through ABI boundaries? If we want to go down this route, we could also try to see what the extent of the ABI break is, i.e. if we tend to crash it's better than if we silently run and do something weird.
Thanks for the confirmation it's an ABI break. I like your suggestion to use a derived class, so I'll try that solution first.
I'm not 100% sure that solution addresses the ABI break problem, though. I'd have to think about it harder and run it through at least another person to be confident.
Can you comment on that:
Do we have a way of quantifying how bad it would be? What regexes would be impacted? Is it regexes with ^ or $ used as anchors, passed through ABI boundaries?
If that's an ABI break we can afford to make, it would be great cause it would avoid increasing the code complexity.
I'll update the diff with this approach in a sec.
Can you comment on that:
Do we have a way of quantifying how bad it would be? What regexes would be impacted? Is it regexes with ^ or $ used as anchors, passed through ABI boundaries?
Yes it requires a regex with a ^ or $ anchor. It should only break if created with the old version (no bool in the struct) and executed by the new version (with a bool in the struct)
If that's an ABI break we can afford to make, it would be great cause it would avoid increasing the code complexity.
Googling/looking in GitHub is hard since it ignores special characters. I searched on https://codesearch.debian.net
and found 318 hits looking for std::regex("^ 205 were in the llvm projects so about 100 in other projects.
Looking at std::regex\(".*\$ found 309 matches with 177 in the llvm projects. This regex hit some false positives.
So at least in Debian it is not used that often. So we have 3 options:
- Use the version with inheritance if ABI safe. Pro no #ifdefs, Con adds some 'dead code' for ABI compatibility
- Replace #if _LIBCPP_STD_VER > 14 of the current patch with an ABI flag. Pro ABI compatible, Con adds a lot if #ifdefs
- Break the ABI, Pro cleanest code, Con an ABI breakage
If 1 is ABI safe I would prefer that option, else I would slightly prefer to avoid an ABI breakage.
What's your opinion?
libcxx/include/regex | ||
---|---|---|
2104 | To avoid code duplication, we could create a helper function that takes a bool __multiline parameter. Then, __r_anchor_multiline<_CharT>::__exec would call that helper function with __multiline as an argument, and __r_anchor<_CharT>::__exec would call the helper function with false. | |
4771 | Do we ever create a __l_anchor anymore? If not, then we could in theory delete that code. Actually, now that I think of it, I believe we could simply add the functionality to __l_anchor itself, but rename the class to something else. The idea is:
I believe the ABI break only happens if we change __l_anchor::execute without changing the name of the class, since that's when two libraries compiled at different times (and hence with different versions of __l_anchor) can disagree on the layout of __l_anchor. If we change the name of the class, I think it should all work out. WDYT? |
libcxx/include/regex | ||
---|---|---|
2104 | That's indeed possible. I can create an updated patch with that approach. | |
4771 | No after this change we'll never create a __l_anchor. I expected removing the __l_anchor class from the header would be an ABI break. But I think you're right since this is a templated class. So that means I:
Agreed? |
libcxx/include/regex | ||
---|---|---|
4771 | Sorry this got dropped. Yes, we agree on the approach. Regarding
My reasoning is that since it's a class template that isn't explicitly instantiated in and exported from the dylib, any code that references __l_anchor already has a weak definition for the vtable. It needs to, because that vtable isn't defined anywhere else the linker can tell upfront. So, just removing that class from the headers isn't going to break anything, because
ABIs are tricky, so I hope I'm not missing something. But if my reasoning holds, your approach LGTM. Let's go with that. |
libcxx/include/regex | ||
---|---|---|
4771 | Thanks for the explanation. |
LGTM, thanks a lot for this! Consider my comment about the comments you left behind, but feel free to commit without implementing my suggestion.
I don't understand why CI isn't running -- it seems to be doing that with a few patches. If you commit this, can you please keep an eye on https://buildkite.com/llvm-project/libcxx-ci/builds?branch=master for the next build?
libcxx/include/regex | ||
---|---|---|
2001 | I don't think this comment is relevant, since __l_anchor doesn't exist anymore. I feel like questions of this nature can be answered best via git blame and following the link back to this review. If you feel strongly that it adds value, please feel free to leave it in. | |
2040 | Ditto. |
Thanks for the review!
I'll remove them.
I don't understand why CI isn't running -- it seems to be doing that with a few patches. If you commit this, can you please keep an eye on https://buildkite.com/llvm-project/libcxx-ci/builds?branch=master for the next build?
I'll keep an eye on the build bot.
I update the patches in the review using Phabricator's web-interface. Can the fact that I don't use arc be the reason the CI builds aren't triggered?
I think that's most likely it. It seems like CI isn't triggered when the repository isn't set on the review. I think Mikhail is looking into it. Don't worry about it.
I don't think this comment is relevant, since __l_anchor doesn't exist anymore. I feel like questions of this nature can be answered best via git blame and following the link back to this review. If you feel strongly that it adds value, please feel free to leave it in.