Page MenuHomePhabricator

[WIP] Replace std::forward & std::move by cast expressions during Sema
Needs ReviewPublic

Authored by cor3ntin on Apr 5 2021, 11:27 AM.

Details

Summary

This is a very early draft, mostly looking for direction/guidance at this point (notably whether the cost/benefit ratio of the overall idea is sensible).
I need to do a lot of refactors, some (a lot of) tests, etc.

std::move/std::forward are just casts, and are used a lot (230K instances in the VCPKG repository for std::move).

This code some issues:

  • Worse debug experience
  • More symbols
  • Slower build times through template instantiations
  • Worse Codegen
push    rbp
mov     rbp, rsp
sub     rsp, 16
mov     dword ptr [rbp - 4], edi
lea     rdi, [rbp - 4]
call    std::remove_reference<int&>::type&& std::move<int&>(int&)
mov     eax, dword ptr [rax]
add     rsp, 16
pop     rbp
ret

These functions could be marked force inline, but that does not address the build time concerns, and it doesn't solve O0 (some people seem to care?)
I also consider an LLVM pass to replace calls to std::move/std::forward, but that doesn't solve build time concerns.

In the end, I landed on the idea to replace these calls by a cast expression in the front end.

The general idea is to try to intercept qualified calls before overload resolution, and after overload resolutions for all calls.
The functions are still required to be declared, so lookup is not modified.
When a replacement fails, we try to create the CallExpr as usual so any diagnostic is emitted from there.
The intent is of course that this replacement cannot be observable.

  • Is this an avenue worth pursuing ?
  • Do we want an even more generic solution so that calls to eg, is_constant_evaluated can be replaced too?
  • Do we want to expand to as_const, to_integer, to_underlying, bit_cast?

In terms of AST nodes, do we want to preserve the original replaced expression?
My current CXXStdLibraryCastExpr assumes the call is resolved to a CallExpr, but that's no longer the case since I also replace calls before template instantiations,
so I would probably do something to keep the arguments.

I don't know how much information we need to retain, and as is certainly very visible, I have no idea what I am doing.

Nevertheless, I figured the path might be in a good enough state to start a discussion.
I am also thinking that making std::forward(x) or std::fwd(x) (notice the lack of angle brackets) a magic expression in C++23 using the same mechanism might be worth pursuing.
(instead of new syntax as in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0644r1.html)

Diff Detail

Event Timeline

cor3ntin created this revision.Apr 5 2021, 11:27 AM
cor3ntin requested review of this revision.Apr 5 2021, 11:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2021, 11:27 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
cor3ntin retitled this revision from [WIP] Replace std::forward & std::move by cast expression to [WIP] Replace std::forward & std::move by cast expressions during Sema.Apr 5 2021, 11:59 AM
cor3ntin edited the summary of this revision. (Show Details)
cor3ntin added a reviewer: aaron.ballman.

Adding a few more reviewers for early feedback.

To be noted, this generally goes against the [current] spirit of the AST,
those were calls in the source code, but they won't be represented as such any longer,
which is likely not very expected by whatever tool that could be consuming the AST.
Though, i'm not sure how much that matters.

I would guess, having clang emit MLIR C++ dialect and not LLVM IR
might allow such transforms without degrading AST,
though i will be surprised if that happens [upstream] soon.

I think a pattern-match in IRGen is probably a much more reasonable alternative if the goal is primarily to avoid the code-generation overheads at -O0 / optimization costs at -O<n>. You'd still have template instantiation overheads; I don't know how significant those really are.

rsmith added a comment.Apr 5 2021, 6:26 PM

I think a pattern-match in IRGen is probably a much more reasonable alternative if the goal is primarily to avoid the code-generation overheads at -O0 / optimization costs at -O<n>. You'd still have template instantiation overheads; I don't know how significant those really are.

I agree, this kind of special-casing really goes against the spirit of Clang's AST model, and such special-casing seems to fit better in IR generation than in the AST representation. The approach in this patch is going to require additional complexity in all AST traversals that want to treat function calls uniformly, and will break all existing AST matchers looking for std::move / std::forward calls.

Perhaps we could instead treat std::move and std::forward as builtin functions? We could then skip instantiating the definition given that we already have a builtin definition, and IR generation is already set up to perform custom code generation for built-in functions. That should minimize the amount of special-casing and non-uniform AST representation we need here, and closely matches how we model other library functions that have well-known semantic effects. Note that the "builtin function" approach won't allow us to avoid performing overload resolution, but I don't know how important that is for what you're trying to achieve here -- and in any case, skipping overload resolution seems fraught with problems given that there is another (algorithm) overload of std::move and that both std::move and (especially) std::forward will reject some calls in overload resolution.

Treating a namespace-std function template as a builtin isn't entirely novel; we already do this for MSVC's std::_GetExceptionInfo (though we don't actually handle that properly: we're missing the "namespace std" check, at least). Treating the builtin definition as overriding an inline library definition might be novel, though that doesn't seem like a huge problem.

Treating a namespace-std function template as a builtin isn't entirely novel; we already do this for MSVC's std::_GetExceptionInfo (though we don't actually handle that properly: we're missing the "namespace std" check, at least). Treating the builtin definition as overriding an inline library definition might be novel, though that doesn't seem like a huge problem.

This was my initial approach, I found some issues

  • I struggled getting a built in forward to return a reference
  • I had no clue how to inject them in namespace std (knowing about std::_GetExceptionInfo will certainly help, thanks!
  • I really didn't want to modify standard headers and wasn't sure how to override the declaration.

But overall, I agree with all of you, it's a better design if we can make it work. Back to the drawing board, and thanks for the feedbacks :)

Our builtin logic is capable of recognizing user declarations as builtins without any sort of explicit annotation in source. That's how we recognize C library builtins, for example. As Richard points out, that logic isn't really set up to do the right thing for namespaced declarations (or template ones, for that matter), but that seems fixable.

The namespace issue might be annoying because of the inline namespaces that some standard libraries used for ABI versioning.

Our builtin logic is capable of recognizing user declarations as builtins without any sort of explicit annotation in source. That's how we recognize C library builtins, for example. As Richard points out, that logic isn't really set up to do the right thing for namespaced declarations (or template ones, for that matter), but that seems fixable.

The namespace issue might be annoying because of the inline namespaces that some standard libraries used for ABI versioning.

I managed to make a somewhat generic mechanism to mark functions declared in the namespace std ( and I do think we want there to be a declaration in the header always) as built-ins.
Currently it's using the number of parameters as a filter - which I think is reasonable ( the issue as Richard pointed out is that std::move is overloaded with the algorithm version).

I'm now struggling to handle code generation and constant evaluation, but hopefully I will find a way to get there.

This would solve some of the issues (perfect code-gen / no body instantiation), but the declarations are still instantiated of course.
And I have a few folks asking me whether I could get rid of it. Unfortunately, I think the goals of preserving the AST as is today and getting rid of these template instantiations are contradictory.

No matter how it works internally, I think that (nearer the end of the process) someone should insist that you add some Clang tests to verify e.g.

Widget&& a = static_cast<Widget&&>(Widget());  // is lifetime-extended, but
Widget&& b = std::move(Widget());              // is not

https://godbolt.org/z/9Ka7hcode
and whatever other corner-cases people can think of. That is, these may turn into not-a-function-calls internally, but they'd better continue to behave observably as if they were function-calls.

rsmith added a comment.Apr 6 2021, 4:35 PM

This would solve some of the issues (perfect code-gen / no body instantiation), but the declarations are still instantiated of course.
And I have a few folks asking me whether I could get rid of it. Unfortunately, I think the goals of preserving the AST as is today and getting rid of these template instantiations are contradictory.

Do you know how much of the cost here is instantiating the function declaration, and how much is instantiating std::remove_reference? I imagine we could make the latter cheaper by adding a __remove_reference builtin and changing libc++'s forward and move to use it if that's worthwhile.