This is an archive of the discontinued LLVM Phabricator instance.

[C++2a] Implement operator<=> Part 2: Operator Rewritting and Overload Resolution.
Needs ReviewPublic

Authored by EricWF on Apr 16 2018, 1:15 AM.

Details

Summary

This is a work-in-progress attempt to add operator<=> rewriting, including the required changes to overload resolution.

The rewritten expression is represented with the newly added AST node CXXRewrittenExpr. This node contains two sub-statements. The first is a "dummy" expression representing the syntactic form of the initial expression, where the input expressions are represented using OpaqueValueExpr. The second represents the rewritten expression, the input expressions are used here directly (ie are not wrapped in OpaqueValueExprs).

As currently implemented, rewritten and synthesized candidates are only partially checked for viability when they're added to the overload candidates (the conversion sequence from the argument type -> parameter type is checked, but nothing else). This can lead non-viable candidates being selected or causing ambiguity when final overload resolution is attempted.

The solution implemented in this patch is to fully build "potentially viable" candidates if they are selected or cause ambiguity. If building the candidate fails, it is marked as non-viable, and a new best viable function is computed.

The problem with this approach is that it's expensive and potentially wasteful. For example, if overload resolution results in ambiguity with N rewritten candidates with the same partial ordering, then all N candidates are fully built, and those results potentially thrown away.

For builtin candidates this can be avoided by separating out the bits of CheckBinaryOperands which compute it's result type from the bits which actually build the expression and convert the arguments. Once we know the return type of the builtin, we can deduce if the comparison category type can be used in the expression 0 @ <comp-category> without much effect.

However, for non-builtin overloads of operator<=> (which don't return a comparison category type), it seems non-trivial to check that 0 @ <result> is valid without actually attempting to build the overloaded binary operator.

Diff Detail

Event Timeline

EricWF created this revision.Apr 16 2018, 1:15 AM

I'm guessing this implementation is super non-conforming, since expressions are evaluated during overload resolution.

My plan is to keep the general shape of this patch intact, but fix the eager building of expressions with something less wasteful.

EricWF planned changes to this revision.Apr 16 2018, 5:40 AM

I know how to do better.

include/clang/Sema/Overload.h
941

This is unneeded. As long as the rewritten candidates are added last, restoring the Functions member is unneeded.

rsmith added inline comments.Apr 16 2018, 9:48 AM
lib/Sema/SemaOverload.cpp
839

I think this might be clearer if named getParamIndexForArgIndex or similar.

8880–8882

No \brief here (we use autobrief), and please use /// for continuation lines.

8888

Does the implementation below actually rely on this?

8901

No, you shouldn't be doing any lookups here. Instead, what you should do is to change the LookupOverloadedOperatorName call in BuildOverloadedBinOp to look up both operator@ and operator<=> in the case where @ is an equality or relational op, and then filter Fns by operator name both here and when adding the non-member function candidates.

The reason is that we need to do these lookups in the context of the template definition when an equality or relational expression appears within a template, so we need to look up both the original operator and <=> eagerly.

8936–8938

Type isn't sufficient to conclude this; you'll need to check the other relevant properties of LHS and RHS too (at least value kind, but maybe also bit-field-ness and others?).

9218–9219

You also need to check the reverse condition and return false (the "or if not that" is ... somehow ... supposed to imply that).

12502–12535

This results in an inadequate representation for the rewritten <=> form. We need to retain the "as-written" form of the operator, for multiple reasons (source fidelity for tooling, expression mangling, pretty-printing, ...). Generally, Clang's philosophy is to desugar the minimum amount that's necessary to capture both the syntactic and semantic forms.

A couple of possible representations come to mind here:

  1. A specific Expr subclass for a rewritten <=> operator, which is just a wrapper around a (a <=> b) op 0 expression, and knows how to extract the subexpressions itself, or...
  2. A general "rewritten operator wrapper" Expr subclass, which holds its operands and a rewritten expression, and for which the rewritten expression refers to its operands via OpaqueValueExprs. We already have such a class (PseudoObjectExpr).

The former is a smaller and simpler representation, but will likely be more work to implement.

12537–12565

Hmm, this is an interesting approach. It's quite a divergence from our usual approach, which is to perform the viability checks first, before partial ordering, even if we could eliminate some candidates based on partial ordering. And it's observable -- it will result in fewer template instantiations being performed, which will mean we do not diagnose some ill-formed (no diagnostic required) situations which would otherwise fail due to an error in a non-immediate context.

Can you say something about why you chose this approach instead of doing the lookup for operator @ early and marking the candidate non-viable if the lookup / overload resolution for it fails?

(I'm also vaguely wondering whether we should be pushing back on the C++ committee for making viability of the rewritten candidate depend on validity of the @ expression, not just the <=> expression.)

12570–12572

This is a really interesting situation. What does it even mean to consider the (a <=> b) @ c expression if lookup for <=> finds a deleted function? (It would be novel for the return type of a deleted function to be meaningful, and it would be novel to exclude an overload candidate because it's deleted.) And what if lookup for @ finds a deleted function?

I think this is good reason to go back to CWG and suggest that we don't check the @ expression at all until after overload resolution.

12608–12609

This is not correct, because it turns out that isBetterOverloadCandidate is not a strict weak order. It's not any kind of order at all, actually, since it lacks the transitivity property. I don't know whether you can do better than marking your chosen candidate non-viable and rerunning the candidate selection algorithm if your presumed-best candidate turns out to be non-viable.

12634

I am a vampire and

EricWF marked 6 inline comments as done.Apr 16 2018, 5:48 PM
EricWF added inline comments.
lib/Sema/SemaOverload.cpp
8888

Like should it be possible to hit this line of code? No. It was preemptive. I'll remove it.

12537–12565

The main reason for choosing this approach was the current expense of validating the candidates eagerly when they're added. Currently builtin candidates need to be fully built before their viability is known. My initial attempt at this made the compiler just about hang.

If I understand what you mean by performing lookup/overload resolution early for operator @, and I probably don't, then it would require performing overload resolution separately every rewritten candidate added, since each candidate could have a different return type.

I think rewritten builtins could be checked more eagerly by moving a bunch of the checking from SemaExpr.cpp to SemaOverload.cpp, and using it to eagerly check the bits we need to, but not actually building the expression as calling into SemaExpr.cpp would currently do.

As for the root question: Should the validity of the @ expression be considered during overload resolution? I think the answer is tricky. For example, the following case could easily occur and break existing code, especially as more users add defaulted comparison operators. In this case if the base class adds <=> without the derived classes knowledge.

struct B {
  using CallbackTy = void(*)();
  CallbackTy CB = nullptr; // B carries around a function pointer.
#if __cplusplus >= 201703L
  friend auto operator<=>(B const&, B const&) = default;
#else
  friend bool operator==(B, B);
  friend bool operator!=(B, B);
#endif
};
struct D {
  operator int() const; // D uses this conversion to provide comparisons.
};
D d;
// comparison ill-formed if the `@` expression isn't considered for viability.
auto r = d < d;

However, allowing the @ expression to effect viability leads to a different set of bugs. For example this slicing comparison bug:

struct B {
  int x;
  friend auto operator<=>(B, B) = default;
};
struct D : B {
  decltype(nullptr) y = {};
  friend auto operator<=>(D, D) = default;
};
D d;
// If the `@` expression is considered for viability, then `operator<=>(B, B)` will be used, effectively slicing the data used during the comparison. 
auto r = d < d;

What is clear is that it should go back to CWG for more discussion.

12570–12572

What does it even mean to consider the (a <=> b) @ c expression if lookup for <=> finds a deleted function?

Separate from allowing return types to affect overload resolution, I suspect selecting a deleted rewritten candidate means that we should have never considered the candidate in the first place. For example:

struct B { auto operator<=>(B) const = delete; };
struct D : private B { operator int() const; };
D{} < D{};

In the above case, the intent of the code seems clear. But because the compiler invented a <=> candidate with a better conversion ranking, our code is ill-formed. Therefore, it makes sense to prefer non-rewritten candidates over deleted rewritten ones.

That said, it certainly doesn't make sense to "make way" for a worse rewritten candidate when the best viable function is deleted. For example:

struct Any { Any(...); };
auto operator<=>(Any, Any);
struct T { friend auto operator<=>(T, T) = delete; };
T{} < T{}; // selecting operator<=>(Any, Any) would be bad.
12608–12609

I was afraid of that, but I bit too tired to initially check this. Thanks for the correction.

If I'm not mistaken, isBetterOverloadCandidate should still be able to provide list of equivalent candidates, which could be used when handling this case.

12634

Ha. I had to look that reference up.

EricWF marked 2 inline comments as done.Apr 16 2018, 6:29 PM
EricWF added inline comments.
lib/Sema/SemaOverload.cpp
12537–12565

It's quite a divergence from our usual approach, which is to perform the viability checks first, before partial ordering, even if we could eliminate some candidates based on partial ordering. And it's observable -- it will result in fewer template instantiations being performed, which will mean we do not diagnose some ill-formed (no diagnostic required) situations which would otherwise fail due to an error in a non-immediate context.

I should clarify that the same validity checks are applied to rewritten candidates are also applied to non-rewritten candidates, such as checking conversions to parameter types. So I suspect, for all intents and purposes, it should produce the same amount of template instantiations as existing operator lookup does.

EricWF marked an inline comment as done.Apr 16 2018, 6:47 PM
EricWF added inline comments.
lib/Sema/SemaOverload.cpp
9218–9219

Hmm. So I'm wondering what is intended by the language F1 and F2 are rewritten candidates, and F2 is a synthesized candidate with reversed order of parameters and F1 is not. For example, what happens when comparing two distinct member functions with only one explicit parameter?

struct T;
struct U { auto operator<=>(T); };
struct T { auto operator<=>(U); };
auto r = T{} < U{}; // Are the synthesized and rewritten overloads ambiguous?
EricWF added inline comments.Apr 16 2018, 7:04 PM
lib/Sema/SemaOverload.cpp
9218–9219

And what about

struct U {};
struct T { auto operator<=>(U); };
auto operator<=>(U, T);
auto r = T{} < U{};
EricWF added inline comments.Apr 16 2018, 9:17 PM
lib/Sema/SemaOverload.cpp
9218–9219

Nevermind. I found the language. The implicit object parameter isn't considered in this case.

Definitely some interesting questions to take to CWG here. :)

lib/Sema/SemaOverload.cpp
9218–9219

The parameters in scope in this rule are the parameters for the purpose of overload resolution, which include the implicit object parameter for a member function.

In your first example, both candidates have implicit conversion sequences with Exact Match rank for both parameters (and none of the ordering special cases apply), so we fall down to this tie-breaker and it selects the T::operator<=>(U) candidate, unless I'm missing something.

Your second example appears to receive the same treatment.

12537–12565

I'm not sure that it's true in general that we get the same amount of instantiation with this approach. For example, checking the @ expression will trigger instantiation of the return type of the operator<=>, and may trigger instantiation of the operator<=> function itself in some cases. (I think this is a conforming technique; I think the more interesting question is whether it's a good thing. I believe GCC uses techniques like this, and it does create portability problems in practice for code written against GCC, for example -- but on the other hand, they presumably spend less time in overload resolution as a result.)

12570–12572

I'm not sure I see clear intent in your first example. B is saying "any attempt to compare me is ill-formed". It's not clear to me what D's intent when privately inheriting from B is. But if the operator<=> were not deleted, we would[1] have selected it (and such selection would even be valid if done in a context in which B is an accessible base class of D) -- so I think that program should be rejected.

[1]: Well, here's the problem. We would have selected it if it returned a type that supports < 0. And that's not a meaningful question to ask. I suppose we could argue that if the "= delete" were simply removed, we'd have rejected it due to trying to use a function with a deduced return type before the return type is available.

EricWF added inline comments.Apr 17 2018, 3:09 AM
lib/Sema/SemaOverload.cpp
9218–9219

Alright. So I found the wrong language.

I think my first case meant to have const & parameters and const methods. That way the implicit object parameter type matched the reversed parameter type. The second example was meant to demonstrate the case where the implicit object parameter type doesn't match.

What type should we be comparing against in these cases?

EricWF updated this revision to Diff 143224.Apr 19 2018, 8:30 PM

Updating this with the latest work. This adds the semantics for checking and generating default comparison operators.

The overload resolution bits are still a mess, and I'll clean those up shortly. Also there are not sufficient tests yet, those will also come shortly.
I also need to go through the wording for defaulted functions again, to ensure the constexpr and noexcept deduction is correct, as well
as the diagnostics for deleted comparisons.

One oddity introduced here is that defaulted comparisons are evaluated using the "Special Member" matchinary, even though they're
not technically special members, or even members. However, it seemed like the appropriate place to implement them. I'll probably rename
most FooSpecialMember functions to FooDefaultedMember to clear up any confusion.

EricWF updated this revision to Diff 143433.Apr 21 2018, 1:18 AM
  • Better integrate defaulted comparison with other special members; this buys correct access, constexpr, and except specification checking.

However it's not clear if this is the best approach. Lookups for defaulted comparison can't be cached like other special members, because the lookup can change after the class definition. It is clear, however, that using the existing machinery would be too expensive if not for cached lookup results, because the checks for constexpr, deletion, and noexcept all depend on the results for lookup being cached.

rsmith added a comment.May 2 2018, 2:49 PM

Can you split this patch up a bit? There's changes here for the <=> rewriting, but also for supporting operator<=>(...) = default, and some constant evaluation and code generation changes too. It's a lot easier to review more directed, smaller patches.

include/clang/AST/ExprCXX.h
4212

This should have a name ending Expr.

4230–4231

It probably won't ever be type-dependent, but could absolutely be value-dependent, instantiation-dependent, or contain an unexpanded parameter pack.

4294–4296

I find the interface exposed by this class a little confusing. On the one hand, it provides the original expression as an Expr*, and on the other hand it hard-codes knowledge about the form of that expression and exposes accessors into that original form. The upshot is that we pay the cost of a general mechanism (that can cope with any form of original expression) but don't get any benefit from that (because the class hard-codes non-generality).

This also generates a distinctly non-tree-shaped AST; clients naively walking recursively over expressions and their subexpressions will see the same`Expr*` multiple times and potentially end up performing an exponential-time tree walk.

I think there are (at least) two reasonable approaches here:

  1. Use OpaqueValueExprs to handle the repeated AST nodes, and track an original expression, a rewritten expression, and probably an original LHS expr and an original RHS expr; the original and rewrite would use OVEs to refer indirectly to the LHS and RHS. Remove all hardcoded knowledge of the original and rewrite from this expression node and make it generic for rewritten expression forms.
  1. Make this node explicitly be specific to spaceship rewrites. Remove the original expression pointer and rename it to something more specific. Keep the accessors that make hardcoded assumptions about the form of the rewritten expression.

Option 2 looks closer to what this patch is currently doing (you're not using the original expression much).

lib/AST/ItaniumMangle.cpp
3555

No, the expression should mangle as-written.

EricWF updated this revision to Diff 145390.May 5 2018, 9:11 PM
EricWF marked 7 inline comments as done.
  • Remove the = default changes as requested.
  • Attempt to work around ambiguity in overload resolution caused by differing lvalue or qualification conversions when ranking synthesized candidates. For example:
struct T {};
auto operator<=>(T const&, T&&);
T{} <=> T{}; // Ambiguous according to the current spec.

The workaround ignores worse implicit conversions when the conversions are both an exact match and a rewritten candidate tie-breaker is present.

EricWF added inline comments.May 5 2018, 9:18 PM
lib/Sema/SemaOverload.cpp
9177

This doesn't seem anywhere close to correct, but It's the best I've come up with so far.
For now though, it seems nice to avoid the ambiguity caused by the current specification.

EricWF updated this revision to Diff 145398.May 6 2018, 4:32 AM

Generalize CXXRewrittenExpr to contain only a "representation" of the original expression, and the fully checked rewritten expression. By "representation" i mean to imply that we can't actually build and check a full representation for the expression as written, because it need not be valid. For example:

struct T { auto operator<=>(T const&) const = default; };
T{} < T{};

Currently this patch builds a BinaryOperator node to represent the expression as written, regardless of whether the original expression would actually select a builtin comparison operator. A consequence of this pseudo-representation is that we can't effectively transform it in TreeTransform.h (currently we only transform the rewritten expression and not the representation of the original; there are some kinks I need to work out there).

EricWF updated this revision to Diff 145419.May 6 2018, 3:13 PM
  • Further generalize CXXRewrittenExpr and get it working with TreeTransform.h.
EricWF retitled this revision from [C++2a] Add operator<=> Rewriting - Early Attempt to [C++2a] Implement operator<=> Part 2: Operator Rewritting and Overload Resolution..May 7 2018, 12:38 AM
EricWF edited the summary of this revision. (Show Details)
EricWF set the repository for this revision to rC Clang.
EricWF updated this revision to Diff 145643.May 8 2018, 1:13 AM
  • Rebase on master.

I think the correct direction to head with this patch is to start removing the bits of the implementation which evaluate and build rewritten expressions during overload resolution. I'll submit such an update tomorrow.

EricWF added a comment.May 8 2018, 2:01 AM

I'm still not sure how to best generate a diff that best ignores white spaces changes caused by clang-format or my editor.

I'll work on removing existing whitespace changes.

EricWF updated this revision to Diff 146090.May 10 2018, 1:31 AM

Remove a bunch of nonsense this patch originally had. Specifically remove the bits of overload resolution which disambiguated rewritten expressions after the fact; Instead attempt to validate that the return type of the three-way comparison operand in a rewritten relational or equality can be used with the the specified operand (e.g. std::strong_equality cannot be used with a relational operator).

rsmith added inline comments.May 10 2018, 4:37 PM
include/clang/AST/ExprCXX.h
4219–4227

I don't think you need this.

4294–4296

As noted, if you want this to be a general-purpose "rewritten expression" node, you should use OpaqueValueExprs to track the repeated AST nodes between the original and rewritten expression forms rather than forming a non-tree-shaped AST. However, if you want to take that path, you don't need to add a new Expr node for that; that's what PseudoObjectExpr is for.

include/clang/AST/Stmt.h
254

I don't think you need this either.

include/clang/Sema/Overload.h
81–83

These names are not very good. The standard describes both these cases as "rewritten", so ROC_Rewritten is ambiguous, and it says "a synthesized candidate with reversed order of parameters", not merely "synthesized" (which would fail to communicate what's going on).

How about ROC_AsThreeWayComparison and ROC_AsReversedThreeWayComparison or similar?

941

Please instead extend the Functions set to be a set of PointerIntPair<Decl*, 2, RewrittenOperatorCandidateKind>.

include/clang/Serialization/ASTBitCodes.h
1814

Drop the _OPERATOR here.

lib/Sema/SemaOverload.cpp
12542

in a *call* to a

12543–12547

If you want to model this as a generic (syntactic form, semantic form) pair, the syntactic form needs to preserve enough information that TreeTransform on it can recreate the semantic form. That means you need to store a CXXOperatorCallExpr to an UnresolvedLookupExpr to hold Fns if Fns is not empty (see the code at the start of CreateOverloadedBinOp that deals with the type-dependent case for an example of what you would need to do).

Though perhaps this is as good a reason as any to give up the idea of a generic rewrite expression and add something specific to a three-way comparison rewrite; the representation needed to fit this into a generic rewrite mechanism is very unwieldy. (If we add any more forms of rewritten operator later, we can consider whether a generalization is worthwhile.)

lib/Sema/TreeTransform.h
11569–11573

This doesn't seem right to me; TreeTransform should be transforming the original expression, not the rewritten one. We shouldn't be assuming we know what kind of transforms the client wants to perform, it's just our job to give them the syntactic pieces and ask for those pieces to be transformed.

(But this is all redundant if you switch to using PseudoObjectExpr.)

EricWF updated this revision to Diff 146274.May 10 2018, 8:18 PM
EricWF marked 13 inline comments as done.
  • Change CXXRewrittenExpr to be a non-general wrapper only for rewritten comparison operators.
  • Start fixing and improving overload resolution diagnostics.
EricWF added inline comments.May 10 2018, 8:19 PM
include/clang/AST/ExprCXX.h
4219–4227

I guess it's just future proofing, or an attempt to demonstrate how the class may be generalized. I'll remove it for now.

include/clang/Sema/Overload.h
81–83

Sounds good to me.

941

Sure. I'll get that done.

There are two options here:

(A) Keep the guard object. Store the RewrittenOverloadCandidateKind which we're currently adding candidates for, and have addCandidate and isNewCandidate handle the ROC.

(B) Change all of the AddFooCandidates interfaces in Sema to take yet another default parameter, and pass it around everywhere.

Which would you prefer @rsmith?

lib/Sema/SemaOverload.cpp
12543–12547

I think the FIXME comment may be incorrect.

Shouldn't this be sufficient since we never actually build the rewritten expression until the input expressions are no longer type dependent?
Additionally, with this representation TreeTransform can transform the semantic form, since (if I'm correct), a rebuild should never re-perform overload resolution to choose how to rewrite the initial expression. (Does that make sense, and is it correct?)

I'll go ahead and start converting to a less-general representation of the rewritten expression.

EricWF marked 2 inline comments as done.May 10 2018, 8:20 PM
EricWF added inline comments.
lib/Sema/SemaOverload.cpp
12543–12547

Nevermind. This was answered offline.

EricWF marked 11 inline comments as done.May 10 2018, 8:22 PM
EricWF added inline comments.
include/clang/AST/Stmt.h
254

Does keeping it help save bits in CXXRewrittenOperatorExpr?

EricWF updated this revision to Diff 146280.May 10 2018, 10:03 PM

Do better english in diagnostics.

EricWF updated this revision to Diff 146281.May 10 2018, 10:15 PM
EricWF marked 2 inline comments as done.

Fix fetching the correct source location information in CXXRewrittenOperatorExpr.

EricWF updated this revision to Diff 146285.May 10 2018, 10:55 PM
  • Update tests and improve diagnostics. Still more work to be done here though.