Page MenuHomePhabricator

[Sema, CodeGen] Allow [[likely]] and [[unlikely]] on labels
Changes PlannedPublic

Authored by Mordante on Aug 25 2020, 11:08 AM.

Details

Summary

Allows the likelihood attributes on label, which are not part of a case statement. When a goto is used it accepts the attribute placed on the target label.

Depends on D85091.

Diff Detail

Event Timeline

Mordante created this revision.Aug 25 2020, 11:08 AM
Mordante requested review of this revision.Aug 25 2020, 11:08 AM

This feels like the wrong approach to me... but I admit that I don't know what the "right" approach might be. (I doubt any right approach exists.)

if (ch == ' ') [[likely]] {
    goto whitespace;  // A
} else if (ch == '\n' || ch == '\t') [[unlikely]] {
    goto whitespace;  // B
} else {
    foo();
}
[[likely]] whitespace: bar();  // C

It seems like this patch would basically "copy" the [[likely]] attribute from line C up to lines A and B, where it would reinforce the likelihood of path A and (maybe?) "cancel out" the unlikelihood of path B, without actually saying anything specifically about the likelihood of label C (which is surely what the programmer intended by applying the attribute, right?). OTOH, I can't think of any particular optimization that would care about the likelihood of label C. I could imagine trying to align label C to a 4-byte boundary or something, but that wouldn't be an optimization on any platform as far as I know.

It seems like this patch would basically "copy" the [[likely]] attribute from line C up to lines A and B, where it would reinforce the likelihood of path A and (maybe?) "cancel out" the unlikelihood of path B, without actually saying anything specifically about the likelihood of label C (which is surely what the programmer intended by applying the attribute, right?).

Yes A is double [[likely]] and B is both [[likely]] and [[unlikely]]. Based on http://eel.is/c++draft/dcl.attr.likelihood#:attribute,likely
"The use of the likely attribute is intended to allow implementations to optimize for the case where paths of execution including it are arbitrarily more likely than any alternative path of execution that does not include such an attribute on a statement or label."
So it seem it's intended to see a goto as a part of a path of execution, since it's an unconditional jump.

But I think the standard should be improved:

if(foo) {
  [[likely]][[unlikely]] bar(1); // error: The attribute-token likely shall not appear in an
  bar(2);                                      //  attribute-specifier-seq that contains the attribute-token unlikely.
}
if(foo) {
  [[likely]] bar(1);         
  [[unlikely]] bar(2);                // This contradiction in the `ThenStmt` is allowed
}
if(foo) {
  [[likely]] bar(1);
} else {
  [[likely]] bar(2);                   // This contradiction between the `ThenStmt` and `ElseStmt` is allowed
}

So I'll work on a paper to discuss these issues. I already have some notes, but I would prefer to get more implementation experience before starting to write.
IMO we should ignore conflicting likelihoods and issue a diagnostic. (For a switch multiple [[likely]] cases would be allowed, but still no mixing.)

It seems like this patch would basically "copy" the [[likely]] attribute from line C up to lines A and B, where it would reinforce the likelihood of path A and (maybe?) "cancel out" the unlikelihood of path B, without actually saying anything specifically about the likelihood of label C (which is surely what the programmer intended by applying the attribute, right?).

Yes A is double [[likely]] and B is both [[likely]] and [[unlikely]]. Based on http://eel.is/c++draft/dcl.attr.likelihood#:attribute,likely
"The use of the likely attribute is intended to allow implementations to optimize for the case where paths of execution including it are arbitrarily more likely than any alternative path of execution that does not include such an attribute on a statement or label."
So it seem it's intended to see a goto as a part of a path of execution, since it's an unconditional jump.

But I think the standard should be improved:

if(foo) {
  [[likely]][[unlikely]] bar(1); // error: The attribute-token likely shall not appear in an
  bar(2);                                      //  attribute-specifier-seq that contains the attribute-token unlikely.
}
if(foo) {
  [[likely]] bar(1);         
  [[unlikely]] bar(2);                // This contradiction in the `ThenStmt` is allowed
}
if(foo) {
  [[likely]] bar(1);
} else {
  [[likely]] bar(2);                   // This contradiction between the `ThenStmt` and `ElseStmt` is allowed
}

So I'll work on a paper to discuss these issues. I already have some notes, but I would prefer to get more implementation experience before starting to write.
IMO we should ignore conflicting likelihoods and issue a diagnostic. (For a switch multiple [[likely]] cases would be allowed, but still no mixing.)

I'd like to understand your reasoning for ignore + diagnose as opposed to count attrs (+ optionally diagnose) or other strategies. I can see pros and cons to basically anything we do here.

If we decide to ignore conflicting likelihoods, then I agree we should issue a diagnostic. However, I suspect that's going to come up frequently in practice because people are going to write macros that include these attributes. For instance, it's very sensible for a developer to put [[unlikely]] in front of an assert under the assumption this is telling the compiler "this assertion isn't likely to happen" when in fact it's saying "this branch containing the assertion is unlikely to be taken". This is one of the reasons why the GCC behavior of allowing these attributes on arbitrary statements feels like an awful trap for users to get into. If the attribute only applied to compound statements following a flow control construct, I think the average programmer will have a far better chance of not using this wrong. (I know that I said in the other thread we should match the GCC behavior, but I'm still uncomfortable with it because that behavior seems to be fraught with dangers for users, and this patch introduces new sharp edges, as @Quuxplusone pointed out.)

I'd like to understand your reasoning for ignore + diagnose as opposed to count attrs (+ optionally diagnose) or other strategies. I can see pros and cons to basically anything we do here.

If we decide to ignore conflicting likelihoods, then I agree we should issue a diagnostic. However, I suspect that's going to come up frequently in practice because people are going to write macros that include these attributes. For instance, it's very sensible for a developer to put [[unlikely]] in front of an assert under the assumption this is telling the compiler "this assertion isn't likely to happen" when in fact it's saying "this branch containing the assertion is unlikely to be taken".

This macro is example why I dislike the counting. If MyAssert has an [[unlikely]] attribute, then changing the number of MyAssert statements will influence the likelihood of the branches taken.

This is one of the reasons why the GCC behavior of allowing these attributes on arbitrary statements feels like an awful trap for users to get into. If the attribute only applied to compound statements following a flow control construct, I think the average programmer will have a far better chance of not using this wrong. (I know that I said in the other thread we should match the GCC behavior, but I'm still uncomfortable with it because that behavior seems to be fraught with dangers for users, and this patch introduces new sharp edges, as @Quuxplusone pointed out.)

I'm also not fond of GCC's implementation, but I think their implementation conforms with the standard. In hindsight I think it indeed makes more sense to only allow it on compound statements. That will require thinking about how to mark multiple cases as [[likely]] when falling through:

switch(a) {
  case '0':
  case '1':
  case '2': [[likely]] { return 'a' - '0'; }      // All 3 cases likely?

  case '3':                                                      // neutral?
  case '4': [[likely]]()                                    // likely?
  case '5': [[unlikely]] {return 'a' - '0'; }  // unlikely?
}

Of course this can be solved by only allowing it on the case labels:

switch(a) {
 [[likely]] case '0':
 [[likely]] case '1':
 [[likely]] case '2':  { return 'a' - '0'; }    // All 3 cases likely

  case '3':                                                     // neutral
  [[likely]] case '4':                                    // likely
  [[unlikely case '5':  {return 'a' - '0'; }  // unlikely
}

I fully agree the behaviour mandated by the standard is way too complex and user unfriendly. It would have been nice if there were simpler rules, making it easier to use and to teach. Still I think it would be best to use the complex approach now, since that's what the standard specifies. During that process we can see whether there are more pitfalls. Then we can discuss it with other vendors and see whether we can change the wording of the standard. Do you agree?

I'd like to understand your reasoning for ignore + diagnose as opposed to count attrs (+ optionally diagnose) or other strategies. I can see pros and cons to basically anything we do here.

If we decide to ignore conflicting likelihoods, then I agree we should issue a diagnostic. However, I suspect that's going to come up frequently in practice because people are going to write macros that include these attributes. For instance, it's very sensible for a developer to put [[unlikely]] in front of an assert under the assumption this is telling the compiler "this assertion isn't likely to happen" when in fact it's saying "this branch containing the assertion is unlikely to be taken".

This macro is example why I dislike the counting. If MyAssert has an [[unlikely]] attribute, then changing the number of MyAssert statements will influence the likelihood of the branches taken.

<nod>, but this is an example of why I don't like the behavior of applying this to arbitrary statements; this is spooky behavior that would be very hard to spot in code reviews unless you're an expert on C++.

This is one of the reasons why the GCC behavior of allowing these attributes on arbitrary statements feels like an awful trap for users to get into. If the attribute only applied to compound statements following a flow control construct, I think the average programmer will have a far better chance of not using this wrong. (I know that I said in the other thread we should match the GCC behavior, but I'm still uncomfortable with it because that behavior seems to be fraught with dangers for users, and this patch introduces new sharp edges, as @Quuxplusone pointed out.)

I'm also not fond of GCC's implementation, but I think their implementation conforms with the standard.

Conforming to the standard is not hard in this case. We can parse the attributes and ignore their semantics entirely (but not the constraints) and that also conforms to the standard. Our job is to figure out what the best behavior is for our implementation (which may or may not mean following the GCC behavior).

In hindsight I think it indeed makes more sense to only allow it on compound statements. That will require thinking about how to mark multiple cases as [[likely]] when falling through:

switch(a) {
  case '0':
  case '1':
  case '2': [[likely]] { return 'a' - '0'; }      // All 3 cases likely?

  case '3':                                                      // neutral?
  case '4': [[likely]]()                                    // likely?
  case '5': [[unlikely]] {return 'a' - '0'; }  // unlikely?
}

Of course this can be solved by only allowing it on the case labels:

switch(a) {
 [[likely]] case '0':
 [[likely]] case '1':
 [[likely]] case '2':  { return 'a' - '0'; }    // All 3 cases likely

  case '3':                                                     // neutral
  [[likely]] case '4':                                    // likely
  [[unlikely case '5':  {return 'a' - '0'; }  // unlikely
}

That is exactly the behavior I am coming to believe we should follow. You can either write it on a compound statement that is guarded by a flow control decision (if/else/while) or you can write it on a label, otherwise the attribute is parsed and ignored (with a diagnostic). This feels like the right mixture of useful with understandable semantics so far, but perhaps we'll find other examples that change our minds.

The fallthrough behavior question has one more case we may want to think about:

switch (x) {
case 0:
case 1:
[[likely]] case 2: break;
[[unlikely]] default:
}

Does this mark cases 0 and 1 as being likely or not? I could see users wanting to use shorthand rather than repeat themselves on all the cases. However, I'm also not certain whether there would be any performance impact if we marked only case 2 as likely and left cases 0 and 1 with default likelihood. My gut feeling is that this should only mark case 2, but others may have different views.

I fully agree the behaviour mandated by the standard is way too complex and user unfriendly. It would have been nice if there were simpler rules, making it easier to use and to teach. Still I think it would be best to use the complex approach now, since that's what the standard specifies. During that process we can see whether there are more pitfalls. Then we can discuss it with other vendors and see whether we can change the wording of the standard. Do you agree?

The only requirement from the standard is that we parse [[likely]] or [[unlikely]] on a statement or label, that we don't allow conflicting attributes in the same attribute sequence, and that the attribute has no arguments to it. All the rest is recommended practice that's left up to the implementation as a matter of QoI. So we conform to the standard by following the approach on compound statements and labels + the constraint checking. We may not be following the recommended practice precisely, but we may not *want* to follow the recommended practice because it's bad for usability. So I'd like us to design the feature that we think gives the most value and is the easiest to use that matches the recommended practice as best we can, then start talking to other vendors. If other vendors don't want to follow our design, that's totally fine, but we should ensure our design *allows* users to write code that will behave similarly for other implementations without diagnostics. e.g., we don't want to design something where the user has to use macros to avoid diagnostics in cross-compiler code. After that, we may want to propose some changes to the recommended practice to WG21.

From my current thinking, it seems that there may be agreement that allowing these on arbitrary statements may be really difficult for users to understand the behavior of and that compound statements and labels are what we think is an understandable and useful feature and is also a strict subset of what we COULD support. By that, I think we should limit the feature to just compound statements and labels; this leaves the door open to allow the attributes on arbitrary statements in the future without breaking code. If we allow on arbitrary statements from the start, we can't later restrict the feature because we'd break code. This lets us start getting field experience with that behavior to see how we like it, which may also help when talking to other vendors because we'll have something concrete to talk about (hopefully). WDYT?

Mordante planned changes to this revision.Aug 27 2020, 10:21 AM

That is exactly the behavior I am coming to believe we should follow. You can either write it on a compound statement that is guarded by a flow control decision (if/else/while) or you can write it on a label, otherwise the attribute is parsed and ignored (with a diagnostic). This feels like the right mixture of useful with understandable semantics so far, but perhaps we'll find other examples that change our minds.

The fallthrough behavior question has one more case we may want to think about:

switch (x) {
case 0:
case 1:
[[likely]] case 2: break;
[[unlikely]] default:
}

Does this mark cases 0 and 1 as being likely or not? I could see users wanting to use shorthand rather than repeat themselves on all the cases. However, I'm also not certain whether there would be any performance impact if we marked only case 2 as likely and left cases 0 and 1 with default likelihood. My gut feeling is that this should only mark case 2, but others may have different views.

Not according to the standard: "A path of execution includes a label if and only if it contains a jump to that label."
However for an if statement I use 2 weights: 2000 for likely, 1 for unlikely. (These can be configured by a command-line option already used for __builtin_expect.)
For a switch I intent to use 3 weights: 2000 for likely, (2000 + 1)/2 for no likelihood, 1 for unlikely. __builtin_expect doesn't have a 'neutral' values since in a switch you can only mark one branch as likely. Instead of adding a new option I picked the midpoint.
So the weights in your example would be:

> switch (x) {
case 0:  // 1000
case 1:  // 1000
[[likely]] case 2: break; // 2000
[[unlikely]] default: // 1
 }

In this case the user could also write:

> switch (x) {
case 0:  // 1000
case 1:  // 1000
case 2: break; // 1000
[[unlikely]] default: // 1
 }

where the values 0, 1, 2 still would be more likely than the default statement. Obviously this will be documented when I implement the switch.
How do you feel about this approach?

I fully agree the behaviour mandated by the standard is way too complex and user unfriendly. It would have been nice if there were simpler rules, making it easier to use and to teach. Still I think it would be best to use the complex approach now, since that's what the standard specifies. During that process we can see whether there are more pitfalls. Then we can discuss it with other vendors and see whether we can change the wording of the standard. Do you agree?

The only requirement from the standard is that we parse [[likely]] or [[unlikely]] on a statement or label, that we don't allow conflicting attributes in the same attribute sequence, and that the attribute has no arguments to it. All the rest is recommended practice that's left up to the implementation as a matter of QoI. So we conform to the standard by following the approach on compound statements and labels + the constraint checking. We may not be following the recommended practice precisely, but we may not *want* to follow the recommended practice because it's bad for usability. So I'd like us to design the feature that we think gives the most value and is the easiest to use that matches the recommended practice as best we can, then start talking to other vendors. If other vendors don't want to follow our design, that's totally fine, but we should ensure our design *allows* users to write code that will behave similarly for other implementations without diagnostics. e.g., we don't want to design something where the user has to use macros to avoid diagnostics in cross-compiler code. After that, we may want to propose some changes to the recommended practice to WG21.

From my current thinking, it seems that there may be agreement that allowing these on arbitrary statements may be really difficult for users to understand the behavior of and that compound statements and labels are what we think is an understandable and useful feature and is also a strict subset of what we COULD support. By that, I think we should limit the feature to just compound statements and labels; this leaves the door open to allow the attributes on arbitrary statements in the future without breaking code. If we allow on arbitrary statements from the start, we can't later restrict the feature because we'd break code. This lets us start getting field experience with that behavior to see how we like it, which may also help when talking to other vendors because we'll have something concrete to talk about (hopefully). WDYT?

I'm somewhat on the fence. In general I prefer to implement what the standard expects. However the more I try to do that, the more I'm convinced what the standard suggests is difficult to achieve and more importantly to difficult explain to C++ users. In order to implement this feature properly I've been looking at using the CFG. That allows me to follow the path of execution properly. However it makes the feature more complicated. AFAIK the CFG is only used in Clang for AnalysisBasedWarnings::IssueWarnings. This feels like a hint, this is not the proper solution. Another "interesting" issue with the current approach is, that it'll be hard to determine the likelihood when looking at the code:

if(a) {
    foo(a);
    [[likely]] ++a;
}

If foo is declared [[noreturn]] the ++a is never executed so the attribute shouldn't affect the likelihood. (This is of course great when creating C++ trivia, but not great for users.)
Another issue is to properly implement it when a branch has as a simple if inside it:

if(a > 5) {
    if(a < 10)
        ++a;
    [[likely]] ++a;
}

Obviously the attribute is in the a > 5 branch and will always be executed, but I haven't found a solution to figure that out using the CFG. I need to find a way to determine the a < 10 will always resume the original branch. I'm convinced it can be done, but it made me wonder whether this is the best approach.

So I think it would be wise follow your suggestion. Implement a simple version and try to convince other vendors to follow suit. This leaves the door open to implement a more complex solution if needed. I think this approach does C++ users and ourselves a favour. I only feel we need a slight relaxation in the requirements. In your approach the attribute is only allowed on a compound statement. I would like to allow it on the first statement, like it was in my initial patch. This allows using the attribute without being forced to use a compound statement. I feel forcing users to use a certain coding style is a bad idea.

if(a) [[likely]] { // Good allowed on the compound statement
  ...
}

if(a)
    [[likely]] return; // Good allowed on the first statement

if(a) {
    [[likely]] return; // Ignored, not on the first statement
}

if(a)                      // Attribute not allowed on a declaration,
    [[likely]] int i = 5;  // but I can't think about a good use-case
                           // for this code.

if(a) [[likely]] { // Good allowed on the compound statement
    int i = 5;     // Now i seems to have a purpose
  ...
}

if(a) [[likely]] {} // A diagnostic is issued and the attributes
else [[likely]]] {} // are ignored

The diagnostics for ignored attributes are enabled by default so contradiction is visible by default. For the switch I'll use the previous proposal.
WDYT?

If C++20 was still in development I would even consider to write a proposal to make the attributes more like __builtin_expect:

[[likely]] if(a) {
  // Considered to be arbitrarily likely
} else {
  // Considered to be arbitrarily unlikely
}

if(a) {
} [[unlikely]] else { // Not allowed on the else statement
}

if(a) {
  // No likelihood suggestion
} else
  // No likelihood suggestion

  [[unlikely]] if{
  // Considered to be arbitrarily likely
}

But it's too late to consider this change.

That is exactly the behavior I am coming to believe we should follow. You can either write it on a compound statement that is guarded by a flow control decision (if/else/while) or you can write it on a label, otherwise the attribute is parsed and ignored (with a diagnostic). This feels like the right mixture of useful with understandable semantics so far, but perhaps we'll find other examples that change our minds.

The fallthrough behavior question has one more case we may want to think about:

switch (x) {
case 0:
case 1:
[[likely]] case 2: break;
[[unlikely]] default:
}

Does this mark cases 0 and 1 as being likely or not? I could see users wanting to use shorthand rather than repeat themselves on all the cases. However, I'm also not certain whether there would be any performance impact if we marked only case 2 as likely and left cases 0 and 1 with default likelihood. My gut feeling is that this should only mark case 2, but others may have different views.

Not according to the standard: "A path of execution includes a label if and only if it contains a jump to that label."

A switch statement contains a jump to all of its labels, though. So all of those labels are included in the path of execution, which is not likely what's intended by the standard.

However for an if statement I use 2 weights: 2000 for likely, 1 for unlikely. (These can be configured by a command-line option already used for __builtin_expect.)
For a switch I intent to use 3 weights: 2000 for likely, (2000 + 1)/2 for no likelihood, 1 for unlikely. __builtin_expect doesn't have a 'neutral' values since in a switch you can only mark one branch as likely. Instead of adding a new option I picked the midpoint.
So the weights in your example would be:

> switch (x) {
case 0:  // 1000
case 1:  // 1000
[[likely]] case 2: break; // 2000
[[unlikely]] default: // 1
 }

I think this is defensible.

In this case the user could also write:

> switch (x) {
case 0:  // 1000
case 1:  // 1000
case 2: break; // 1000
[[unlikely]] default: // 1
 }

where the values 0, 1, 2 still would be more likely than the default statement. Obviously this will be documented when I implement the switch.
How do you feel about this approach?

I think it's a reasonable approach; at least, I can't think of a case where this falls over.

I fully agree the behaviour mandated by the standard is way too complex and user unfriendly. It would have been nice if there were simpler rules, making it easier to use and to teach. Still I think it would be best to use the complex approach now, since that's what the standard specifies. During that process we can see whether there are more pitfalls. Then we can discuss it with other vendors and see whether we can change the wording of the standard. Do you agree?

The only requirement from the standard is that we parse [[likely]] or [[unlikely]] on a statement or label, that we don't allow conflicting attributes in the same attribute sequence, and that the attribute has no arguments to it. All the rest is recommended practice that's left up to the implementation as a matter of QoI. So we conform to the standard by following the approach on compound statements and labels + the constraint checking. We may not be following the recommended practice precisely, but we may not *want* to follow the recommended practice because it's bad for usability. So I'd like us to design the feature that we think gives the most value and is the easiest to use that matches the recommended practice as best we can, then start talking to other vendors. If other vendors don't want to follow our design, that's totally fine, but we should ensure our design *allows* users to write code that will behave similarly for other implementations without diagnostics. e.g., we don't want to design something where the user has to use macros to avoid diagnostics in cross-compiler code. After that, we may want to propose some changes to the recommended practice to WG21.

From my current thinking, it seems that there may be agreement that allowing these on arbitrary statements may be really difficult for users to understand the behavior of and that compound statements and labels are what we think is an understandable and useful feature and is also a strict subset of what we COULD support. By that, I think we should limit the feature to just compound statements and labels; this leaves the door open to allow the attributes on arbitrary statements in the future without breaking code. If we allow on arbitrary statements from the start, we can't later restrict the feature because we'd break code. This lets us start getting field experience with that behavior to see how we like it, which may also help when talking to other vendors because we'll have something concrete to talk about (hopefully). WDYT?

I'm somewhat on the fence. In general I prefer to implement what the standard expects. However the more I try to do that, the more I'm convinced what the standard suggests is difficult to achieve and more importantly to difficult explain to C++ users. In order to implement this feature properly I've been looking at using the CFG. That allows me to follow the path of execution properly. However it makes the feature more complicated. AFAIK the CFG is only used in Clang for AnalysisBasedWarnings::IssueWarnings. This feels like a hint, this is not the proper solution.

FWIW, that's where we implement the logic for the [[fallthrough]] attribute as well, so there's certainly precedent for it. When you consider how tightly coupled the feature is to control flow, using the control flow graph doesn't seem entirely unreasonable. I think the potential downside will be performance concerns (you have to opt in to fallthrough diagnostics, so you don't pay for the CFG expense unless you enable it), but those may be negligible depending on how you're implementing it.

Another "interesting" issue with the current approach is, that it'll be hard to determine the likelihood when looking at the code:

if(a) {
    foo(a);
    [[likely]] ++a;
}

If foo is declared [[noreturn]] the ++a is never executed so the attribute shouldn't affect the likelihood. (This is of course great when creating C++ trivia, but not great for users.)

Hah, what a fun use case! :-D

Another issue is to properly implement it when a branch has as a simple if inside it:

if(a > 5) {
    if(a < 10)
        ++a;
    [[likely]] ++a;
}

Obviously the attribute is in the a > 5 branch and will always be executed, but I haven't found a solution to figure that out using the CFG. I need to find a way to determine the a < 10 will always resume the original branch. I'm convinced it can be done, but it made me wonder whether this is the best approach.

You should be able to tell that information from the CFG, I believe, because the exit node for both paths should converge. However, "should be able to" and "can easily do" are not the same thing; I don't know how much of a challenge it would be.

So I think it would be wise follow your suggestion. Implement a simple version and try to convince other vendors to follow suit. This leaves the door open to implement a more complex solution if needed. I think this approach does C++ users and ourselves a favour. I only feel we need a slight relaxation in the requirements. In your approach the attribute is only allowed on a compound statement. I would like to allow it on the first statement, like it was in my initial patch. This allows using the attribute without being forced to use a compound statement. I feel forcing users to use a certain coding style is a bad idea.

if(a) [[likely]] { // Good allowed on the compound statement
  ...
}

if(a)
    [[likely]] return; // Good allowed on the first statement

Yes, I agree this makes sense -- the attribute is being written on the substatement of a control flow branch (whether it's a compound statement or just a regular statement).

if(a) {

[[likely]] return; // Ignored, not on the first statement

}

Agreed.

if(a) // Attribute not allowed on a declaration,

[[likely]] int i = 5;  // but I can't think about a good use-case
                       // for this code.

This is a fun example because I can think of a somewhat reasonable use-case but we can't support it without a compound statement. :-D The declaration could be an RAII object,

if (a)
  [[likely]] SomeRAIIObj Obj(*a);

However, you're right that we cannot write the attribute there because a declaration-statement cannot be attributed (http://eel.is/c++draft/stmt.stmt#stmt.pre-1), so the attribute instead appertains to the declaration and not the statement. So the user would have to write:

if (a) [[likely]] {
  SomeRAIIObj Obj(*a);
}

That said, I think this is enough of a corner case that requiring the compound statement is not really a burden. If we find users run into that often (they'd get an attribute ignored error if they tried), we could add a fix-it to help them out, but that doesn't seem necessary initially.

if(a) [[likely]] { // Good allowed on the compound statement

  int i = 5;     // Now i seems to have a purpose
...

}

if(a) [[likely]] {} A diagnostic is issued and the attributes
else [[likely]]] {}
are ignored

The diagnostics for ignored attributes are enabled by default so contradiction is visible by default. For the `switch` I'll use the previous proposal.
WDYT?

I think this all seems reasonable to me.

If C++20 was still in development I would even consider to write a proposal to make the attributes more like __builtin_expect:

[[likely]] if(a) {
  // Considered to be arbitrarily likely
} else {
  // Considered to be arbitrarily unlikely
}

if(a) {
} [[unlikely]] else { // Not allowed on the else statement
}

if(a) {
  // No likelihood suggestion
} else
  // No likelihood suggestion

  [[unlikely]] if{
  // Considered to be arbitrarily likely
}

But it's too late to consider this change.

Yeah, unfortunately, I think that ship has sailed.

That is exactly the behavior I am coming to believe we should follow. You can either write it on a compound statement that is guarded by a flow control decision (if/else/while) or you can write it on a label, otherwise the attribute is parsed and ignored (with a diagnostic). This feels like the right mixture of useful with understandable semantics so far, but perhaps we'll find other examples that change our minds.

The fallthrough behavior question has one more case we may want to think about:

switch (x) {
case 0:
case 1:
[[likely]] case 2: break;
[[unlikely]] default:
}

Does this mark cases 0 and 1 as being likely or not? I could see users wanting to use shorthand rather than repeat themselves on all the cases. However, I'm also not certain whether there would be any performance impact if we marked only case 2 as likely and left cases 0 and 1 with default likelihood. My gut feeling is that this should only mark case 2, but others may have different views.

Not according to the standard: "A path of execution includes a label if and only if it contains a jump to that label."
However for an if statement I use 2 weights: 2000 for likely, 1 for unlikely. (These can be configured by a command-line option already used for __builtin_expect.)
For a switch I intent to use 3 weights: 2000 for likely, (2000 + 1)/2 for no likelihood, 1 for unlikely. __builtin_expect doesn't have a 'neutral' values since in a switch you can only mark one branch as likely. Instead of adding a new option I picked the midpoint.
So the weights in your example would be:

> switch (x) {
case 0:  // 1000
case 1:  // 1000
[[likely]] case 2: break; // 2000
[[unlikely]] default: // 1
 }

In this case the user could also write:

> switch (x) {
case 0:  // 1000
case 1:  // 1000
case 2: break; // 1000
[[unlikely]] default: // 1
 }

where the values 0, 1, 2 still would be more likely than the default statement. Obviously this will be documented when I implement the switch.
How do you feel about this approach?

As one of the SG14 industry members driving this, I'm firmly in the camp that this is what we're expecting. In the first case the 1/2 case are "neutral". This is a very explicit, and local, marker. Anything else makes it so vague as to be unusable for fine tuned code.

I should also make the point that we are not talking about a feature that is expected, or indeed should be, used by anyone other than someone with an exceedingly good understanding of what is going on. This is not a "teach everyone about it, it's safe" feature. It's there to produce a very fine-grained control in those cases where it really matters, and where profiling-guided optimizations would produce exactly the wrong result. Using this feature should be an automatic "is this needed" question in a code review. It is needed sometimes, just rarely.

if(a) // Attribute not allowed on a declaration,

[[likely]] int i = 5;  // but I can't think about a good use-case
                       // for this code.

if(a) [[likely]] { // Good allowed on the compound statement

  int i = 5;     // Now i seems to have a purpose
...

}

The use case for this becomes clearer when considering that the attribute that will be used 95% of the time is [[unlikely]]. You may have a constructor that you wish to ask the compiler to please, please, do not inline this, in a particular function, even if the rest of that function is either likely or neutral.

if (a)
  [[unlikely]] expensive_class q{};

This could be the case if expensive_class held large static members, for instance.

This feels like the wrong approach to me... but I admit that I don't know what the "right" approach might be. (I doubt any right approach exists.)

if (ch == ' ') [[likely]] {
    goto whitespace;  // A
} else if (ch == '\n' || ch == '\t') [[unlikely]] {
    goto whitespace;  // B
} else {
    foo();
}
[[likely]] whitespace: bar();  // C

It seems like this patch would basically "copy" the [[likely]] attribute from line C up to lines A and B, where it would reinforce the likelihood of path A and (maybe?) "cancel out" the unlikelihood of path B, without actually saying anything specifically about the likelihood of label C (which is surely what the programmer intended by applying the attribute, right?). OTOH, I can't think of any particular optimization that would care about the likelihood of label C. I could imagine trying to align label C to a 4-byte boundary or something, but that wouldn't be an optimization on any platform as far as I know.

The only reason I could see for writing the above (and it's far more likely to be written with [[unlikely]] in my experience) is to control the probability that the call to bar() gets inlined or not.

That is exactly the behavior I am coming to believe we should follow. You can either write it on a compound statement that is guarded by a flow control decision (if/else/while) or you can write it on a label, otherwise the attribute is parsed and ignored (with a diagnostic). This feels like the right mixture of useful with understandable semantics so far, but perhaps we'll find other examples that change our minds.

The fallthrough behavior question has one more case we may want to think about:

switch (x) {
case 0:
case 1:
[[likely]] case 2: break;
[[unlikely]] default:
}

Does this mark cases 0 and 1 as being likely or not? I could see users wanting to use shorthand rather than repeat themselves on all the cases. However, I'm also not certain whether there would be any performance impact if we marked only case 2 as likely and left cases 0 and 1 with default likelihood. My gut feeling is that this should only mark case 2, but others may have different views.

Not according to the standard: "A path of execution includes a label if and only if it contains a jump to that label."

A switch statement contains a jump to all of its labels, though. So all of those labels are included in the path of execution, which is not likely what's intended by the standard.

In the example above if x == 0 there will be a jump to case 0 which then falls through to case 1 and case 2 so case 0 doesn't jump to case 2 and thus doesn't "execute" the label.

if(a) {

[[likely]] return; // Ignored, not on the first statement

}

Agreed.

if(a) // Attribute not allowed on a declaration,

[[likely]] int i = 5;  // but I can't think about a good use-case
                       // for this code.

This is a fun example because I can think of a somewhat reasonable use-case but we can't support it without a compound statement. :-D The declaration could be an RAII object,

if (a)
  [[likely]] SomeRAIIObj Obj(*a);

However, you're right that we cannot write the attribute there because a declaration-statement cannot be attributed (http://eel.is/c++draft/stmt.stmt#stmt.pre-1), so the attribute instead appertains to the declaration and not the statement. So the user would have to write:

if (a) [[likely]] {
  SomeRAIIObj Obj(*a);
}

That said, I think this is enough of a corner case that requiring the compound statement is not really a burden. If we find users run into that often (they'd get an attribute ignored error if they tried), we could add a fix-it to help them out, but that doesn't seem necessary initially.

I had thought about RAII before and I think there it's also not a real issue. Your example does the same as:

if (a)
   [[likely]] SomeRAIIObj{*a};

Here's no declaration and the attribute is allowed. If the RAII object is used in a declaration I expect it usually will be inside a compound statement to create a scope where the object is alive. In that case the attribute is placed on the compound statement. I don't expect people to really write code like this, but it may happen when using macros.

As one of the SG14 industry members driving this, I'm firmly in the camp that this is what we're expecting. In the first case the 1/2 case are "neutral". This is a very explicit, and local, marker. Anything else makes it so vague as to be unusable for fine tuned code.

Thanks for your interest and affirming this looks like the right path to take.

I should also make the point that we are not talking about a feature that is expected, or indeed should be, used by anyone other than someone with an exceedingly good understanding of what is going on. This is not a "teach everyone about it, it's safe" feature. It's there to produce a very fine-grained control in those cases where it really matters, and where profiling-guided optimizations would produce exactly the wrong result. Using this feature should be an automatic "is this needed" question in a code review. It is needed sometimes, just rarely.

I think it's hard to predict how the feature will be used. For example if a well known C++ gives a presentation at a conference about the attributes it might be more used. Even though as humans we're bad at guessing the performance bottleneck in our code I still think there are cases where the attribute can be used without testing. For example when writing a cache in an application you can mark the cache-hit to be more likely. If that isn't the case there's no reason for adding a cache. (Of course it would still be wise to measure.)

The use case for this becomes clearer when considering that the attribute that will be used 95% of the time is [[unlikely]]. You may have a constructor that you wish to ask the compiler to please, please, do not inline this, in a particular function, even if the rest of that function is either likely or neutral.

At the moment the likelihood is used only for branch prediction and not for inlining decisions. I'm not sure it would be a good idea to use this attribute for that duality. If it's wanted to control the inlining of calls at the callers side I feel a separate attribute would make more sense.

void foo()
{
  [[unlikely]] expensive_class q{};
  ...
}

This looks odd to me and the attribute isn't allowed on the declaration. I think this looks better:

void foo()
{
  [[noinline]] expensive_class q{};
  ...
}
if (a)
  [[unlikely]] expensive_class q{};

This could be the case if expensive_class held large static members, for instance.

This can be rewritten to be an expression and not a declaration, then the attribute can be used:

if(a)
  [[unlikely]] expensive_class{};

As one of the SG14 industry members driving this, I'm firmly in the camp that this is what we're expecting. In the first case the 1/2 case are "neutral". This is a very explicit, and local, marker. Anything else makes it so vague as to be unusable for fine tuned code.

Thank you for chiming in!

I should also make the point that we are not talking about a feature that is expected, or indeed should be, used by anyone other than someone with an exceedingly good understanding of what is going on.

That doesn't mean we should design something that's really hard to use for average folks too.

This is not a "teach everyone about it, it's safe" feature. It's there to produce a very fine-grained control in those cases where it really matters, and where profiling-guided optimizations would produce exactly the wrong result. Using this feature should be an automatic "is this needed" question in a code review. It is needed sometimes, just rarely.

+1 but I would point out that when PGO is enabled, this attribute is ignored, so it's an either/or feature. Either you get tuned optimizations, or you get to guess at them, but the current design doesn't let you mix and match. Do see that as being a major issue with the design?

In the example above if x == 0 there will be a jump to case 0 which then falls through to case 1 and case 2 so case 0 doesn't jump to case 2 and thus doesn't "execute" the label.

My point was that the standard doesn't say the jump has to be executed, just that it has to exist. I think we both agree with how we'd like to interpret this bit, but if we're going to write a paper trying to improve the wording in the standard, I think this is a minor thing we could perhaps clean up.

I had thought about RAII before and I think there it's also not a real issue. Your example does the same as:

if (a)
   [[likely]] SomeRAIIObj{*a};

Here's no declaration and the attribute is allowed. If the RAII object is used in a declaration I expect it usually will be inside a compound statement to create a scope where the object is alive. In that case the attribute is placed on the compound statement. I don't expect people to really write code like this, but it may happen when using macros.

This is a good point. I also agree that more RAII uses are going to use a compound statement than not. I think we're in agreement that this isn't a case we need to do anything special for.

As one of the SG14 industry members driving this, I'm firmly in the camp that this is what we're expecting. In the first case the 1/2 case are "neutral". This is a very explicit, and local, marker. Anything else makes it so vague as to be unusable for fine tuned code.

Thank you for chiming in!

I should also make the point that we are not talking about a feature that is expected, or indeed should be, used by anyone other than someone with an exceedingly good understanding of what is going on.

That doesn't mean we should design something that's really hard to use for average folks too.

This is not a "teach everyone about it, it's safe" feature. It's there to produce a very fine-grained control in those cases where it really matters, and where profiling-guided optimizations would produce exactly the wrong result. Using this feature should be an automatic "is this needed" question in a code review. It is needed sometimes, just rarely.

+1 but I would point out that when PGO is enabled, this attribute is ignored, so it's an either/or feature. Either you get tuned optimizations, or you get to guess at them, but the current design doesn't let you mix and match. Do see that as being a major issue with the design?

We did have discussions in SG14 about the possible need for "inverse PGO" optimization passes, but we didn't have the compiler dev expertise to know if we were on a good track or not. From the 8 years I've spent working on this kind of code, I've never mixed the two, since the interactions are just too unpredictable. What I will frequently do however, is compare benchmarks between PGO generated code, and our hand-specified optimizations. Sometimes the PGO detects interesting program flow that a mere human brain could not have envisioned, which we can then take advantage of (or de-prioritize) as needed.

As one of the SG14 industry members driving this, I'm firmly in the camp that this is what we're expecting. In the first case the 1/2 case are "neutral". This is a very explicit, and local, marker. Anything else makes it so vague as to be unusable for fine tuned code.

Thank you for chiming in!

I should also make the point that we are not talking about a feature that is expected, or indeed should be, used by anyone other than someone with an exceedingly good understanding of what is going on.

That doesn't mean we should design something that's really hard to use for average folks too.

This is not a "teach everyone about it, it's safe" feature. It's there to produce a very fine-grained control in those cases where it really matters, and where profiling-guided optimizations would produce exactly the wrong result. Using this feature should be an automatic "is this needed" question in a code review. It is needed sometimes, just rarely.

+1 but I would point out that when PGO is enabled, this attribute is ignored, so it's an either/or feature. Either you get tuned optimizations, or you get to guess at them, but the current design doesn't let you mix and match. Do see that as being a major issue with the design?

We did have discussions in SG14 about the possible need for "inverse PGO" optimization passes, but we didn't have the compiler dev expertise to know if we were on a good track or not. From the 8 years I've spent working on this kind of code, I've never mixed the two, since the interactions are just too unpredictable. What I will frequently do however, is compare benchmarks between PGO generated code, and our hand-specified optimizations. Sometimes the PGO detects interesting program flow that a mere human brain could not have envisioned, which we can then take advantage of (or de-prioritize) as needed.

Great, thank you for your perspective!

This is not a "teach everyone about it, it's safe" feature. It's there to produce a very fine-grained control in those cases where it really matters, and where profiling-guided optimizations would produce exactly the wrong result. Using this feature should be an automatic "is this needed" question in a code review. It is needed sometimes, just rarely.

+1 but I would point out that when PGO is enabled, this attribute is ignored, so it's an either/or feature. Either you get tuned optimizations, or you get to guess at them, but the current design doesn't let you mix and match. Do see that as being a major issue with the design?

We did have discussions in SG14 about the possible need for "inverse PGO" optimization passes, but we didn't have the compiler dev expertise to know if we were on a good track or not. From the 8 years I've spent working on this kind of code, I've never mixed the two, since the interactions are just too unpredictable. What I will frequently do however, is compare benchmarks between PGO generated code, and our hand-specified optimizations. Sometimes the PGO detects interesting program flow that a mere human brain could not have envisioned, which we can then take advantage of (or de-prioritize) as needed.

Are you currently using __builtin_expect and test whether these expectations match with the PGO expectations? This isn't part of the current patches, but it's something I intent to look at later.

This is not a "teach everyone about it, it's safe" feature. It's there to produce a very fine-grained control in those cases where it really matters, and where profiling-guided optimizations would produce exactly the wrong result. Using this feature should be an automatic "is this needed" question in a code review. It is needed sometimes, just rarely.

+1 but I would point out that when PGO is enabled, this attribute is ignored, so it's an either/or feature. Either you get tuned optimizations, or you get to guess at them, but the current design doesn't let you mix and match. Do see that as being a major issue with the design?

We did have discussions in SG14 about the possible need for "inverse PGO" optimization passes, but we didn't have the compiler dev expertise to know if we were on a good track or not. From the 8 years I've spent working on this kind of code, I've never mixed the two, since the interactions are just too unpredictable. What I will frequently do however, is compare benchmarks between PGO generated code, and our hand-specified optimizations. Sometimes the PGO detects interesting program flow that a mere human brain could not have envisioned, which we can then take advantage of (or de-prioritize) as needed.

Are you currently using __builtin_expect and test whether these expectations match with the PGO expectations? This isn't part of the current patches, but it's something I intent to look at later.

Not per se (as I understand the question). We use PGO as inspiration to find interesting code paths, but if we find a substantial overlap with such a code path and our annotations, then that is a decided code smell - we're manually stating something the caches and system will discover on it's own. To our mind, __builtin_expect, the noinline attribute, and other tools at our disposal to shape the generated code should only be used in those extreme cases where the metrics of the compiler, together with the runtime cache / preload systems will get it wrong for our "one--in-ten-million" use cases. The testing we do is typically by looking at the overall performance of the application over a day's processing (albeit at accelerated timescales). If we want to take advantage of an interesting code path discovered by examining the results of a PGO run, we'll craft the code to produce that outcome (including manual assembly if needed). That way we avoid trying to do both manual and automatic tuning - that rarely ends in a a happy place in our experience.

Mordante updated this revision to Diff 290809.Sep 9 2020, 1:00 PM

Update the patch to match the behaviour where the attribute only affects the substatement of an if statement. This means the likelihood attributes on labels are accepted but are a no-op. Since they're a no-op the documentation doesn't mention them.

Friendly ping.

aaron.ballman added inline comments.Sun, Sep 27, 6:02 AM
clang/lib/Sema/SemaDeclAttr.cpp
6454

This is entering into somewhat novel territory for attributes, so some of this feedback is me thinking out loud.

Attr.td lists these two attributes as being a StmtAttr and not any kind of declaration attribute. We have DeclOrTypeAttr for attributes that can be applied to declarations or types, but we do not have something similar for statement attributes yet. We do have some custom semantic handling logic in SemaDeclAttr.cpp for statement attributes, but you avoid hitting that code path by adding a case for the two likelihood attributes. These attributes only apply to label declarations currently, and labels cannot be redeclared, so there aren't yet questions about whether this is inheritable or not. So we *might* be okay with this, but I'm not 100% certain. For instance, I don't recall if the pretty printer or AST dumpers will need to distinguish between whether this attribute is written on the statement or the declaration (which is itself a bit of an interesting question: should the attribute attach only to the statement rather than trying to attach to the underlying decl? http://eel.is/c++draft/stmt.stmt#stmt.label-1.sentence-2 is ambiguous, but I don't think of case or default labels as being declarations so I tend to not think of identifier labels as such either.). There's a part of me that wonders if we have a different issue where the attribute is trying to attach to the declaration rather than the statement and that this should be handled purely as a statement attribute.

I'm curious what your thoughts are, but I'd also like to see some additional tests for the random other bits that interact with attributes like AST dumping and pretty printing to be sure the behavior is reasonable.

Mordante added inline comments.Sun, Sep 27, 11:28 AM
clang/lib/Sema/SemaDeclAttr.cpp
6454

The labels in a switch are indeed different and the code in trunk already should allow the attribute there. (I'm still busy with the CodeGen patch.)
I agree that Standard isn't clear whether the attribute is attached to the statement or the declaration.

The LabelDecl expects a pointer to a LabelStmt and not to an AttributedStmt. Since declarations can already have attributes I used that feature. I just checked and the LabelDecl isn't shown in the AST and so the attributes also aren't shown. I can adjust that.

Another option would be to change the LabelDecl and have two overloads of setStmt
void setStmt(LabelStmt *T) { TheStmt = T; }
void setStmt(AttributedStmt *T) { TheStmt = T; }
Then TheStmt needs to be a Stmt and an extra getter would be required to get the generic statement.

I think both solutions aren't trivial changes. Currently the attribute has no effect on labels so it not being visible isn't a real issue. However I feel that's not a proper solution. I expect attributes will be used more in C and C++ in the future. For example, I can imagine a [[cold]] attribute becoming available for labels.

So I'm leaning towards biting the bullet and change the implementation of LabelDecl to allow an AttributedStmt instead of a LabelStmt.
WDYT?

aaron.ballman added inline comments.Mon, Sep 28, 9:38 AM
clang/lib/Sema/SemaDeclAttr.cpp
6454

Currently the attribute has no effect on labels so it not being visible isn't a real issue.

That's not entirely true though -- we have pretty printing capabilities that will lose the attribute when written on a label, so round-tripping through the pretty printer will fail. But we have quite a few issues with pretty printing attributes as it stands, so I'm not super concerned either.

So I'm leaning towards biting the bullet and change the implementation of LabelDecl to allow an AttributedStmt instead of a LabelStmt.
WDYT?

I'm curious if @rsmith feels the same way, but I think something along those lines makes sense (if not an overload, perhaps a templated function with SFINAE). We'd have to make sure that the attributed statement eventually resolves to a LabelStmt once we strip the attributes away, but this would keep the attribute at the statement level rather than making it a declaration one, which I think is more along the lines of what's intended for the likelihood attributes (and probably for hot/cold if we add that support later).

Mordante planned changes to this revision.Sun, Oct 18, 4:59 AM
Mordante added inline comments.
clang/lib/Sema/SemaDeclAttr.cpp
6454

Currently the attribute has no effect on labels so it not being visible isn't a real issue.

That's not entirely true though -- we have pretty printing capabilities that will lose the attribute when written on a label, so round-tripping through the pretty printer will fail. But we have quite a few issues with pretty printing attributes as it stands, so I'm not super concerned either.

I'll keep that in mind when I start working on that.

So I'm leaning towards biting the bullet and change the implementation of LabelDecl to allow an AttributedStmt instead of a LabelStmt.
WDYT?

I'm curious if @rsmith feels the same way, but I think something along those lines makes sense (if not an overload, perhaps a templated function with SFINAE). We'd have to make sure that the attributed statement eventually resolves to a LabelStmt once we strip the attributes away, but this would keep the attribute at the statement level rather than making it a declaration one, which I think is more along the lines of what's intended for the likelihood attributes (and probably for hot/cold if we add that support later).

Yes if we go for an overload I need to make sure that the attributed statement has a LabelStmt as its substatement. I haven't looked into how to enforce that.

@rsmith Any opinion on whether the likelihood attribute should be "attached" to the label declaration or the label statement?