This is an archive of the discontinued LLVM Phabricator instance.

[Sema, CodeGen] Allow [[likely]] and [[unlikely]] on labels
Changes PlannedPublic

Authored by Mordante on Aug 25 2020, 11:08 AM.

Details

Summary

Allows the likelihood attributes on label, which are not part of a case statement. When a goto is used it accepts the attribute placed on the target label.

Depends on D85091.

Diff Detail

Event Timeline

Mordante created this revision.Aug 25 2020, 11:08 AM
Mordante requested review of this revision.Aug 25 2020, 11:08 AM

This feels like the wrong approach to me... but I admit that I don't know what the "right" approach might be. (I doubt any right approach exists.)

if (ch == ' ') [[likely]] {
    goto whitespace;  // A
} else if (ch == '\n' || ch == '\t') [[unlikely]] {
    goto whitespace;  // B
} else {
    foo();
}
[[likely]] whitespace: bar();  // C

It seems like this patch would basically "copy" the [[likely]] attribute from line C up to lines A and B, where it would reinforce the likelihood of path A and (maybe?) "cancel out" the unlikelihood of path B, without actually saying anything specifically about the likelihood of label C (which is surely what the programmer intended by applying the attribute, right?). OTOH, I can't think of any particular optimization that would care about the likelihood of label C. I could imagine trying to align label C to a 4-byte boundary or something, but that wouldn't be an optimization on any platform as far as I know.

It seems like this patch would basically "copy" the [[likely]] attribute from line C up to lines A and B, where it would reinforce the likelihood of path A and (maybe?) "cancel out" the unlikelihood of path B, without actually saying anything specifically about the likelihood of label C (which is surely what the programmer intended by applying the attribute, right?).

Yes A is double [[likely]] and B is both [[likely]] and [[unlikely]]. Based on http://eel.is/c++draft/dcl.attr.likelihood#:attribute,likely
"The use of the likely attribute is intended to allow implementations to optimize for the case where paths of execution including it are arbitrarily more likely than any alternative path of execution that does not include such an attribute on a statement or label."
So it seem it's intended to see a goto as a part of a path of execution, since it's an unconditional jump.

But I think the standard should be improved:

if(foo) {
  [[likely]][[unlikely]] bar(1); // error: The attribute-token likely shall not appear in an
  bar(2);                                      //  attribute-specifier-seq that contains the attribute-token unlikely.
}
if(foo) {
  [[likely]] bar(1);         
  [[unlikely]] bar(2);                // This contradiction in the `ThenStmt` is allowed
}
if(foo) {
  [[likely]] bar(1);
} else {
  [[likely]] bar(2);                   // This contradiction between the `ThenStmt` and `ElseStmt` is allowed
}

So I'll work on a paper to discuss these issues. I already have some notes, but I would prefer to get more implementation experience before starting to write.
IMO we should ignore conflicting likelihoods and issue a diagnostic. (For a switch multiple [[likely]] cases would be allowed, but still no mixing.)

It seems like this patch would basically "copy" the [[likely]] attribute from line C up to lines A and B, where it would reinforce the likelihood of path A and (maybe?) "cancel out" the unlikelihood of path B, without actually saying anything specifically about the likelihood of label C (which is surely what the programmer intended by applying the attribute, right?).

Yes A is double [[likely]] and B is both [[likely]] and [[unlikely]]. Based on http://eel.is/c++draft/dcl.attr.likelihood#:attribute,likely
"The use of the likely attribute is intended to allow implementations to optimize for the case where paths of execution including it are arbitrarily more likely than any alternative path of execution that does not include such an attribute on a statement or label."
So it seem it's intended to see a goto as a part of a path of execution, since it's an unconditional jump.

But I think the standard should be improved:

if(foo) {
  [[likely]][[unlikely]] bar(1); // error: The attribute-token likely shall not appear in an
  bar(2);                                      //  attribute-specifier-seq that contains the attribute-token unlikely.
}
if(foo) {
  [[likely]] bar(1);         
  [[unlikely]] bar(2);                // This contradiction in the `ThenStmt` is allowed
}
if(foo) {
  [[likely]] bar(1);
} else {
  [[likely]] bar(2);                   // This contradiction between the `ThenStmt` and `ElseStmt` is allowed
}

So I'll work on a paper to discuss these issues. I already have some notes, but I would prefer to get more implementation experience before starting to write.
IMO we should ignore conflicting likelihoods and issue a diagnostic. (For a switch multiple [[likely]] cases would be allowed, but still no mixing.)

I'd like to understand your reasoning for ignore + diagnose as opposed to count attrs (+ optionally diagnose) or other strategies. I can see pros and cons to basically anything we do here.

If we decide to ignore conflicting likelihoods, then I agree we should issue a diagnostic. However, I suspect that's going to come up frequently in practice because people are going to write macros that include these attributes. For instance, it's very sensible for a developer to put [[unlikely]] in front of an assert under the assumption this is telling the compiler "this assertion isn't likely to happen" when in fact it's saying "this branch containing the assertion is unlikely to be taken". This is one of the reasons why the GCC behavior of allowing these attributes on arbitrary statements feels like an awful trap for users to get into. If the attribute only applied to compound statements following a flow control construct, I think the average programmer will have a far better chance of not using this wrong. (I know that I said in the other thread we should match the GCC behavior, but I'm still uncomfortable with it because that behavior seems to be fraught with dangers for users, and this patch introduces new sharp edges, as @Quuxplusone pointed out.)

I'd like to understand your reasoning for ignore + diagnose as opposed to count attrs (+ optionally diagnose) or other strategies. I can see pros and cons to basically anything we do here.

If we decide to ignore conflicting likelihoods, then I agree we should issue a diagnostic. However, I suspect that's going to come up frequently in practice because people are going to write macros that include these attributes. For instance, it's very sensible for a developer to put [[unlikely]] in front of an assert under the assumption this is telling the compiler "this assertion isn't likely to happen" when in fact it's saying "this branch containing the assertion is unlikely to be taken".

This macro is example why I dislike the counting. If MyAssert has an [[unlikely]] attribute, then changing the number of MyAssert statements will influence the likelihood of the branches taken.

This is one of the reasons why the GCC behavior of allowing these attributes on arbitrary statements feels like an awful trap for users to get into. If the attribute only applied to compound statements following a flow control construct, I think the average programmer will have a far better chance of not using this wrong. (I know that I said in the other thread we should match the GCC behavior, but I'm still uncomfortable with it because that behavior seems to be fraught with dangers for users, and this patch introduces new sharp edges, as @Quuxplusone pointed out.)

I'm also not fond of GCC's implementation, but I think their implementation conforms with the standard. In hindsight I think it indeed makes more sense to only allow it on compound statements. That will require thinking about how to mark multiple cases as [[likely]] when falling through:

switch(a) {
  case '0':
  case '1':
  case '2': [[likely]] { return 'a' - '0'; }      // All 3 cases likely?

  case '3':                                                      // neutral?
  case '4': [[likely]]()                                    // likely?
  case '5': [[unlikely]] {return 'a' - '0'; }  // unlikely?
}

Of course this can be solved by only allowing it on the case labels:

switch(a) {
 [[likely]] case '0':
 [[likely]] case '1':
 [[likely]] case '2':  { return 'a' - '0'; }    // All 3 cases likely

  case '3':                                                     // neutral
  [[likely]] case '4':                                    // likely
  [[unlikely case '5':  {return 'a' - '0'; }  // unlikely
}

I fully agree the behaviour mandated by the standard is way too complex and user unfriendly. It would have been nice if there were simpler rules, making it easier to use and to teach. Still I think it would be best to use the complex approach now, since that's what the standard specifies. During that process we can see whether there are more pitfalls. Then we can discuss it with other vendors and see whether we can change the wording of the standard. Do you agree?

I'd like to understand your reasoning for ignore + diagnose as opposed to count attrs (+ optionally diagnose) or other strategies. I can see pros and cons to basically anything we do here.

If we decide to ignore conflicting likelihoods, then I agree we should issue a diagnostic. However, I suspect that's going to come up frequently in practice because people are going to write macros that include these attributes. For instance, it's very sensible for a developer to put [[unlikely]] in front of an assert under the assumption this is telling the compiler "this assertion isn't likely to happen" when in fact it's saying "this branch containing the assertion is unlikely to be taken".

This macro is example why I dislike the counting. If MyAssert has an [[unlikely]] attribute, then changing the number of MyAssert statements will influence the likelihood of the branches taken.

<nod>, but this is an example of why I don't like the behavior of applying this to arbitrary statements; this is spooky behavior that would be very hard to spot in code reviews unless you're an expert on C++.

This is one of the reasons why the GCC behavior of allowing these attributes on arbitrary statements feels like an awful trap for users to get into. If the attribute only applied to compound statements following a flow control construct, I think the average programmer will have a far better chance of not using this wrong. (I know that I said in the other thread we should match the GCC behavior, but I'm still uncomfortable with it because that behavior seems to be fraught with dangers for users, and this patch introduces new sharp edges, as @Quuxplusone pointed out.)

I'm also not fond of GCC's implementation, but I think their implementation conforms with the standard.

Conforming to the standard is not hard in this case. We can parse the attributes and ignore their semantics entirely (but not the constraints) and that also conforms to the standard. Our job is to figure out what the best behavior is for our implementation (which may or may not mean following the GCC behavior).

In hindsight I think it indeed makes more sense to only allow it on compound statements. That will require thinking about how to mark multiple cases as [[likely]] when falling through:

switch(a) {
  case '0':
  case '1':
  case '2': [[likely]] { return 'a' - '0'; }      // All 3 cases likely?

  case '3':                                                      // neutral?
  case '4': [[likely]]()                                    // likely?
  case '5': [[unlikely]] {return 'a' - '0'; }  // unlikely?
}

Of course this can be solved by only allowing it on the case labels:

switch(a) {
 [[likely]] case '0':
 [[likely]] case '1':
 [[likely]] case '2':  { return 'a' - '0'; }    // All 3 cases likely

  case '3':                                                     // neutral
  [[likely]] case '4':                                    // likely
  [[unlikely case '5':  {return 'a' - '0'; }  // unlikely
}

That is exactly the behavior I am coming to believe we should follow. You can either write it on a compound statement that is guarded by a flow control decision (if/else/while) or you can write it on a label, otherwise the attribute is parsed and ignored (with a diagnostic). This feels like the right mixture of useful with understandable semantics so far, but perhaps we'll find other examples that change our minds.

The fallthrough behavior question has one more case we may want to think about:

switch (x) {
case 0:
case 1:
[[likely]] case 2: break;
[[unlikely]] default:
}

Does this mark cases 0 and 1 as being likely or not? I could see users wanting to use shorthand rather than repeat themselves on all the cases. However, I'm also not certain whether there would be any performance impact if we marked only case 2 as likely and left cases 0 and 1 with default likelihood. My gut feeling is that this should only mark case 2, but others may have different views.

I fully agree the behaviour mandated by the standard is way too complex and user unfriendly. It would have been nice if there were simpler rules, making it easier to use and to teach. Still I think it would be best to use the complex approach now, since that's what the standard specifies. During that process we can see whether there are more pitfalls. Then we can discuss it with other vendors and see whether we can change the wording of the standard. Do you agree?

The only requirement from the standard is that we parse [[likely]] or [[unlikely]] on a statement or label, that we don't allow conflicting attributes in the same attribute sequence, and that the attribute has no arguments to it. All the rest is recommended practice that's left up to the implementation as a matter of QoI. So we conform to the standard by following the approach on compound statements and labels + the constraint checking. We may not be following the recommended practice precisely, but we may not *want* to follow the recommended practice because it's bad for usability. So I'd like us to design the feature that we think gives the most value and is the easiest to use that matches the recommended practice as best we can, then start talking to other vendors. If other vendors don't want to follow our design, that's totally fine, but we should ensure our design *allows* users to write code that will behave similarly for other implementations without diagnostics. e.g., we don't want to design something where the user has to use macros to avoid diagnostics in cross-compiler code. After that, we may want to propose some changes to the recommended practice to WG21.

From my current thinking, it seems that there may be agreement that allowing these on arbitrary statements may be really difficult for users to understand the behavior of and that compound statements and labels are what we think is an understandable and useful feature and is also a strict subset of what we COULD support. By that, I think we should limit the feature to just compound statements and labels; this leaves the door open to allow the attributes on arbitrary statements in the future without breaking code. If we allow on arbitrary statements from the start, we can't later restrict the feature because we'd break code. This lets us start getting field experience with that behavior to see how we like it, which may also help when talking to other vendors because we'll have something concrete to talk about (hopefully). WDYT?

Mordante planned changes to this revision.Aug 27 2020, 10:21 AM

That is exactly the behavior I am coming to believe we should follow. You can either write it on a compound statement that is guarded by a flow control decision (if/else/while) or you can write it on a label, otherwise the attribute is parsed and ignored (with a diagnostic). This feels like the right mixture of useful with understandable semantics so far, but perhaps we'll find other examples that change our minds.

The fallthrough behavior question has one more case we may want to think about:

switch (x) {
case 0:
case 1:
[[likely]] case 2: break;
[[unlikely]] default:
}

Does this mark cases 0 and 1 as being likely or not? I could see users wanting to use shorthand rather than repeat themselves on all the cases. However, I'm also not certain whether there would be any performance impact if we marked only case 2 as likely and left cases 0 and 1 with default likelihood. My gut feeling is that this should only mark case 2, but others may have different views.

Not according to the standard: "A path of execution includes a label if and only if it contains a jump to that label."
However for an if statement I use 2 weights: 2000 for likely, 1 for unlikely. (These can be configured by a command-line option already used for __builtin_expect.)
For a switch I intent to use 3 weights: 2000 for likely, (2000 + 1)/2 for no likelihood, 1 for unlikely. __builtin_expect doesn't have a 'neutral' values since in a switch you can only mark one branch as likely. Instead of adding a new option I picked the midpoint.
So the weights in your example would be:

> switch (x) {
case 0:  // 1000
case 1:  // 1000
[[likely]] case 2: break; // 2000
[[unlikely]] default: // 1
 }

In this case the user could also write:

> switch (x) {
case 0:  // 1000
case 1:  // 1000
case 2: break; // 1000
[[unlikely]] default: // 1
 }

where the values 0, 1, 2 still would be more likely than the default statement. Obviously this will be documented when I implement the switch.
How do you feel about this approach?

I fully agree the behaviour mandated by the standard is way too complex and user unfriendly. It would have been nice if there were simpler rules, making it easier to use and to teach. Still I think it would be best to use the complex approach now, since that's what the standard specifies. During that process we can see whether there are more pitfalls. Then we can discuss it with other vendors and see whether we can change the wording of the standard. Do you agree?

The only requirement from the standard is that we parse [[likely]] or [[unlikely]] on a statement or label, that we don't allow conflicting attributes in the same attribute sequence, and that the attribute has no arguments to it. All the rest is recommended practice that's left up to the implementation as a matter of QoI. So we conform to the standard by following the approach on compound statements and labels + the constraint checking. We may not be following the recommended practice precisely, but we may not *want* to follow the recommended practice because it's bad for usability. So I'd like us to design the feature that we think gives the most value and is the easiest to use that matches the recommended practice as best we can, then start talking to other vendors. If other vendors don't want to follow our design, that's totally fine, but we should ensure our design *allows* users to write code that will behave similarly for other implementations without diagnostics. e.g., we don't want to design something where the user has to use macros to avoid diagnostics in cross-compiler code. After that, we may want to propose some changes to the recommended practice to WG21.

From my current thinking, it seems that there may be agreement that allowing these on arbitrary statements may be really difficult for users to understand the behavior of and that compound statements and labels are what we think is an understandable and useful feature and is also a strict subset of what we COULD support. By that, I think we should limit the feature to just compound statements and labels; this leaves the door open to allow the attributes on arbitrary statements in the future without breaking code. If we allow on arbitrary statements from the start, we can't later restrict the feature because we'd break code. This lets us start getting field experience with that behavior to see how we like it, which may also help when talking to other vendors because we'll have something concrete to talk about (hopefully). WDYT?

I'm somewhat on the fence. In general I prefer to implement what the standard expects. However the more I try to do that, the more I'm convinced what the standard suggests is difficult to achieve and more importantly to difficult explain to C++ users. In order to implement this feature properly I've been looking at using the CFG. That allows me to follow the path of execution properly. However it makes the feature more complicated. AFAIK the CFG is only used in Clang for AnalysisBasedWarnings::IssueWarnings. This feels like a hint, this is not the proper solution. Another "interesting" issue with the current approach is, that it'll be hard to determine the likelihood when looking at the code:

if(a) {
    foo(a);
    [[likely]] ++a;
}

If foo is declared [[noreturn]] the ++a is never executed so the attribute shouldn't affect the likelihood. (This is of course great when creating C++ trivia, but not great for users.)
Another issue is to properly implement it when a branch has as a simple if inside it:

if(a > 5) {
    if(a < 10)
        ++a;
    [[likely]] ++a;
}

Obviously the attribute is in the a > 5 branch and will always be executed, but I haven't found a solution to figure that out using the CFG. I need to find a way to determine the a < 10 will always resume the original branch. I'm convinced it can be done, but it made me wonder whether this is the best approach.

So I think it would be wise follow your suggestion. Implement a simple version and try to convince other vendors to follow suit. This leaves the door open to implement a more complex solution if needed. I think this approach does C++ users and ourselves a favour. I only feel we need a slight relaxation in the requirements. In your approach the attribute is only allowed on a compound statement. I would like to allow it on the first statement, like it was in my initial patch. This allows using the attribute without being forced to use a compound statement. I feel forcing users to use a certain coding style is a bad idea.

if(a) [[likely]] { // Good allowed on the compound statement
  ...
}

if(a)
    [[likely]] return; // Good allowed on the first statement

if(a) {
    [[likely]] return; // Ignored, not on the first statement
}

if(a)                      // Attribute not allowed on a declaration,
    [[likely]] int i = 5;  // but I can't think about a good use-case
                           // for this code.

if(a) [[likely]] { // Good allowed on the compound statement
    int i = 5;     // Now i seems to have a purpose
  ...
}

if(a) [[likely]] {} // A diagnostic is issued and the attributes
else [[likely]]] {} // are ignored

The diagnostics for ignored attributes are enabled by default so contradiction is visible by default. For the switch I'll use the previous proposal.
WDYT?

If C++20 was still in development I would even consider to write a proposal to make the attributes more like __builtin_expect:

[[likely]] if(a) {
  // Considered to be arbitrarily likely
} else {
  // Considered to be arbitrarily unlikely
}

if(a) {
} [[unlikely]] else { // Not allowed on the else statement
}

if(a) {
  // No likelihood suggestion
} else
  // No likelihood suggestion

  [[unlikely]] if{
  // Considered to be arbitrarily likely
}

But it's too late to consider this change.

That is exactly the behavior I am coming to believe we should follow. You can either write it on a compound statement that is guarded by a flow control decision (if/else/while) or you can write it on a label, otherwise the attribute is parsed and ignored (with a diagnostic). This feels like the right mixture of useful with understandable semantics so far, but perhaps we'll find other examples that change our minds.

The fallthrough behavior question has one more case we may want to think about:

switch (x) {
case 0:
case 1:
[[likely]] case 2: break;
[[unlikely]] default:
}

Does this mark cases 0 and 1 as being likely or not? I could see users wanting to use shorthand rather than repeat themselves on all the cases. However, I'm also not certain whether there would be any performance impact if we marked only case 2 as likely and left cases 0 and 1 with default likelihood. My gut feeling is that this should only mark case 2, but others may have different views.

Not according to the standard: "A path of execution includes a label if and only if it contains a jump to that label."

A switch statement contains a jump to all of its labels, though. So all of those labels are included in the path of execution, which is not likely what's intended by the standard.

However for an if statement I use 2 weights: 2000 for likely, 1 for unlikely. (These can be configured by a command-line option already used for __builtin_expect.)
For a switch I intent to use 3 weights: 2000 for likely, (2000 + 1)/2 for no likelihood, 1 for unlikely. __builtin_expect doesn't have a 'neutral' values since in a switch you can only mark one branch as likely. Instead of adding a new option I picked the midpoint.
So the weights in your example would be:

> switch (x) {
case 0:  // 1000
case 1:  // 1000
[[likely]] case 2: break; // 2000
[[unlikely]] default: // 1
 }

I think this is defensible.

In this case the user could also write:

> switch (x) {
case 0:  // 1000
case 1:  // 1000
case 2: break; // 1000
[[unlikely]] default: // 1
 }

where the values 0, 1, 2 still would be more likely than the default statement. Obviously this will be documented when I implement the switch.
How do you feel about this approach?

I think it's a reasonable approach; at least, I can't think of a case where this falls over.

I fully agree the behaviour mandated by the standard is way too complex and user unfriendly. It would have been nice if there were simpler rules, making it easier to use and to teach. Still I think it would be best to use the complex approach now, since that's what the standard specifies. During that process we can see whether there are more pitfalls. Then we can discuss it with other vendors and see whether we can change the wording of the standard. Do you agree?

The only requirement from the standard is that we parse [[likely]] or [[unlikely]] on a statement or label, that we don't allow conflicting attributes in the same attribute sequence, and that the attribute has no arguments to it. All the rest is recommended practice that's left up to the implementation as a matter of QoI. So we conform to the standard by following the approach on compound statements and labels + the constraint checking. We may not be following the recommended practice precisely, but we may not *want* to follow the recommended practice because it's bad for usability. So I'd like us to design the feature that we think gives the most value and is the easiest to use that matches the recommended practice as best we can, then start talking to other vendors. If other vendors don't want to follow our design, that's totally fine, but we should ensure our design *allows* users to write code that will behave similarly for other implementations without diagnostics. e.g., we don't want to design something where the user has to use macros to avoid diagnostics in cross-compiler code. After that, we may want to propose some changes to the recommended practice to WG21.

From my current thinking, it seems that there may be agreement that allowing these on arbitrary statements may be really difficult for users to understand the behavior of and that compound statements and labels are what we think is an understandable and useful feature and is also a strict subset of what we COULD support. By that, I think we should limit the feature to just compound statements and labels; this leaves the door open to allow the attributes on arbitrary statements in the future without breaking code. If we allow on arbitrary statements from the start, we can't later restrict the feature because we'd break code. This lets us start getting field experience with that behavior to see how we like it, which may also help when talking to other vendors because we'll have something concrete to talk about (hopefully). WDYT?

I'm somewhat on the fence. In general I prefer to implement what the standard expects. However the more I try to do that, the more I'm convinced what the standard suggests is difficult to achieve and more importantly to difficult explain to C++ users. In order to implement this feature properly I've been looking at using the CFG. That allows me to follow the path of execution properly. However it makes the feature more complicated. AFAIK the CFG is only used in Clang for AnalysisBasedWarnings::IssueWarnings. This feels like a hint, this is not the proper solution.

FWIW, that's where we implement the logic for the [[fallthrough]] attribute as well, so there's certainly precedent for it. When you consider how tightly coupled the feature is to control flow, using the control flow graph doesn't seem entirely unreasonable. I think the potential downside will be performance concerns (you have to opt in to fallthrough diagnostics, so you don't pay for the CFG expense unless you enable it), but those may be negligible depending on how you're implementing it.

Another "interesting" issue with the current approach is, that it'll be hard to determine the likelihood when looking at the code:

if(a) {
    foo(a);
    [[likely]] ++a;
}

If foo is declared [[noreturn]] the ++a is never executed so the attribute shouldn't affect the likelihood. (This is of course great when creating C++ trivia, but not great for users.)

Hah, what a fun use case! :-D

Another issue is to properly implement it when a branch has as a simple if inside it:

if(a > 5) {
    if(a < 10)
        ++a;
    [[likely]] ++a;
}

Obviously the attribute is in the a > 5 branch and will always be executed, but I haven't found a solution to figure that out using the CFG. I need to find a way to determine the a < 10 will always resume the original branch. I'm convinced it can be done, but it made me wonder whether this is the best approach.

You should be able to tell that information from the CFG, I believe, because the exit node for both paths should converge. However, "should be able to" and "can easily do" are not the same thing; I don't know how much of a challenge it would be.

So I think it would be wise follow your suggestion. Implement a simple version and try to convince other vendors to follow suit. This leaves the door open to implement a more complex solution if needed. I think this approach does C++ users and ourselves a favour. I only feel we need a slight relaxation in the requirements. In your approach the attribute is only allowed on a compound statement. I would like to allow it on the first statement, like it was in my initial patch. This allows using the attribute without being forced to use a compound statement. I feel forcing users to use a certain coding style is a bad idea.

if(a) [[likely]] { // Good allowed on the compound statement
  ...
}

if(a)
    [[likely]] return; // Good allowed on the first statement

Yes, I agree this makes sense -- the attribute is being written on the substatement of a control flow branch (whether it's a compound statement or just a regular statement).

if(a) {

[[likely]] return; // Ignored, not on the first statement

}

Agreed.

if(a) // Attribute not allowed on a declaration,

[[likely]] int i = 5;  // but I can't think about a good use-case
                       // for this code.

This is a fun example because I can think of a somewhat reasonable use-case but we can't support it without a compound statement. :-D The declaration could be an RAII object,

if (a)
  [[likely]] SomeRAIIObj Obj(*a);

However, you're right that we cannot write the attribute there because a declaration-statement cannot be attributed (http://eel.is/c++draft/stmt.stmt#stmt.pre-1), so the attribute instead appertains to the declaration and not the statement. So the user would have to write:

if (a) [[likely]] {
  SomeRAIIObj Obj(*a);
}

That said, I think this is enough of a corner case that requiring the compound statement is not really a burden. If we find users run into that often (they'd get an attribute ignored error if they tried), we could add a fix-it to help them out, but that doesn't seem necessary initially.

if(a) [[likely]] { // Good allowed on the compound statement

  int i = 5;     // Now i seems to have a purpose
...

}

if(a) [[likely]] {} A diagnostic is issued and the attributes
else [[likely]]] {}
are ignored

The diagnostics for ignored attributes are enabled by default so contradiction is visible by default. For the `switch` I'll use the previous proposal.
WDYT?

I think this all seems reasonable to me.

If C++20 was still in development I would even consider to write a proposal to make the attributes more like __builtin_expect:

[[likely]] if(a) {
  // Considered to be arbitrarily likely
} else {
  // Considered to be arbitrarily unlikely
}

if(a) {
} [[unlikely]] else { // Not allowed on the else statement
}

if(a) {
  // No likelihood suggestion
} else
  // No likelihood suggestion

  [[unlikely]] if{
  // Considered to be arbitrarily likely
}

But it's too late to consider this change.

Yeah, unfortunately, I think that ship has sailed.

That is exactly the behavior I am coming to believe we should follow. You can either write it on a compound statement that is guarded by a flow control decision (if/else/while) or you can write it on a label, otherwise the attribute is parsed and ignored (with a diagnostic). This feels like the right mixture of useful with understandable semantics so far, but perhaps we'll find other examples that change our minds.

The fallthrough behavior question has one more case we may want to think about:

switch (x) {
case 0:
case 1:
[[likely]] case 2: break;
[[unlikely]] default:
}

Does this mark cases 0 and 1 as being likely or not? I could see users wanting to use shorthand rather than repeat themselves on all the cases. However, I'm also not certain whether there would be any performance impact if we marked only case 2 as likely and left cases 0 and 1 with default likelihood. My gut feeling is that this should only mark case 2, but others may have different views.

Not according to the standard: "A path of execution includes a label if and only if it contains a jump to that label."
However for an if statement I use 2 weights: 2000 for likely, 1 for unlikely. (These can be configured by a command-line option already used for __builtin_expect.)
For a switch I intent to use 3 weights: 2000 for likely, (2000 + 1)/2 for no likelihood, 1 for unlikely. __builtin_expect doesn't have a 'neutral' values since in a switch you can only mark one branch as likely. Instead of adding a new option I picked the midpoint.
So the weights in your example would be:

> switch (x) {
case 0:  // 1000
case 1:  // 1000
[[likely]] case 2: break; // 2000
[[unlikely]] default: // 1
 }

In this case the user could also write:

> switch (x) {
case 0:  // 1000
case 1:  // 1000
case 2: break; // 1000
[[unlikely]] default: // 1
 }

where the values 0, 1, 2 still would be more likely than the default statement. Obviously this will be documented when I implement the switch.
How do you feel about this approach?

As one of the SG14 industry members driving this, I'm firmly in the camp that this is what we're expecting. In the first case the 1/2 case are "neutral". This is a very explicit, and local, marker. Anything else makes it so vague as to be unusable for fine tuned code.

I should also make the point that we are not talking about a feature that is expected, or indeed should be, used by anyone other than someone with an exceedingly good understanding of what is going on. This is not a "teach everyone about it, it's safe" feature. It's there to produce a very fine-grained control in those cases where it really matters, and where profiling-guided optimizations would produce exactly the wrong result. Using this feature should be an automatic "is this needed" question in a code review. It is needed sometimes, just rarely.

if(a) // Attribute not allowed on a declaration,

[[likely]] int i = 5;  // but I can't think about a good use-case
                       // for this code.

if(a) [[likely]] { // Good allowed on the compound statement

  int i = 5;     // Now i seems to have a purpose
...

}

The use case for this becomes clearer when considering that the attribute that will be used 95% of the time is [[unlikely]]. You may have a constructor that you wish to ask the compiler to please, please, do not inline this, in a particular function, even if the rest of that function is either likely or neutral.

if (a)
  [[unlikely]] expensive_class q{};

This could be the case if expensive_class held large static members, for instance.

This feels like the wrong approach to me... but I admit that I don't know what the "right" approach might be. (I doubt any right approach exists.)

if (ch == ' ') [[likely]] {
    goto whitespace;  // A
} else if (ch == '\n' || ch == '\t') [[unlikely]] {
    goto whitespace;  // B
} else {
    foo();
}
[[likely]] whitespace: bar();  // C

It seems like this patch would basically "copy" the [[likely]] attribute from line C up to lines A and B, where it would reinforce the likelihood of path A and (maybe?) "cancel out" the unlikelihood of path B, without actually saying anything specifically about the likelihood of label C (which is surely what the programmer intended by applying the attribute, right?). OTOH, I can't think of any particular optimization that would care about the likelihood of label C. I could imagine trying to align label C to a 4-byte boundary or something, but that wouldn't be an optimization on any platform as far as I know.

The only reason I could see for writing the above (and it's far more likely to be written with [[unlikely]] in my experience) is to control the probability that the call to bar() gets inlined or not.

That is exactly the behavior I am coming to believe we should follow. You can either write it on a compound statement that is guarded by a flow control decision (if/else/while) or you can write it on a label, otherwise the attribute is parsed and ignored (with a diagnostic). This feels like the right mixture of useful with understandable semantics so far, but perhaps we'll find other examples that change our minds.

The fallthrough behavior question has one more case we may want to think about:

switch (x) {
case 0:
case 1:
[[likely]] case 2: break;
[[unlikely]] default:
}

Does this mark cases 0 and 1 as being likely or not? I could see users wanting to use shorthand rather than repeat themselves on all the cases. However, I'm also not certain whether there would be any performance impact if we marked only case 2 as likely and left cases 0 and 1 with default likelihood. My gut feeling is that this should only mark case 2, but others may have different views.

Not according to the standard: "A path of execution includes a label if and only if it contains a jump to that label."

A switch statement contains a jump to all of its labels, though. So all of those labels are included in the path of execution, which is not likely what's intended by the standard.

In the example above if x == 0 there will be a jump to case 0 which then falls through to case 1 and case 2 so case 0 doesn't jump to case 2 and thus doesn't "execute" the label.

if(a) {

[[likely]] return; // Ignored, not on the first statement

}

Agreed.

if(a) // Attribute not allowed on a declaration,

[[likely]] int i = 5;  // but I can't think about a good use-case
                       // for this code.

This is a fun example because I can think of a somewhat reasonable use-case but we can't support it without a compound statement. :-D The declaration could be an RAII object,

if (a)
  [[likely]] SomeRAIIObj Obj(*a);

However, you're right that we cannot write the attribute there because a declaration-statement cannot be attributed (http://eel.is/c++draft/stmt.stmt#stmt.pre-1), so the attribute instead appertains to the declaration and not the statement. So the user would have to write:

if (a) [[likely]] {
  SomeRAIIObj Obj(*a);
}

That said, I think this is enough of a corner case that requiring the compound statement is not really a burden. If we find users run into that often (they'd get an attribute ignored error if they tried), we could add a fix-it to help them out, but that doesn't seem necessary initially.

I had thought about RAII before and I think there it's also not a real issue. Your example does the same as:

if (a)
   [[likely]] SomeRAIIObj{*a};

Here's no declaration and the attribute is allowed. If the RAII object is used in a declaration I expect it usually will be inside a compound statement to create a scope where the object is alive. In that case the attribute is placed on the compound statement. I don't expect people to really write code like this, but it may happen when using macros.

As one of the SG14 industry members driving this, I'm firmly in the camp that this is what we're expecting. In the first case the 1/2 case are "neutral". This is a very explicit, and local, marker. Anything else makes it so vague as to be unusable for fine tuned code.

Thanks for your interest and affirming this looks like the right path to take.

I should also make the point that we are not talking about a feature that is expected, or indeed should be, used by anyone other than someone with an exceedingly good understanding of what is going on. This is not a "teach everyone about it, it's safe" feature. It's there to produce a very fine-grained control in those cases where it really matters, and where profiling-guided optimizations would produce exactly the wrong result. Using this feature should be an automatic "is this needed" question in a code review. It is needed sometimes, just rarely.

I think it's hard to predict how the feature will be used. For example if a well known C++ gives a presentation at a conference about the attributes it might be more used. Even though as humans we're bad at guessing the performance bottleneck in our code I still think there are cases where the attribute can be used without testing. For example when writing a cache in an application you can mark the cache-hit to be more likely. If that isn't the case there's no reason for adding a cache. (Of course it would still be wise to measure.)

The use case for this becomes clearer when considering that the attribute that will be used 95% of the time is [[unlikely]]. You may have a constructor that you wish to ask the compiler to please, please, do not inline this, in a particular function, even if the rest of that function is either likely or neutral.

At the moment the likelihood is used only for branch prediction and not for inlining decisions. I'm not sure it would be a good idea to use this attribute for that duality. If it's wanted to control the inlining of calls at the callers side I feel a separate attribute would make more sense.

void foo()
{
  [[unlikely]] expensive_class q{};
  ...
}

This looks odd to me and the attribute isn't allowed on the declaration. I think this looks better:

void foo()
{
  [[noinline]] expensive_class q{};
  ...
}
if (a)
  [[unlikely]] expensive_class q{};

This could be the case if expensive_class held large static members, for instance.

This can be rewritten to be an expression and not a declaration, then the attribute can be used:

if(a)
  [[unlikely]] expensive_class{};

As one of the SG14 industry members driving this, I'm firmly in the camp that this is what we're expecting. In the first case the 1/2 case are "neutral". This is a very explicit, and local, marker. Anything else makes it so vague as to be unusable for fine tuned code.

Thank you for chiming in!

I should also make the point that we are not talking about a feature that is expected, or indeed should be, used by anyone other than someone with an exceedingly good understanding of what is going on.

That doesn't mean we should design something that's really hard to use for average folks too.

This is not a "teach everyone about it, it's safe" feature. It's there to produce a very fine-grained control in those cases where it really matters, and where profiling-guided optimizations would produce exactly the wrong result. Using this feature should be an automatic "is this needed" question in a code review. It is needed sometimes, just rarely.

+1 but I would point out that when PGO is enabled, this attribute is ignored, so it's an either/or feature. Either you get tuned optimizations, or you get to guess at them, but the current design doesn't let you mix and match. Do see that as being a major issue with the design?

In the example above if x == 0 there will be a jump to case 0 which then falls through to case 1 and case 2 so case 0 doesn't jump to case 2 and thus doesn't "execute" the label.

My point was that the standard doesn't say the jump has to be executed, just that it has to exist. I think we both agree with how we'd like to interpret this bit, but if we're going to write a paper trying to improve the wording in the standard, I think this is a minor thing we could perhaps clean up.

I had thought about RAII before and I think there it's also not a real issue. Your example does the same as:

if (a)
   [[likely]] SomeRAIIObj{*a};

Here's no declaration and the attribute is allowed. If the RAII object is used in a declaration I expect it usually will be inside a compound statement to create a scope where the object is alive. In that case the attribute is placed on the compound statement. I don't expect people to really write code like this, but it may happen when using macros.

This is a good point. I also agree that more RAII uses are going to use a compound statement than not. I think we're in agreement that this isn't a case we need to do anything special for.

As one of the SG14 industry members driving this, I'm firmly in the camp that this is what we're expecting. In the first case the 1/2 case are "neutral". This is a very explicit, and local, marker. Anything else makes it so vague as to be unusable for fine tuned code.

Thank you for chiming in!

I should also make the point that we are not talking about a feature that is expected, or indeed should be, used by anyone other than someone with an exceedingly good understanding of what is going on.

That doesn't mean we should design something that's really hard to use for average folks too.

This is not a "teach everyone about it, it's safe" feature. It's there to produce a very fine-grained control in those cases where it really matters, and where profiling-guided optimizations would produce exactly the wrong result. Using this feature should be an automatic "is this needed" question in a code review. It is needed sometimes, just rarely.

+1 but I would point out that when PGO is enabled, this attribute is ignored, so it's an either/or feature. Either you get tuned optimizations, or you get to guess at them, but the current design doesn't let you mix and match. Do see that as being a major issue with the design?

We did have discussions in SG14 about the possible need for "inverse PGO" optimization passes, but we didn't have the compiler dev expertise to know if we were on a good track or not. From the 8 years I've spent working on this kind of code, I've never mixed the two, since the interactions are just too unpredictable. What I will frequently do however, is compare benchmarks between PGO generated code, and our hand-specified optimizations. Sometimes the PGO detects interesting program flow that a mere human brain could not have envisioned, which we can then take advantage of (or de-prioritize) as needed.

As one of the SG14 industry members driving this, I'm firmly in the camp that this is what we're expecting. In the first case the 1/2 case are "neutral". This is a very explicit, and local, marker. Anything else makes it so vague as to be unusable for fine tuned code.

Thank you for chiming in!

I should also make the point that we are not talking about a feature that is expected, or indeed should be, used by anyone other than someone with an exceedingly good understanding of what is going on.

That doesn't mean we should design something that's really hard to use for average folks too.

This is not a "teach everyone about it, it's safe" feature. It's there to produce a very fine-grained control in those cases where it really matters, and where profiling-guided optimizations would produce exactly the wrong result. Using this feature should be an automatic "is this needed" question in a code review. It is needed sometimes, just rarely.

+1 but I would point out that when PGO is enabled, this attribute is ignored, so it's an either/or feature. Either you get tuned optimizations, or you get to guess at them, but the current design doesn't let you mix and match. Do see that as being a major issue with the design?

We did have discussions in SG14 about the possible need for "inverse PGO" optimization passes, but we didn't have the compiler dev expertise to know if we were on a good track or not. From the 8 years I've spent working on this kind of code, I've never mixed the two, since the interactions are just too unpredictable. What I will frequently do however, is compare benchmarks between PGO generated code, and our hand-specified optimizations. Sometimes the PGO detects interesting program flow that a mere human brain could not have envisioned, which we can then take advantage of (or de-prioritize) as needed.

Great, thank you for your perspective!

This is not a "teach everyone about it, it's safe" feature. It's there to produce a very fine-grained control in those cases where it really matters, and where profiling-guided optimizations would produce exactly the wrong result. Using this feature should be an automatic "is this needed" question in a code review. It is needed sometimes, just rarely.

+1 but I would point out that when PGO is enabled, this attribute is ignored, so it's an either/or feature. Either you get tuned optimizations, or you get to guess at them, but the current design doesn't let you mix and match. Do see that as being a major issue with the design?

We did have discussions in SG14 about the possible need for "inverse PGO" optimization passes, but we didn't have the compiler dev expertise to know if we were on a good track or not. From the 8 years I've spent working on this kind of code, I've never mixed the two, since the interactions are just too unpredictable. What I will frequently do however, is compare benchmarks between PGO generated code, and our hand-specified optimizations. Sometimes the PGO detects interesting program flow that a mere human brain could not have envisioned, which we can then take advantage of (or de-prioritize) as needed.

Are you currently using __builtin_expect and test whether these expectations match with the PGO expectations? This isn't part of the current patches, but it's something I intent to look at later.

This is not a "teach everyone about it, it's safe" feature. It's there to produce a very fine-grained control in those cases where it really matters, and where profiling-guided optimizations would produce exactly the wrong result. Using this feature should be an automatic "is this needed" question in a code review. It is needed sometimes, just rarely.

+1 but I would point out that when PGO is enabled, this attribute is ignored, so it's an either/or feature. Either you get tuned optimizations, or you get to guess at them, but the current design doesn't let you mix and match. Do see that as being a major issue with the design?

We did have discussions in SG14 about the possible need for "inverse PGO" optimization passes, but we didn't have the compiler dev expertise to know if we were on a good track or not. From the 8 years I've spent working on this kind of code, I've never mixed the two, since the interactions are just too unpredictable. What I will frequently do however, is compare benchmarks between PGO generated code, and our hand-specified optimizations. Sometimes the PGO detects interesting program flow that a mere human brain could not have envisioned, which we can then take advantage of (or de-prioritize) as needed.

Are you currently using __builtin_expect and test whether these expectations match with the PGO expectations? This isn't part of the current patches, but it's something I intent to look at later.

Not per se (as I understand the question). We use PGO as inspiration to find interesting code paths, but if we find a substantial overlap with such a code path and our annotations, then that is a decided code smell - we're manually stating something the caches and system will discover on it's own. To our mind, __builtin_expect, the noinline attribute, and other tools at our disposal to shape the generated code should only be used in those extreme cases where the metrics of the compiler, together with the runtime cache / preload systems will get it wrong for our "one--in-ten-million" use cases. The testing we do is typically by looking at the overall performance of the application over a day's processing (albeit at accelerated timescales). If we want to take advantage of an interesting code path discovered by examining the results of a PGO run, we'll craft the code to produce that outcome (including manual assembly if needed). That way we avoid trying to do both manual and automatic tuning - that rarely ends in a a happy place in our experience.

Mordante updated this revision to Diff 290809.Sep 9 2020, 1:00 PM

Update the patch to match the behaviour where the attribute only affects the substatement of an if statement. This means the likelihood attributes on labels are accepted but are a no-op. Since they're a no-op the documentation doesn't mention them.

Friendly ping.

aaron.ballman added inline comments.Sep 27 2020, 6:02 AM
clang/lib/Sema/SemaDeclAttr.cpp
6950

This is entering into somewhat novel territory for attributes, so some of this feedback is me thinking out loud.

Attr.td lists these two attributes as being a StmtAttr and not any kind of declaration attribute. We have DeclOrTypeAttr for attributes that can be applied to declarations or types, but we do not have something similar for statement attributes yet. We do have some custom semantic handling logic in SemaDeclAttr.cpp for statement attributes, but you avoid hitting that code path by adding a case for the two likelihood attributes. These attributes only apply to label declarations currently, and labels cannot be redeclared, so there aren't yet questions about whether this is inheritable or not. So we *might* be okay with this, but I'm not 100% certain. For instance, I don't recall if the pretty printer or AST dumpers will need to distinguish between whether this attribute is written on the statement or the declaration (which is itself a bit of an interesting question: should the attribute attach only to the statement rather than trying to attach to the underlying decl? http://eel.is/c++draft/stmt.stmt#stmt.label-1.sentence-2 is ambiguous, but I don't think of case or default labels as being declarations so I tend to not think of identifier labels as such either.). There's a part of me that wonders if we have a different issue where the attribute is trying to attach to the declaration rather than the statement and that this should be handled purely as a statement attribute.

I'm curious what your thoughts are, but I'd also like to see some additional tests for the random other bits that interact with attributes like AST dumping and pretty printing to be sure the behavior is reasonable.

Mordante added inline comments.Sep 27 2020, 11:28 AM
clang/lib/Sema/SemaDeclAttr.cpp
6950

The labels in a switch are indeed different and the code in trunk already should allow the attribute there. (I'm still busy with the CodeGen patch.)
I agree that Standard isn't clear whether the attribute is attached to the statement or the declaration.

The LabelDecl expects a pointer to a LabelStmt and not to an AttributedStmt. Since declarations can already have attributes I used that feature. I just checked and the LabelDecl isn't shown in the AST and so the attributes also aren't shown. I can adjust that.

Another option would be to change the LabelDecl and have two overloads of setStmt
void setStmt(LabelStmt *T) { TheStmt = T; }
void setStmt(AttributedStmt *T) { TheStmt = T; }
Then TheStmt needs to be a Stmt and an extra getter would be required to get the generic statement.

I think both solutions aren't trivial changes. Currently the attribute has no effect on labels so it not being visible isn't a real issue. However I feel that's not a proper solution. I expect attributes will be used more in C and C++ in the future. For example, I can imagine a [[cold]] attribute becoming available for labels.

So I'm leaning towards biting the bullet and change the implementation of LabelDecl to allow an AttributedStmt instead of a LabelStmt.
WDYT?

aaron.ballman added inline comments.Sep 28 2020, 9:38 AM
clang/lib/Sema/SemaDeclAttr.cpp
6950

Currently the attribute has no effect on labels so it not being visible isn't a real issue.

That's not entirely true though -- we have pretty printing capabilities that will lose the attribute when written on a label, so round-tripping through the pretty printer will fail. But we have quite a few issues with pretty printing attributes as it stands, so I'm not super concerned either.

So I'm leaning towards biting the bullet and change the implementation of LabelDecl to allow an AttributedStmt instead of a LabelStmt.
WDYT?

I'm curious if @rsmith feels the same way, but I think something along those lines makes sense (if not an overload, perhaps a templated function with SFINAE). We'd have to make sure that the attributed statement eventually resolves to a LabelStmt once we strip the attributes away, but this would keep the attribute at the statement level rather than making it a declaration one, which I think is more along the lines of what's intended for the likelihood attributes (and probably for hot/cold if we add that support later).

Mordante planned changes to this revision.Oct 18 2020, 4:59 AM
Mordante added inline comments.
clang/lib/Sema/SemaDeclAttr.cpp
6950

Currently the attribute has no effect on labels so it not being visible isn't a real issue.

That's not entirely true though -- we have pretty printing capabilities that will lose the attribute when written on a label, so round-tripping through the pretty printer will fail. But we have quite a few issues with pretty printing attributes as it stands, so I'm not super concerned either.

I'll keep that in mind when I start working on that.

So I'm leaning towards biting the bullet and change the implementation of LabelDecl to allow an AttributedStmt instead of a LabelStmt.
WDYT?

I'm curious if @rsmith feels the same way, but I think something along those lines makes sense (if not an overload, perhaps a templated function with SFINAE). We'd have to make sure that the attributed statement eventually resolves to a LabelStmt once we strip the attributes away, but this would keep the attribute at the statement level rather than making it a declaration one, which I think is more along the lines of what's intended for the likelihood attributes (and probably for hot/cold if we add that support later).

Yes if we go for an overload I need to make sure that the attributed statement has a LabelStmt as its substatement. I haven't looked into how to enforce that.

@rsmith Any opinion on whether the likelihood attribute should be "attached" to the label declaration or the label statement?

Mordante added inline comments.Oct 31 2020, 10:40 AM
clang/lib/Sema/SemaDeclAttr.cpp
6950

So I'm leaning towards biting the bullet and change the implementation of LabelDecl to allow an AttributedStmt instead of a LabelStmt.

Actually it seems this isn't required, it works when ActOnLabelStmt returns an AttributedStmt. This means the LabelDecl class doesn't need to change.

Mordante updated this revision to Diff 302103.Oct 31 2020, 10:43 AM

Sema::ActOnLabelStmt now processes the statement attributes placed on the LabelDecl. Returning an AttributedStmt from this function seems to work as intended. This changes the behaviour of [[nomerge]] being allowed on labels. [[nomerge]] has been introduced in D79121.

Given that some parts of this confused me, I'd like to make sure I'm clear on the approach you're taking. Based on the fact that labels are statements (defined as [stmt.label], in the statment production, etc), I would expect that attributes which appertain to labels all appertain to the statement and not the declaration and that the underlying issue is that we attach attributes to the label declaration (such as for __attribute__((unused)). So I wasn't expecting that we'd slide attributes around, but redefine which construct they live on (they'd always make an AttributedStmt for the label rather than adding the attribute to the Decl). However, I could imagine there being cases when we might want a helper function on LabelDecl that looks for attributes on the associated LabelStmt and returns results about it if that helps ease implementation or refactoring burdens. WDYT? Also, does @rsmith have opinions here?

clang/include/clang/Sema/Sema.h
4528

Don't we have a ParsedAttributesViewWithRange (or something along those lines) that could be used instead of separating attributes and their source range?

clang/lib/Parse/ParseStmt.cpp
677

I think I would have expected this to be calling Actions.ProcessStmtAttributes() as done a few lines up on 651 rather than changing the interface to ActOnLabelStmt().

clang/lib/Sema/SemaDeclAttr.cpp
7368

This also surprises me -- I wouldn't have expected the attribute to be in the list for the label decl in the first place.

clang/lib/Sema/TreeTransform.h
1307

I wouldn't have expected any changes to be needed here because the attributed statement should be rebuilt properly by RebuildAttributedStmt(), no?

Mordante planned changes to this revision.Nov 2 2020, 12:45 PM
Mordante marked 2 inline comments as done.

Then me try to clear up the confusion.
The parser first parses the LabelDecl and tries to attach the attributes to this declaration. If that fails instead of issue a diagnostic for a StmtAttr attached to a declaration it let's them be attached to the LabelStmt. This way attributes can be placed on both the LabelDecl and the LabelStmt. For example:

void foo() {
  __label__ lbl; // Needed to show the LabelDecl in the AST
  [[likely, clang::annotate("foo")]] lbl:;
}

Will result in the following AST

`-FunctionDecl 0x560be00b6728 </tmp/x.cpp:1:1, line:4:1> line:1:6 foo 'void ()'
  `-CompoundStmt 0x560be00b6a00 <col:12, line:4:1>
    |-DeclStmt 0x560be00b6860 <line:2:3, col:16>
    | `-LabelDecl 0x560be00b6810 <col:3, col:13> col:13 lbl
    |   `-AnnotateAttr 0x560be00b6920 <line:3:13, col:34> "foo"
    |     `-StringLiteral 0x560be00b68f8 <col:29> 'const char [4]' lvalue "foo"
    `-AttributedStmt 0x560be00b69e8 <col:3, col:42>
      |-LikelyAttr 0x560be00b69c0 <col:5>
      `-LabelStmt 0x560be00b69a8 <col:38, col:42> 'lbl'
        `-NullStmt 0x560be00b6918 <col:42>

I thought this was a sensible approach where I don't change the behaviour of DeclAttr on the LabelDecl, but at the same time allow the StmtAttr to be "forwarded" to the LabelStmt, turning it into an AttributedStmt with the LabelStmt as substatement.

However, I could imagine there being cases when we might want a helper function on LabelDecl that looks for attributes on the associated LabelStmt and returns results about it if that helps ease implementation or refactoring burdens.

If we want that we need to change the LabelDecl to point to either a LabelStmt or an AttributedStmt. This was the approach I thought I had to take, but I found this solution. We can still take that direction if wanted. But then we need to place DeclAttr in an AttributedStmt, not sure how well that works and how much additional code needs to change to find the attributes there. Since in that case we to call this helper function at the appropriate places.

Does this clear your confusion?
Do you agree with this approach or do you think changing the LabelDecl is the better solution?

clang/include/clang/Sema/Sema.h
4528

Yes there is. I initially used it, but at some point I changed it. I ran into some issues with it in an earlier iteration of the patch. I can't recall the exact details. I'll test again with the version with range.

Note since the new arguments aren't at the end, thus not defaulted the template rebuild call also needs to be changed.

clang/lib/Parse/ParseStmt.cpp
677

An interesting suggestion. I haven't tried that approach. I'll have to test whether it works. Since the ActOnLabelStmt uses the LabelDecl I think we can't do the same as on line 651. Maybe something like:

  • First call ActOnLabelStmt without the attributes,
  • If the result is valid and we have attributes call ProcessStmtAttributes

The difference with 651 is where the attributes are placed:

void foo() {
__label__ L; // Needed to show the LabelDecl in the AST
L: __attribute__((unused));

Results in this AST

`-FunctionDecl 0x55e1debc1728 </tmp/x.cpp:1:1, line:4:1> line:1:6 foo 'void ()'
  `-CompoundStmt 0x55e1debc18f0 <col:12, line:4:1>
    |-DeclStmt 0x55e1debc1860 <line:2:1, col:12>
    | `-LabelDecl 0x55e1debc1810 <col:1, col:11> col:11 L
    |   `-UnusedAttr 0x55e1debc1880 <line:3:19> unused
    `-LabelStmt 0x55e1debc18d8 <col:1, col:27> 'L'
      `-NullStmt 0x55e1debc1878 <col:27>

This is the GNU label attribute extension https://gcc.gnu.org/onlinedocs/gcc/Label-Attributes.html

clang/lib/Sema/SemaDeclAttr.cpp
7368

I was also a bit surprised, but the LabelDecl is parsed first and the attributes are attached to it. Probably done since it's already possible to attach attributes to a LabelDecl.

clang/lib/Sema/TreeTransform.h
1307

You're correct. There are no real changes here. I added a comment explaining why no changes are needed. Since the signature of ActOnLabelStmt has changed, I only call the function using its new interface, but without attributes.

Then me try to clear up the confusion.
The parser first parses the LabelDecl and tries to attach the attributes to this declaration. If that fails instead of issue a diagnostic for a StmtAttr attached to a declaration it let's them be attached to the LabelStmt. This way attributes can be placed on both the LabelDecl and the LabelStmt. For example:

void foo() {
  __label__ lbl; // Needed to show the LabelDecl in the AST
  [[likely, clang::annotate("foo")]] lbl:;
}

Will result in the following AST

`-FunctionDecl 0x560be00b6728 </tmp/x.cpp:1:1, line:4:1> line:1:6 foo 'void ()'
  `-CompoundStmt 0x560be00b6a00 <col:12, line:4:1>
    |-DeclStmt 0x560be00b6860 <line:2:3, col:16>
    | `-LabelDecl 0x560be00b6810 <col:3, col:13> col:13 lbl
    |   `-AnnotateAttr 0x560be00b6920 <line:3:13, col:34> "foo"
    |     `-StringLiteral 0x560be00b68f8 <col:29> 'const char [4]' lvalue "foo"
    `-AttributedStmt 0x560be00b69e8 <col:3, col:42>
      |-LikelyAttr 0x560be00b69c0 <col:5>
      `-LabelStmt 0x560be00b69a8 <col:38, col:42> 'lbl'
        `-NullStmt 0x560be00b6918 <col:42>

I thought this was a sensible approach where I don't change the behaviour of DeclAttr on the LabelDecl, but at the same time allow the StmtAttr to be "forwarded" to the LabelStmt, turning it into an AttributedStmt with the LabelStmt as substatement.

I think this is a defensible approach, FWIW.

However, I could imagine there being cases when we might want a helper function on LabelDecl that looks for attributes on the associated LabelStmt and returns results about it if that helps ease implementation or refactoring burdens.

If we want that we need to change the LabelDecl to point to either a LabelStmt or an AttributedStmt. This was the approach I thought I had to take, but I found this solution. We can still take that direction if wanted. But then we need to place DeclAttr in an AttributedStmt, not sure how well that works and how much additional code needs to change to find the attributes there. Since in that case we to call this helper function at the appropriate places.

Ah, I was thinking of something slightly different here. I was thinking that LabelDecl would hold a Stmt* so that it could be either a label statement or an attribute statement. The helper functions would give you access to the attributes of the statement and to the LabelStmt itself (stripping off any attributed statements). Then in Attr.td, we'd move attributes off the label declarations and onto the label statements. At the end of the day, all attributes on labels would appertain to the statement at the AST level, but you'd have an easier time transitioning in some places if you could get to the attributes if the only thing you have is the LabelDecl. (So this doesn't mix statement and declaration attributes together, it shifts the model to putting all attributes on labels on the statement level rather than having a somewhat odd split between decl and statement.)

Does this clear your confusion?
Do you agree with this approach or do you think changing the LabelDecl is the better solution?

Thank you for the explanations, I understand your design better now. I'm not certain what the right answer is yet, but thinking out loud about my concerns: I worry that making a distinction between a label statement and a label declaration (at least for attributes) generates heat without light. Making the attribute author decide "which label construct should my attribute appertain to" adds extra burden on attribute authors and I'm not sure I have any advice for them on whether to use the decl or the statement -- it seems entirely arbitrary. Coupled with the fact that the standard defines labels as being statements, that suggests to me that putting all of the attributes on the statement level is the right decision -- it's one place (removes confusion about where it goes), it's the "obvious" place (matches where the standard suggests that attributes live), and we should have all the machinery in place to make it possible within the compiler (given that you can reach the LabelStmt from the LabelDecl).

Any thoughts from @rjmccall or @rsmith?

Mordante marked 2 inline comments as done.Nov 3 2020, 10:16 AM

Then me try to clear up the confusion.

However, I could imagine there being cases when we might want a helper function on LabelDecl that looks for attributes on the associated LabelStmt and returns results about it if that helps ease implementation or refactoring burdens.

If we want that we need to change the LabelDecl to point to either a LabelStmt or an AttributedStmt. This was the approach I thought I had to take, but I found this solution. We can still take that direction if wanted. But then we need to place DeclAttr in an AttributedStmt, not sure how well that works and how much additional code needs to change to find the attributes there. Since in that case we to call this helper function at the appropriate places.

Ah, I was thinking of something slightly different here. I was thinking that LabelDecl would hold a Stmt* so that it could be either a label statement or an attribute statement.

Yes I wanted to take that route. While investigating that route I found the current solution and it seemed less intrusive. So that's why I went for the current approach.

The helper functions would give you access to the attributes of the statement and to the LabelStmt itself (stripping off any attributed statements). Then in Attr.td, we'd move attributes off the label declarations and onto the label statements. At the end of the day, all attributes on labels would appertain to the statement at the AST level, but you'd have an easier time transitioning in some places if you could get to the attributes if the only thing you have is the LabelDecl. (So this doesn't mix statement and declaration attributes together, it shifts the model to putting all attributes on labels on the statement level rather than having a somewhat odd split between decl and statement.)

If we go that route, then we need to think about attributes that are normally allowed on declarations. For example

def Unused : InheritableAttr {
  let Spellings = [CXX11<"", "maybe_unused", 201603>, GCC<"unused">,
                   C2x<"", "maybe_unused", 201904>];
  let Subjects = SubjectList<[Var, ObjCIvar, Type, Enum, EnumConstant, Label,
                              Field, ObjCMethod, FunctionLike]>;
  let Documentation = [WarnMaybeUnusedDocs];
}

It also seems attributes on the LabelDecl aren't visible in the AST at the moment. If I modify the example posted yesterday to:

void foo() {
  [[likely, clang::annotate("foo")]] lbl:;
}

The AST will become:

`-FunctionDecl 0x5607f7dbb7a8 </tmp/x.cpp:1:1, line:3:1> line:1:6 foo 'void ()'
  `-CompoundStmt 0x5607f7dbba60 <col:12, line:3:1>
    `-AttributedStmt 0x5607f7dbba48 <line:2:3, col:42>
      |-LikelyAttr 0x5607f7dbba20 <col:5>
      `-LabelStmt 0x5607f7dbba08 <col:38, col:42> 'lbl'
        `-NullStmt 0x5607f7dbb928 <col:42>

Maybe it would be a good idea to see whether the LabelDecl needs to be visible in the AST even if we proceed with the current approach.

Does this clear your confusion?
Do you agree with this approach or do you think changing the LabelDecl is the better solution?

Thank you for the explanations, I understand your design better now. I'm not certain what the right answer is yet, but thinking out loud about my concerns: I worry that making a distinction between a label statement and a label declaration (at least for attributes) generates heat without light. Making the attribute author decide "which label construct should my attribute appertain to" adds extra burden on attribute authors and I'm not sure I have any advice for them on whether to use the decl or the statement -- it seems entirely arbitrary. Coupled with the fact that the standard defines labels as being statements, that suggests to me that putting all of the attributes on the statement level is the right decision -- it's one place (removes confusion about where it goes), it's the "obvious" place (matches where the standard suggests that attributes live), and we should have all the machinery in place to make it possible within the compiler (given that you can reach the LabelStmt from the LabelDecl).

I agree it's not clear cut what the best way is. For example unused is a declaration attribute, but likely is a statement attribute. Both can be used on a label. I think for authors in most cases it will be clear, their attribute is also used on other declarations or statements. The harder case will be, when a label specific attribute is added, should it then go on the declaration or the statement. I foresee another issue when there will be attributes that can be attached to both a declaration and a statement, for example if we implement [[clang::suppress("compiler.warning.12345")]]. (But I suspect that attribute has more implementation challenges.) In the current approach that attribute will be attached to the declaration since that's parsed first. I think regardless of the final solution labels will remain a special case for attributes.

During the review of [[clang::nomerge]] the placement on labels was also discussed. I also see some threads on the dev-ml about making more use of attributes. For example to enhance the analyzers, which sounds very interesting. So I feel it's worth the effort to look for the "best" approach, even if the usefulness for the likelihood attributes is limited.

Then me try to clear up the confusion.

However, I could imagine there being cases when we might want a helper function on LabelDecl that looks for attributes on the associated LabelStmt and returns results about it if that helps ease implementation or refactoring burdens.

If we want that we need to change the LabelDecl to point to either a LabelStmt or an AttributedStmt. This was the approach I thought I had to take, but I found this solution. We can still take that direction if wanted. But then we need to place DeclAttr in an AttributedStmt, not sure how well that works and how much additional code needs to change to find the attributes there. Since in that case we to call this helper function at the appropriate places.

Ah, I was thinking of something slightly different here. I was thinking that LabelDecl would hold a Stmt* so that it could be either a label statement or an attribute statement.

Yes I wanted to take that route. While investigating that route I found the current solution and it seemed less intrusive. So that's why I went for the current approach.

The helper functions would give you access to the attributes of the statement and to the LabelStmt itself (stripping off any attributed statements). Then in Attr.td, we'd move attributes off the label declarations and onto the label statements. At the end of the day, all attributes on labels would appertain to the statement at the AST level, but you'd have an easier time transitioning in some places if you could get to the attributes if the only thing you have is the LabelDecl. (So this doesn't mix statement and declaration attributes together, it shifts the model to putting all attributes on labels on the statement level rather than having a somewhat odd split between decl and statement.)

If we go that route, then we need to think about attributes that are normally allowed on declarations. For example

def Unused : InheritableAttr {
  let Spellings = [CXX11<"", "maybe_unused", 201603>, GCC<"unused">,
                   C2x<"", "maybe_unused", 201904>];
  let Subjects = SubjectList<[Var, ObjCIvar, Type, Enum, EnumConstant, Label,
                              Field, ObjCMethod, FunctionLike]>;
  let Documentation = [WarnMaybeUnusedDocs];
}

Yup, annotate is another such attribute, to some degree. I was envisioning that these tablegen definitions would have to change slightly to allow for writing on a statement for the label case.

It also seems attributes on the LabelDecl aren't visible in the AST at the moment. If I modify the example posted yesterday to:

void foo() {
  [[likely, clang::annotate("foo")]] lbl:;
}

The AST will become:

`-FunctionDecl 0x5607f7dbb7a8 </tmp/x.cpp:1:1, line:3:1> line:1:6 foo 'void ()'
  `-CompoundStmt 0x5607f7dbba60 <col:12, line:3:1>
    `-AttributedStmt 0x5607f7dbba48 <line:2:3, col:42>
      |-LikelyAttr 0x5607f7dbba20 <col:5>
      `-LabelStmt 0x5607f7dbba08 <col:38, col:42> 'lbl'
        `-NullStmt 0x5607f7dbb928 <col:42>

Maybe it would be a good idea to see whether the LabelDecl needs to be visible in the AST even if we proceed with the current approach.

Good catch, I hadn't noticed that before. I agree, we should figure that out. Some simple testing shows clang-query also doesn't match label declarations with the decl() matcher, so the issue is wider than just dumping the AST.

Does this clear your confusion?
Do you agree with this approach or do you think changing the LabelDecl is the better solution?

Thank you for the explanations, I understand your design better now. I'm not certain what the right answer is yet, but thinking out loud about my concerns: I worry that making a distinction between a label statement and a label declaration (at least for attributes) generates heat without light. Making the attribute author decide "which label construct should my attribute appertain to" adds extra burden on attribute authors and I'm not sure I have any advice for them on whether to use the decl or the statement -- it seems entirely arbitrary. Coupled with the fact that the standard defines labels as being statements, that suggests to me that putting all of the attributes on the statement level is the right decision -- it's one place (removes confusion about where it goes), it's the "obvious" place (matches where the standard suggests that attributes live), and we should have all the machinery in place to make it possible within the compiler (given that you can reach the LabelStmt from the LabelDecl).

I agree it's not clear cut what the best way is. For example unused is a declaration attribute, but likely is a statement attribute. Both can be used on a label.

unused is also a bit weird because [[maybe_unused]] technically does not apply to labels (http://eel.is/c++draft/dcl.attr.unused#2, we are using a conforming extension in our implementation), but __attribute__((unused)) does apply to labels.

I think for authors in most cases it will be clear, their attribute is also used on other declarations or statements. The harder case will be, when a label specific attribute is added, should it then go on the declaration or the statement. I foresee another issue when there will be attributes that can be attached to both a declaration and a statement, for example if we implement [[clang::suppress("compiler.warning.12345")]]. (But I suspect that attribute has more implementation challenges.) In the current approach that attribute will be attached to the declaration since that's parsed first. I think regardless of the final solution labels will remain a special case for attributes.

I'm more and more thinking of LabelDecl as an implementation detail that allows us to perform lookups easier rather than a salient piece of the AST, so I guess I view label-specific attributes as being the easy case where the attributes consistently go on the statement level.

During the review of [[clang::nomerge]] the placement on labels was also discussed. I also see some threads on the dev-ml about making more use of attributes. For example to enhance the analyzers, which sounds very interesting. So I feel it's worth the effort to look for the "best" approach, even if the usefulness for the likelihood attributes is limited.

I agree and I greatly appreciate your patience while we work the design out.

What is the status of this patch?