This is an archive of the discontinued LLVM Phabricator instance.

[Sema, CodeGen] Implement [[likely]] and [[unlikely]] in IfStmt
ClosedPublic

Authored by Mordante on Aug 2 2020, 4:42 AM.

Details

Summary

This contains the initial part of the implementation for the C++20 likelihood attributes.
For now it only handles them in an IfStmt, I want to add support for other statements after this one is done.

I was unsure whether it's preferred to have one patch for both the Sema and CodeGen part. If wanted I can split them easily.

Diff Detail

Event Timeline

Mordante created this revision.Aug 2 2020, 4:42 AM
Herald added a project: Restricted Project. · View Herald Transcript
Mordante requested review of this revision.Aug 2 2020, 4:42 AM

Thank you for starting the implementation work on these attributes!

clang/include/clang/Basic/Attr.td
1292

Hmm, I'm on the fence about specifying 201803 for these attributes. Given that this is only the start of supporting the attribute, do we want to claim it already matches the standard's behavior? Or do we just want to return 1 to signify that we understand this attribute but we don't yet fully support it in common cases (such as on labels in switch statements, etc)?

As another question, should we consider adding a C2x spelling [[clang::likely]] and [[clang::unlikely]] to add this functionality to C?

clang/include/clang/Basic/AttrDocs.td
1698

Why? I would expect this to be reasonable code:

if (foo) [[likely]] {
  ...
} else if (bar) [[unlikely]] {
  ...
} else if (baz) [[likely]] {
  ...
} else {
  ...
}

Especially because I would expect this to be reasonable code:

switch (value) {
[[likely]] case 0: ... break;
[[unlikely]] case 1: ... break;
[[likely]] case 2: ... break;
[[unlikely]] default: ... break;
}

As motivating examples, consider a code generator that knows whether a particular branch is likely or not and it writes out the attribute on all branches. Or, something like this:

float rnd_value = get_super_random_number_between_zero_and_one();
if (rnd_value < .1) [[unlikely]] {
} else if (rnd_value > .9) [[unlikely]] {
} else [[likely]] {
}
1703

We should document whether we silently accept the attribute in other locations and do nothing with it (I'm not keen on this approach because it surprises users) or that we loudly diagnose the attribute in situations that we don't support it in (I think this is a more user-friendly approach, but may require a separate warning flag).

1709

Why does -Wimplicit-fallthrough matter?

clang/include/clang/Basic/DiagnosticGroups.td
659 ↗(On Diff #282444)

Is this because you expect people will want to control diagnostics on if statements separate from diagnostics in other syntactic locations?

clang/include/clang/Basic/DiagnosticSemaKinds.td
2401

This isn't needed, we already have err_attributes_are_not_compatible.

2403

We've already got note_conflicting_attribute for this as well.

clang/lib/CodeGen/CGStmt.cpp
658

const auto *

679–680

I don't think the frontend should issue a diagnostic for that situation. I think codegen should handle chains appropriately instead as those seem plausible in reasonable code.

Quuxplusone added inline comments.
clang/include/clang/Basic/AttrDocs.td
1698

Right, annotating both/multiple branches of a control statement should be perfectly fine. Even if (x) [[likely]] { } else [[likely]] { } should be perfectly okay as far as the code generator is concerned (and we should have a test for that); it's silly, but there's no reason to warn against it in the compiler docs.

Aaron, notice that if (x) [[likely]] { } else if (y) [[likely]] { } is not actually annotating "both" branches of any single if-statement. There you're annotating the true branch of an if (without annotating the else branch), and then annotating the true branch of another if (which doesn't have an else branch).

Mordante, in these docs, please document the "interesting" behavior of the standard attribute on labels — annotating a label is different from annotating the labeled statement itself. In particular,

[[likely]] case 1: case 2: foo(); break;
case 3: [[likely]] case 4: foo(); break;
case 5: case 6: [[likely]] foo(); break;

indicates that the likely cases are 1, 4, 5, and 6. (1 and 4 because their labels are annotated; 5 and 6 because their control flow passes through an annotated statement. Case 3's control flow passes through an annotated label, but that doesn't matter to the standard attribute.)

clang/include/clang/Basic/DiagnosticSemaKinds.td
2404

Typo: /declarered/declared/

clang/lib/Sema/SemaStmt.cpp
585

I agree with Aaron, this doesn't deserve a warning. Or at least there should be some thought put into the "threat model." (Are we protecting against someone typoing [[likely]]/[[likely]] when they meant [[likely]]/[[unlikely]]? But they weren't supposed to annotate both branches in the first place...)

clang/lib/Sema/SemaStmtAttr.cpp
349

Nit: Remove this blank line, for parallelism with lines 358-359.

clang/test/SemaCXX/attr-likelihood.cpp
102

I'd like to see a case like if (x) [[likely]] i=1; just to prove that it works on statements that aren't blocks or empty statements. (My understanding is that this should already work with your current patch.)

I'd like to see a case like if (x) { [[likely]] i=1; } to prove that it works on arbitrary statements. This should have the same effect as if (x) [[likely]] { i=1; }. My understanding is that your current patch doesn't get us there yet. If it's unclear how we'd get there by proceeding along your current trajectory, then I would question whether we want to commit to this trajectory at all, yet.

Mordante planned changes to this revision.Aug 14 2020, 12:52 PM
Mordante marked 14 inline comments as done.
Mordante added inline comments.
clang/include/clang/Basic/Attr.td
1292

I was also somewhat on the fence, for now I'll change it to specify 1.

I'll have a look at the C2x changes. Just curious do you know whether there's a proposal to add this to C2x?

clang/include/clang/Basic/AttrDocs.td
1698

Aaron, the current CodeGen code for the switch statement allows all branches to have a weight, for the if there are only two weights allowed. As Arthur explained the chained ifs are different statements, so this will work properly.

Arthur, I agree we should add the documentation about the switch, but I'd rather do that when the attributes are implemented for the switch. For now I think it would make sense to add some documentation about the fact that the attribute can be placed inside the CompoundStmt.

1703

I'll update the documentation. I intend to continue to work on implementing these attributes so I feel adding a temporary diagnostic is not worth the effort.

1709

It doesn't, thanks for spotting the copy-paste error.

clang/include/clang/Basic/DiagnosticGroups.td
659 ↗(On Diff #282444)

This is about warning about [[likely]] in both branches. Due to your an Arthur's remarks I've decided to remove them. Especially since Arthur correctly pointed out the attribute doesn't need to be directly after the if(x).

clang/include/clang/Basic/DiagnosticSemaKinds.td
2401

Thanks I didn't find them before, removed.

2403

Thanks I didn't find them before, removed.

clang/lib/CodeGen/CGStmt.cpp
679–680

This is the diagnostic about if(b) [[likely]] {...} else [[likely]] {...}. I'll remove the diagnostic, however the CodeGen will still prioritize the [[likely]] attribute in the ThenStmt. With Arthur's information the user can even do more unwise things:

if(b) [[likely]] {
    [[unlikely]] ...
} else [[likely]] {
    [[unlikely]]...
}

If the user provided conflicting attributes the CodeGen will not fail, but might not do what the user wanted.
This doesn't affect the chained if:

if(b) [[likely]] {
    ...
} else if(c) [[likely]] {
    ...
}

For the switch it will be allowed to annotate multiple cases with the same attribute and that should work as intended. (In a case it won't 'protect' against conflicting attributes in one case).

clang/lib/Sema/SemaStmt.cpp
585

Yes it protects against [[likely]]/[[likely]]. I've decided to remove the warning.

clang/test/SemaCXX/attr-likelihood.cpp
102

I agree it would be a good idea to add a test like if (x) [[likely]] i=1; but I don't feel adding an additional test in this file proves anything. I'll add a test to clang/test/CodeGenCXX/attr-likelihood-if-branch-weights.cpp to prove it not only accepts the code but also honours the attribute. This is especially important due to your correct observation that if (x) { [[likely]] i=1; } doesn't have any effect. The code is accepted but the CodeGen won't honour the attribute.

I think we can use the current approach, but I need to adjust getLikelihood. Instead of only testing whether the current Stmt is an AttributedStmt it needs to iterate over all Stmts and test them for the attribute. Obviously it should avoid looking into Stmts that also use the attribute, e.g:

if(b) {
    switch(c) {
        [[unlikely]] case 0: ... break; // The attribute "belongs" to the switch not to the if
        [[likely]] case 1: ... break; // The attribute "belongs" to the switch not to the if
    }
}
aaron.ballman added inline comments.Aug 16 2020, 5:19 AM
clang/include/clang/Basic/Attr.td
1292

I'll have a look at the C2x changes. Just curious do you know whether there's a proposal to add this to C2x?

There isn't one currently because there is no implementation experience with the attributes in C. This is a way to get that implementation experience so it's easier to propose the feature to WG14.

clang/include/clang/Basic/AttrDocs.td
1698

Aaron, notice that if (x) [[likely]] { } else if (y) [[likely]] { } is not actually annotating "both" branches of any single if-statement. There you're annotating the true branch of an if (without annotating the else branch), and then annotating the true branch of another if (which doesn't have an else branch).

I agree, which is why I pointed it out -- I think users are going to be surprised by this behavior because it won't do what they expect it to do.

As Arthur explained the chained ifs are different statements, so this will work properly.

We may have differing definitions of "properly." Users are going to expect to be able to write chains like this:

if (foo) [[likely]] {
} else if (bar) [[likely]] {
} else [[unlikely]] {
}

and have it behave as if they wrote this:

if (foo) [[likely]] {
} else [[likely]] if (bar) [[likely]] {
} else [[unlikely]] {
}

instead of this:

if (foo) [[likely]] {
} else [[unlikely]] if (bar) [[likely]] {
} else [[unlikely]] {
}

in terms of what information gets passed to the optimizer. Given what we currently can support for branch weights, Perhaps the behavior closest to the user's expectation would be as if they wrote this:

if (foo) {
} else if (bar) [[likely]] {
} else [[unlikely]] {
}

(So we say nothing about the first if/else decision branch because the user wants them both to be treated as equally likely.)

1703

SGTM!

clang/test/SemaCXX/attr-likelihood.cpp
102

This is especially important due to your correct observation that if (x) { [[likely]] i=1; } doesn't have any effect.

This code should diagnose the attribute as being ignored.

Mordante marked 14 inline comments as done.Aug 16 2020, 11:47 AM
Mordante added inline comments.
clang/include/clang/Basic/Attr.td
1292

I was also somewhat on the fence, for now I'll change it to specify 1.

There seems to be an issue using the 1 so I used 2 instead. (Didn't want to look closely at the issue.)

1292

I'll have a look at the C2x changes. Just curious do you know whether there's a proposal to add this to C2x?

There isn't one currently because there is no implementation experience with the attributes in C. This is a way to get that implementation experience so it's easier to propose the feature to WG14.

Nice to know. It seems the C2x wasn't at straightforward as I hoped, so I didn't implement it yet. I intend to look at it later. I first want the initial part done to see whether this is the right approach.

clang/include/clang/Basic/AttrDocs.td
1698

Aaron, notice that if (x) [[likely]] { } else if (y) [[likely]] { } is not actually annotating "both" branches of any single if-statement. There you're annotating the true branch of an if (without annotating the else branch), and then annotating the true branch of another if (which doesn't have an else branch).

I agree, which is why I pointed it out -- I think users are going to be surprised by this behavior because it won't do what they expect it to do.

As Arthur explained the chained ifs are different statements, so this will work properly.

We may have differing definitions of "properly." Users are going to expect to be able to write chains like this:

if (foo) [[likely]] {
} else if (bar) [[likely]] {
} else [[unlikely]] {
}

and have it behave as if they wrote this:

if (foo) [[likely]] {
} else [[likely]] if (bar) [[likely]] {
} else [[unlikely]] {
}

instead of this:

if (foo) [[likely]] {
} else [[unlikely]] if (bar) [[likely]] {
} else [[unlikely]] {
}

in terms of what information gets passed to the optimizer. Given what we currently can support for branch weights, Perhaps the behavior closest to the user's expectation would be as if they wrote this:

if (foo) {
} else if (bar) [[likely]] {
} else [[unlikely]] {
}

(So we say nothing about the first if/else decision branch because the user wants them both to be treated as equally likely.)

1698

We may have differing definitions of "properly." Users are going to expect to be able to write chains like this:

if (foo) [[likely]] {
} else if (bar) [[likely]] {
} else [[unlikely]] {
}

and have it behave as if they wrote this:

if (foo) [[likely]] {
} else [[likely]] if (bar) [[likely]] {
} else [[unlikely]] {
}

instead of this:

if (foo) [[likely]] {
} else [[unlikely]] if (bar) [[likely]] {
} else [[unlikely]] {
}

I consider this the "proper" behaviour from the compiler's point of view. I think it's not intended to "taint" the entire path to the attribute. E.g

if(foo) { // 1
   // do stuff
   if(error) [[unlikely]] { // 2
       // handle error
   }
} else { // 3
   // do stuff     
}

Here I expect:

  • 1 and 3 to be equally likely
  • 2 to be unlikely

in terms of what information gets passed to the optimizer. Given what we currently can support for branch weights, Perhaps the behavior closest to the user's expectation would be as if they wrote this:

if (foo) {
} else if (bar) [[likely]] {
} else [[unlikely]] {
}

(So we say nothing about the first if/else decision branch because the user wants them both to be treated as equally likely.)

I'll give this some more thought and will write some more documentation on what my current approach will do and discuss it from there. It should also take the iteration statements into account and how they handle their likelihood. I'll also have a look at behaviour of GCC and MSVC to see how they've implemented the feature.

clang/test/SemaCXX/attr-likelihood.cpp
102

What I understood from Arthur and rereading the standard this should work, but the initial version of the patch didn't handle this properly.

aaron.ballman added inline comments.Aug 17 2020, 6:34 AM
clang/include/clang/Basic/Attr.td
1292

What issues are you running into? 1 is the default value when you don't specify a value specifically.

clang/include/clang/Basic/AttrDocs.td
1698

Here I expect:
1 and 3 to be equally likely
2 to be unlikely

That matches my expectations.

I'll give this some more thought and will write some more documentation on what my current approach will do and discuss it from there. It should also take the iteration statements into account and how they handle their likelihood. I'll also have a look at behaviour of GCC and MSVC to see how they've implemented the feature.

Thank you! I think my design principle for a binary decision is that repeating the attribute on both branches is valid but a noop because it gives the compiler no additional information, and using opposite attribute on the branches is identical to only one branch being annotated because the information given to the compiler is logically consistent.

clang/test/SemaCXX/attr-likelihood.cpp
102

What I understood from Arthur and rereading the standard this should work, but the initial version of the patch didn't handle this properly.

My original belief was that it's acceptable for the attribute to be placed there in terms of syntax, but the recommended practice doesn't apply to this case because there's no alternative paths of execution once you've entered the compound statement. That means:

if (x) [[likely]] { i = 1; }
// is not the same as
if (x) { [[likely]] i = 1; }

That said, I can squint at the words to get your interpretation and your interpretation matches what's in the original paper (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0479r2.html#examples).

Ultimately, I think we should strive for consistency between implementations, and I think we should file an issue with WG21 to clarify the wording once we figure out how implementations want to interpret questions like these.

clang/test/SemaCXX/attr-likelihood.cpp
102

I don't know WG21's actual rationale, but I can imagine a programmer wanting to annotate something like this:

#define FATAL(msg) [[unlikely]] log_and_abort("Fatal error!", msg);
...
auto packet = receive();
if (packet.malformed()) {
    save_to_log(packet);
    FATAL("malformed packet");
}
...

The "absolute unlikeliness" of the malformed-packet codepath is encapsulated within the FATAL macro so that the programmer doesn't have to tediously repeat [[unlikely]] at some branch arbitrarily far above the FATAL point. Compilers actually do this already, every time they see a throw — every throw is implicitly "unlikely" and taints its whole codepath. We want to do something similar for [[unlikely]].

It's definitely going to be unsatisfyingly fuzzy logic, though, similar to how inlining heuristics and inline are today.

My understanding is that

if (x) { [[likely]] i = 1; }
if (x) [[likely]] { i = 1; }

have exactly the same surface reading: the same set of codepaths exist in both cases, and the same "x true, i gets 1" codepath is [[likely]] in both cases. Of course your heuristic might choose to treat them differently, just as you might choose to treat

struct S { int f() { ... } };
struct S { inline int f() { ... } };

differently for inlining purposes despite their identical surface readings.

aaron.ballman added inline comments.Aug 17 2020, 9:07 AM
clang/test/SemaCXX/attr-likelihood.cpp
102

Compilers actually do this already, every time they see a throw — every throw is implicitly "unlikely" and taints its whole codepath. We want to do something similar for [[unlikely]].

Heh, fun story, one of the use cases I was told was critical when discussing this feature in the committee came from the safety-critical space where they want to mark catch blocks as being likely in order to force the optimizer to do its best on the failure path, because that's the code path which had the hard requirements for bailing out quickly (so the elevator doesn't kill you, car doesn't crash, etc). They needed to be able to write:

try {
  ...
} catch (something& s) [[likely]] {
  ...
} catch (...) [[likely]] {
  ...
}

My understanding is that ... have exactly the same surface reading

I can read the standard to get to that understanding, but am skeptical that programmers will be able to predict what these attributes do anywhere they're written for anything but contrived examples with that interpretation. I think limiting the impact to only being at branch points is at least explicable behavior, otherwise programmers are left reasoning through code like:

auto what = []{ FATAL("malformed packet"); };
if (foo) {
  what();
} else {
  if (something) {
    what();
  }
}

(Does this make the foo branch unlikely? What impact is there on the else branch?) Now replace the lambda with a call to an inline function, or a GNU statement expression, an RAII constructor you can see the definition of, etc and ask the same questions.

My fear is that by going with the least restrictive design, programmers will eventually treat these attributes the same way they treat use of the register keyword -- something to be avoided because the best you can hope for is the implementation ignores it.

This isn't to say that I'm *opposed* to that approach, it's more that I'm skeptical of it unless we have some implementation agreement between major compilers about how to set user expectations appropriately.

Mordante marked 8 inline comments as done.Aug 23 2020, 11:08 AM
Mordante added inline comments.
clang/include/clang/Basic/Attr.td
1292

It give the error "clang/include/clang/Basic/Attr.td:265:7: error: Standard attributes must have valid version information."

clang/test/SemaCXX/attr-likelihood.cpp
102

Compilers actually do this already, every time they see a throw — every throw is implicitly "unlikely" and taints its whole codepath. We want to do something similar for [[unlikely]].

Heh, fun story, one of the use cases I was told was critical when discussing this feature in the committee came from the safety-critical space where they want to mark catch blocks as being likely in order to force the optimizer to do its best on the failure path, because that's the code path which had the hard requirements for bailing out quickly (so the elevator doesn't kill you, car doesn't crash, etc). They needed to be able to write:

try {
  ...
} catch (something& s) [[likely]] {
  ...
} catch (...) [[likely]] {
  ...
}

Interesting use case, not sure how feasible it would be to implement it. I don't know how much it would affect the implementation if the exception handling. I tested with GCC and ICC and this doesn't compile. Looking at the standard it shouldn't compile. catch http://eel.is/c++draft/except.pre requires a compound statement http://eel.is/c++draft/stmt.block. (Obviously the attribute can be placed in the compound statement.)

A use-case I found where placing the attribute inside the compound statement is when using goto. I think this might be especially interesting when using it in C code.

int foo()
{
  int result = 0;
  struct bar *ptr;
  
  ptr = malloc(sizeof(struct bar));
  if(!ptr) {
      result = -ENOMEM;
      goto err;
  }

  ...
  if(!f(ptr)) {
    result = -ENOMDEV;
    goto err_free;
  }

[[unlikely]] err_free:
  free(ptr);  
[[unlikely]] err:
   return result; 
}

This allows to mark the error paths as [[unlikely]] by only placing the attribute on the label and every goto will mark the branch unlikely.

102

Ultimately, I think we should strive for consistency between implementations, and I think we should file an issue with WG21 to clarify the wording once we figure out how implementations want to interpret questions like these.

I agree having consistency would be a good. I did some testing.
It seems GCC's implementation is quite mature and does indeed handle the attributes inside the compound statements.
It seems MSVC's implementation seems a WIP.
It seems ICC accepts the attribute but it doesn't seem to change its branch weights.

What would be the best way to discuss this question with the other vendors? I assume I first need to write a paper, but would it then be best to discuss this with the other vendors or directly send it to the committee?

Mordante added inline comments.Aug 23 2020, 11:08 AM
clang/include/clang/Basic/Attr.td
1292

I'll have a look at the C2x changes. Just curious do you know whether there's a proposal to add this to C2x?

There isn't one currently because there is no implementation experience with the attributes in C. This is a way to get that implementation experience so it's easier to propose the feature to WG14.

Nice to know. It seems the C2x wasn't at straightforward as I hoped, so I didn't implement it yet. I intend to look at it later. I first want the initial part done to see whether this is the right approach.

I had another look and I got it working in C.

Mordante updated this revision to Diff 287264.Aug 23 2020, 11:29 AM
Mordante marked 2 inline comments as done.

Addresses review comments:

  • Adds support for c2x using [[clang::likely]] and [[clang::unlikely]]
  • Remove warnings about conflicting likelihoods, except those required by the Standard
  • The attributes can now be placed inside the compound statement in of the ThenStmt and ElseStmt

The current implementation matches GCC's behaviour. This behaviour aligns best with my interpretation of the standard and GCC seems to have the most mature implementation. I would like to get the current implementation in Clang and proceed to work on the missing pieces. In parallel we can align with the other vendors and, if needed, adjust our implementation.

The current implementation matches GCC's behaviour. This behaviour aligns best with my interpretation of the standard and GCC seems to have the most mature implementation. I would like to get the current implementation in Clang and proceed to work on the missing pieces. In parallel we can align with the other vendors and, if needed, adjust our implementation.

I agree with this direction.

clang/include/clang/Basic/Attr.td
1292

If you leave the version number off entirely, it defaults to 1.

clang/include/clang/Basic/AttrDocs.td
1692

I think this paragraph is misleading because the attribute no longer impacts the statement, it impacts the *entire branch the statement is contained within*.

1697

You should clarify here that you CAN annotate multiple statements in the same block and what we do about it. e.g.,

if (foo) {
  [[likely]];
  [[unlikely]];
  bar();
}

Note, I doubt this will happen as obviously as this example. More likely what will happen is:

#define ERROR_UNLESS(x) [[unlikely]] assert(x)

if (foo) {
  [[likely]];
  ERROR_UNLESS(baz == 12);
  bar();
}
1700

Spell out PGO and "or optimization" -> "or at optimization"

1727

Something else we should document is what our behavior is when the attribute is not immediately inside of an if or else statement. e.g.,

void func() { // Does not behave as though specified with [[gnu::hot]]
  [[likely]];
}

void func() {
  if (x) {
    {
      [[likely]]; // Does this make the if branch likely?
      SomeRAIIObj Obj;
    }
  } else {
  }
}

void func() {
  if (x) {
    int y = ({[[likely]]; /* Does this make the if branch likely? */ 1;});
  } else {
  }
}
clang/lib/CodeGen/CGStmt.cpp
656

//! -> //

691

Is early return the correct logic? Won't that give surprising results in the case the programmer has both attributes in the same compound statement? I think we need to look over all the statements in the current block, increment a counter when we hit [[likely]], decrement the counter when we hit [[unlikely]] and return whether the counter is 0, negative (unlikely), or positive (likely).

721

Perhaps not for this patch, but I wonder if we'd be doing users a good deed by alerting them when PGO weights do not match the attribute specified. e.g., if an attribute says "this branch is likely" and PGO shows it's unlikely, that seems like something the programmer may wish to know. WDYT?

723

I do not think conflicts should result in the first match being used (unless we're diagnosing the situation, at which point some of this code should be lifted into Sema and set a bit on an AST node that can be read during codegen time), so this comment may need to be updated.

733

LH = LH?

741

You should return None here.

clang/lib/Sema/SemaStmtAttr.cpp
345

I believe you can drop the calls to getSpelling() and just pass in the Attr* directly. The diagnostic engine takes care of printing the attribute properly.

clang/test/SemaCXX/attr-likelihood.cpp
102

What would be the best way to discuss this question with the other vendors? I assume I first need to write a paper, but would it then be best to discuss this with the other vendors or directly send it to the committee?

I would discuss directly with vendors first so that vendors can talk about what situations they are able to support and how, and then a paper could come second which details any suggested changes or open questions.

That said, for this particular patch, let's go with allowing it on arbitrary statements as GCC is doing. We can always revisit the decision later if we find an intractable problem.

Mordante marked 4 inline comments as done.Aug 26 2020, 11:18 AM
Mordante added inline comments.
clang/include/clang/Basic/AttrDocs.td
1727

Something else we should document is what our behavior is when the attribute is not immediately inside of an if or else statement. e.g.,

void func() { // Does not behave as though specified with [[gnu::hot]]
  [[likely]];
}

No a few days ago I wondered whether it makes sense, but the [[gnu::hot]] should be at the declaration of the function and not in the body.
So I think this doesn't make sense and the [[likely]] here will be ignored.

void func() {

if (x) {
  {
    [[likely]]; // Does this make the if branch likely?
    SomeRAIIObj Obj;
  }
} else {
}

}

Yes this should work, the attribute will recursively visit compound statements.

void func() {

if (x) {
  int y = ({[[likely]]; /* Does this make the if branch likely? */ 1;});
} else {
}

}

Not in this patch. I'm working on more improvements to make switch statements working. I tested it with my current WIP and there it does work.
So in the future this will work.

clang/lib/CodeGen/CGStmt.cpp
691

Yes here it accepts the first likelihood it finds and accepts that as the wanted likelihood. I also start to doubt whether that's wanted. In my current switch WIP I issue an diagnostic if there's a conflict between the likelihood diagnostics and ignore them. I feel that's a better approach. This diagnostic is added to the -Wignored-attributes group, which is shown by default.

I don't like the idea of using a counter. If the attribute is "hidden" in a validation macro, adding it to a branch might suddenly change the likelihood of that branch.

721

I already investigated before and there's a diagnostic warn_profile_data_misexpect when using __builtin_expect so I expect I can reuse existing code. So I want to have a look at it, but I first want to get the more important parts of the likelihood attributes working properly.

723

Agreed. In my current switch WIP I already started experimenting with moving the likelihood selection to Sema and store the likelihood in the AST bits.

aaron.ballman added inline comments.Aug 27 2020, 5:52 AM
clang/include/clang/Basic/AttrDocs.td
1727

So I think this doesn't make sense and the [[likely]] here will be ignored.

Great, that matches the behavior I was hoping for.

Yes this should work, the attribute will recursively visit compound statements.

Great!

Not in this patch. I'm working on more improvements to make switch statements working.

I'm not certain what that example had to do with switch statement, but just as an FYI, I'd like this to work initially (assuming it's not overly hard) only because GNU statement expressions show up rather frequently in macros as an alternative form of the do ... while (0); pattern, which can have likely/unlikely path sensitivity.

clang/lib/CodeGen/CGStmt.cpp
691

I don't like the idea of using a counter. If the attribute is "hidden" in a validation macro, adding it to a branch might suddenly change the likelihood of that branch.

That's definitely true and is also a concern of mine. I don't know what the correct answer is here by going on a per-statement basis. None of the solutions are satisfying and all of them leave this feature with surprising behavior.

721

So I want to have a look at it, but I first want to get the more important parts of the likelihood attributes working properly.

SGTM

Mordante planned changes to this revision.Aug 27 2020, 9:43 AM
Mordante marked 15 inline comments as done.

I fixed the issues locally. Before I resubmit them I'd rather discuss in D86559 the direction we want to take, so we can avoid an additional review cycle.

clang/include/clang/Basic/Attr.td
1292

Yes and that gives the same error. So the default value doesn't "compile". I assume that's intentional to avoid setting a date of 1 for standard attributes. So we could keep it at 2 or switch back to 201803. Which value do you prefer?

clang/include/clang/Basic/AttrDocs.td
1692

I'll update the documentation including the other remarks added.

1727

It doesn't have anything to do with the switch by itself. However while working on the switch I needed a better way to follow the path of executed statements to handle break and return in a switch. As a side-effect it also handle the last example. So it will be possible to handle this case in a future patch. (However I think we should first discuss the approach to take in D86559.)

clang/lib/CodeGen/CGStmt.cpp
733

Has an extra LH =, thanks for catching it.

aaron.ballman added inline comments.Aug 27 2020, 11:22 AM
clang/include/clang/Basic/Attr.td
1292

Ah, yeah, you're right (sorry for the noise). I think 2 is fine for now unless you find that the new patch actually hits enough of the important cases from the standard to justify 201803. We can figure that out with the updated patch series.

clang/include/clang/Basic/AttrDocs.td
1727

Ah, thank you for the explanation.

Mordante marked 7 inline comments as done.Aug 29 2020, 8:08 AM
Mordante added inline comments.
clang/include/clang/Basic/Attr.td
1292

For now let's keep it at 2, this patch won't implement the switch. Once the switch works the iteration statements still need to be done, but I think they aren't as important as the selection statements.

Mordante updated this revision to Diff 288781.Aug 29 2020, 9:05 AM
Mordante marked an inline comment as done.

Reworked the patch to match the behaviour as discussed in D86559.

  • The likelihood attributes only have an effect when used directly on the ThenStmt and ElseStmt.
  • Conflicting attributes on the ThenStmt and ElseStmt generate a diagnostic.
  • Moved the likelihood determination from the CodeGen to the Sema. This requires the state to be stored in the AST.
  • Updated the AST functions for the new likelihood bits.
  • Updated the documentation.

Note currently there's no diagnostic for ignored attributes, this will be added in a future patch.

aaron.ballman added inline comments.Sep 1 2020, 11:06 AM
clang/include/clang/Basic/AttrDocs.td
1693

How about: This is done by marking the branch substatement with one of the two attributes.

1697

This sentence and the next one seem to say opposite things. I think this should read:

Annotating the 'true' and 'false' branch of an 'if' statement with the same likelihood attribute will result in a diagnostic and the attributes are ignored on both branches.

1705

How about: In Clang, the attributes will be ignored if they're not placed on the substatement of a selection statement.

1726

The formatting on this comment looks off (maybe different indentation, or tabs vs spaces).

1736

if or else statement.

clang/lib/AST/JSONNodeDumper.cpp
1452

I don't think this change should be necessary because the attribute should be written out for the AST node already. It's worth verifying though.

clang/lib/AST/TextNodeDumper.cpp
927

Similar here.

clang/lib/Sema/SemaStmt.cpp
618

returs -> returns

Mordante marked 8 inline comments as done.Sep 1 2020, 12:05 PM

Thanks again for your feedback. Once we agree on what to do with the Likelihood state in the output I'll finish the changes and submit a new patch.

clang/include/clang/Basic/AttrDocs.td
1697

I think the current wording is correct, but I like your wording better. So I'll use your wording.

1705

Your wording is correct for an if statement, but it will be different in a switch statement. I'll use your wording but replace selection statement with if or else statement. I like the substatement instead of first statement.

1726

Good catch, I thought I disabled using tabs.

clang/lib/AST/JSONNodeDumper.cpp
1452

They are in the AST. I added them to make it clear what the status is. I marked a case with a conflict, due to the same attribute in both branches. When the attribute is there but not on the substatement it's ignored. I'm not sure whether this is too important in the JSON output. I don't know who consumes it. So I don't feel to strongly about removing this from the JSON output.

What's your opinion?

clang/lib/AST/TextNodeDumper.cpp
927

I'm against removing it from the text output. This is be best way to determine how the attributes are parsed and processed. Especially when an attribute is ignored the likelihood may be different form what's expected. The Node->hasInitStorage()'s status is also visible in the AST and written out, so there are more examples of "duplicated" information.

Do you agree to keep it here?

clang/test/AST/ast-dump-if-json.cpp
1333

Here is the likelihood status in the If statement.

1429

The attribute in the ThenStmt.

1483

The attribute in the ElseStmt.

rsmith added a comment.Sep 1 2020, 1:37 PM

Looking specifically for attributes in the 'then' and 'else' cases of an if seems like a fine first pass at this, but I think this is the wrong way to lower these attributes in the longer term: we should have a uniform treatment of them that looks for the most recent prior branch and annotates it with weights. We could do that either by generating LLVM intrinsic calls that an LLVM pass lowers, or by having the frontend look backwards for the most recent branch and annotate it. I suppose it's instructive to consider a case such as:

void mainloop() noexcept; // probably terminates by calling `exit`
void f() {
  mainloop();
  [[unlikely]];
  somethingelse();
}

... where the [[unlikely]]; should probably cause us to split the somethingelse() out into a separate basic block that we mark as cold, but should not cause f() itself or its call to mainloop() to be considered unlikely -- except in the case where mainloop() can be proven to always terminate, in which case the implication is that it's unlikely that f() is invoked. Cases like this probably need the LLVM intrinsic approach to model them properly.

clang/include/clang/AST/Stmt.h
175–178

Do we need to store this here? The "then" and "else" cases should be an AttributedStmt holding the likelihood attribute, so duplicating that info here seems redundant.

clang/lib/Sema/SemaStmt.cpp
578

Sema doesn't care about any of this; can you move this code to CodeGen and remove the code that stores likelihood data in the AST?

aaron.ballman added inline comments.Sep 1 2020, 1:50 PM
clang/lib/Sema/SemaStmt.cpp
578

FWIW, I asked for it to be moved out of CodeGen and into Sema because the original implementation in CodeGen was doing a fair amount of work trying to interrogate the AST for this information. Now that we've switched the design to only be on the substatement of an if/else statement (rather than an arbitrary statement), it may be that CodeGen is a better place for this again (and if so, I'm sorry for the churn).

Mordante marked 8 inline comments as done.Sep 2 2020, 10:12 AM

Looking specifically for attributes in the 'then' and 'else' cases of an if seems like a fine first pass at this, but I think this is the wrong way to lower these attributes in the longer term: we should have a uniform treatment of them that looks for the most recent prior branch and annotates it with weights. We could do that either by generating LLVM intrinsic calls that an LLVM pass lowers, or by having the frontend look backwards for the most recent branch and annotate it. I suppose it's instructive to consider a case such as:

void mainloop() noexcept; // probably terminates by calling `exit`
void f() {
  mainloop();
  [[unlikely]];
  somethingelse();
}

... where the [[unlikely]]; should probably cause us to split the somethingelse() out into a separate basic block that we mark as cold, but should not cause f() itself or its call to mainloop() to be considered unlikely -- except in the case where mainloop() can be proven to always terminate, in which case the implication is that it's unlikely that f() is invoked. Cases like this probably need the LLVM intrinsic approach to model them properly.

We indeed considered similar cases and wondered whether it was really intended to work this way. Since this approach seems a bit hard to explain to users, I changed the code back to only allow the attribute on the substatement of the if and else. For now I first want to implement the simple approach, which I expect will be the most common use case. Once finished we can see what the next steps will be.

clang/include/clang/AST/Stmt.h
175–178

Agreed. I'll remove it again.

clang/lib/Sema/SemaStmt.cpp
578

At the moment the Sema cares about it to generate diagnostics about conflicting annotations:

if(i) [[ likely]] {}
else [[likely]] {}

This diagnostic only happens for an if statement, for a switch multiple values can be considered likely.
Do you prefer to have the diagnostic also in the CodeGen?

rsmith added inline comments.Sep 2 2020, 12:28 PM
clang/include/clang/AST/Stmt.h
1105–1111

I'd suggest you order these in increasing order of likeliness: unlikely, none, likely. Then you can determine what branch weights to use by a three-way comparison of the likeliness of the then branch and the likeliness of the else branch.

clang/lib/Sema/SemaStmt.cpp
578

It looked to me like you'd reached agreement to remove that diagnostic. Are you intending to keep it?

If so, then I'd suggest you make getLikelihood a member of Stmt so that CodeGen and the warning here can both easily call it.

Mordante marked 2 inline comments as done.Sep 2 2020, 1:02 PM
Mordante added inline comments.
clang/lib/Sema/SemaStmt.cpp
578

@aaron.ballman I thought we wanted to keep this conflict warning, am I correct?
I'll add an extra function to the Stmt to test for a conflict and use that for the diagnostic in the Sema. This allows me to use LH_None for no attribute or a conflict of attributes in the CodeGen. Then there's no need for LH_Conflict and it can be removed from the enumerate.

aaron.ballman added inline comments.Sep 3 2020, 5:51 AM
clang/lib/Sema/SemaStmt.cpp
578

@aaron.ballman I thought we wanted to keep this conflict warning, am I correct?

I want to keep the property that any time the user writes an explicit annotation in their code but we decide to ignore the annotation (for whatever reason), the user gets some sort of diagnostic telling them their expectations may not be met. Because we're ignoring the attributes in the case where they're identical on both branches, I'd like to keep some form of diagnostic.

Looking specifically for attributes in the 'then' and 'else' cases of an if seems like a fine first pass at this, but I think this is the wrong way to lower these attributes in the longer term: we should have a uniform treatment of them that looks for the most recent prior branch and annotates it with weights. We could do that either by generating LLVM intrinsic calls that an LLVM pass lowers, or by having the frontend look backwards for the most recent branch and annotate it. I suppose it's instructive to consider a case such as:

void mainloop() noexcept; // probably terminates by calling `exit`
void f() {
  mainloop();
  [[unlikely]];
  somethingelse();
}

... where the [[unlikely]]; should probably cause us to split the somethingelse() out into a separate basic block that we mark as cold, but should not cause f() itself or its call to mainloop() to be considered unlikely -- except in the case where mainloop() can be proven to always terminate, in which case the implication is that it's unlikely that f() is invoked. Cases like this probably need the LLVM intrinsic approach to model them properly.

We indeed considered similar cases and wondered whether it was really intended to work this way. Since this approach seems a bit hard to explain to users, I changed the code back to only allow the attribute on the substatement of the if and else. For now I first want to implement the simple approach, which I expect will be the most common use case. Once finished we can see what the next steps will be.

+1 to the cautious approach. While I can understand how to implement what Richard is suggesting, I'm not convinced it results in readable code or that we're missing important optimization cases by using the more restrictive approach, so I'd like to get some field experience with users first.

Mordante marked an inline comment as done.Sep 4 2020, 9:08 AM
Mordante added inline comments.
clang/lib/Sema/SemaStmt.cpp
578

Good then we're on the same page.

Mordante updated this revision to Diff 289979.Sep 4 2020, 9:10 AM
Mordante marked an inline comment as done.

Addresses review comments, mainly:

  • Improving the documentation
  • Removing the AST bits since they're no longer required
aaron.ballman accepted this revision.Sep 9 2020, 10:00 AM

LGTM aside from a commenting nit. Thank you for the work on this!

clang/include/clang/AST/Stmt.h
1184

This only returns one value, so it can't return the likelihood of "the branches". How about something along the lines of:

returns the likelihood of the 'then' branch of an 'if' statement. The 'else' branch is required to determine whether both branches specify the same likelihood, which impacts the result.

This revision is now accepted and ready to land.Sep 9 2020, 10:00 AM
Mordante marked an inline comment as done.Sep 9 2020, 11:08 AM

Thanks for your feedback during the review.

clang/include/clang/AST/Stmt.h
1184

Good catch, I'll use your suggestion.

clang/include/clang/AST/Stmt.h
1184

s/impacts/affects/ ;)

Mordante marked 2 inline comments as done.Sep 9 2020, 11:48 AM
Mordante added inline comments.
clang/include/clang/AST/Stmt.h
1184

Just in time before I commit this patch :-)

This revision was automatically updated to reflect the committed changes.
Mordante marked an inline comment as done.