This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
-
CGCall.cpp
-
test/
-
CodeGenCXX/
-
debug-info-class.cpp
-
CodeGenObjCXX/
-
arc-list-init-destruct.mm
1/1
os_log.mm
-
OpenMP/
-
atomic_codegen.cpp
-
critical_codegen.cpp
-
distribute_parallel_for_num_threads_codegen.cpp
-
distribute_parallel_for_simd_num_threads_codegen.cpp
-
for_codegen.cpp
-
for_simd_codegen.cpp
-
master_codegen.cpp
-
parallel_for_codegen.cpp
-
parallel_for_simd_codegen.cpp
-
parallel_master_codegen.cpp
-
parallel_num_threads_codegen.cpp
-
parallel_sections_codegen.cpp
-
sections_codegen.cpp
-
simd_codegen.cpp
-
single_codegen.cpp
-
taskgroup_codegen.cpp

Differential D83906

[CodeGen] Emit a call instruction instead of an invoke if the called llvm function is marked nounwind
ClosedPublic

Authored by ahatanak on Jul 15 2020, 2:11 PM.

Download Raw Diff

Details

Reviewers

vsk
rjmccall
ABataev

Commits

rGed6b578040a8: [CodeGen] Emit a call instruction instead of an invoke if the called llvm…

Summary

This fixes cases where an invoke is emitted, despite the called llvm function being marked nounwind, because CodeGenModule::ConstructAttributeList failed to add the attribute to the attribute list. llvm optimization passes turn invokes into calls and optimize away the exception handling code, but it's better to avoid emitting the code in the front-end if the called function is known not to raise an exception.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ahatanak created this revision.Jul 15 2020, 2:11 PM

Herald added subscribers: ributzka, dexonsmith, jkorous. · View Herald TranscriptJul 15 2020, 2:11 PM

In case it wasn't clear, the calls to mayThrow() in the test cases are needed to prevent TryMarkNoThrow from annotating the functions with nounwind, which would cause a lot of churn.

Looks good to me.

clang/test/CodeGenObjCXX/os_log.mm
15–16	This comment can simply read "Check that the os_log_helper is marked `nounwind`."

This revision is now accepted and ready to land.Jul 15 2020, 2:33 PM

Update comment in test case.

Closed by commit rGed6b578040a8: [CodeGen] Emit a call instruction instead of an invoke if the called llvm… (authored by ahatanak). · Explain WhyJul 15 2020, 2:48 PM

This revision was automatically updated to reflect the committed changes.

Harbormaster failed remote builds in B64420: Diff 278298!Jul 15 2020, 2:54 PM

Harbormaster failed remote builds in B64428: Diff 278316!Jul 15 2020, 3:20 PM

wenlei added a subscriber: wenlei.Mar 6 2023, 2:22 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2023, 2:22 PM

Hi @ahatanak

We recently hit an issue of inconsistent codegen related with this optimization. In one build, Clang frontend generates different llvm IRs for the same function that is originally from one header file. It turned out this optimization gives different results for different function definition order which is naturally unstable.

See this two repro programs:

p1.cpp: https://godbolt.org/z/bavTYEG1x

void foo() {};
void bar() noexcept {foo();};

 p2.cpp: https://godbolt.org/z/zfsnzPrE6

void foo();
void bar() noexcept {foo();};
void foo(){};

See the codegens of bar are different, for p2.cpp, the callee(foo)’s definition is after the caller(bar), it's unknown to be marked nounwind before it see foo's definition, so it still generates the invoke things.

This inconsistency affected the AutoFDO, one of our work assigns consecutive number IDs to the BBs of CFG, the unstable CFGs causes the BB ID mismatched and a lot of samples are lost.

Would like to hear from your feedback. Wondering if FE can handle this perfectly or perhaps we can just leave it for BE. Thank you in advance!

cc @hoy @modimo @wenlei

akyrtzi added a subscriber: akyrtzi.Mar 8 2023, 4:27 PM

In D83906#4179512, @wlei wrote:
Hi @ahatanak

We recently hit an issue of inconsistent codegen related with this optimization. In one build, Clang frontend generates different llvm IRs for the same function that is originally from one header file. It turned out this optimization gives different results for different function definition order which is naturally unstable.

See this two repro programs:

p1.cpp: https://godbolt.org/z/bavTYEG1x
void foo() {};
void bar() noexcept {foo();};
 p2.cpp: https://godbolt.org/z/zfsnzPrE6
void foo();
void bar() noexcept {foo();};
void foo(){};
See the codegens of bar are different, for p2.cpp, the callee(foo)’s definition is after the caller(bar), it's unknown to be marked nounwind before it see foo's definition, so it still generates the invoke things.

This inconsistency affected the AutoFDO, one of our work assigns consecutive number IDs to the BBs of CFG, the unstable CFGs causes the BB ID mismatched and a lot of samples are lost.

Would like to hear from your feedback. Wondering if FE can handle this perfectly or perhaps we can just leave it for BE. Thank you in advance!

cc @hoy @modimo @wenlei

To be clear, there's no miscompile, correct?

(Also, can the backend safely optimize an invoke to a linkonce_odr function that's nounwind to a call? I thought it couldn't, in case the function is de-refined to a version that's not nounwind. But the frontend can do it since it has access to the source and knows it can't be de-refined in that way?)

In any case, let's say the backend can do this optimization.

I wonder if this is just a single example, where there could be various other (header-related) peepholes that cause similar problems for stable output. IIRC, the usual Clang approach is to make as-close-to-optimal IR up front, but maybe in some situations it's desirable to delay optimizations to improve stability. Another application where that could be useful is caching.

Maybe the high level principle deserves a broader discussion on the forums. Do we want IRGen to prefer stable IR, or optimized IR? Should there be a -cc1 flag to decide (which AutoFDO could set)?

@rjmccall, any thoughts?

(Also, can the backend safely optimize an invoke to a linkonce_odr function that's nounwind to a call? I thought it couldn't, in case the function is de-refined to a version that's not nounwind. But the frontend can do it since it has access to the source and knows it can't be de-refined in that way?)

Can you please elaborate what de-refining does? The backend does have the ability to optimize a nounwind invoke and its landing pad into a single call instruction.

In any case, let's say the backend can do this optimization.

I wonder if this is just a single example, where there could be various other (header-related) peepholes that cause similar problems for stable output. IIRC, the usual Clang approach is to make as-close-to-optimal IR up front, but maybe in some situations it's desirable to delay optimizations to improve stability. Another application where that could be useful is caching.

Maybe the high level principle deserves a broader discussion on the forums. Do we want IRGen to prefer stable IR, or optimized IR? Should there be a -cc1 flag to decide (which AutoFDO could set)?

A flag to allow for a stable IR generation would be nice, but I guess in general we do not want to lose the opportunity that are only available to the front end just to favor AutoFDO. The current case sounds to me a very specific case that the backend can also get, and so far it's the only case we have see affecting the IR stability so I'm inclined to just deferring it to the backend. WDYT?

I wonder if this is just a single example, where there could be various other (header-related) peepholes that cause similar problems for stable output. IIRC, the usual Clang approach is to make as-close-to-optimal IR up front, but maybe in some situations it's desirable to delay optimizations to improve stability. Another application where that could be useful is caching.

I think this nounwind propagation a classic IPA problem, where you need proper per-function summary first and then propagate that through call graph to get final per-function attribute (like Attributor). Frontend is not the right place to do this kind of IPA/IPO.

Do we want IRGen to prefer stable IR, or optimized IR? Should there be a -cc1 flag to decide (which AutoFDO could set)?

Unstable IR is a side of trying to do IPA in frontend which is naturally going to be half-complete.

I'm not sure if I follow why invoke -> call optimization can not be done in mid-end. If possible, I think this should be deferred to mid-end.

Oh, de-refining is pretty nifty / evil. This patch has background:
https://reviews.llvm.org/D18634

Since 2016, the optimizer is not allowed to do IPA on functions that can be de-refined (such as linkonce_odr functions).

Here's a hypothetical problematic scenario for the optimizer:

original IR for B has a throw somewhere
function A invokes function B
in this TU, B is optimized and removes exceptions, and gets marked nounwind
function A leverages the nounwind to turn the invoke into a call
function B is de-refined at link/load time: linker chooses a *different* function B which still has a throw
"evil magic" happens (see the discussions around the patch linked above)
a crash is introduced

At first blush, it sounds like this could only be a problem if the code has UB in it. However, read the above patch (and follow-ups, and related discussion) for a variety of examples of non-UB cases where IPA on de-refineable functions introduces crashes. I don't know for sure this could be a problem for nounwind specifically, but in general the LLVM optimizer doesn't look at attributes of de-refineable functions.

(Note that if you're doing LTO (like AutoFDO), this usually won't block optimization, since at LTO time there are very few de-refineable functions (most linkonce_odr functions are internalized, and not exported as weak). So if we added a -cc1 flag to prefer "stable IR" over "frontend peepholes", it would make sense for -flto to imply it.)

On the other hand, the frontend knows the token sequence from the source language. It knows whether function B is inherently nounwind based on its ODR token sequence; in which case, it's safe to use the attribute for an IPA peephole.

BTW, I'm not personally against removing this peephole from the frontend (even without a flag), or limiting it somehow to cases where it doesn't make IR output unstable. I like the idea of stable IRGen output.

Nevertheless, it feels like removing IPA-based peepholes from Clang in the name of stable IRGen output is a change in direction, which might deserve a discussion in the forums rather than in a particular patch review.

clang marks the called function foo in p1.cpp as nounwind here: https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CodeGenFunction.cpp#L1284

clang can also mark a function declaration as nounwind based on the information in the source code, for example, when it is annotated with __attribute__((pure)).

I haven't read everything discussed in https://reviews.llvm.org/D18634 yet, but it seems like it's safe to do this optimization when the called function is linkonce_odr. If clang or llvm's optimization determines one version of the function doesn't throw, then other versions of the same function can't throw either.

But it looks like clang doesn't do the right thing when the foo is weak. clang emits a call instead of an invoke when it compiles the following code:

int foo() __attribute__((weak, pure));
int bar() noexcept { return foo();};

There are *some* properties we can still assume about linkonce_odr functions despite them being replaceable at link time. The high-level language guarantee we're starting from is that the source semantics of all versions of the function are identical. The version of the function we're looking at has been transformed from the original source — it is, after all, now LLVM IR, not C/C++ — but it has presumably faithfully preserved the source semantics. We can therefore rely on any properties of the semantics that are required to be preserved by transformation, which includes things like "does it terminate", "what value does it return", "what side effects does it perform", and so on. What we can't rely on are properties of the implementation that are not required to be preserved by transformation, like whether or not it uses a certain argument — transformations are permitted to change that.

The output-stability argument is an interesting one. The critical thing here is to avoid instability on the same source. When the source is different, I mean, it'd be nice to make a best effort at stability, but even putting optimization aside, things like header processing order or template instantiation order are necessarily going to affect things like order in the functions lists. That's going to affect output, at the very least in terms of object file order, but also in that we can't realistically promise that function processing order in the optimization will *never* have any impact. Our interprocedural passes generally try to work in call-dependency order, but that's not a perfect tree, and function order inevitably comes into it.

With all that said, I don't feel strongly that we need to preserve this frontend optimization if it's causing real problems.

In D83906#4180435, @dexonsmith wrote:

Oh, de-refining is pretty nifty / evil. This patch has background:
https://reviews.llvm.org/D18634

Since 2016, the optimizer is not allowed to do IPA on functions that can be de-refined (such as linkonce_odr functions).

Here's a hypothetical problematic scenario for the optimizer:

original IR for B has a throw somewhere

function A invokes function B

in this TU, B is optimized and removes exceptions, and gets marked nounwind

function A leverages the nounwind to turn the invoke into a call

function B is de-refined at link/load time: linker chooses a *different* function B which still has a throw

"evil magic" happens (see the discussions around the patch linked above)

a crash is introduced

At first blush, it sounds like this could only be a problem if the code has UB in it. However, read the above patch (and follow-ups, and related discussion) for a variety of examples of non-UB cases where IPA on de-refineable functions introduces crashes. I don't know for sure this could be a problem for nounwind specifically, but in general the LLVM optimizer doesn't look at attributes of de-refineable functions.

(Note that if you're doing LTO (like AutoFDO), this usually won't block optimization, since at LTO time there are very few de-refineable functions (most linkonce_odr functions are internalized, and not exported as weak). So if we added a -cc1 flag to prefer "stable IR" over "frontend peepholes", it would make sense for -flto to imply it.)

On the other hand, the frontend knows the token sequence from the source language. It knows whether function B is inherently nounwind based on its ODR token sequence; in which case, it's safe to use the attribute for an IPA peephole.

BTW, I'm not personally against removing this peephole from the frontend (even without a flag), or limiting it somehow to cases where it doesn't make IR output unstable. I like the idea of stable IRGen output.

Nevertheless, it feels like removing IPA-based peepholes from Clang in the name of stable IRGen output is a change in direction, which might deserve a discussion in the forums rather than in a particular patch review.

In D83906#4181876, @rjmccall wrote:

There are *some* properties we can still assume about linkonce_odr functions despite them being replaceable at link time. The high-level language guarantee we're starting from is that the source semantics of all versions of the function are identical. The version of the function we're looking at has been transformed from the original source — it is, after all, now LLVM IR, not C/C++ — but it has presumably faithfully preserved the source semantics. We can therefore rely on any properties of the semantics that are required to be preserved by transformation, which includes things like "does it terminate", "what value does it return", "what side effects does it perform", and so on. What we can't rely on are properties of the implementation that are not required to be preserved by transformation, like whether or not it uses a certain argument — transformations are permitted to change that.

The output-stability argument is an interesting one. The critical thing here is to avoid instability on the same source. When the source is different, I mean, it'd be nice to make a best effort at stability, but even putting optimization aside, things like header processing order or template instantiation order are necessarily going to affect things like order in the functions lists. That's going to affect output, at the very least in terms of object file order, but also in that we can't realistically promise that function processing order in the optimization will *never* have any impact. Our interprocedural passes generally try to work in call-dependency order, but that's not a perfect tree, and function order inevitably comes into it.

With all that said, I don't feel strongly that we need to preserve this frontend optimization if it's causing real problems.

Thanks for the detailed explanation about de-refining!

I feel a bit confused about linkonce_odr. From the LLVM IR reference I see the definition of

linkonce_odr, weak_odr
Some languages allow differing globals to be merged, such as two functions with different semantics. Other languages, such as C++, ensure that only equivalent globals are ever merged (the “one definition rule” — “ODR”). Such languages can use the linkonce_odr and weak_odr linkage types to indicate that the global will only be merged with equivalent globals. These linkage types are otherwise the same as their non-odr versions.

It sounds to me that at link time only equivalent symbols can replace each other. Then de-refining some of those equivalent symbols should not affect their semantics as far as nothrow is concerned? Just as @rjmccall pointed out, the C++ language guarantee we're starting from is that the source semantics of all versions of the function are identical.

That said, the LLVM optimizer does not strictly subsume the front-end because of how it fails to handle linkonce_odr functions as in https://reviews.llvm.org/D18634. I'm wondering how common the linkonce_odr linkage is for C++. In @wlei's example, none of the functions there is linkonce_odr. Is there a particular source-level annotate that specifies functions to be linkonce_odr?

Discussing a path to stable IR gen in general in the forum would be great. In the meantime I'm appealing to remove this specific peephole to unblock AutoFDO, if nobody objects.

In D83906#4181981, @hoy wrote:

That said, the LLVM optimizer does not strictly subsume the front-end because of how it fails to handle linkonce_odr functions as in https://reviews.llvm.org/D18634. I'm wondering how common the linkonce_odr linkage is for C++. In @wlei's example, none of the functions there is linkonce_odr. Is there a particular source-level annotate that specifies functions to be linkonce_odr?

In C++, you get linkonce_odr all over the place. It's basically all functions that are defined in C++ headers that are available for inlining.

any function marked inline
any function in a class/struct whose declaration is its definition (approximately all templated code)

A few exceptions:

If a function is explicitly instantiated (e.g., member functions of T<int> if template class T<int>;), it gets weak_odr, which IIRC cannot be de-refined?
If a function has local linkage (like free functions with static inline), it gets internal, which cannot be de-refined.
If a function is marked inline inside an extern "C" block, it gets available_externally. This can also be de-refined (but without ODR, you wouldn't be tempted to optimize based on its attributes anyway).

It sounds to me that at link time only equivalent symbols can replace each other. Then de-refining some of those equivalent symbols should not affect their semantics as far as nothrow is concerned? Just as @rjmccall pointed out, the C++ language guarantee we're starting from is that the source semantics of all versions of the function are identical.

The rule is subtly different. Only symbols that are source-equivalent can replace each other. But they aren't necessarily equivalent to the function you see, which may have been refined by optimization.

Here's a concrete example. Say we have function maybe_nounwind that is not nounwind at the source level, and a catch_all function that wraps it.

// Defined in header.
extern std::atomic<int> Global;

// LLVM: linkonce_odr
inline int maybe_nounwind(int In) {
  int Read1 = Global;
  int Read2 = Global;
  if (Read1 != Read2)
    throw 0;

  return /* Big, non-inlineable computation on In */;
}

// Defined in source.
// LLVM: nounwind
int catch_all(int In) {
  try {
    return maybe_nounwind(In);
  } catch (...) {
    return -1;
  }
}

There's no UB here, since comparing two atomic loads is allowed. In rare cases, an unoptimized maybe_nounwind could throw, if another thread is changing the value of Global between the two loads.

But the optimizer will probably CSE the two atomic loads since it's allowed to assume that both loads happen at the same time. This refines maybe_nounwind. It'll turn into IR equivalent to:

// Defined in header. Then optimized.
// LLVM: linkonce_odr nounwind readnone
inline int maybe_nounwind(int In) {
  return /* Big, non-inlineable computation on In */;
}

// Defined in source. Then optimized.
// LLVM: nounwind
int catch_all(int In) {
  try {
    return maybe_nounwind(In);
  } catch (...) {
    return -1;
  }
}

It's important that catch_all is NOT optimized based on maybe_nounwind's new nounwind attribute. At link time, it's possible for the linker to choose an unoptimized copy of maybe_nounwind. Just in case it does, catch_all needs to keep its try/catch block, since unoptimized maybe_nounwind can throw. Similarly, catch_all should not be marked readnone, even though the refined/optimized maybe_nounwind is readnone, since a de-refined copy reads from memory.

In D83906#4181876, @rjmccall wrote:

There are *some* properties we can still assume about linkonce_odr functions despite them being replaceable at link time. The high-level language guarantee we're starting from is that the source semantics of all versions of the function are identical. The version of the function we're looking at has been transformed from the original source — it is, after all, now LLVM IR, not C/C++ — but it has presumably faithfully preserved the source semantics. We can therefore rely on any properties of the semantics that are required to be preserved by transformation, which includes things like "does it terminate", "what value does it return", "what side effects does it perform", and so on. What we can't rely on are properties of the implementation that are not required to be preserved by transformation, like whether or not it uses a certain argument — transformations are permitted to change that.

At IRGen time, you know the LLVM attributes have not been adjusted after the optimized refined the function's behaviour. It should be safe to have IPA peepholes, as long as IRGen's other peepholes don't refine behaviour and add attributes based on that.
In the optimizer, if you're looking at de-refineable function then you don't know which attributes come directly from the source and which were implied by optimizer refinements. You can't trust you'll get the same function attributes at runtime.

In D83906#4182287, @dexonsmith wrote:

In C++, you get linkonce_odr all over the place. It's basically all functions that are defined in C++ headers that are available for inlining.

On the other hand, the frontend knows the token sequence from the source language. It knows whether function B is inherently nounwind based on its ODR token sequence; in which case, it's safe to use the attribute for an IPA peephole.

Thanks for the detailed explanation again! As you pointed out previously, linkonce_odr is something the front end can optimize. I'm wondering why the front end is confident about that the linker would not replace the current definition with something else.

In D83906#4182428, @hoy wrote:

In D83906#4182287, @dexonsmith wrote:

In C++, you get linkonce_odr all over the place. It's basically all functions that are defined in C++ headers that are available for inlining.

On the other hand, the frontend knows the token sequence from the source language. It knows whether function B is inherently nounwind based on its ODR token sequence; in which case, it's safe to use the attribute for an IPA peephole.

Thanks for the detailed explanation again! As you pointed out previously, linkonce_odr is something the front end can optimize. I'm wondering why the front end is confident about that the linker would not replace the current definition with something else.

The frontend has generated unrefined IR with all side effects from the must-be-ODR-equivalent source still present. It's not until on optimizer gets at it that side effects can be refined away. (Unless the IRGen peepholes are powerful enough to refine away side effects, but I don't believe IRGen does that.)

Since the IR from IRGen is unrefined (still has all side effects present in the source), whatever the linker/loader chooses cannot gain "extra" side effects through de-refinement.

In D83906#4182287, @dexonsmith wrote:

At IRGen time, you know the LLVM attributes have not been adjusted after the optimized refined the function's behaviour. It should be safe to have IPA peepholes, as long as IRGen's other peepholes don't refine behaviour and add attributes based on that.

In the optimizer, if you're looking at de-refineable function then you don't know which attributes come directly from the source and which were implied by optimizer refinements. You can't trust you'll get the same function attributes at runtime.

Hmm. I see what you're saying, but it's an interesting question how it applies here. In principle, the optimizer should not be changing the observable semantics of functions, which certainly includes things like whether the function throws. Maybe the optimizer can only figure out that a function throws in one TU, but if it "figures that out" and then a function with supposedly the same semantics actually does throw — not just retains the static ability to throw on a path that happens not to be taken dynamically, but actually throws at runtime — then arguably something has gone badly wrong. As I recall, the de-refinement discussion was originally about properties that are *not* invariant to optimization in this way, things like whether the function uses one of its arguments. Those properties are not generally considered to be part of the function's externally-observable semantics.

Of course, that's making a lot of assumptions about both what transformations are legal and to what extent they can be observed. All bets are off the second you have a single transformation that's observable in code. For example, we have a C++ optimization that promotes scoped heap allocations to the stack; that can definitely change whether exceptions are thrown, and then you can handle that exception and change return values, trigger extra side effects, and so on. I don't think anyone wants to argue that we shouldn't do that optimization. Even more simply, fast-math optimization can certainly change return values; and of course *anything* can change semantics under SEH.

Even if we just need to assume that that's always going to be possible in LLVM — that there will always be optimizations in play that can arbitrarily change observable semantics — maybe we can at least be a little more principled about them? It's still true that the vast majority of transformations in LLVM cannot trigger arbitrary changes to source semantics, at least when SEH isn't in use. Most transformations that change semantics are pretty narrow in practice — they don't touch most functions — so instead of conservatively assuming that *any* function might have been altered, it's probably profitable to track that on specific functions. That would at least eliminate this artificial boundary between the frontend and the optimizer: the optimizer would have the information it would need to do this analysis on unaltered functions.

In D83906#4182476, @dexonsmith wrote:

In D83906#4182428, @hoy wrote:

In D83906#4182287, @dexonsmith wrote:

In C++, you get linkonce_odr all over the place. It's basically all functions that are defined in C++ headers that are available for inlining.

On the other hand, the frontend knows the token sequence from the source language. It knows whether function B is inherently nounwind based on its ODR token sequence; in which case, it's safe to use the attribute for an IPA peephole.

Thanks for the detailed explanation again! As you pointed out previously, linkonce_odr is something the front end can optimize. I'm wondering why the front end is confident about that the linker would not replace the current definition with something else.

The frontend has generated unrefined IR with all side effects from the must-be-ODR-equivalent source still present. It's not until on optimizer gets at it that side effects can be refined away. (Unless the IRGen peepholes are powerful enough to refine away side effects, but I don't believe IRGen does that.)

Since the IR from IRGen is unrefined (still has all side effects present in the source), whatever the linker/loader chooses cannot gain "extra" side effects through de-refinement.

As far as I know, the optimizer IPO pass that infers function attributes (i..e InferFunctionAttrsPass) is placed at the very beginning of the optimization pipeline. Does this sound to you that the side effects computed for linkonce_odr functions there can be trusted by the rest of the pipeline?

In D83906#4182777, @rjmccall wrote:

In D83906#4182287, @dexonsmith wrote:

At IRGen time, you know the LLVM attributes have not been adjusted after the optimized refined the function's behaviour. It should be safe to have IPA peepholes, as long as IRGen's other peepholes don't refine behaviour and add attributes based on that.

In the optimizer, if you're looking at de-refineable function then you don't know which attributes come directly from the source and which were implied by optimizer refinements. You can't trust you'll get the same function attributes at runtime.

Hmm. I see what you're saying, but it's an interesting question how it applies here. In principle, the optimizer should not be changing the observable semantics of functions, which certainly includes things like whether the function throws. Maybe the optimizer can only figure out that a function throws in one TU, but if it "figures that out" and then a function with supposedly the same semantics actually does throw — not just retains the static ability to throw on a path that happens not to be taken dynamically, but actually throws at runtime — then arguably something has gone badly wrong.

I believe in my example, it's kind of the reverse. Only one TU *remembers* that the function can throw; the other one "forgets" because it has optimized its variant not to throw.

Maybe it's useful to note that, while maybe_nounwind has no UB, whether it throws or not depends on thread timing, and is generally non-reproducible (run it twice, you can get different results). In the TU that forgets, the optimizer is choosing to assume that the two adjacent atomic loads happen so quickly that no store happens in between; choosing the thread timing where there's no store to contend with. This is a valid refinement of the original source semantics -- optimizers are allowed to CSE adjacent atomic loads.

As I recall, the de-refinement discussion was originally about properties that are *not* invariant to optimization in this way, things like whether the function uses one of its arguments. Those properties are not generally considered to be part of the function's externally-observable semantics.

The example described in the referenced de-refinement commit is where a function that writes to memory is refined to readnone. I think my nounwind example above is analogous.

Here's the original from https://reviews.llvm.org/D18634:

For instance, FunctionAttrs cannot assume a comdat function is
actually readnone even if it does not have any loads or stores in
it; since there may have been loads and stores in the "original
function" that were refined out in the currently visible variant, and
at the link step the linker may in fact choose an implementation with
a load or a store. As an example, consider a function that does two
atomic loads from the same memory location, and writes to memory only
if the two values are not equal. The optimizer is allowed to refine
this function by first CSE'ing the two loads, and the folding the
comparision to always report that the two values are equal. Such a
refined variant will look like it is readonly. However, the
unoptimized version of the function can still write to memory (since
the two loads can result in different values), and selecting the
unoptimized version at link time will retroactively invalidate
transforms we may have done under the assumption that the function
does not write to memory.

So your argument is that it would not be possible to recognize that we're doing such an optimization and mark the function as having had a possible semantics change?

In D83906#4182847, @hoy wrote:

As far as I know, the optimizer IPO pass that infers function attributes (i..e InferFunctionAttrsPass) is placed at the very beginning of the optimization pipeline. Does this sound to you that the side effects computed for linkonce_odr functions there can be trusted by the rest of the pipeline?

Depends what you mean by "trusted". It assumes the attributes accurately describe the function it sees. The properties promised there will apply if/when the code is inlined. But, since the commit in 2016, it doesn't trust that they fully describe the source semantics, so IPA ignores them when the function is not inlined.

Note that the optimizer doesn't know if its input IR has already been optimized. Is this the first optimizer that has run on the IR, or could side effects have been refined away already? E.g., if the optimization pipeline in question runs at LTO time, the compile-time optimization pipeline has already run.

In D83906#4182902, @rjmccall wrote:

So your argument is that it would not be possible to recognize that we're doing such an optimization and mark the function as having had a possible semantics change?

I suspect it would be error-prone to do that precisely. I'd bet there are a variety of hard-to-reason about examples. Originally, when @sanjoy was first describing this problem (to me and others), his examples all had UB in the original code (e.g., reading and writing to globals in different threads). Eventually he invented the adjacent-atomic-load device, described above, which does not rely on UB in the original code. I just assume there are more devices out there that we don't know about or understand.

Maybe it would be useful to do it imprecisely? E.g, have all transformation passes mark all functions as changed (or drop a pristine attribute)? Then at least you know whether it has come directly from IRGen.

Not sure if it's valuable enough though. I don't think the regressions were as bad as we expected. It seems okay for the optimizer to delay propagating attributes from de-refineable functions until you have the export list for the link (and non-exported symbols can be internalized).

Sorry... this seems to be pretty off topic, in a way, although maybe not many people know the de-refinement ins and outs so maybe this is useful.

My original points:

Moving this to the middle end isn't easy (currently, it doesn't have the info it needs, although John might have a proposal for providing it)
Dropping frontend peepholes in favour of stable IR output seems novel and might deserve a forum discussion (I'd be in favour, personally)

I just assume there are more devices out there that we don't know about or understand.

I don't totally understand the broader discussion, but malloc(4) == nullptr is another gadget. This is optimized to false by LLVM even though at runtime it can be false or true depending on the state of the heap.

In D83906#4182904, @dexonsmith wrote:

In D83906#4182847, @hoy wrote:

As far as I know, the optimizer IPO pass that infers function attributes (i..e InferFunctionAttrsPass) is placed at the very beginning of the optimization pipeline. Does this sound to you that the side effects computed for linkonce_odr functions there can be trusted by the rest of the pipeline?

Depends what you mean by "trusted". It assumes the attributes accurately describe the function it sees. The properties promised there will apply if/when the code is inlined. But, since the commit in 2016, it doesn't trust that they fully describe the source semantics, so IPA ignores them when the function is not inlined.

Note that the optimizer doesn't know if its input IR has already been optimized. Is this the first optimizer that has run on the IR, or could side effects have been refined away already? E.g., if the optimization pipeline in question runs at LTO time, the compile-time optimization pipeline has already run.

Wondering if we can come up with a way to tell the optimizer about that, e.g., through a new module flag. When it comes to LTO, the selection of linkonce_odr symbols should already been done and the optimizer may be able to recompute the attributes based on pre-LTO attributes, or at least we can allow IPO to one module only, which should still do a better job than FE does?

In D83906#4183453, @hoy wrote:

Wondering if we can come up with a way to tell the optimizer about that, e.g., through a new module flag. When it comes to LTO, the selection of linkonce_odr symbols should already been done and the optimizer may be able to recompute the attributes based on pre-LTO attributes, or at least we can allow IPO to one module only, which should still do a better job than FE does?

I don't think there's much point in passing anything to LTO. There are very few linkonce_odr symbols in LTO, since LTO has the advantage of an export list from the link. Symbols not on the export list are internalized (they're given local linkage).

In D83906#4184916, @dexonsmith wrote:

In D83906#4183453, @hoy wrote:

Wondering if we can come up with a way to tell the optimizer about that, e.g., through a new module flag. When it comes to LTO, the selection of linkonce_odr symbols should already been done and the optimizer may be able to recompute the attributes based on pre-LTO attributes, or at least we can allow IPO to one module only, which should still do a better job than FE does?

I don't think there's much point in passing anything to LTO. There are very few linkonce_odr symbols in LTO, since LTO has the advantage of an export list from the link. Symbols not on the export list are internalized (they're given local linkage).

That sounds to me an opportunity to get a broader IPO done precisely in the prelink optimizer, as long as we find a way to tell it the incoming IR has source fidelity. What do you think about idea of introducing a module flag? Maybe it's worth discussing in the forum as a followup of introducing a cc1 flag for a stable IR gen.

In D83906#4186887, @hoy wrote:

In D83906#4184916, @dexonsmith wrote:

In D83906#4183453, @hoy wrote:

Wondering if we can come up with a way to tell the optimizer about that, e.g., through a new module flag. When it comes to LTO, the selection of linkonce_odr symbols should already been done and the optimizer may be able to recompute the attributes based on pre-LTO attributes, or at least we can allow IPO to one module only, which should still do a better job than FE does?

I don't think there's much point in passing anything to LTO. There are very few linkonce_odr symbols in LTO, since LTO has the advantage of an export list from the link. Symbols not on the export list are internalized (they're given local linkage).

That sounds to me an opportunity to get a broader IPO done precisely in the prelink optimizer, as long as we find a way to tell it the incoming IR has source fidelity. What do you think about idea of introducing a module flag? Maybe it's worth discussing in the forum as a followup of introducing a cc1 flag for a stable IR gen.

I'm not sure I'm following.

The prelink optimizer will already be internalizing (i.e., NOT exporting) these symbols. That should be enough. AFAICT, it's non-LTO pipelines that might have headroom after this is reverted.

I'm also not sure what the module flag would be for. If "this module has source fidelity", it won't work, because the gadgets I'm aware of are implemented in function passes (probably -instcombine?). A function pass isn't allowed to touch module state. Were you thinking of a different module flag? (But, I repeat, I think LTO pipelines have nothing to worry about anyway.)

In D83906#4187872, @dexonsmith wrote:

In D83906#4186887, @hoy wrote:

In D83906#4184916, @dexonsmith wrote:

In D83906#4183453, @hoy wrote:

Wondering if we can come up with a way to tell the optimizer about that, e.g., through a new module flag. When it comes to LTO, the selection of linkonce_odr symbols should already been done and the optimizer may be able to recompute the attributes based on pre-LTO attributes, or at least we can allow IPO to one module only, which should still do a better job than FE does?

I don't think there's much point in passing anything to LTO. There are very few linkonce_odr symbols in LTO, since LTO has the advantage of an export list from the link. Symbols not on the export list are internalized (they're given local linkage).

That sounds to me an opportunity to get a broader IPO done precisely in the prelink optimizer, as long as we find a way to tell it the incoming IR has source fidelity. What do you think about idea of introducing a module flag? Maybe it's worth discussing in the forum as a followup of introducing a cc1 flag for a stable IR gen.

I'm not sure I'm following.

The prelink optimizer will already be internalizing (i.e., NOT exporting) these symbols. That should be enough. AFAICT, it's non-LTO pipelines that might have headroom after this is reverted.

By prelink I meant the optimizer that run by Clang. The one run by the linker is usually called the postlink optimizer. As you pointed out, the postlink optimizer will unlikely see lots of linkonce_odr because of the internalization done right before it. But the prelink optimizer, which is basically a non-LTO optimizer, will still see them.

We actually didn't see expected wins with AutoFDO thinLTO after disabling this specific FE peephole, which I'm guessing might be due to the lack of such peephole in prelink.

I'm also not sure what the module flag would be for. If "this module has source fidelity", it won't work, because the gadgets I'm aware of are implemented in function passes (probably -instcombine?). A function pass isn't allowed to touch module state. Were you thinking of a different module flag? (But, I repeat, I think LTO pipelines have nothing to worry about anyway.)

One of the common places to promoting an invoke to a call is in SimplifyCFG (https://github.com/llvm/llvm-project/blob/207854b07dd9bd0d79add49bc5af17f1aabc752f/llvm/lib/Transforms/Utils/Local.cpp#L2498), which is based on function attributes that are inferred by InferFunctionAttrsPass. This pass is the first optimization pass in the pipeline, so as long as we can get that pass right, the downstream optimizations should be safe. The module flag I was thinking of is to inform the InferFunctionAttrsPass module pass that the IR it's seeing is non-refined.

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGCall.cpp

4 lines

test/

CodeGenCXX/

debug-info-class.cpp

2 lines

CodeGenObjCXX/

arc-list-init-destruct.mm

2 lines

os_log.mm

5 lines

OpenMP/

atomic_codegen.cpp

2 lines

critical_codegen.cpp

2 lines

distribute_parallel_for_num_threads_codegen.cpp

2 lines

distribute_parallel_for_simd_num_threads_codegen.cpp

2 lines

for_codegen.cpp

2 lines

for_simd_codegen.cpp

4 lines

master_codegen.cpp

2 lines

parallel_for_codegen.cpp

2 lines

parallel_for_simd_codegen.cpp

4 lines

parallel_master_codegen.cpp

2 lines

parallel_num_threads_codegen.cpp

2 lines

parallel_sections_codegen.cpp

4 lines

sections_codegen.cpp

4 lines

simd_codegen.cpp

4 lines

single_codegen.cpp

2 lines

taskgroup_codegen.cpp

2 lines

Diff 278319

clang/lib/CodeGen/CGCall.cpp

Show First 20 Lines • Show All 4,835 Lines • ▼ Show 20 Lines	CannotThrow = false;
EHPersonality::get(*this).isMSVCXXPersonality()) {		EHPersonality::get(*this).isMSVCXXPersonality()) {
// The MSVC++ personality will implicitly terminate the program if an		// The MSVC++ personality will implicitly terminate the program if an
// exception is thrown during a cleanup outside of a try/catch.		// exception is thrown during a cleanup outside of a try/catch.
// We don't need to model anything in IR to get this behavior.		// We don't need to model anything in IR to get this behavior.
CannotThrow = true;		CannotThrow = true;
} else {		} else {
// Otherwise, nounwind call sites will never throw.		// Otherwise, nounwind call sites will never throw.
CannotThrow = Attrs.hasFnAttribute(llvm::Attribute::NoUnwind);		CannotThrow = Attrs.hasFnAttribute(llvm::Attribute::NoUnwind);

		if (auto *FPtr = dyn_cast<llvm::Function>(CalleePtr))
		if (FPtr->hasFnAttribute(llvm::Attribute::NoUnwind))
		CannotThrow = true;
}		}

// If we made a temporary, be sure to clean up after ourselves. Note that we		// If we made a temporary, be sure to clean up after ourselves. Note that we
// can't depend on being inside of an ExprWithCleanups, so we need to manually		// can't depend on being inside of an ExprWithCleanups, so we need to manually
// pop this cleanup later on. Being eager about this is OK, since this		// pop this cleanup later on. Being eager about this is OK, since this
// temporary is 'invisible' outside of the callee.		// temporary is 'invisible' outside of the callee.
if (UnusedReturnSizePtr)		if (UnusedReturnSizePtr)
pushFullExprCleanup<CallLifetimeEnd>(NormalEHLifetimeMarker, SRetAlloca,		pushFullExprCleanup<CallLifetimeEnd>(NormalEHLifetimeMarker, SRetAlloca,
▲ Show 20 Lines • Show All 278 Lines • Show Last 20 Lines

clang/test/CodeGenCXX/debug-info-class.cpp

	struct foo;			struct foo;
	void func(foo *f) {			void func(foo *f) {
	}			}
	class bar;			class bar;
	void func(bar *f) {			void func(bar *f) {
	}			}
	union baz;			union baz;
	void func(baz *f) {			void func(baz *f) {
	}			}

	class B {			class B {
	public:			public:
	virtual ~B();			virtual ~B();
	};			};

	B::~B() {			B::~B() { extern void mayThrow(); mayThrow();
	}			}

	struct C {			struct C {
	static int s;			static int s;
	virtual ~C();			virtual ~C();
	};			};

	C::~C() {			C::~C() {
	▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

clang/test/CodeGenObjCXX/arc-list-init-destruct.mm

	Show All 10 Lines
	};			};

	struct Container {			struct Container {
	Class1 a;			Class1 a;
	bool b;			bool b;
	};			};

	bool getBool() {			bool getBool() {
				extern void mayThrow();
				mayThrow();
	return false;			return false;
	}			}

	Class0 *g;			Class0 *g;

	// CHECK: define {{.*}} @_Z4testv()			// CHECK: define {{.*}} @_Z4testv()
	// CHECK: invoke zeroext i1 @_Z7getBoolv()			// CHECK: invoke zeroext i1 @_Z7getBoolv()
	// CHECK: landingpad { i8*, i32 }			// CHECK: landingpad { i8*, i32 }
	// CHECK: call void @_ZN6Class1D1Ev(%[[STRUCT_CLASS1]]* %{{.*}})			// CHECK: call void @_ZN6Class1D1Ev(%[[STRUCT_CLASS1]]* %{{.*}})
	// CHECK: br label			// CHECK: br label

	// CHECK: define linkonce_odr void @_ZN6Class1D1Ev(			// CHECK: define linkonce_odr void @_ZN6Class1D1Ev(

	Container test() {			Container test() {
	return {{g}, getBool()};			return {{g}, getBool()};
	}			}

clang/test/CodeGenObjCXX/os_log.mm

	// RUN: %clang_cc1 %s -emit-llvm -o - -triple x86_64-darwin-apple -fobjc-arc \			// RUN: %clang_cc1 %s -emit-llvm -o - -triple x86_64-darwin-apple -fobjc-arc \
	// RUN: -fexceptions -fcxx-exceptions \| FileCheck %s			// RUN: -fexceptions -fcxx-exceptions \| FileCheck %s

	// Check that no EH cleanup is emitted around the call to __os_log_helper.			// Check that no EH cleanup is emitted around the call to __os_log_helper.
	namespace no_eh_cleanup {			namespace no_eh_cleanup {
	void release(int *lock);			void release(int *lock);

	// CHECK-LABEL: define {{.*}} @_ZN13no_eh_cleanup3logERiPcS1_(			// CHECK-LABEL: define {{.*}} @_ZN13no_eh_cleanup3logERiPcS1_(
				// CHECK: call void @__os_log_helper_1_2_2_4_0_8_34(

	void log(int &i, char data, char buf) {			void log(int &i, char data, char buf) {
	int lock __attribute__((cleanup(release)));			int lock __attribute__((cleanup(release)));
	__builtin_os_log_format(buf, "%d %{public}s", i, data);			__builtin_os_log_format(buf, "%d %{public}s", i, data);
	}			}

	// An `invoke` of a `nounwind` callee is simplified to a direct			// Check that the os_log_helper is marked `nounwind`.
				vskUnsubmitted Done Reply Inline Actions This comment can simply read "Check that the os_log_helper is marked `nounwind`." vsk: This comment can simply read "Check that the os_log_helper is marked `nounwind`."
	// call by an optimization in llvm. Just check that we emit `nounwind`.
	// CHECK: define {{.}} @__os_log_helper_1_2_2_4_0_8_34({{.}} [[NUW:#[0-9]+]]			// CHECK: define {{.}} @__os_log_helper_1_2_2_4_0_8_34({{.}} [[NUW:#[0-9]+]]
	}			}

	// CHECK: attributes [[NUW]] = { {{.*}}nounwind			// CHECK: attributes [[NUW]] = { {{.*}}nounwind

clang/test/OpenMP/atomic_codegen.cpp

Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	#pragma omp atomic
// CHECK: [[OMP_DONE]]		// CHECK: [[OMP_DONE]]
// CHECK: store i32 [[NEW_CALC_VAL]], i32* @a,		// CHECK: store i32 [[NEW_CALC_VAL]], i32* @a,
// CHECK: {{invoke\|call}} void @_ZN2StD1Ev(%struct.St* [[TEMP_ST_ADDR]])		// CHECK: {{invoke\|call}} void @_ZN2StD1Ev(%struct.St* [[TEMP_ST_ADDR]])
#pragma omp atomic capture		#pragma omp atomic capture
a = St().get() %= b;		a = St().get() %= b;
}		}
}		}

int &foo() { return a; }		int &foo() { extern void mayThrow(); mayThrow(); return a; }

// TERM_DEBUG-LABEL: parallel_atomic		// TERM_DEBUG-LABEL: parallel_atomic
void parallel_atomic() {		void parallel_atomic() {
#pragma omp parallel		#pragma omp parallel
{		{
#pragma omp atomic read		#pragma omp atomic read
// TERM_DEBUG-NOT: __kmpc_global_thread_num		// TERM_DEBUG-NOT: __kmpc_global_thread_num
// TERM_DEBUG: invoke {{.}}foo{{.}}()		// TERM_DEBUG: invoke {{.}}foo{{.}}()
Show All 34 Lines

clang/test/OpenMP/critical_codegen.cpp

	Show All 16 Lines

	// ALL: [[IDENT_T_TY:%.+]] = type { i32, i32, i32, i32, i8* }			// ALL: [[IDENT_T_TY:%.+]] = type { i32, i32, i32, i32, i8* }
	// ALL: [[UNNAMED_LOCK:@.+]] = common global [8 x i32] zeroinitializer			// ALL: [[UNNAMED_LOCK:@.+]] = common global [8 x i32] zeroinitializer
	// ALL: [[THE_NAME_LOCK:@.+]] = common global [8 x i32] zeroinitializer			// ALL: [[THE_NAME_LOCK:@.+]] = common global [8 x i32] zeroinitializer
	// ALL: [[THE_NAME_LOCK1:@.+]] = common global [8 x i32] zeroinitializer			// ALL: [[THE_NAME_LOCK1:@.+]] = common global [8 x i32] zeroinitializer

	// ALL: define {{.*}}void [[FOO:@.+]]()			// ALL: define {{.*}}void [[FOO:@.+]]()

	void foo() {}			void foo() { extern void mayThrow(); mayThrow(); }

	// ALL-LABEL: @main			// ALL-LABEL: @main
	// TERM_DEBUG-LABEL: @main			// TERM_DEBUG-LABEL: @main
	int main() {			int main() {
	// ALL: [[A_ADDR:%.+]] = alloca i8			// ALL: [[A_ADDR:%.+]] = alloca i8
	char a;			char a;

	// ALL: [[GTID:%.+]] = call {{.}}i32 @__kmpc_global_thread_num([[IDENT_T_TY]] [[DEFAULT_LOC:@.+]])			// ALL: [[GTID:%.+]] = call {{.}}i32 @__kmpc_global_thread_num([[IDENT_T_TY]] [[DEFAULT_LOC:@.+]])
	▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

clang/test/OpenMP/distribute_parallel_for_num_threads_codegen.cpp

	Show All 16 Lines
	// CHECK-DAG: [[STR:@.+]] = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00"			// CHECK-DAG: [[STR:@.+]] = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00"
	// CHECK-DAG: [[DEF_LOC_2:@.+]] = private unnamed_addr global [[IDENT_T_TY]] { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* [[STR]], i32 0, i32 0) }			// CHECK-DAG: [[DEF_LOC_2:@.+]] = private unnamed_addr global [[IDENT_T_TY]] { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* [[STR]], i32 0, i32 0) }

	void foo();			void foo();

	struct S {			struct S {
	intptr_t a, b, c;			intptr_t a, b, c;
	S(intptr_t a) : a(a) {}			S(intptr_t a) : a(a) {}
	operator char() { return a; }			operator char() { extern void mayThrow(); mayThrow(); return a; }
	~S() {}			~S() {}
	};			};

	template <typename T, int C>			template <typename T, int C>
	int tmain() {			int tmain() {
	#pragma omp target			#pragma omp target
	#pragma omp teams			#pragma omp teams
	#pragma omp distribute parallel for num_threads(C)			#pragma omp distribute parallel for num_threads(C)
	▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

clang/test/OpenMP/distribute_parallel_for_simd_num_threads_codegen.cpp

	Show All 16 Lines
	// CHECK-DAG: [[STR:@.+]] = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00"			// CHECK-DAG: [[STR:@.+]] = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00"
	// CHECK-DAG: [[DEF_LOC_2:@.+]] = private unnamed_addr global [[IDENT_T_TY]] { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* [[STR]], i32 0, i32 0) }			// CHECK-DAG: [[DEF_LOC_2:@.+]] = private unnamed_addr global [[IDENT_T_TY]] { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* [[STR]], i32 0, i32 0) }

	void foo();			void foo();

	struct S {			struct S {
	intptr_t a, b, c;			intptr_t a, b, c;
	S(intptr_t a) : a(a) {}			S(intptr_t a) : a(a) {}
	operator char() { return a; }			operator char() { extern void mayThrow(); mayThrow(); return a; }
	~S() {}			~S() {}
	};			};

	template <typename T, int C>			template <typename T, int C>
	int tmain() {			int tmain() {
	#pragma omp target			#pragma omp target
	#pragma omp teams			#pragma omp teams
	#pragma omp distribute parallel for simd num_threads(C)			#pragma omp distribute parallel for simd num_threads(C)
	▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

clang/test/OpenMP/for_codegen.cpp

Show First 20 Lines • Show All 530 Lines • ▼ Show 20 Lines	void test_precond() {
// CHECK: call void @__kmpc_for_static_init_4		// CHECK: call void @__kmpc_for_static_init_4
#pragma omp for		#pragma omp for
for(char i = a; i < 10; ++i);		for(char i = a; i < 10; ++i);
// CHECK: call void @__kmpc_for_static_fini		// CHECK: call void @__kmpc_for_static_fini
// CHECK: [[PRECOND_END]]		// CHECK: [[PRECOND_END]]
}		}

// TERM_DEBUG-LABEL: foo		// TERM_DEBUG-LABEL: foo
int foo() {return 0;};		int foo() { extern void mayThrow(); mayThrow(); return 0;};

// TERM_DEBUG-LABEL: parallel_for		// TERM_DEBUG-LABEL: parallel_for
void parallel_for(float *a) {		void parallel_for(float *a) {
#pragma omp parallel		#pragma omp parallel
#pragma omp for schedule(static, 5)		#pragma omp for schedule(static, 5)
// TERM_DEBUG-NOT: __kmpc_global_thread_num		// TERM_DEBUG-NOT: __kmpc_global_thread_num
// TERM_DEBUG: call void @__kmpc_for_static_init_4u({{.+}}), !dbg [[DBG_LOC:![0-9]+]]		// TERM_DEBUG: call void @__kmpc_for_static_init_4u({{.+}}), !dbg [[DBG_LOC:![0-9]+]]
// TERM_DEBUG: invoke i32 {{.}}foo{{.}}()		// TERM_DEBUG: invoke i32 {{.}}foo{{.}}()
▲ Show 20 Lines • Show All 239 Lines • Show Last 20 Lines

clang/test/OpenMP/for_simd_codegen.cpp

	Show All 14 Lines
	// RUN: %clang_cc1 -fopenmp-simd -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t < %s -fopenmp-version=50 -DOMP5			// RUN: %clang_cc1 -fopenmp-simd -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t < %s -fopenmp-version=50 -DOMP5
	// RUN: %clang_cc1 -fopenmp-simd -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -debug-info-kind=limited -std=c++11 -include-pch %t -verify -fopenmp-version=50 -DOMP5 -emit-llvm -o - < %s \| FileCheck --check-prefix SIMD-ONLY0 %s			// RUN: %clang_cc1 -fopenmp-simd -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -debug-info-kind=limited -std=c++11 -include-pch %t -verify -fopenmp-version=50 -DOMP5 -emit-llvm -o - < %s \| FileCheck --check-prefix SIMD-ONLY0 %s
	// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp-simd -fexceptions -fcxx-exceptions -debug-info-kind=line-tables-only -x c++ -emit-llvm -o - < %s -fopenmp-version=50 -DOMP5 \| FileCheck --check-prefix SIMD-ONLY0 %s			// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp-simd -fexceptions -fcxx-exceptions -debug-info-kind=line-tables-only -x c++ -emit-llvm -o - < %s -fopenmp-version=50 -DOMP5 \| FileCheck --check-prefix SIMD-ONLY0 %s
	// SIMD-ONLY0-NOT: {{__kmpc\|__tgt}}			// SIMD-ONLY0-NOT: {{__kmpc\|__tgt}}
	// expected-no-diagnostics			// expected-no-diagnostics
	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER

	long long get_val() { return 0; }			long long get_val() { extern void mayThrow(); mayThrow(); return 0; }
	double *g_ptr;			double *g_ptr;

	// CHECK-LABEL: define {{.void}} @{{.}}simple{{.}}(float {{.+}}, float* {{.+}}, float* {{.+}}, float* {{.+}})			// CHECK-LABEL: define {{.void}} @{{.}}simple{{.}}(float {{.+}}, float* {{.+}}, float* {{.+}}, float* {{.+}})
	void simple(float a, float b, float c, float d) {			void simple(float a, float b, float c, float d) {
	#ifdef OMP5			#ifdef OMP5
	#pragma omp for simd if (true)			#pragma omp for simd if (true)
	#else			#else
	#pragma omp for simd			#pragma omp for simd
	▲ Show 20 Lines • Show All 748 Lines • ▼ Show 20 Lines
	//			//
	// Here we expect store into original localint, not its privatized version.			// Here we expect store into original localint, not its privatized version.
	// CHECK-NOT: store i32 {{.+}}, i32* [[LOCALINT]]			// CHECK-NOT: store i32 {{.+}}, i32* [[LOCALINT]]
	localint = (int)j;			localint = (int)j;
	// CHECK: ret void			// CHECK: ret void
	}			}

	// TERM_DEBUG-LABEL: bar			// TERM_DEBUG-LABEL: bar
	int bar() {return 0;};			int bar() { extern void mayThrow(); mayThrow(); return 0; };

	// TERM_DEBUG-LABEL: parallel_simd			// TERM_DEBUG-LABEL: parallel_simd
	void parallel_simd(float *a) {			void parallel_simd(float *a) {
	#pragma omp parallel			#pragma omp parallel
	#pragma omp for simd			#pragma omp for simd
	// TERM_DEBUG-NOT: __kmpc_global_thread_num			// TERM_DEBUG-NOT: __kmpc_global_thread_num
	// TERM_DEBUG: invoke i32 {{.}}bar{{.}}()			// TERM_DEBUG: invoke i32 {{.}}bar{{.}}()
	// TERM_DEBUG: unwind label %[[TERM_LPAD:.+]],			// TERM_DEBUG: unwind label %[[TERM_LPAD:.+]],
	Show All 16 Lines

clang/test/OpenMP/master_codegen.cpp

	Show All 13 Lines
	// expected-no-diagnostics			// expected-no-diagnostics
	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER

	// ALL: [[IDENT_T_TY:%.+]] = type { i32, i32, i32, i32, i8* }			// ALL: [[IDENT_T_TY:%.+]] = type { i32, i32, i32, i32, i8* }

	// ALL: define {{.*}}void [[FOO:@.+]]()			// ALL: define {{.*}}void [[FOO:@.+]]()

	void foo() {}			void foo() { extern void mayThrow(); mayThrow(); }

	// ALL-LABEL: @main			// ALL-LABEL: @main
	// TERM_DEBUG-LABEL: @main			// TERM_DEBUG-LABEL: @main
	int main() {			int main() {
	// ALL: [[A_ADDR:%.+]] = alloca i8			// ALL: [[A_ADDR:%.+]] = alloca i8
	char a;			char a;

	// ALL: [[GTID:%.+]] = call {{.}}i32 @__kmpc_global_thread_num([[IDENT_T_TY]] [[DEFAULT_LOC:@.+]])			// ALL: [[GTID:%.+]] = call {{.}}i32 @__kmpc_global_thread_num([[IDENT_T_TY]] [[DEFAULT_LOC:@.+]])
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

clang/test/OpenMP/parallel_for_codegen.cpp

	Show First 20 Lines • Show All 366 Lines • ▼ Show 20 Lines
	// CHECK-NEXT: br label %{{.+}}			// CHECK-NEXT: br label %{{.+}}
	}			}
	// CHECK: [[LOOP1_END]]			// CHECK: [[LOOP1_END]]
	// CHECK: [[O_LOOP1_END]]			// CHECK: [[O_LOOP1_END]]
	// CHECK: ret void			// CHECK: ret void
	}			}

	// TERM_DEBUG-LABEL: foo			// TERM_DEBUG-LABEL: foo
	int foo() {return 0;};			int foo() { extern void mayThrow(); mayThrow(); return 0; };

	// TERM_DEBUG-LABEL: parallel_for			// TERM_DEBUG-LABEL: parallel_for
	// CLEANUP: parallel_for			// CLEANUP: parallel_for
	void parallel_for(float *a, const int n) {			void parallel_for(float *a, const int n) {
	float arr[n];			float arr[n];
	#pragma omp parallel for schedule(static, 5) private(arr) default(none) firstprivate(n) shared(a)			#pragma omp parallel for schedule(static, 5) private(arr) default(none) firstprivate(n) shared(a)
	// TERM_DEBUG-NOT: __kmpc_global_thread_num			// TERM_DEBUG-NOT: __kmpc_global_thread_num
	// TERM_DEBUG: call void @__kmpc_for_static_init_4u({{.+}}), !dbg [[DBG_LOC_START:![0-9]+]]			// TERM_DEBUG: call void @__kmpc_for_static_init_4u({{.+}}), !dbg [[DBG_LOC_START:![0-9]+]]
	▲ Show 20 Lines • Show All 259 Lines • Show Last 20 Lines

clang/test/OpenMP/parallel_for_simd_codegen.cpp

	Show All 16 Lines
	// RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=50 -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s			// RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=50 -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
	// RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=50 -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -debug-info-kind=limited -std=c++11 -include-pch %t -verify %s -emit-llvm -o - \| FileCheck --check-prefix SIMD-ONLY0 %s			// RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=50 -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -debug-info-kind=limited -std=c++11 -include-pch %t -verify %s -emit-llvm -o - \| FileCheck --check-prefix SIMD-ONLY0 %s
	// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp-simd -fopenmp-version=50 -fexceptions -fcxx-exceptions -debug-info-kind=line-tables-only -x c++ -emit-llvm %s -o - \| FileCheck --check-prefix SIMD-ONLY0 %s			// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp-simd -fopenmp-version=50 -fexceptions -fcxx-exceptions -debug-info-kind=line-tables-only -x c++ -emit-llvm %s -o - \| FileCheck --check-prefix SIMD-ONLY0 %s
	// expected-no-diagnostics			// expected-no-diagnostics
	// SIMD-ONLY0-NOT: {{__kmpc\|__tgt}}			// SIMD-ONLY0-NOT: {{__kmpc\|__tgt}}
	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER

	long long get_val() { return 0; }			long long get_val() { extern void mayThrow(); mayThrow(); return 0; }
	double *g_ptr;			double *g_ptr;

	// CHECK-LABEL: define {{.void}} @{{.}}simple{{.}}(float {{.+}}, float* {{.+}}, float* {{.+}}, float* {{.+}})			// CHECK-LABEL: define {{.void}} @{{.}}simple{{.}}(float {{.+}}, float* {{.+}}, float* {{.+}}, float* {{.+}})
	void simple(float a, float b, float c, float d) {			void simple(float a, float b, float c, float d) {
	// CHECK: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(			// CHECK: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(
	// CHECK: [[K0:%.+]] = call {{.}}i64 @{{.}}get_val			// CHECK: [[K0:%.+]] = call {{.}}i64 @{{.}}get_val
	// CHECK-NEXT: store i64 [[K0]], i64* [[K_VAR:%[^,]+]]			// CHECK-NEXT: store i64 [[K0]], i64* [[K_VAR:%[^,]+]]
	// CHECK: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(			// CHECK: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(
	▲ Show 20 Lines • Show All 762 Lines • ▼ Show 20 Lines
	// OMP45-DAG: ![[VM]] = !{!"llvm.loop.vectorize.enable", i1 true}			// OMP45-DAG: ![[VM]] = !{!"llvm.loop.vectorize.enable", i1 true}
	// OMP45-NOT: !{!"llvm.loop.vectorize.enable", i1 false}			// OMP45-NOT: !{!"llvm.loop.vectorize.enable", i1 false}
	// OMP50-DAG: ![[VECT]] = distinct !{![[VECT]], ![[VM:.+]]}			// OMP50-DAG: ![[VECT]] = distinct !{![[VECT]], ![[VM:.+]]}
	// OMP50-DAG: ![[VM]] = !{!"llvm.loop.vectorize.enable", i1 true}			// OMP50-DAG: ![[VM]] = !{!"llvm.loop.vectorize.enable", i1 true}
	// OMP50-DAG: ![[NOVECT]] = distinct !{![[NOVECT]], ![[NOVM:.+]]}			// OMP50-DAG: ![[NOVECT]] = distinct !{![[NOVECT]], ![[NOVM:.+]]}
	// OMP50-DAG: ![[NOVM]] = !{!"llvm.loop.vectorize.enable", i1 false}			// OMP50-DAG: ![[NOVM]] = !{!"llvm.loop.vectorize.enable", i1 false}

	// TERM_DEBUG-LABEL: bar			// TERM_DEBUG-LABEL: bar
	int bar() {return 0;};			int bar() { extern void mayThrow(); mayThrow(); return 0; };

	// TERM_DEBUG-LABEL: parallel_simd			// TERM_DEBUG-LABEL: parallel_simd
	void parallel_simd(float *a) {			void parallel_simd(float *a) {
	#pragma omp parallel for simd			#pragma omp parallel for simd
	// TERM_DEBUG-NOT: __kmpc_global_thread_num			// TERM_DEBUG-NOT: __kmpc_global_thread_num
	// TERM_DEBUG: invoke i32 {{.}}bar{{.}}()			// TERM_DEBUG: invoke i32 {{.}}bar{{.}}()
	// TERM_DEBUG: unwind label %[[TERM_LPAD:.+]],			// TERM_DEBUG: unwind label %[[TERM_LPAD:.+]],
	// TERM_DEBUG-NOT: __kmpc_global_thread_num			// TERM_DEBUG-NOT: __kmpc_global_thread_num
	Show All 10 Lines

clang/test/OpenMP/parallel_master_codegen.cpp

	Show All 12 Lines
	// RUN: %clang_cc1 -DCK1 -fopenmp-simd -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -std=c++11 -include-pch %t -verify %s -emit-llvm -o - \| FileCheck --check-prefix SIMD-ONLY0 %s			// RUN: %clang_cc1 -DCK1 -fopenmp-simd -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -std=c++11 -include-pch %t -verify %s -emit-llvm -o - \| FileCheck --check-prefix SIMD-ONLY0 %s
	// SIMD-ONLY0-NOT: {{__kmpc\|__tgt}}			// SIMD-ONLY0-NOT: {{__kmpc\|__tgt}}

	// CK1-DAG: %struct.ident_t = type { i32, i32, i32, i32, i8* }			// CK1-DAG: %struct.ident_t = type { i32, i32, i32, i32, i8* }
	// CK1-DAG: [[STR:@.+]] = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00"			// CK1-DAG: [[STR:@.+]] = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00"
	// CK1-DAG: [[DEF_LOC:@.+]] = private unnamed_addr global %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* [[STR]], i32 0, i32 0) }			// CK1-DAG: [[DEF_LOC:@.+]] = private unnamed_addr global %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* [[STR]], i32 0, i32 0) }

	// CK1-LABEL: foo			// CK1-LABEL: foo
	void foo() {}			void foo() { extern void mayThrow(); mayThrow(); }

	void parallel_master() {			void parallel_master() {
	#pragma omp parallel master			#pragma omp parallel master
	foo();			foo();
	}			}

	// CK1-LABEL: define void @{{.+}}parallel_master{{.+}}			// CK1-LABEL: define void @{{.+}}parallel_master{{.+}}
	// CK1: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* [[DEF_LOC]], i32 0, void (i32, i32, ...)* bitcast (void (i32, i32)* [[OMP_OUTLINED:@.+]] to void (i32, i32, ...)*))			// CK1: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* [[DEF_LOC]], i32 0, void (i32, i32, ...)* bitcast (void (i32, i32)* [[OMP_OUTLINED:@.+]] to void (i32, i32, ...)*))
	▲ Show 20 Lines • Show All 622 Lines • Show Last 20 Lines

clang/test/OpenMP/parallel_num_threads_codegen.cpp

	Show All 16 Lines
	// CHECK-DAG: [[STR:@.+]] = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00"			// CHECK-DAG: [[STR:@.+]] = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00"
	// CHECK-DAG: [[DEF_LOC_2:@.+]] = private unnamed_addr global [[IDENT_T_TY]] { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* [[STR]], i32 0, i32 0) }			// CHECK-DAG: [[DEF_LOC_2:@.+]] = private unnamed_addr global [[IDENT_T_TY]] { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* [[STR]], i32 0, i32 0) }

	void foo();			void foo();

	struct S {			struct S {
	intptr_t a, b, c;			intptr_t a, b, c;
	S(intptr_t a) : a(a) {}			S(intptr_t a) : a(a) {}
	operator char() { return a; }			operator char() { extern void mayThrow(); mayThrow(); return a; }
	~S() {}			~S() {}
	};			};

	template <typename T, int C>			template <typename T, int C>
	int tmain() {			int tmain() {
	#pragma omp parallel num_threads(C)			#pragma omp parallel num_threads(C)
	foo();			foo();
	#pragma omp parallel num_threads(T(23))			#pragma omp parallel num_threads(T(23))
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

clang/test/OpenMP/parallel_sections_codegen.cpp

	// RUN: %clang_cc1 -verify -fopenmp -x c++ -emit-llvm -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -o - %s \| FileCheck %s			// RUN: %clang_cc1 -verify -fopenmp -x c++ -emit-llvm -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -o - %s \| FileCheck %s
	// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -fexceptions -fcxx-exceptions -triple x86_64-unknown-unknown -emit-pch -o %t %s			// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -fexceptions -fcxx-exceptions -triple x86_64-unknown-unknown -emit-pch -o %t %s
	// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -include-pch %t -fsyntax-only -verify %s -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-llvm -o - \| FileCheck %s			// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -include-pch %t -fsyntax-only -verify %s -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-llvm -o - \| FileCheck %s

	// RUN: %clang_cc1 -verify -fopenmp-simd -x c++ -emit-llvm -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -o - %s \| FileCheck --check-prefix SIMD-ONLY0 %s			// RUN: %clang_cc1 -verify -fopenmp-simd -x c++ -emit-llvm -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -o - %s \| FileCheck --check-prefix SIMD-ONLY0 %s
	// RUN: %clang_cc1 -fopenmp-simd -x c++ -std=c++11 -fexceptions -fcxx-exceptions -triple x86_64-unknown-unknown -emit-pch -o %t %s			// RUN: %clang_cc1 -fopenmp-simd -x c++ -std=c++11 -fexceptions -fcxx-exceptions -triple x86_64-unknown-unknown -emit-pch -o %t %s
	// RUN: %clang_cc1 -fopenmp-simd -x c++ -std=c++11 -include-pch %t -fsyntax-only -verify %s -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-llvm -o - \| FileCheck --check-prefix SIMD-ONLY0 %s			// RUN: %clang_cc1 -fopenmp-simd -x c++ -std=c++11 -include-pch %t -fsyntax-only -verify %s -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-llvm -o - \| FileCheck --check-prefix SIMD-ONLY0 %s
	// SIMD-ONLY0-NOT: {{__kmpc\|__tgt}}			// SIMD-ONLY0-NOT: {{__kmpc\|__tgt}}
	// expected-no-diagnostics			// expected-no-diagnostics
	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER
	// CHECK-LABEL: foo			// CHECK-LABEL: foo
	void foo() {};			void foo() { extern void mayThrow(); mayThrow(); };
	// CHECK-LABEL: bar			// CHECK-LABEL: bar
	void bar() {};			void bar() { extern void mayThrow(); mayThrow(); };

	template <class T>			template <class T>
	T tmain() {			T tmain() {
	#pragma omp parallel sections			#pragma omp parallel sections
	{			{
	foo();			foo();
	}			}
	return T();			return T();
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

clang/test/OpenMP/sections_codegen.cpp

	// RUN: %clang_cc1 -verify -fopenmp -x c++ -emit-llvm -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -o - %s \| FileCheck %s			// RUN: %clang_cc1 -verify -fopenmp -x c++ -emit-llvm -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -o - %s \| FileCheck %s
	// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -fexceptions -fcxx-exceptions -triple x86_64-unknown-unknown -emit-pch -o %t %s			// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -fexceptions -fcxx-exceptions -triple x86_64-unknown-unknown -emit-pch -o %t %s
	// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -include-pch %t -fsyntax-only -verify %s -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-llvm -o - \| FileCheck %s			// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -include-pch %t -fsyntax-only -verify %s -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-llvm -o - \| FileCheck %s

	// RUN: %clang_cc1 -verify -fopenmp-simd -x c++ -emit-llvm -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -o - %s \| FileCheck --check-prefix SIMD-ONLY0 %s			// RUN: %clang_cc1 -verify -fopenmp-simd -x c++ -emit-llvm -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -o - %s \| FileCheck --check-prefix SIMD-ONLY0 %s
	// RUN: %clang_cc1 -fopenmp-simd -x c++ -std=c++11 -fexceptions -fcxx-exceptions -triple x86_64-unknown-unknown -emit-pch -o %t %s			// RUN: %clang_cc1 -fopenmp-simd -x c++ -std=c++11 -fexceptions -fcxx-exceptions -triple x86_64-unknown-unknown -emit-pch -o %t %s
	// RUN: %clang_cc1 -fopenmp-simd -x c++ -std=c++11 -include-pch %t -fsyntax-only -verify %s -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-llvm -o - \| FileCheck --check-prefix SIMD-ONLY0 %s			// RUN: %clang_cc1 -fopenmp-simd -x c++ -std=c++11 -include-pch %t -fsyntax-only -verify %s -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-llvm -o - \| FileCheck --check-prefix SIMD-ONLY0 %s
	// SIMD-ONLY0-NOT: {{__kmpc\|__tgt}}			// SIMD-ONLY0-NOT: {{__kmpc\|__tgt}}
	// expected-no-diagnostics			// expected-no-diagnostics
	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER
	// CHECK-DAG: [[IMPLICIT_BARRIER_SECTIONS_LOC:@.+]] = private unnamed_addr global %{{.+}} { i32 0, i32 194, i32 0, i32 0, i8*			// CHECK-DAG: [[IMPLICIT_BARRIER_SECTIONS_LOC:@.+]] = private unnamed_addr global %{{.+}} { i32 0, i32 194, i32 0, i32 0, i8*
	// CHECK-DAG: [[SECTIONS_LOC:@.+]] = private unnamed_addr global %{{.+}} { i32 0, i32 1026, i32 0, i32 0, i8*			// CHECK-DAG: [[SECTIONS_LOC:@.+]] = private unnamed_addr global %{{.+}} { i32 0, i32 1026, i32 0, i32 0, i8*
	// CHECK-LABEL: foo			// CHECK-LABEL: foo
	void foo() {};			void foo() { extern void mayThrow(); mayThrow(); };
	// CHECK-LABEL: bar			// CHECK-LABEL: bar
	void bar() {};			void bar() { extern void mayThrow(); mayThrow(); };

	template <class T>			template <class T>
	T tmain() {			T tmain() {
	#pragma omp parallel			#pragma omp parallel
	#pragma omp sections			#pragma omp sections
	{			{
	foo();			foo();
	}			}
	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

clang/test/OpenMP/simd_codegen.cpp

	Show All 20 Lines
	#define CONDITIONAL conditional :			#define CONDITIONAL conditional :
	#else			#else
	#define CONDITIONAL			#define CONDITIONAL
	#endif //OMP5			#endif //OMP5
	// CHECK: [[SS_TY:%.+]] = type { i32 }			// CHECK: [[SS_TY:%.+]] = type { i32 }
	// OMP50-DAG: [[LAST_IV:@.+]] = {{.*}}common global i64 0			// OMP50-DAG: [[LAST_IV:@.+]] = {{.*}}common global i64 0
	// OMP50-DAG: [[LAST_A:@.+]] = {{.*}}common global i32 0			// OMP50-DAG: [[LAST_A:@.+]] = {{.*}}common global i32 0

	long long get_val() { return 0; }			long long get_val() { extern void mayThrow(); mayThrow(); return 0; }
	double *g_ptr;			double *g_ptr;

	struct S {			struct S {
	int a, b;			int a, b;
	};			};

	// CHECK-LABEL: define {{.void}} @{{.}}simple{{.}}(float {{.+}}, float* {{.+}}, float* {{.+}}, float* {{.+}})			// CHECK-LABEL: define {{.void}} @{{.}}simple{{.}}(float {{.+}}, float* {{.+}}, float* {{.+}}, float* {{.+}})
	void simple(float a, float b, float c, float d) {			void simple(float a, float b, float c, float d) {
	▲ Show 20 Lines • Show All 755 Lines • ▼ Show 20 Lines
	} t;			} t;

	void bartfoo() {			void bartfoo() {
	t.foo();			t.foo();
	}			}

	#endif // OMP5			#endif // OMP5
	// TERM_DEBUG-LABEL: bar			// TERM_DEBUG-LABEL: bar
	int bar() {return 0;};			int bar() { extern void mayThrow(); mayThrow(); return 0; };

	// TERM_DEBUG-LABEL: parallel_simd			// TERM_DEBUG-LABEL: parallel_simd
	void parallel_simd(float *a) {			void parallel_simd(float *a) {
	#pragma omp parallel			#pragma omp parallel
	#pragma omp simd			#pragma omp simd
	// TERM_DEBUG-NOT: __kmpc_global_thread_num			// TERM_DEBUG-NOT: __kmpc_global_thread_num
	// TERM_DEBUG: invoke i32 {{.}}bar{{.}}()			// TERM_DEBUG: invoke i32 {{.}}bar{{.}}()
	// TERM_DEBUG: unwind label %[[TERM_LPAD:[^,]+]],			// TERM_DEBUG: unwind label %[[TERM_LPAD:[^,]+]],
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

clang/test/OpenMP/single_codegen.cpp

	Show All 36 Lines
	// CHECK: [[IMPLICIT_BARRIER_SINGLE_LOC:@.+]] = private unnamed_addr global %{{.+}} { i32 0, i32 322, i32 0, i32 0, i8*			// CHECK: [[IMPLICIT_BARRIER_SINGLE_LOC:@.+]] = private unnamed_addr global %{{.+}} { i32 0, i32 322, i32 0, i32 0, i8*

	// CHECK: define void [[FOO:@.+]]()			// CHECK: define void [[FOO:@.+]]()

	TestClass tc;			TestClass tc;
	TestClass tc2[2];			TestClass tc2[2];
	#pragma omp threadprivate(tc, tc2)			#pragma omp threadprivate(tc, tc2)

	void foo() {}			void foo() { extern void mayThrow(); mayThrow(); }

	struct SS {			struct SS {
	int a;			int a;
	int b : 4;			int b : 4;
	int &c;			int &c;
	SS(int &d) : a(0), b(0), c(d) {			SS(int &d) : a(0), b(0), c(d) {
	#pragma omp parallel firstprivate(a, b, c)			#pragma omp parallel firstprivate(a, b, c)
	#pragma omp single copyprivate(a, this->b, (this)->c)			#pragma omp single copyprivate(a, this->b, (this)->c)
	▲ Show 20 Lines • Show All 443 Lines • Show Last 20 Lines

clang/test/OpenMP/taskgroup_codegen.cpp

	Show All 10 Lines
	// expected-no-diagnostics			// expected-no-diagnostics
	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER

	// CHECK: [[IDENT_T_TY:%.+]] = type { i32, i32, i32, i32, i8* }			// CHECK: [[IDENT_T_TY:%.+]] = type { i32, i32, i32, i32, i8* }

	// CHECK: define {{.*}}void [[FOO:@.+]]()			// CHECK: define {{.*}}void [[FOO:@.+]]()

	void foo() {}			void foo() { extern void mayThrow(); mayThrow(); }

	// CHECK-LABEL: @main			// CHECK-LABEL: @main
	// TERM_DEBUG-LABEL: @main			// TERM_DEBUG-LABEL: @main
	int main() {			int main() {
	// CHECK: [[A_ADDR:%.+]] = alloca i8			// CHECK: [[A_ADDR:%.+]] = alloca i8
	char a;			char a;

	// CHECK: [[GTID:%.+]] = call {{.}}i32 @__kmpc_global_thread_num([[IDENT_T_TY]] [[DEFAULT_LOC:@.+]])			// CHECK: [[GTID:%.+]] = call {{.}}i32 @__kmpc_global_thread_num([[IDENT_T_TY]] [[DEFAULT_LOC:@.+]])
	Show All 35 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Emit a call instruction instead of an invoke if the called llvm function is marked nounwindClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 278319

clang/lib/CodeGen/CGCall.cpp

clang/test/CodeGenCXX/debug-info-class.cpp

clang/test/CodeGenObjCXX/arc-list-init-destruct.mm

clang/test/CodeGenObjCXX/os_log.mm

clang/test/OpenMP/atomic_codegen.cpp

clang/test/OpenMP/critical_codegen.cpp

clang/test/OpenMP/distribute_parallel_for_num_threads_codegen.cpp

clang/test/OpenMP/distribute_parallel_for_simd_num_threads_codegen.cpp

clang/test/OpenMP/for_codegen.cpp

clang/test/OpenMP/for_simd_codegen.cpp

clang/test/OpenMP/master_codegen.cpp

clang/test/OpenMP/parallel_for_codegen.cpp

clang/test/OpenMP/parallel_for_simd_codegen.cpp

clang/test/OpenMP/parallel_master_codegen.cpp

clang/test/OpenMP/parallel_num_threads_codegen.cpp

clang/test/OpenMP/parallel_sections_codegen.cpp

clang/test/OpenMP/sections_codegen.cpp

clang/test/OpenMP/simd_codegen.cpp

clang/test/OpenMP/single_codegen.cpp

clang/test/OpenMP/taskgroup_codegen.cpp

[CodeGen] Emit a call instruction instead of an invoke if the called llvm function is marked nounwind
ClosedPublic