This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Revert "Temporarily do not drop volatile stores before unreachable"
AbandonedPublic

Authored by lebedev.ri on Jul 2 2021, 2:06 AM.

Details

Summary

This reverts commit 4e413e16216d0c94ada2171f3c59e0a85f4fa4b6,
which landed almost 10 months ago under premise that the original behavior
didn't match reality and was breaking users, even though it was correct as per
the LangRef. But the LangRef change still hasn't appeared, which might suggest
that the affected parties aren't really worried about this problem.

Please refer to disscussion in:

Perhaps, this isn't destined to land, but merely to nudge towards further disscussion and a proper resolution.

Diff Detail

Event Timeline

lebedev.ri created this revision.Jul 2 2021, 2:06 AM
lebedev.ri requested review of this revision.Jul 2 2021, 2:06 AM
nikic added a comment.Jul 2 2021, 8:34 AM

Ah yes, nothing as permanent as a temporary solution... Thanks for following up on this.

Let's wait another week for a LangRef patch to appear, otherwise this LGTM.

I'm on board with either solution stated in the comment already, the current code is a workaround, as stated.

 		-     // TODO: Either remove this code, or properly integrate the check into
		-     // isGuaranteedToTransferExecutionToSuccessor().

Ah yes, nothing as permanent as a temporary solution... Thanks for following up on this.

Let's wait another week for a LangRef patch to appear, otherwise this LGTM.

BTW, i have already previously warned that this will happen 3 months after said patch landed,
and that was half a year ago: https://reviews.llvm.org/D87399#2472707
So sure, we can wait one more week, but somehow i'm not sure that'll get their attention.

jdoerfert accepted this revision.Jul 7 2021, 8:56 PM

We actually hit this in the OpenMP device runtime. It seems nobody is interested in discussing a LangRef change. LGTM assuming there is no LangRef patch by Friday.

This revision is now accepted and ready to land.Jul 7 2021, 8:56 PM

We actually hit this in the OpenMP device runtime. It seems nobody is interested in discussing a LangRef change. LGTM assuming there is no LangRef patch by Friday.

Yep, i plan on landing this in T-24h.

We actually hit this in the OpenMP device runtime. It seems nobody is interested in discussing a LangRef change. LGTM assuming there is no LangRef patch by Friday.

As I said in the review thread of the prior patch -- I don't have a strong opinion about the LLVM IR semantics here.

But just reverting this is pretty hostile unless Clang has been updated to make C and C++ volatile stores continue to behave as the language expects first.

Just regressing Clang because no one has written the LangRef change doesn't make sense to me. I think you should fix Clang to not be buggy first if you want to pursue this (or work with the Clang devs to get such a patch), just like we would fix a buggy LLVM pass that broke real users before enabling a valid optimization that uncovered the big.

lebedev.ri edited the summary of this revision. (Show Details)Jul 8 2021, 2:45 AM
spatel added a comment.Jul 8 2021, 9:53 AM

But just reverting this is pretty hostile unless Clang has been updated to make C and C++ volatile stores continue to behave as the language expects first.

Just regressing Clang because no one has written the LangRef change doesn't make sense to me. I think you should fix Clang to not be buggy first if you want to pursue this (or work with the Clang devs to get such a patch), just like we would fix a buggy LLVM pass that broke real users before enabling a valid optimization that uncovered the big.

I agree. I don't know exactly what fixes are needed for Clang, but we do know that code like this:
https://github.com/llvm/llvm-project/blob/8cf60e61e7b0e3213b5b81039878dcf1ef18c00f/compiler-rt/lib/gwp_asan/guarded_pool_allocator.cpp#L254
( the root cause of https://llvm.org/PR47480 ) is going to break again with this change.

If the goal is to get source like that to change, then we might as well fix that first, so we can point to it as a reference example of how to fix code?

But just reverting this is pretty hostile unless Clang has been updated to make C and C++ volatile stores continue to behave as the language expects first.

Just regressing Clang because no one has written the LangRef change doesn't make sense to me. I think you should fix Clang to not be buggy first if you want to pursue this (or work with the Clang devs to get such a patch), just like we would fix a buggy LLVM pass that broke real users before enabling a valid optimization that uncovered the big.

I agree. I don't know exactly what fixes are needed for Clang, but we do know that code like this:
https://github.com/llvm/llvm-project/blob/8cf60e61e7b0e3213b5b81039878dcf1ef18c00f/compiler-rt/lib/gwp_asan/guarded_pool_allocator.cpp#L254
( the root cause of https://llvm.org/PR47480 ) is going to break again with this change.

If the goal is to get source like that to change, then we might as well fix that first, so we can point to it as a reference example of how to fix code?

I really don't know how to change that code in a meaningful way. It is already marking the store as volatile (in the C sense).

I would expect this to require extensive changes to kernel code for example, and I'm not sure there would be any appetite for those changes.

Even LLVM's own blog contains advice to use the code pattern that GWP-ASan uses: 'If you're using an LLVM-based compiler, you can dereference a "volatile" null pointer to get a crash if that's what you're looking for [...]' (http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html), and while it is talking about null pointers, the prior section seems to make it very clear this is also how to get a crash from something like a guard page which is the specific use case in question.

But just reverting this is pretty hostile unless Clang has been updated to make C and C++ volatile stores continue to behave as the language expects first.

Just regressing Clang because no one has written the LangRef change doesn't make sense to me. I think you should fix Clang to not be buggy first if you want to pursue this (or work with the Clang devs to get such a patch), just like we would fix a buggy LLVM pass that broke real users before enabling a valid optimization that uncovered the big.

I agree. I don't know exactly what fixes are needed for Clang, but we do know that code like this:
https://github.com/llvm/llvm-project/blob/8cf60e61e7b0e3213b5b81039878dcf1ef18c00f/compiler-rt/lib/gwp_asan/guarded_pool_allocator.cpp#L254
( the root cause of https://llvm.org/PR47480 ) is going to break again with this change.

If the goal is to get source like that to change, then we might as well fix that first, so we can point to it as a reference example of how to fix code?

I really don't know how to change that code in a meaningful way. It is already marking the store as volatile (in the C sense).

For example by using __builtin_trap().

I would expect this to require extensive changes to kernel code for example, and I'm not sure there would be any appetite for those changes.

Either that, or someone would have to come up with an RFC to carve out an exception specifically for volatile store to null when null pointer is defined.

Even LLVM's own blog contains advice to use the code pattern that GWP-ASan uses: 'If you're using an LLVM-based compiler, you can dereference a "volatile" null pointer to get a crash if that's what you're looking for [...]' (http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html), and while it is talking about null pointers, the prior section seems to make it very clear this is also how to get a crash from something like a guard page which is the specific use case in question.

There has been a very lengthy disscussion originating from this very message starting at https://reviews.llvm.org/D87149#2264099

But just reverting this is pretty hostile unless Clang has been updated to make C and C++ volatile stores continue to behave as the language expects first.

Just regressing Clang because no one has written the LangRef change doesn't make sense to me. I think you should fix Clang to not be buggy first if you want to pursue this (or work with the Clang devs to get such a patch), just like we would fix a buggy LLVM pass that broke real users before enabling a valid optimization that uncovered the big.

I agree. I don't know exactly what fixes are needed for Clang, but we do know that code like this:
https://github.com/llvm/llvm-project/blob/8cf60e61e7b0e3213b5b81039878dcf1ef18c00f/compiler-rt/lib/gwp_asan/guarded_pool_allocator.cpp#L254
( the root cause of https://llvm.org/PR47480 ) is going to break again with this change.

If the goal is to get source like that to change, then we might as well fix that first, so we can point to it as a reference example of how to fix code?

I really don't know how to change that code in a meaningful way. It is already marking the store as volatile (in the C sense).

The suggestion there was to change __builtin_unreachable() to __builtin_trap(). Is that not an adequate fix?

If the goal is to get source like that to change, then we might as well fix that first, so we can point to it as a reference example of how to fix code?

I really don't know how to change that code in a meaningful way. It is already marking the store as volatile (in the C sense).

We have __builtin_trap that's supposed to do something similar. If they need a SEGV specifically (i.e. this is a unix-like OS), they can use kill(getpid(), SIGSEGV).

But just reverting this is pretty hostile unless Clang has been updated to make C and C++ volatile stores continue to behave as the language expects first.

Just regressing Clang because no one has written the LangRef change doesn't make sense to me. I think you should fix Clang to not be buggy first if you want to pursue this (or work with the Clang devs to get such a patch), just like we would fix a buggy LLVM pass that broke real users before enabling a valid optimization that uncovered the big.

I agree. I don't know exactly what fixes are needed for Clang [...]

Maybe it's just me but I have not understood what needs fixing in Clang. The way I understood this, the "problem" is that people think (e.g., maybe based on the blog post from 2011) that volatile store can be used to force a trap.
As shown very nicely in https://reviews.llvm.org/D87149#2266434, the argument is mood. As long as UB is hoisted across those volatile stores, they are effectively removed.

If the goal is to get source like that to change, then we might as well fix that first, so we can point to it as a reference example of how to fix code?

We pointed them to a fix (__builtin_trap()) a while ago, didn't we?

spatel added a subscriber: hctim.Jul 8 2021, 1:44 PM

If the goal is to get source like that to change, then we might as well fix that first, so we can point to it as a reference example of how to fix code?

We pointed them to a fix (__builtin_trap()) a while ago, didn't we?

Yep, and now it's in place with (cc @hctim ):
D105654 / d458f379324967
...but there's a code comment and comment in PR47480 that says LLVM has a bug. I assume that means the LangRef has a bug, but nobody wants to change it?

As for Clang, I don't know how we'd solve that problem (without changing LLVM) other than to say (loudly) that the blog post no longer applies, and we want code to update. Now we can at least point people to the GWP-asan patch as a reference, but as noted, there is very likely going to be fallout...

If the goal is to get source like that to change, then we might as well fix that first, so we can point to it as a reference example of how to fix code?

We pointed them to a fix (__builtin_trap()) a while ago, didn't we?

Yep, and now it's in place with (cc @hctim ):
D105654 / d458f379324967
...but there's a code comment and comment in PR47480 that says LLVM has a bug. I assume that means the LangRef has a bug, but nobody wants to change it?

It depends on how you look at it. LangRef and blog post disagree, that's for sure. What is "correct" depends now. I would modify the blog post with a big note, add a clang warning, and start to delete the stores.

As for Clang, I don't know how we'd solve that problem (without changing LLVM) other than to say (loudly) that the blog post no longer applies, and we want code to update. Now we can at least point people to the GWP-asan patch as a reference, but as noted, there is very likely going to be fallout...

Thinking out load here: What about a warning in clang if we store to a 0 literal?

(still tracking towards landing this in T-12H unless delayed by further patches appearing)

If the goal is to get source like that to change, then we might as well fix that first, so we can point to it as a reference example of how to fix code?

We pointed them to a fix (__builtin_trap()) a while ago, didn't we?

Yep, and now it's in place with (cc @hctim ):
D105654 / d458f379324967
...but there's a code comment and comment in PR47480 that says LLVM has a bug. I assume that means the LangRef has a bug, but nobody wants to change it?

It depends on how you look at it. LangRef and blog post disagree, that's for sure. What is "correct" depends now. I would modify the blog post with a big note, add a clang warning, and start to delete the stores.

As for Clang, I don't know how we'd solve that problem (without changing LLVM) other than to say (loudly) that the blog post no longer applies, and we want code to update. Now we can at least point people to the GWP-asan patch as a reference, but as noted, there is very likely going to be fallout...

Thinking out load here: What about a warning in clang if we store to a 0 literal?

That could be a very simple diag, and it could be useful i suppose,
but as seen in GWP-ASan case, it won't help people potentially expecting any volatile store trapping.

That could be a very simple diag, and it could be useful i suppose,
but as seen in GWP-ASan case, it won't help people potentially expecting any volatile store trapping.

There is also a related diagnostic that needs to be fixed, since it suggests the volatile approach:

rtt.cc:2:3: warning: indirection of non-volatile null pointer will be deleted, not trap [-Wnull-dereference]
  *reinterpret_cast<int*>(0) = 0;
  ^~~~~~~~~~~~~~~~~~~~~~~~~~
rtt.cc:2:3: note: consider using __builtin_trap() or qualifying pointer with 'volatile'
This revision was landed with ongoing or failed builds.Jul 9 2021, 4:17 AM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Jul 9 2021, 7:24 AM

This seems fairly unfriendly to users. The commit description doesn't say why this change is being made – is there any upside to the change? Is it big enough to justify the downside? Seems weird to land this without mentioning any advantages when the downside is pretty clear :)

thakis added a comment.Jul 9 2021, 7:25 AM

Also, seems like this should go in considerable time after https://reviews.llvm.org/rGf4877c78c0fc98be47b926439bbfe33d5e1d1b6d and not around the same time so that people have time to update their codebases.

thakis added a comment.Jul 9 2021, 7:35 AM

Lifting a comment from IRC:

[10:26:49]  <thakis_> i don't have an opinion on if this should be done at all, but if there's some good reason to do it
[10:26:58]  <thakis_> then i think it should go like a) land diag b) wait a bit c) land change
[10:27:12]  <thakis_> as-is all users basically have to fix all instances of the warning before they can upgrade compilers
[10:27:51]  <thakis_> if the diag lands first, users can instead a) upgrade and disable warning b) fix warning async c) receive actual codegen change at a later date, when they're hopefully done cleaning up
[10:28:13]  <thakis_> ( https://bugs.chromium.org/p/chromium/issues/detail?id=1227705 )
[10:28:35]  <thakis_> i'd guess (but i don't know) that we have many many instances of volatile nulls, given that that's what the diag recommended up until yesterday
[10:28:43]  <thakis_> spread over many repositories
[10:29:31]  <thakis_> (…and https://reviews.llvm.org/rGf4877c78c0fc98be47b926439bbfe33d5e1d1b6d landed without review?)

Given that this landed with several people raising concerns and it making life difficult for probably all users of LLVM, I think this and the clang change should be reverted for now. If there's agreement to move forward (which isn't clear to me from this review here, but I didn't read the prior discussions you linked to), then the clang diag should go on well before the actual LLVM change. And there should be a convincing explanation for the benefits of requiring updating (likely) a significant chunk of the world's code for this in the commit message of both changes.

In any case, the current rollout strategy doesn't seem workable to me. (Happy to hear from others, though.)

So maybe revert and provide a plan how to progress with it? No need for rush. Diag patch should be reviewed properly as well.

The aggressive style of commiting important patches without consensus is definitely a bad thing for the quality of LLVM. You can be aggressive in your LLVM forks, with you experimental builds, but not in the upstream.

thakis added a comment.Jul 9 2021, 7:55 AM

To provide some data on how disruptive this is: https://bugs.chromium.org/p/chromium/issues/detail?id=1227705#c2 (Summary: A naive regex search for \bvolatile.*(NULL|nullptr).*= finds this pattern in at least 10 different repositories looking only at chromium code).

So +1 to revert for now, getting numbers on perf to decide if this is worth pursuing at all (they'd have to outweigh the disruption caused), if so then landing the clang warning, waiting a release, and only then landing the llvm change.

nikic added a comment.Jul 9 2021, 8:08 AM

@thakis I think everyone here would be happy with this being reverted as soon as someone submits a LangRef patch and RFC to llvm-dev for changing the specified behavior of volatile stores in LLVM. I'd personally also be fine with this being reverted as long as someone takes personal responsibility for providing such a LangRef patch and RFC as soon as possible.

To provide some data on how disruptive this is: https://bugs.chromium.org/p/chromium/issues/detail?id=1227705#c2 (Summary: A naive regex search for \bvolatile.*(NULL|nullptr).*= finds this pattern in at least 10 different repositories looking only at chromium code).

That sounds fun.

So +1 to revert for now,

getting numbers on perf to decide if this is worth pursuing at all (they'd have to outweigh the disruption caused),

This is irrelevant and i will not be doing that.
Please note that i'm *not* touching clang, i'm only changing llvm.
clang is not the only llvm front-end, and i'm not confident it should get to dictate
what temporary hacks that violate llvm langref the llvm contains.

Tangentially, how about we propagate this hack from llvm into clang itself?
Since all the complaints so far are about *(volatile int*)0 = ??? pattern specifically,
since we already diagnose it, we could also just codegen it as a trap directly.

if so then landing the clang warning, waiting a release, and only then landing the llvm change.

as per https://llvm.org/, July 27: release/13.x branch created, i'd be fine with reverting for now, and relanding right after the 13.0 branching.
Does this work for you?

thakis added a comment.Jul 9 2021, 8:20 AM

@thakis I think everyone here would be happy with this being reverted as soon as someone submits a LangRef patch and RFC to llvm-dev for changing the specified behavior of volatile stores in LLVM. I'd personally also be fine with this being reverted as long as someone takes personal responsibility for providing such a LangRef patch and RFC as soon as possible.

I think this mixes up several things. This breaks users of LLVM who care about that their stuff works and who've never looked at the LangRef. They shouldn't have to pay for that we don't have our house in order.

thakis added a comment.Jul 9 2021, 8:23 AM

To provide some data on how disruptive this is: https://bugs.chromium.org/p/chromium/issues/detail?id=1227705#c2 (Summary: A naive regex search for \bvolatile.*(NULL|nullptr).*= finds this pattern in at least 10 different repositories looking only at chromium code).

That sounds fun.

So +1 to revert for now,

getting numbers on perf to decide if this is worth pursuing at all (they'd have to outweigh the disruption caused),

This is irrelevant and i will not be doing that.
Please note that i'm *not* touching clang, i'm only changing llvm.
clang is not the only llvm front-end, and i'm not confident it should get to dictate
what temporary hacks that violate llvm langref the llvm contains.

Tangentially, how about we propagate this hack from llvm into clang itself?
Since all the complaints so far are about *(volatile int*)0 = ??? pattern specifically,
since we already diagnose it, we could also just codegen it as a trap directly.

That seems like a good approach to me, but it should go through normal review etc :) Don't even need the warning change then, things will just keep working, right? (But again, please revert this for now and let's land this in some organized fashion.)

if so then landing the clang warning, waiting a release, and only then landing the llvm change.

as per https://llvm.org/, July 27: release/13.x branch created, i'd be fine with reverting for now, and relanding right after the 13.0 branching.
Does this work for you?

Updating 10 repositories in 3 weeks sounds challenging. But maybe that's not even necessary if the frontend rewrites this to a call to __builtin_trap?

To provide some data on how disruptive this is: https://bugs.chromium.org/p/chromium/issues/detail?id=1227705#c2 (Summary: A naive regex search for \bvolatile.*(NULL|nullptr).*= finds this pattern in at least 10 different repositories looking only at chromium code).

That sounds fun.

So +1 to revert for now,

getting numbers on perf to decide if this is worth pursuing at all (they'd have to outweigh the disruption caused),

This is irrelevant and i will not be doing that.
Please note that i'm *not* touching clang, i'm only changing llvm.
clang is not the only llvm front-end, and i'm not confident it should get to dictate
what temporary hacks that violate llvm langref the llvm contains.

Tangentially, how about we propagate this hack from llvm into clang itself?
Since all the complaints so far are about *(volatile int*)0 = ??? pattern specifically,
since we already diagnose it, we could also just codegen it as a trap directly.

That seems like a good approach to me, but it should go through normal review etc :)

Don't even need the warning change then, things will just keep working, right?

Nope. The idea is to move this hack from llvm to clang side, and then revert it on 28'th.

(But again, please revert this for now and let's land this in some organized fashion.)

if so then landing the clang warning, waiting a release, and only then landing the llvm change.

as per https://llvm.org/, July 27: release/13.x branch created, i'd be fine with reverting for now, and relanding right after the 13.0 branching.
Does this work for you?

Updating 10 repositories in 3 weeks sounds challenging. But maybe that's not even necessary if the frontend rewrites this to a call to __builtin_trap?

I'm only proposing to move the hack, not make it not-a-hack.
Does that sound ok to you?

thakis added a comment.Jul 9 2021, 8:28 AM

Don't even need the warning change then, things will just keep working, right?

Nope. The idea is to move this hack from llvm to clang side, and then revert it on 28'th.

What? Why? Why not keep it in clang permanently if you're not willing to collect data on any upside?

thakis added a subscriber: rsmith.Jul 9 2021, 8:45 AM

(I reverted this and the clang bit in 97c675d3d43fe02a0ff0a8350d79344c845758af to "unbreak" trunk while discussions on the way forward are ongoing.)

Adding @rsmith who might have an opinion on doing this in the frontend.

Possible paths forward:

  1. Make clang lower volatile literal nullptr derefs to __builtin_abort, land LLVM change as-is
    • Maybe add option to control whether to do this lowering or not
    • Advantages:
      • Keeps user code working as expected without work on part of users
      • LLVM change can go in quickly
    • Disadvantages:
      • Embedded folks might want to have a way to write to 0?
      • Most of LLVM's "nullptr is magic" logic probably exists for C, so it's a bit strange to make this not fire for C frontends
      • Other frontends still exposed to the change, but some of them (Rust, Swift) probably rely on nullptrs less. Not sure about Fortran
  1. Make clang diag volatile literal nullptr derefs and ask users to migrate all their code, then land LLVM some appropriate time later
  1. *handwave* Make langref self-consistent (which seems to be the motivation for this change? Still not quite sure) by making other cases less aggressive instead of this case more aggressive
  1. Keep things as is, update some documentation

From an uninformed distance (1) and (4) seem alright.

rsmith added a comment.EditedJul 9 2021, 2:20 PM

(I reverted this and the clang bit in 97c675d3d43fe02a0ff0a8350d79344c845758af to "unbreak" trunk while discussions on the way forward are ongoing.)

Adding @rsmith who might have an opinion on doing this in the frontend.

Example from upthread: https://github.com/llvm/llvm-project/blob/8cf60e61e7b0e3213b5b81039878dcf1ef18c00f/compiler-rt/lib/gwp_asan/guarded_pool_allocator.cpp#L254

It is not feasible to change the frontend to handle cases such as this differently, by emitting a trap directly, because the analysis required to determine whether a trap is emitted is too sophisticated. For the same reason it's not feasible for the frontend to warn on such cases in general.

The expectation of programmers using volatile is that you actually get a memory access, and volatile memory accesses are sometimes used in practice in cases where that memory access might have side effects on the state of the machine, meaning that execution -- in general -- does *not* necessarily continue past the instruction. Is there some way we can model that set of expectations in the LLVM IR semantics after this patch? If not, and assuming we want to keep this direction, could we add such a mechanism -- either a flag on the instruction to say that it might trap, or some kind of barrier intrinsic, or something? I think it would be fine for Clang to emit such a marker on every volatile load or store it emits; I'd even be happy to have a frontend flag to control whether we assume that volatile memory operations always complete or not, but I personally don't think it's worth adding such a thing given that volatile memory operations are extremely rare and the cost of the barrier is likely (almost) always negligible.

lebedev.ri added a comment.EditedJul 9 2021, 2:29 PM

Thank you for taking a look!

(I reverted this and the clang bit in 97c675d3d43fe02a0ff0a8350d79344c845758af to "unbreak" trunk while discussions on the way forward are ongoing.)

Adding @rsmith who might have an opinion on doing this in the frontend.

Example from upthread: https://github.com/llvm/llvm-project/blob/8cf60e61e7b0e3213b5b81039878dcf1ef18c00f/compiler-rt/lib/gwp_asan/guarded_pool_allocator.cpp#L254

It is not feasible to change the frontend to handle cases such as this differently, by emitting a trap directly, because the analysis required to determine whether a trap is emitted is too sophisticated. For the same reason it's not feasible for the frontend to warn on such cases in general.

The expectation of programmers using volatile is that you actually get a memory access, and volatile memory accesses are sometimes used in practice in cases where that memory access might have side effects on the state of the machine, meaning that execution -- in general -- does *not* necessarily continue past the instruction. Is there some way we can model that set of expectations in the LLVM IR semantics after this patch? If not, and assuming we want to keep this direction, could we add such a mechanism -- either a flag on the instruction to say that it might trap, or some kind of barrier intrinsic, or something? I think it would be fine for Clang to emit such a marker on every volatile load or store it emits; I'd even be happy to have a frontend flag to control whether we assume that volatile memory operations always complete or not, but I personally don't think it's worth adding such a thing given that volatile memory operations are extremely rare and the cost of the barrier is likely (almost) always negligible.

I think we should separate two patterns:

  1. *(volatile int)0 = 0, aka let's trap
  2. volatile int* ptr; <...> *ptr = x;

Indeed, we can't do anything reasonable with the latter.
Langref states that they do not trap, D53184, so the clang is already miscompiling the source C code that expects them to trap.
Unless someone comes forward with an RFC to change the semantics, LLVM will continue to believe that they will not trap.
Any pondering around this that does not involve an actual RFC is bikeshedding.

For the former, the pattern is obvious and the solution is simple.

rsmith added a comment.Jul 9 2021, 4:47 PM

I think we should separate two patterns:

  1. *(volatile int)0 = 0, aka let's trap
  2. volatile int* ptr; <...> *ptr = x;

Indeed, we can't do anything reasonable with the latter.
Langref states that they do not trap, D53184, so the clang is already miscompiling the source C code that expects them to trap.

In the absence of the recent semantics change, are there actually end-to-end miscompilations, or do you instead mean that Clang is producing IR that the IR semantics don't guarantee to behave as expected (but do in practice)? Those are two quite different outcomes, especially given that there's code in our own compiler-rt (and presumably in lots of other places) that assumes that a volatile null dereference gives an actual null dereference even if the null pointer is not "immediate".

Unless someone comes forward with an RFC to change the semantics, LLVM will continue to believe that they will not trap.

My question stands: is there a way for a frontend to generate IR that gives the semantics that Clang expects here? Or is the outcome that, after this change to the de facto semantics (and the earlier change to the de jure semantics in D53184), LLVM IR can no longer represent C programs with the semantics that Clang expects?

If there's nothing today, we could presumably add something like an intrinsic that is not guaranteed to return and that reads inaccessible memory only, that gets lowered to a no-op, and Clang could insert a call to that after every volatile load or store. That would seem more robust to me than an ad-hoc syntactic check for only pattern 1, given we know that pattern 2 also exists in the wild, and Clang has promised that both patterns work for many years.

CC @jfb, does C and C++ actually provide the guarantee we are talking about here?

I think we should separate two patterns:

  1. *(volatile int)0 = 0, aka let's trap
  2. volatile int* ptr; <...> *ptr = x;

Indeed, we can't do anything reasonable with the latter.
Langref states that they do not trap, D53184, so the clang is already miscompiling the source C code that expects them to trap.

Actually, i failed to convey the full detail of the problem.
This patch only matters when said volatile store and all the instructions after it must transfer execution to the unreachable.
So

store volatile
throw 1 ; execution of the normal, non-exception codeflow stops here, unreachable is not reached, execution continues in exception handler
unreachable

is fine and the store can not be dropped, while in

store volatile
printf("how did we get here?"); ; nothrow willreturn
unreachable

both the call and the store will be consumed by the unreachable.

Also, there aren't two patterns, there are three:

  1. *(volatile int)0 = 0, aka let's trap
  2. volatile int* ptr; <...> *ptr = x; // that's it, we've successfully trapped, i.e. what we've had in GWP-ASAN in https://github.com/llvm/llvm-project/blob/8cf60e61e7b0e3213b5b81039878dcf1ef18c00f/compiler-rt/lib/gwp_asan/guarded_pool_allocator.cpp#L254
  3. volatile int* ptr; <...> *ptr = x; /* did we trap? */ *ptr = y; /* what about now ? */

Point 1. and point 2. are the same, they are specifically written with the intention of trapping, the only difference is the address.
Point 3 is the problematic one. Does anyone actually expects that *any* volatile store could potentially "arbitrairly" trap? CC @jfb

Point 1. is "addressed" by D105728.
Point 2. potentially needs salvaging in the user code, iff it ends with unreachable, by either making it the first snippet, or ensuring that it does not end in unreachable (by e.g. ending with an explicit trap, or with an endless loop that isn't mustprogress)
Point 3 is rather hopeless.

In the absence of the recent semantics change, are there actually end-to-end miscompilations,

It is easy to construct: https://godbolt.org/z/fKch73G4M
While we have to compute 333 / (42 * width) before the first volatile store,
we also compute the 666 / (84 * width) before said first volatile store,
so if the 84 * width would overflow (resulting in poison as per https://llvm.org/docs/LangRef.html#id108),
then the 666 / (84 * width) would be UB (as per https://llvm.org/docs/LangRef.html#poison-values)
but if the first volatile store would have trapped, then we would't have experienced an UB,
yet we did. Not sure if i'm mixing up C++ and IR semantics..

or do you instead mean that Clang is producing IR that the IR semantics don't guarantee to behave as expected (but do in practice)?

Mainly, yes.

Those are two quite different outcomes, especially given that there's code in our own compiler-rt (and presumably in lots of other places) that assumes that a volatile null dereference gives an actual null dereference even if the null pointer is not "immediate".

Unless someone comes forward with an RFC to change the semantics, LLVM will continue to believe that they will not trap.

My question stands: is there a way for a frontend to generate IR that gives the semantics that Clang expects here? Or is the outcome that, after this change to the de facto semantics (and the earlier change to the de jure semantics in D53184), LLVM IR can no longer represent C programs with the semantics that Clang expects?

If there's nothing today, we could presumably add something like an intrinsic that is not guaranteed to return and that reads inaccessible memory only, that gets lowered to a no-op, and Clang could insert a call to that after every volatile load or store. That would seem more robust to me than an ad-hoc syntactic check for only pattern 1, given we know that pattern 2 also exists in the wild, and Clang has promised that both patterns work for many years.

Hm, load too? :] We already happily drop those: https://godbolt.org/z/8Kdo7z3jd

Let me just add a few points to the discussion:

  • __builtin_unreachable() in LLVM is equivalent to triggering UB.
  • Program executions that trigger UB at any point are meaningless. For the purposes of checking if an optimization is correct, we only need to consider the inputs for which no UB is triggered,
  • load/store of null is UB. And this fact is used by several analyses and optimizations in LLVM.

The discussion is now around volatile load/stores of null. There are at least 3 possible semantics:

  1. UB (current)
  2. mandatory abort
  3. non-deterministic abort

The 3rd option doesn't fix the present issue, so I would discard it. Since we allow derefs to sometimes not abort execution, we could declare a trace of deref null followed by unreachable UB.
For option 2), it adds a new requirement for CPUs that LLVM didn't previously have (that they must abort on null deref). If this a desired change, it must be clearly documented and announced in the release notes. Plus are we sure all CPUs abort on null deref in all execution modes?

Let's just not try to make temporary hacks definitive or else no one will accept temporary hacks again. Which are sometimes useful and important. But they must be temporary.

Let me just add a few points to the discussion:

  • __builtin_unreachable() in LLVM is equivalent to triggering UB.
  • Program executions that trigger UB at any point are meaningless. For the purposes of checking if an optimization is correct, we only need to consider the inputs for which no UB is triggered,
  • load/store of null is UB. And this fact is used by several analyses and optimizations in LLVM.

The discussion is now around volatile load/stores of null. There are at least 3 possible semantics:

  1. UB (current)
  2. mandatory abort
  3. non-deterministic abort

The 3rd option doesn't fix the present issue, so I would discard it. Since we allow derefs to sometimes not abort execution, we could declare a trace of deref null followed by unreachable UB.
For option 2), it adds a new requirement for CPUs that LLVM didn't previously have (that they must abort on null deref). If this a desired change, it must be clearly documented and announced in the release notes. Plus are we sure all CPUs abort on null deref in all execution modes?

Let's just not try to make temporary hacks definitive or else no one will accept temporary hacks again. Which are sometimes useful and important. But they must be temporary.

Sorry, forgot to add that option 2) makes some optimizations of LLVM wrong. We can't propagate backwards that a dereferenced ptr is non-null, only forwards. So if the semantics is changed, a few analyses would need to be changed, including ValueTracking, function attribute inference, etc.

lebedev.ri reopened this revision.Jul 11 2021, 10:02 AM
This revision is now accepted and ready to land.Jul 11 2021, 10:02 AM
efriedma added a comment.EditedJul 11 2021, 3:59 PM

My perspective:

In LLVM IR, there are basically the following possibilities for how we can handle volatile ops that trap:

  1. Volatile ops can have side-effects equivalent to an external call.
  2. Execution may not continue after the volatile operation (i.e. isGuaranteedToTransferExecutionToSuccessor() is false), but they can't modify arbitrary memory. They only modify the address in question, and possibly "inaccessible" memory. (i.e. we can use AliasAnalysis::alias() to determine if a non-volatile memory operation aliases a volatile memory operation).
  3. Execution will continue after the volatile operation, and they can't modify arbitrary memory. They only modify the address in question, and possibly "inaccessible" memory.
  4. They can't trap, and they can't modify memory. They only modify "inaccessible" memory (MMIO registers, etc.).
  5. (3), but don't DCE volatile ops followed by unreachable. ("unreachable" can't be hoisted past volatile ops, but other sorts of undefined behavior, like divide by zero, can.)

(1) is terrible for performance, if you're writing embedded code that actually needs to access MMIO registers regularly. It prevents a bunch of commonly expected transformations. Maybe useful for supporting exotic compilation modes like clang-cl's /EHa or /volatile:ms.

(3) is basically the minimum necessary to support the C standard's rules for volatile accesses. (2) is basically that minimum, plus volatile ops are allowed to kill the process. I'm not sure what the impact would be overall. Probably acceptable for most applications; isGuaranteedToTransferExecutionToSuccessor() isn't critical for the most important optimizations.

(4) is basically promising that you're only using volatile ops to access MMIO. Maybe some programming languages would want to expose this as an intrinsic for embedded programming.


This patch implements (3).

As far as I can tell, all of icc, msvc, gcc, and clang currently implement (5). See https://godbolt.org/z/cz7jPh5b3 . So it's basically standardized, right? :)

Link in previous comment should have been https://godbolt.org/z/cz7jPh5b3 .

@efriedma your proposal (5) is to change the semantics of builtin_unreachable() so it lowers to e.g. @llvm.trap rather than unreachable. I don't think having 2 builtin_* with the same semantics is any useful. Plus gcc documentation clearly says that __builtin_unreachable triggers UB.

Defining volatile ops as producing side-effects alone is not sufficient to prevent them from being deleted if followed by unreachable. We also delete function calls that precede unreachable. Anything that precedes UB can be deleted.
If we were to special case volatile ops, the implications are not local. We would need to consider that a called function may execute volatile ops.

CompCert's correctness theorem disallows removing operations before unreachable, but adopting that model in LLVM is a significant change. And I don't really see the point when users can just read the documentation and use the right intrinsic?

I agree with @eli.friedman's breakdown of options here: https://reviews.llvm.org/D105338#2870156

That said, programmers generally assume that volatile effects will same-thread happens-before other things. Yes it's misplaced, yes the LangRef says otherwise. So we have a few choices:

  1. Argue with developers that they're wrong. This is often a great way to get into shouting matches, and get developers to despise compiler developers. "But the LangRef was clear!!!" really isn't a compelling argument. In particular, "it's always worked before" is a strong argument, "clang can't even model your semantics" as @rsmith points out, and "OK but you haven't given me a better tool to do what I want, and I don't have a good way to find all the places you're saying my code is 'broken' now".
  2. Change the LangRef to reflect developer expectations. This might cause performance degradation, for example when mixing __builtin_unreachable with volatile, or when doing really basic MMIO accesses, or when __builtin_unreachable now has to assume that *any* function call might contain volatile, so can't eliminate opaque calls that come before it.

I don't have a strong opinion on which direction we should go here. We're talking about interactions between volatile which is ill-defined (and defined by what programmers feel it should do), and other features such as __builtin_unreachable. Without good perf data, and data about at-scale breakage, it's hard to find the right solution.

I do think we should get data, and be empathetic towards our developers :)

efriedma added a comment.EditedJul 12 2021, 10:47 AM

@efriedma your proposal (5) is to change the semantics of __builtin_unreachable() so it lowers to e.g. @llvm.trap rather than unreachable. I don't think having 2 __builtin_* with the same semantics is any useful. Plus gcc documentation clearly says that __builtin_unreachable triggers UB.

Sorry I wasn't quite clear. (5) is just declaring "whatever transforms compilers currently implement are legal, and whatever transforms they don't are illegal". I don't think there's any way to turn it into a semantically consistent model.

Defining volatile ops as producing side-effects alone is not sufficient to prevent them from being deleted if followed by unreachable. We also delete function calls that precede unreachable.

We only delete calls before unreachable if they're willreturn. The idea behind (2) is that we can mess with whether volatile ops are considered willreturn without allowing other side-effects.

Thank you everyone for taking a look!

@efriedma your proposal (5) is to change the semantics of __builtin_unreachable() so it lowers to e.g. @llvm.trap rather than unreachable. I don't think having 2 __builtin_* with the same semantics is any useful. Plus gcc documentation clearly says that __builtin_unreachable triggers UB.

Sorry I wasn't quite clear. (5) is just declaring "whatever transforms compilers currently implement are legal, and whatever transforms they don't are illegal". I don't think there's any way to turn it into a semantically consistent model.

Defining volatile ops as producing side-effects alone is not sufficient to prevent them from being deleted if followed by unreachable. We also delete function calls that precede unreachable.

We only delete calls before unreachable if they're willreturn. The idea behind (2) is that we can mess with whether volatile ops are considered willreturn without allowing other side-effects.

Since volatile store isn't modelled as trapping, we could end up outlining it into a new function, and said function would obviously be willreturn, and thus the call to said outlined function would obviously be eraseable.

D106309 has landed, so now we're in a stable state, although not really an ideal one.

I suspect the way forward here involves adding flavors of volatile at the IR level: we can let the frontend request the optimization restrictions it wants, sort of like atomic orderings.