This is an archive of the discontinued LLVM Phabricator instance.

[LICM] Only create load in pre-header when promoting load.
ClosedPublic

Authored by fhahn on Apr 10 2022, 2:26 PM.

Details

Summary

When only a store is sunk, there is no need to create a load in the
pre-header, as the result of the load will never get used.

The dead load can can introduce UB, if the function is marked as
writeonly.

Fixes #51248.

Diff Detail

Unit TestsFailed

Event Timeline

fhahn created this revision.Apr 10 2022, 2:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2022, 2:26 PM
fhahn requested review of this revision.Apr 10 2022, 2:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2022, 2:26 PM
nikic added inline comments.Apr 11 2022, 2:18 AM
llvm/lib/Transforms/Scalar/LICM.cpp
2180

Do we need to add the poison value here at all? If we don't add it, shouldn't SSAUpdater avoid creating the phi altogether and directly use the value stored in the loop?

fhahn updated this revision to Diff 421874.Apr 11 2022, 4:29 AM
fhahn marked an inline comment as done.

Update patch to only add available value when needed.

fhahn added inline comments.Apr 11 2022, 4:30 AM
llvm/lib/Transforms/Scalar/LICM.cpp
2180

We don't need to add the poison value. We still need the phi node in the test case, because the store doesn't dominate the exit. The SSA updater uses undef instead of poison.

nikic accepted this revision.Apr 11 2022, 5:35 AM

LGTM

This revision is now accepted and ready to land.Apr 11 2022, 5:35 AM

I think the patch is not the full fix, as far as I understand. It's a good improvement, perf & correctness wise, though, but doesn't fix #51248.

There's one missing case: the load is needed and the function can't load memory (writeonly, readnone, etc?). In that case the optimization should bail out.

fhahn added a comment.Apr 11 2022, 7:38 AM

I think the patch is not the full fix, as far as I understand. It's a good improvement, perf & correctness wise, though, but doesn't fix #51248.

There's one missing case: the load is needed and the function can't load memory (writeonly, readnone, etc?). In that case the optimization should bail out.

I think at the moment, the load will only be generated if the source already had a load that is guaranteed to execute. So hoisting that in a write only function should be fine, as it is already UB.

I think the patch is not the full fix, as far as I understand. It's a good improvement, perf & correctness wise, though, but doesn't fix #51248.

There's one missing case: the load is needed and the function can't load memory (writeonly, readnone, etc?). In that case the optimization should bail out.

I think at the moment, the load will only be generated if the source already had a load that is guaranteed to execute. So hoisting that in a write only function should be fine, as it is already UB.

OK then, thank you!
LGTM

This revision was landed with ongoing or failed builds.Apr 11 2022, 7:45 AM
This revision was automatically updated to reflect the committed changes.
fhahn added inline comments.Apr 11 2022, 10:36 AM
llvm/lib/Transforms/Scalar/LICM.cpp
2180

Hm, it looks like adding the available value unconditionally was needed for the code later on to preserve LCSSA form. Otherwise SSAUpdater may introduce non-LCSSA uses later on.

I am planning on re-committing with a poison value for the pre-header to fix this.

After this got recommitted, our bot has issue during stage2 with llvm-tablgen crashing, have you seen this elsewhere?

Oh I see you already reverted the recommit, nevermind.

In case it helps, I was diagnosing a regression in Rust that bisected to the recommit of this when the re-revert landed. https://buildkite.com/llvm-project/rust-llvm-integrate-prototype/builds/9758 has the failure, though I imagine you'd want some un-optimized IR for the test case or something. If you'd like that please let me know and I can get something together. Thanks!

rsmith added a subscriber: rsmith.Apr 13 2022, 1:38 PM

If you're looking for a testcase for the latest failure, this change appears to cause a miscompile in llvm::VersionTuple::tryParse, at least in our internal bootstrap builds.

fhahn reopened this revision.Apr 20 2022, 11:48 AM
This revision is now accepted and ready to land.Apr 20 2022, 11:48 AM
fhahn updated this revision to Diff 423986.Apr 20 2022, 11:48 AM

I think this needs another look. AFAICT the original version was causing some mis-compiles, because *if* the object we store to is function local (alloca), we sink the store even if the store is not guarnateed to execute (tests added in 5e54a413de1f803).

I updated the patch to introduce the load, unless the the store is guaranteed to execute. If the load is introduced, the writeonly attribute is removed. This isn't ideal, because the attribute could have knock-on effects for functions calling the modified function.

Maybe a better solution would be to clarify the wording for writeonly in langref to be clear that the attribute only considers writes to memory visible to its callers. Then we can retain writeonly even if loads to local objects are introduced. I doubt considering reads to function-local objects for writeonly adds any value.

Maybe a better solution would be to clarify the wording for writeonly in langref to be clear that the attribute only considers writes to memory visible to its callers. Then we can retain writeonly even if loads to local objects are introduced. I doubt considering reads to function-local objects for writeonly adds any value.

These are already accepted semantics for memory attributes, see e.g. the code in checkFunctionMemoryAccess(), which ignores local memory. The wording in LangRef is "If a writeonly function reads memory visible to the program", which is a bit unclear, but probably the intent of "visible to the program" here is "visible to the caller".

Maybe a better solution would be to clarify the wording for writeonly in langref to be clear that the attribute only considers writes to memory visible to its callers. Then we can retain writeonly even if loads to local objects are introduced. I doubt considering reads to function-local objects for writeonly adds any value.

These are already accepted semantics for memory attributes, see e.g. the code in checkFunctionMemoryAccess(), which ignores local memory. The wording in LangRef is "If a writeonly function reads memory visible to the program", which is a bit unclear, but probably the intent of "visible to the program" here is "visible to the caller".

Right, Alive2 implements that semantics. Memory accesses to local allocas/mallocs are ok in functions that restrict loads and/or stores.

fhahn added a comment.Apr 20 2022, 2:08 PM

Maybe a better solution would be to clarify the wording for writeonly in langref to be clear that the attribute only considers writes to memory visible to its callers. Then we can retain writeonly even if loads to local objects are introduced. I doubt considering reads to function-local objects for writeonly adds any value.

These are already accepted semantics for memory attributes, see e.g. the code in checkFunctionMemoryAccess(), which ignores local memory. The wording in LangRef is "If a writeonly function reads memory visible to the program", which is a bit unclear, but probably the intent of "visible to the program" here is "visible to the caller".

Yep, that's the ambiguity that would be good to clarify. I put up D124124.

nlopes added inline comments.Apr 20 2022, 2:10 PM
llvm/lib/Transforms/Scalar/LICM.cpp
2165

we can't remove the writeonly attribute locally. We would need to remove it transitively as it's illegal for a writeonly function to call a non-writeonly function.
We need to bail out if a non-local store is needed.

fhahn updated this revision to Diff 424602.Apr 22 2022, 2:10 PM

Update the patch to not remove the writeonly attribute, but instead verify that the writeonly semantics are not violated if we need to introduce a load, as clarified in D123473. If the store is not guaranteed to execute, a new load should only be introduced for not escaping noalias calls or allocas.

fhahn marked an inline comment as done.Apr 22 2022, 2:12 PM
fhahn added inline comments.
llvm/lib/Transforms/Scalar/LICM.cpp
2165

Yeah, I tried to allude to that in the message for the update. Updated to assert no problematic loads will be introduced in write only functions as per clarification in D124124.

I'm surprised the assertion is not an if. But if it doesn't trigger, then LGTM.

fhahn marked an inline comment as done.Apr 26 2022, 2:19 PM

I'm surprised the assertion is not an if. But if it doesn't trigger, then LGTM.

Yeah, I couldn't find an example where this would trigger. The hope is that if it triggers we will soon have a test case.

nlopes added a subscriber: regehr.Apr 26 2022, 2:30 PM

@regehr If you have time, it would be nice to fuzz LICM to check if this assertion can be triggered (after the patch lands).
It needs a function with the writeonly attribute + a pointer as input (or a global variable), a loop with a store to said pointer, a phi, and that's it (i.e., the test cases here as seed). Then some random stuff to make it crash.
Thanks!

reames added a comment.May 1 2022, 9:02 AM

Coming into this quite late, but I'm concerned about this patch and the reasoning discussed in this thread.

Consider the fact that the original program may have contained a load along a dynamically dead path inside a loop contained by a writeonly function. Even if we interpret the load as being UB when executed, the original program is well defined. If we then speculatively insert a load into the preheader - which we do based on speculation safety, even with this patch - then the load will execute in a writeonly function.

Before:

loop { 
  v = 0
  if (never_taken) {
    v = load addr
  }
  store v, addr
}

After:

v1 = load addr
loop { 
  v = 0
  if (never_taken) {
    v = v1
  }
  store v, addr
}

If we consider such a load immediate UB, then this is not a valid transform. However, consider that this transform is the same as performed by LICM (e.g. not store promotion.)(

If we actually want to prevent this transform, I think we'd have to consider loads in writeonly functions not safe to speculate.

I don't think we actually want to do that. Instead, I think we're making a mistake by considering the load to be immediate UB as opposed to simply returning poison in this case.

If we accept that semantic, then this patch becomes completely unnecessary.

nlopes added a comment.May 2 2022, 4:01 AM

I don't think we actually want to do that. Instead, I think we're making a mistake by considering the load to be immediate UB as opposed to simply returning poison in this case.

You've a point. My concern is that it's surprising to have LLVM introducing a load in a function that's not supposed to do any load. I was thinking about I/O, but those accesses should be volatile. For kernel code it may not be ok, but probably they also use volatile to prevent the compiler messing around instead of relying on writeonly.
I can give it a try. I will run Alive2 on the LLVM test suite and see if anything fails.
Nevertheless, as mentioned below, whether the load is UB or returns poison is orthogonal to this patch.

If we accept that semantic, then this patch becomes completely unnecessary.

Not true. The loop may not execute and thus you need the previous value. If the loaded value is poison, you still can't do the transformation,
Plus this patch removes useless loads; it's a perf win.

aqjune added a comment.May 2 2022, 6:42 AM

I don't think we actually want to do that. Instead, I think we're making a mistake by considering the load to be immediate UB as opposed to simply returning poison in this case.

If there is a transformation that turns load in a writeonly function into unreachable, defining it as UB would be necessary.
However, my guess is that existing optimizations won't be that aggressive; perhaps for existing opts, poison might be enough. This is still my guess, though.

reames added a comment.May 3 2022, 1:26 PM

I don't think we actually want to do that. Instead, I think we're making a mistake by considering the load to be immediate UB as opposed to simply returning poison in this case.

You've a point. My concern is that it's surprising to have LLVM introducing a load in a function that's not supposed to do any load. I was thinking about I/O, but those accesses should be volatile. For kernel code it may not be ok, but probably they also use volatile to prevent the compiler messing around instead of relying on writeonly

Something can be surprising, and yet a natural consequence of other reasonable choices. I think this is one of them.

On the topic of volatile, you raise an interesting point. We don't seem to be suppressing speculation of volatile loads in the common utility. That sounds like a bug, and likely worth a patch of it's own. It's much broader than LICM though.
.

If we accept that semantic, then this patch becomes completely unnecessary.

Not true. The loop may not execute and thus you need the previous value. If the loaded value is poison, you still can't do the transformation,

Yes, you can. That's my point. If the code which branches on it executes, then you have UB. Not before.

Plus this patch removes useless loads; it's a perf win.

Hm, this a minor argument for folding load of caller accessible memory in writeonly to poison as an instcombine rule, but is pretty unconvincing here.

I don't think we actually want to do that. Instead, I think we're making a mistake by considering the load to be immediate UB as opposed to simply returning poison in this case.

If there is a transformation that turns load in a writeonly function into unreachable, defining it as UB would be necessary.
However, my guess is that existing optimizations won't be that aggressive; perhaps for existing opts, poison might be enough. This is still my guess, though.

If you find it, let me know. We can debate removing it.

I don't think we actually want to do that. Instead, I think we're making a mistake by considering the load to be immediate UB as opposed to simply returning poison in this case.

You've a point. My concern is that it's surprising to have LLVM introducing a load in a function that's not supposed to do any load. I was thinking about I/O, but those accesses should be volatile. For kernel code it may not be ok, but probably they also use volatile to prevent the compiler messing around instead of relying on writeonly

I run LLVM's test suite under Alive2 and found no issues with the load-is-poison semantics for writeonly & argmemonly functions.

Giving it a second thought, I agree with you. These attributes are not there to prevent optimizations, but to enable them. The compiler can do whatever it wants as long as it respects the memory model. The kernel may have a different idea about how the memory model looks like, but for now we don't support such special casing like gcc does (or used to).

So I think we can change it. It's generally better to keep UB usage as low as possible, and yield poison whenever possible. So I favor the change.

As for this patch, it optimizes the case where the loop & store are guaranteed to execute, so no load is needed. This is independent of writeonly.
See this example:

define void @test(i8 %var, ptr %ptr) {
entry:
  br label %for.cond

for.cond:
  %i = phi i64 [ 0, %entry ], [ %add, %cond.end ]
  %cmp = icmp ult i64 %i, 2
  br i1 %cmp, label %for.body39, label %for.end

for.body39:
  %div = sdiv i8 %var, 3
  %cmp2 = icmp slt i8 %div, 0
  br i1 %cmp2, label %cond.true, label %cond.false

cond.true:
  br label %cond.end

cond.false:
  br label %cond.end

cond.end:
  %merge = phi i8 [ %div, %cond.true ], [ 0, %cond.false ]
  store i8 %merge, ptr %ptr, align 1
  %add = add i64 %i, 4
  br label %for.cond

for.end:
  ret void
}

LICM currently produces a load that is replaced with poison with this patch:

define void @test(i8 %var, ptr %ptr) {
entry:
  %div = sdiv i8 %var, 3
  %cmp2 = icmp slt i8 %div, 0
  %ptr.promoted = load i8, ptr %ptr, align 1
  br label %for.cond
...
fhahn added a comment.May 5 2022, 4:00 AM

I don't think we actually want to do that. Instead, I think we're making a mistake by considering the load to be immediate UB as opposed to simply returning poison in this case.

You've a point. My concern is that it's surprising to have LLVM introducing a load in a function that's not supposed to do any load. I was thinking about I/O, but those accesses should be volatile. For kernel code it may not be ok, but probably they also use volatile to prevent the compiler messing around instead of relying on writeonly

While I think that's a great first data point, the coverage of writeonly functions is probably not too extensive unfortunately. Another interesting data point may be to (randomly?) add writeonly attributes to the existing tests and verify that.

I run LLVM's test suite under Alive2 and found no issues with the load-is-poison semantics for writeonly & argmemonly functions.

Giving it a second thought, I agree with you. These attributes are not there to prevent optimizations, but to enable them. The compiler can do whatever it wants as long as it respects the memory model. The kernel may have a different idea about how the memory model looks like, but for now we don't support such special casing like gcc does (or used to).

So I think we can change it. It's generally better to keep UB usage as low as possible, and yield poison whenever possible. So I favor the change.

I think it makes sense in isolation. But I think we should be careful to consider consistency with other/related attributes. At the moment, violations of a set of related memory attributes all have the same effect, UB. If we decide to change that, I think we should update at least all related attributes (readnone, inaccessiblememonly, inaccessiblemem_or_argmemonly, argmemonly, !dereferenceable metadata,...). For attributes that forbid writing to memory, we would need to retain UB?

Changing the langref is straight-forward, but enforcing it is difficult. Is there anything we could do to be more confident when adjusting the semantics?

I added a test that triggers the assertion in the current version of the patch (3497a4f39601).

As for this patch, it optimizes the case where the loop & store are guaranteed to execute, so no load is needed. This is independent of writeonly.

Yeah, this should be an independent improvement. I am leaning towards landing the improvement without the assertion. And leave resolution of #51248 until we either adjust LangRef or add a bailout. I probably won't have time in the immediate future to work on updating the langref and possibly audit existing code.

nlopes added a comment.May 9 2022, 8:01 AM

I don't think we actually want to do that. Instead, I think we're making a mistake by considering the load to be immediate UB as opposed to simply returning poison in this case.

You've a point. My concern is that it's surprising to have LLVM introducing a load in a function that's not supposed to do any load. I was thinking about I/O, but those accesses should be volatile. For kernel code it may not be ok, but probably they also use volatile to prevent the compiler messing around instead of relying on writeonly

While I think that's a great first data point, the coverage of writeonly functions is probably not too extensive unfortunately. Another interesting data point may be to (randomly?) add writeonly attributes to the existing tests and verify that.

Agreed. There are very few tests for writeonly. The good news is that it isn't heavily used either.

I run LLVM's test suite under Alive2 and found no issues with the load-is-poison semantics for writeonly & argmemonly functions.

Giving it a second thought, I agree with you. These attributes are not there to prevent optimizations, but to enable them. The compiler can do whatever it wants as long as it respects the memory model. The kernel may have a different idea about how the memory model looks like, but for now we don't support such special casing like gcc does (or used to).

So I think we can change it. It's generally better to keep UB usage as low as possible, and yield poison whenever possible. So I favor the change.

I think it makes sense in isolation. But I think we should be careful to consider consistency with other/related attributes. At the moment, violations of a set of related memory attributes all have the same effect, UB. If we decide to change that, I think we should update at least all related attributes (readnone, inaccessiblememonly, inaccessiblemem_or_argmemonly, argmemonly, !dereferenceable metadata,...).

This kind of attributes are usually added by LLVM itself, they are not specified by hand by users. Hence we are more free to select the semantics.
There are valid reasons to stop the compiler from hoisting loads, but I don't think writeonly is the right attribute. As it was probably added by LLVM to denote a function that is functional, may write to memory, but it will always write the same thing. So repeated calls can be removed. Introducing loads that yield poison doesn't change a thing, as long as their result is not used at run-time.
So for the uses of writeonly that we have today, I think yield poison is sufficient. Nevertheless, (part of) your patch is still useful regardless of the semantics to avoid creating useless loads.

For attributes that forbid writing to memory, we would need to retain UB?

Yes, as we use the "nowrite" attributes to determine whether a function can be hoisted and etc, so we can't introduce a store. Also, introducing stores is already forbidden in many cases.

Changing the langref is straight-forward, but enforcing it is difficult. Is there anything we could do to be more confident when adjusting the semantics?

The best thing we have today is fuzzing + Alive2. @regehr has been working on the fuzzing side of things. This strategy has been reasonably successful in testing semantics. The only caveat is that Alive2 doesn't support IPO, so we disable it for the Attributor and related things to avoid false-positives. For attributes that's not ideal.

fhahn closed this revision.May 29 2022, 2:04 PM

I went ahead and recommitted the patch as 0776c48f9b7e. As per earlier discussions, the committed version doesn't fix the Github issue.