This is an archive of the discontinued LLVM Phabricator instance.

[IR] Add dead_on_unwind attribute
AbandonedPublic

Authored by nikic on Aug 9 2023, 6:02 AM.

Details

Summary

Add the dead_on_unwind attribute, which states that the caller will not read from this argument if the call unwinds. This allows eliding stores that could otherwise be visible on the unwind path, for example:

declare void @may_unwind()

define void @src(ptr noalias dead_on_unwind %out) {
    store i32 0, ptr %out
    call void @may_unwind()
    store i32 1, ptr %out
    ret void
}

define void @tgt(ptr noalias dead_on_unwind %out) {
    call void @may_unwind()
    store i32 1, ptr %out
    ret void
}

The optimization is not valid without dead_on_unwind, because the i32 0 value might be read if @may_unwind unwinds.

This attribute is primarily intended to be used on sret arguments. In fact, I previously wanted to change the semantics of sret to include this "no read after unwind" property (see D116998), but based on the feedback there it is better to keep these attributes orthogonal (sret is an ABI attribute, dead_on_unwind is an optimization attribute). This is a reboot of that change with a separate attribute.

Diff Detail

Build Status
Buildable 258151

Event Timeline

nikic created this revision.Aug 9 2023, 6:02 AM
nikic requested review of this revision.Aug 9 2023, 6:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2023, 6:02 AM

This approach makes sense to me.

I reconfirmed for myself that this will work for C++ sret objects that need to be cleaned up: https://gcc.godbolt.org/z/hnddhTqWa It's the only way it could possibly work, but it took a minute to remind myself how it works.

Please add a test to llvm/test/Bitcode/attributes.ll for the new attribute.

cc some folks with an interest in IR: @asbirlea @jyknight @aeubanks

nikic updated this revision to Diff 548902.Aug 10 2023, 12:53 AM

Add to llvm/test/Bitcode/attributes.ll, add verifier check for pointer type.

needs a verifier check that the attribute isn't a return value attribute

a follow up change will infer this for sret args in inferattrs?

is this intended to be derivable for args of internal/private functions or mostly just for sret/something the frontend can add?

what changes would need to be made to sink the store in https://gcc.godbolt.org/z/hnddhTqWa ?

needs a verifier check that the attribute isn't a return value attribute

This is specified in Attributes.td and does not need to be separately verified (it's enforced by the IR / bitcode parser).

a follow up change will infer this for sret args in inferattrs?

No, this won't be inferred for sret and should be added by the frontend instead, similar to the other attributes that usually go with sret (noalias, dereferenceable, and for many languages also nocapture). The followup would be to emit it in clang.

is this intended to be derivable for args of internal/private functions or mostly just for sret/something the frontend can add?

The semantics allow deriving it (simplest case: All callers use "call" and pass an alloca to the argument) though I don't have immediate plans to implement this. We currently only perform RPO attribute inference in the late pipeline, at which point the attribute likely won't be useful anymore. If we have motivating cases for this, we could make it work though.

what changes would need to be made to sink the store in https://gcc.godbolt.org/z/hnddhTqWa ?

I don't think we have any store sinking support at all right now, so we'd need that first. This change is primarily aimed at call slot optimization and DSE.

I get the motivation, though, I'm not happy as this is super specific "no-read" and "after-unwind".

I'm spitballing mostly:

What about "dead-after-unwind", or is it used but not read specifically?

Different thought: memory(after_unwind: none)?

Also, do we just concatenate words?

nikic added a comment.Aug 14 2023, 1:11 AM

I get the motivation, though, I'm not happy as this is super specific "no-read" and "after-unwind".

I'm spitballing mostly:

What about "dead-after-unwind", or is it used but not read specifically?

I think the thing we really want to convey is that "memory contents do not matter after unwind". There shouldn't be "real" reads or writes after unwind, but there may be a lifetime.end, which we'd usually consider a write.

Something I suggested in previous discussions was to make this "poison on unwind", which does give us the "memory contents do not matter" property without actually forbidding further reads or writes. I think this is what we want in terms of semantics, but I find the name quite intuitive (because it's phrased in terms of an additional virtual action performed on unwind that happens to imply a restriction). Maybe calling it "dead on unwind" with those poison semantics would be reasonable? After all, a write of poison is how we tend to model lifetime end.

Different thought: memory(after_unwind: none)?

This is just for a single argument, while memory() applies to the whole function.

Also, do we just concatenate words?

We have a mix of both, I wouldn't mind making this no_read_after_unwind instead.

nikic updated this revision to Diff 549934.Aug 14 2023, 7:48 AM
nikic retitled this revision from [IR] Add noreadafterunwind attribute to [IR] Add dead_on_unwind attribute.
nikic edited the summary of this revision. (Show Details)

Rename to dead_on_unwind, specify in terms of writing a poison value on unwind.

nikic edited the summary of this revision. (Show Details)Aug 14 2023, 7:50 AM

LG from my end. Unsure if others want to chime in.

llvm/docs/LangRef.rst
1543

I think this sounds sensible.

The "underlying object of a pointer argument" is not something that's defined in LangRef. And if you mean something like getUnderlyingObject(), that has weird consequences for the semantics of memory allocation.

Maybe what you actually want is "any values stored through this pointer will not be read by the caller if this call unwinds", or something like that?

I like the name dead_on_unwind, though; I think the name suggests roughly the correct semantics.

nikic updated this revision to Diff 550267.Aug 15 2023, 4:47 AM

Replace "underlying object" with "allocated object".

nikic added a comment.Aug 15 2023, 4:53 AM

The "underlying object of a pointer argument" is not something that's defined in LangRef. And if you mean something like getUnderlyingObject(), that has weird consequences for the semantics of memory allocation.

I've replaced "underlying object" with "allocated object". I don't think we define that term either, but we do use it to specify various things, including GEP semantics.

The motivation for talking about objects here is to avoid the need to reason about extents. The semantics could be coupled to dereferenceable, but that would make require additional checks for every use of the attribute for no clear benefit.

Maybe what you actually want is "any values stored through this pointer will not be read by the caller if this call unwinds", or something like that?

That would work, with the caveat that "any values stored" needs to include stores that don't exist in the original program. For call slot optimization we may be introducing additional stores, not removing them.

I'm mostly concerned that depending on the size of the allocation leads to weird results if, for example, the caller decides to merge two allocas. For example, you have a function f(ptr dead_on_unwind, ptr), and the caller passes each argument a pointer to an alloca i32. If the caller merge those two allocas into {i32, i32}, you'd expect the semantics to stay consistent... but if the whole allocation is "dead", then merging affects stores through the second pointer. So using the "allocation" as the boundary isn't really well-defined.

Maybe dead was a bad idea after all (mea culpa).
invalidated_on_unwind(<size>), would allow to deal with the merge case and express the semantic even more clearly, I guess?

I'm mostly concerned that depending on the size of the allocation leads to weird results if, for example, the caller decides to merge two allocas. For example, you have a function f(ptr dead_on_unwind, ptr), and the caller passes each argument a pointer to an alloca i32. If the caller merge those two allocas into {i32, i32}, you'd expect the semantics to stay consistent... but if the whole allocation is "dead", then merging affects stores through the second pointer. So using the "allocation" as the boundary isn't really well-defined.

Okay, I can see the concern. In that case, I would suggest the following wording:

At a high level, this attribute indicates that the pointer argument is dead if the call unwinds, in the sense that the caller will not depend on the contents of the memory. Stores that would only be visible on the unwind path can be elided.

More precisely, the behavior is as-if the memory within the dereferenceable region of the pointer argument (if any), as well as any memory written through the pointer during the execution of the function, is overwritten with a poison value on unwind. The caller is allowed to access the affected memory, but all loads that are not preceded by a store will return poison.

This attribute cannot be applied to return values.

This should allow all the motivating optimizations without additional checks, while still allowing things like alloca merging or alloca reuse.

Does that sound reasonable?

That seems close. Not sure about the "dereferenceable region" bit; the concept makes sense, it's just not really compatible with the current meaning of "dereferenceable". "dereferenceable" doesn't currently imply it's legal to write to a region in memory, only to read from it. I think you need some other way to indicate it's safe to speculatively write to the memory.

I would remove:

The caller is allowed to access the affected memory, but all loads that are not preceded by a store will return poison.

And

dereferenceable region of the pointer argument

is super unclear, especially after you apply @efriedma's alloca merge. A size/type would make this clearer, we would not need to talk about "deref" at all, just that the region [ptr, ptr + size) is overwritten by poison on unwind.

nikic planned changes to this revision.Aug 15 2023, 1:37 PM

That seems close. Not sure about the "dereferenceable region" bit; the concept makes sense, it's just not really compatible with the current meaning of "dereferenceable". "dereferenceable" doesn't currently imply it's legal to write to a region in memory, only to read from it. I think you need some other way to indicate it's safe to speculatively write to the memory.

Yeah, you're right. This is an existing issue in call slot optimization, and it probably makes sense to address this first: https://llvm.godbolt.org/z/xa5P7TEed As written this transform would only be valid if the call is also willreturn.

I think I'll propose a writable or so attribute first, and then return back to this.

I think I'll propose a writable or so attribute first, and then return back to this.

We have deref and maybe we want to add modifiers instead. We "have"/"need" deref(globally) anyway, and we can do deref(write) as well.

nikic added a comment.Aug 16 2023, 7:01 AM

Writable attribute is up at D158081.

nikic updated this revision to Diff 558081.Nov 13 2023, 6:20 AM

Rebase and adjust LangRef wording again. We now say that any memory written in the function will be poisoned on unwind. The connection to the writable attribute here is that it implies a write on entry to the function.

@efriedma Does this address your concerns?

seems reasonable to me

nikic added a comment.Mon, Nov 27, 7:51 AM

ping (mainly for @efriedma)

lgtm, but leaving up to @efriedma to stamp.

llvm/test/Transforms/DeadStoreElimination/simple.ll
516–517

Update comment?

nikic updated this revision to Diff 558208.Mon, Dec 4, 12:31 AM

Update comment.

nikic abandoned this revision.Mon, Dec 4, 12:43 AM

I've migrated this patch to GitHub because Phabricator stopped sending emails: https://github.com/llvm/llvm-project/pull/74289